* [PATCH v7 00/10] New DRM accel driver for Rockchip's RKNN NPU
@ 2025-06-06 6:28 Tomeu Vizoso
2025-06-06 6:28 ` [PATCH v7 02/10] accel/rocket: Add a new driver for Rockchip's NPU Tomeu Vizoso
` (11 more replies)
0 siblings, 12 replies; 30+ messages in thread
From: Tomeu Vizoso @ 2025-06-06 6:28 UTC (permalink / raw)
To: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner,
Oded Gabbay, Jonathan Corbet, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
Christian König, Sebastian Reichel, Nicolas Frattaroli,
Kever Yang, Robin Murphy, Daniel Stone, Da Xue, Jeff Hugo
Cc: devicetree, linux-arm-kernel, linux-rockchip, linux-kernel,
dri-devel, linux-doc, linux-media, linaro-mm-sig, Tomeu Vizoso,
Jeff Hugo, Krzysztof Kozlowski
This series adds a new driver for the NPU that Rockchip includes in its
newer SoCs, developed by them on the NVDLA base.
In its current form, it supports the specific NPU in the RK3588 SoC.
The userspace driver is part of Mesa and an initial draft can be found at:
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29698
Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
---
Changes in v7:
- Actually enable process isolation by allocating its own IOMMU domain
to each DRM client.
- Link to v6: https://lore.kernel.org/r/20250604-6-10-rocket-v6-0-237ac75ddb5e@tomeuvizoso.net
Changes in v6:
- Make all cores depend on pclk and npu clocks
- Fix BO sync direction logic
- Misc. cleanups
- Link to v5: https://lore.kernel.org/r/20250520-6-10-rocket-v5-0-18c9ca0fcb3c@tomeuvizoso.net
Changes in v5:
- Use bulk clk API
- Rename bindings file
- Syntax improvement to bindings
- Link to v4: https://lore.kernel.org/r/20250519-6-10-rocket-v4-0-d6dff6b4c0ae@tomeuvizoso.net
Changes in v4:
- Several fixes to DT bindings.
- Link to v3: https://lore.kernel.org/r/20250516-6-10-rocket-v3-0-7051ac9225db@tomeuvizoso.net
Changes in v3:
- Reference in the device tree only the register blocks that are
actually used.
- Several style and robustness fixes suggested in the mailing list.
- Added patches from Nicolas Frattaroli that add support to the NPU for
the Rock 5B board.
- Link to v2: https://lore.kernel.org/r/20250225-6-10-rocket-v2-0-d4dbcfafc141@tomeuvizoso.net
Changes in v2:
- Drop patch adding the rk3588 compatible to rockchip-iommu (Sebastian Reichel)
- Drop patch adding support for multiple power domains to rockchip-iommu (Sebastian Reichel)
- Link to v1: https://lore.kernel.org/r/20240612-6-10-rocket-v1-0-060e48eea250@tomeuvizoso.net
---
Nicolas Frattaroli (2):
arm64: dts: rockchip: add pd_npu label for RK3588 power domains
arm64: dts: rockchip: enable NPU on ROCK 5B
Tomeu Vizoso (8):
accel/rocket: Add registers header
accel/rocket: Add a new driver for Rockchip's NPU
accel/rocket: Add IOCTL for BO creation
accel/rocket: Add job submission IOCTL
accel/rocket: Add IOCTLs for synchronizing memory accesses
dt-bindings: npu: rockchip,rknn: Add bindings
arm64: dts: rockchip: Add nodes for NPU and its MMU to rk3588-base
arm64: dts: rockchip: Enable the NPU on quartzpro64
Documentation/accel/index.rst | 1 +
Documentation/accel/rocket/index.rst | 19 +
.../bindings/npu/rockchip,rk3588-rknn-core.yaml | 118 +
MAINTAINERS | 10 +
arch/arm64/boot/dts/rockchip/rk3588-base.dtsi | 89 +-
.../arm64/boot/dts/rockchip/rk3588-quartzpro64.dts | 30 +
arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts | 56 +
drivers/accel/Kconfig | 1 +
drivers/accel/Makefile | 1 +
drivers/accel/rocket/Kconfig | 25 +
drivers/accel/rocket/Makefile | 10 +
drivers/accel/rocket/rocket_core.c | 80 +
drivers/accel/rocket/rocket_core.h | 59 +
drivers/accel/rocket/rocket_device.c | 31 +
drivers/accel/rocket/rocket_device.h | 30 +
drivers/accel/rocket/rocket_drv.c | 301 ++
drivers/accel/rocket/rocket_drv.h | 19 +
drivers/accel/rocket/rocket_gem.c | 197 +
drivers/accel/rocket/rocket_gem.h | 32 +
drivers/accel/rocket/rocket_job.c | 700 ++++
drivers/accel/rocket/rocket_job.h | 52 +
drivers/accel/rocket/rocket_registers.h | 4425 ++++++++++++++++++++
include/uapi/drm/rocket_accel.h | 145 +
23 files changed, 6430 insertions(+), 1 deletion(-)
---
base-commit: a6d708809579ed3d902465785666085ff07a1d7c
change-id: 20240612-6-10-rocket-9316defc14c7
Best regards,
--
Tomeu Vizoso <tomeu@tomeuvizoso.net>
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v7 02/10] accel/rocket: Add a new driver for Rockchip's NPU
2025-06-06 6:28 [PATCH v7 00/10] New DRM accel driver for Rockchip's RKNN NPU Tomeu Vizoso
@ 2025-06-06 6:28 ` Tomeu Vizoso
2025-06-19 8:59 ` Robert Foss
2025-07-11 17:38 ` Andrew Davis
2025-06-06 6:28 ` [PATCH v7 03/10] accel/rocket: Add IOCTL for BO creation Tomeu Vizoso
` (10 subsequent siblings)
11 siblings, 2 replies; 30+ messages in thread
From: Tomeu Vizoso @ 2025-06-06 6:28 UTC (permalink / raw)
To: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner,
Oded Gabbay, Jonathan Corbet, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
Christian König, Sebastian Reichel, Nicolas Frattaroli,
Kever Yang, Robin Murphy, Daniel Stone, Da Xue, Jeff Hugo
Cc: devicetree, linux-arm-kernel, linux-rockchip, linux-kernel,
dri-devel, linux-doc, linux-media, linaro-mm-sig, Tomeu Vizoso
This initial version supports the NPU as shipped in the RK3588 SoC and
described in the first part of its TRM, in Chapter 36.
This NPU contains 3 independent cores that the driver can submit jobs
to.
This commit adds just hardware initialization and power management.
v2:
- Split cores and IOMMUs as independent devices (Sebastian Reichel)
- Add some documentation (Jeffrey Hugo)
- Be more explicit in the Kconfig documentation (Jeffrey Hugo)
- Remove resets, as these haven't been found useful so far (Zenghui Yu)
- Repack structs (Jeffrey Hugo)
- Use DEFINE_DRM_ACCEL_FOPS (Jeffrey Hugo)
- Use devm_drm_dev_alloc (Jeffrey Hugo)
- Use probe log helper (Jeffrey Hugo)
- Introduce UABI header in a later patch (Jeffrey Hugo)
v3:
- Adapt to a split of the register block in the DT bindings (Nicolas
Frattaroli)
- Move registers header to its own commit (Thomas Zimmermann)
- Misc. cleanups (Thomas Zimmermann and Jeff Hugo)
- Make use of GPL-2.0-only for the copyright notice (Jeff Hugo)
- PM improvements (Nicolas Frattaroli)
v4:
- Use bulk clk API (Krzysztof Kozlowski)
v6:
- Remove mention to NVDLA, as the hardware is only incidentally related
(Kever Yang)
- Use calloc instead of GFP_ZERO (Jeff Hugo)
- Explicitly include linux/container_of.h (Jeff Hugo)
- pclk and npu clocks are now needed by all cores (Rob Herring)
v7:
- Assign its own IOMMU domain to each client, for isolation (Daniel
Stone and Robin Murphy)
Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
---
Documentation/accel/index.rst | 1 +
Documentation/accel/rocket/index.rst | 19 +++
MAINTAINERS | 10 ++
drivers/accel/Kconfig | 1 +
drivers/accel/Makefile | 1 +
drivers/accel/rocket/Kconfig | 25 ++++
drivers/accel/rocket/Makefile | 8 +
drivers/accel/rocket/rocket_core.c | 70 +++++++++
drivers/accel/rocket/rocket_core.h | 45 ++++++
drivers/accel/rocket/rocket_device.c | 25 ++++
drivers/accel/rocket/rocket_device.h | 26 ++++
drivers/accel/rocket/rocket_drv.c | 279 +++++++++++++++++++++++++++++++++++
drivers/accel/rocket/rocket_drv.h | 15 ++
13 files changed, 525 insertions(+)
diff --git a/Documentation/accel/index.rst b/Documentation/accel/index.rst
index bc85f26533d88891dde482f91e26c99991b22869..d8fa332d60a890dbb617454d2a26d9b6f9b196aa 100644
--- a/Documentation/accel/index.rst
+++ b/Documentation/accel/index.rst
@@ -10,6 +10,7 @@ Compute Accelerators
introduction
amdxdna/index
qaic/index
+ rocket/index
.. only:: subproject and html
diff --git a/Documentation/accel/rocket/index.rst b/Documentation/accel/rocket/index.rst
new file mode 100644
index 0000000000000000000000000000000000000000..300eb3aeab1d8c6514c65af4d216b2d5a1669131
--- /dev/null
+++ b/Documentation/accel/rocket/index.rst
@@ -0,0 +1,19 @@
+.. SPDX-License-Identifier: GPL-2.0-only
+
+=====================================
+ accel/rocket Rockchip NPU driver
+=====================================
+
+The accel/rocket driver supports the Neural Processing Units (NPUs) inside some
+Rockchip SoCs such as the RK3588. Rockchip calls it RKNN and sometimes RKNPU.
+
+The hardware is described in chapter 36 in the RK3588 TRM.
+
+This driver just powers the hardware on and off, allocates and maps buffers to
+the device and submits jobs to the frontend unit. Everything else is done in
+userspace, as a Gallium driver (also called rocket) that is part of the Mesa3D
+project.
+
+Hardware currently supported:
+
+* RK3588
\ No newline at end of file
diff --git a/MAINTAINERS b/MAINTAINERS
index 96b82704950184bd71623ff41fc4df31e4c7fe87..2d8833bf1f2db06ca624d703f19066adab2f9fde 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -7263,6 +7263,16 @@ T: git https://gitlab.freedesktop.org/drm/misc/kernel.git
F: drivers/accel/ivpu/
F: include/uapi/drm/ivpu_accel.h
+DRM ACCEL DRIVER FOR ROCKCHIP NPU
+M: Tomeu Vizoso <tomeu@tomeuvizoso.net>
+L: dri-devel@lists.freedesktop.org
+S: Supported
+T: git https://gitlab.freedesktop.org/drm/misc/kernel.git
+F: Documentation/accel/rocket/
+F: Documentation/devicetree/bindings/npu/rockchip,rknn-core.yaml
+F: drivers/accel/rocket/
+F: include/uapi/drm/rocket_accel.h
+
DRM COMPUTE ACCELERATORS DRIVERS AND FRAMEWORK
M: Oded Gabbay <ogabbay@kernel.org>
L: dri-devel@lists.freedesktop.org
diff --git a/drivers/accel/Kconfig b/drivers/accel/Kconfig
index 5b9490367a39fd12d35a8d9021768aa186c09308..bb01cebc42bf16ebf02e938040f339ff94869e33 100644
--- a/drivers/accel/Kconfig
+++ b/drivers/accel/Kconfig
@@ -28,5 +28,6 @@ source "drivers/accel/amdxdna/Kconfig"
source "drivers/accel/habanalabs/Kconfig"
source "drivers/accel/ivpu/Kconfig"
source "drivers/accel/qaic/Kconfig"
+source "drivers/accel/rocket/Kconfig"
endif
diff --git a/drivers/accel/Makefile b/drivers/accel/Makefile
index a301fb6089d4c515430175c5e2ba9190f6dc9158..ffc3fa58866616d933184a7659573cd4d4780a8d 100644
--- a/drivers/accel/Makefile
+++ b/drivers/accel/Makefile
@@ -4,3 +4,4 @@ obj-$(CONFIG_DRM_ACCEL_AMDXDNA) += amdxdna/
obj-$(CONFIG_DRM_ACCEL_HABANALABS) += habanalabs/
obj-$(CONFIG_DRM_ACCEL_IVPU) += ivpu/
obj-$(CONFIG_DRM_ACCEL_QAIC) += qaic/
+obj-$(CONFIG_DRM_ACCEL_ROCKET) += rocket/
\ No newline at end of file
diff --git a/drivers/accel/rocket/Kconfig b/drivers/accel/rocket/Kconfig
new file mode 100644
index 0000000000000000000000000000000000000000..9a59c6c61bf4d6460d8008b16331f001c97de67d
--- /dev/null
+++ b/drivers/accel/rocket/Kconfig
@@ -0,0 +1,25 @@
+# SPDX-License-Identifier: GPL-2.0-only
+
+config DRM_ACCEL_ROCKET
+ tristate "Rocket (support for Rockchip NPUs)"
+ depends on DRM
+ depends on ARM64 || COMPILE_TEST
+ depends on MMU
+ select DRM_SCHED
+ select IOMMU_SUPPORT
+ select IOMMU_IO_PGTABLE_LPAE
+ select DRM_GEM_SHMEM_HELPER
+ help
+ Choose this option if you have a Rockchip SoC that contains a
+ compatible Neural Processing Unit (NPU), such as the RK3588. Called by
+ Rockchip either RKNN or RKNPU, it accelerates inference of neural
+ networks.
+
+ The interface exposed to userspace is described in
+ include/uapi/drm/rocket_accel.h and is used by the Rocket userspace
+ driver in Mesa3D.
+
+ If unsure, say N.
+
+ To compile this driver as a module, choose M here: the
+ module will be called rocket.
diff --git a/drivers/accel/rocket/Makefile b/drivers/accel/rocket/Makefile
new file mode 100644
index 0000000000000000000000000000000000000000..abdd75f2492eaecf8bf5e78a2ac150ea19ac3e96
--- /dev/null
+++ b/drivers/accel/rocket/Makefile
@@ -0,0 +1,8 @@
+# SPDX-License-Identifier: GPL-2.0-only
+
+obj-$(CONFIG_DRM_ACCEL_ROCKET) := rocket.o
+
+rocket-y := \
+ rocket_core.o \
+ rocket_device.o \
+ rocket_drv.o
diff --git a/drivers/accel/rocket/rocket_core.c b/drivers/accel/rocket/rocket_core.c
new file mode 100644
index 0000000000000000000000000000000000000000..3a6f25f2b4103075102739588bcdad96510e2a4e
--- /dev/null
+++ b/drivers/accel/rocket/rocket_core.c
@@ -0,0 +1,70 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright 2024-2025 Tomeu Vizoso <tomeu@tomeuvizoso.net> */
+
+#include <linux/clk.h>
+#include <linux/dev_printk.h>
+#include <linux/err.h>
+#include <linux/platform_device.h>
+#include <linux/pm_runtime.h>
+
+#include "rocket_core.h"
+
+int rocket_core_init(struct rocket_core *core)
+{
+ struct device *dev = core->dev;
+ struct platform_device *pdev = to_platform_device(dev);
+ u32 version;
+ int err = 0;
+
+ err = devm_clk_bulk_get(dev, ARRAY_SIZE(core->clks), core->clks);
+ if (err)
+ return dev_err_probe(dev, err, "failed to get clocks for core %d\n", core->index);
+
+ core->pc_iomem = devm_platform_ioremap_resource_byname(pdev, "pc");
+ if (IS_ERR(core->pc_iomem)) {
+ dev_err(dev, "couldn't find PC registers %ld\n", PTR_ERR(core->pc_iomem));
+ return PTR_ERR(core->pc_iomem);
+ }
+
+ core->cna_iomem = devm_platform_ioremap_resource_byname(pdev, "cna");
+ if (IS_ERR(core->cna_iomem)) {
+ dev_err(dev, "couldn't find CNA registers %ld\n", PTR_ERR(core->cna_iomem));
+ return PTR_ERR(core->cna_iomem);
+ }
+
+ core->core_iomem = devm_platform_ioremap_resource_byname(pdev, "core");
+ if (IS_ERR(core->core_iomem)) {
+ dev_err(dev, "couldn't find CORE registers %ld\n", PTR_ERR(core->core_iomem));
+ return PTR_ERR(core->core_iomem);
+ }
+
+ pm_runtime_use_autosuspend(dev);
+
+ /*
+ * As this NPU will be most often used as part of a media pipeline that
+ * ends presenting in a display, choose 50 ms (~3 frames at 60Hz) as an
+ * autosuspend delay as that will keep the device powered up while the
+ * pipeline is running.
+ */
+ pm_runtime_set_autosuspend_delay(dev, 50);
+
+ pm_runtime_enable(dev);
+
+ err = pm_runtime_get_sync(dev);
+
+ version = rocket_pc_readl(core, VERSION);
+ version += rocket_pc_readl(core, VERSION_NUM) & 0xffff;
+
+ pm_runtime_mark_last_busy(dev);
+ pm_runtime_put_autosuspend(dev);
+
+ dev_info(dev, "Rockchip NPU core %d version: %d\n", core->index, version);
+
+ return 0;
+}
+
+void rocket_core_fini(struct rocket_core *core)
+{
+ pm_runtime_dont_use_autosuspend(core->dev);
+ pm_runtime_disable(core->dev);
+}
diff --git a/drivers/accel/rocket/rocket_core.h b/drivers/accel/rocket/rocket_core.h
new file mode 100644
index 0000000000000000000000000000000000000000..1b1beb9798f03ec2ca325496a4d894674d0b798d
--- /dev/null
+++ b/drivers/accel/rocket/rocket_core.h
@@ -0,0 +1,45 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright 2024-2025 Tomeu Vizoso <tomeu@tomeuvizoso.net> */
+
+#ifndef __ROCKET_CORE_H__
+#define __ROCKET_CORE_H__
+
+#include <drm/gpu_scheduler.h>
+#include <linux/clk.h>
+#include <linux/io.h>
+#include <linux/mutex_types.h>
+
+#include "rocket_registers.h"
+
+#define rocket_pc_readl(core, reg) \
+ readl((core)->pc_iomem + (REG_PC_##reg))
+#define rocket_pc_writel(core, reg, value) \
+ writel(value, (core)->pc_iomem + (REG_PC_##reg))
+
+#define rocket_cna_readl(core, reg) \
+ readl((core)->cna_iomem + (REG_CNA_##reg) - REG_CNA_S_STATUS)
+#define rocket_cna_writel(core, reg, value) \
+ writel(value, (core)->cna_iomem + (REG_CNA_##reg) - REG_CNA_S_STATUS)
+
+#define rocket_core_readl(core, reg) \
+ readl((core)->core_iomem + (REG_CORE_##reg) - REG_CORE_S_STATUS)
+#define rocket_core_writel(core, reg, value) \
+ writel(value, (core)->core_iomem + (REG_CORE_##reg) - REG_CORE_S_STATUS)
+
+struct rocket_core {
+ struct device *dev;
+ struct rocket_device *rdev;
+ struct device_link *link;
+ unsigned int index;
+
+ int irq;
+ void __iomem *pc_iomem;
+ void __iomem *cna_iomem;
+ void __iomem *core_iomem;
+ struct clk_bulk_data clks[4];
+};
+
+int rocket_core_init(struct rocket_core *core);
+void rocket_core_fini(struct rocket_core *core);
+
+#endif
diff --git a/drivers/accel/rocket/rocket_device.c b/drivers/accel/rocket/rocket_device.c
new file mode 100644
index 0000000000000000000000000000000000000000..a05c103e117e3eaa6439884b7acb6e3483296edb
--- /dev/null
+++ b/drivers/accel/rocket/rocket_device.c
@@ -0,0 +1,25 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright 2024-2025 Tomeu Vizoso <tomeu@tomeuvizoso.net> */
+
+#include <linux/array_size.h>
+#include <linux/clk.h>
+#include <linux/dev_printk.h>
+
+#include "rocket_device.h"
+
+int rocket_device_init(struct rocket_device *rdev)
+{
+ int err;
+
+ /* Initialize core 0 (top) */
+ err = rocket_core_init(&rdev->cores[0]);
+ if (err)
+ return err;
+
+ return 0;
+}
+
+void rocket_device_fini(struct rocket_device *rdev)
+{
+ rocket_core_fini(&rdev->cores[0]);
+}
diff --git a/drivers/accel/rocket/rocket_device.h b/drivers/accel/rocket/rocket_device.h
new file mode 100644
index 0000000000000000000000000000000000000000..b5d5f1479d56e2fde59bbcad9de2b58cef9a9a4d
--- /dev/null
+++ b/drivers/accel/rocket/rocket_device.h
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright 2024-2025 Tomeu Vizoso <tomeu@tomeuvizoso.net> */
+
+#ifndef __ROCKET_DEVICE_H__
+#define __ROCKET_DEVICE_H__
+
+#include <drm/drm_device.h>
+#include <linux/clk.h>
+#include <linux/container_of.h>
+
+#include "rocket_core.h"
+
+struct rocket_device {
+ struct drm_device ddev;
+
+ struct rocket_core *cores;
+ unsigned int num_cores;
+};
+
+int rocket_device_init(struct rocket_device *rdev);
+void rocket_device_fini(struct rocket_device *rdev);
+
+#define to_rocket_device(drm_dev) \
+ ((struct rocket_device *)container_of(drm_dev, struct rocket_device, ddev))
+
+#endif
diff --git a/drivers/accel/rocket/rocket_drv.c b/drivers/accel/rocket/rocket_drv.c
new file mode 100644
index 0000000000000000000000000000000000000000..b38a5c6264cb4e74d5e381adaeba1426e576fa56
--- /dev/null
+++ b/drivers/accel/rocket/rocket_drv.c
@@ -0,0 +1,279 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright 2024-2025 Tomeu Vizoso <tomeu@tomeuvizoso.net> */
+
+#include <drm/drm_accel.h>
+#include <drm/drm_drv.h>
+#include <drm/drm_gem.h>
+#include <drm/drm_ioctl.h>
+#include <drm/drm_of.h>
+#include <linux/array_size.h>
+#include <linux/clk.h>
+#include <linux/component.h>
+#include <linux/dma-mapping.h>
+#include <linux/iommu.h>
+#include <linux/of.h>
+#include <linux/platform_device.h>
+#include <linux/pm_runtime.h>
+
+#include "rocket_drv.h"
+
+static int
+rocket_open(struct drm_device *dev, struct drm_file *file)
+{
+ struct rocket_device *rdev = to_rocket_device(dev);
+ struct rocket_file_priv *rocket_priv;
+
+ rocket_priv = kzalloc(sizeof(*rocket_priv), GFP_KERNEL);
+ if (!rocket_priv)
+ return -ENOMEM;
+
+ rocket_priv->rdev = rdev;
+ rocket_priv->domain = iommu_paging_domain_alloc(dev->dev);
+ file->driver_priv = rocket_priv;
+
+ return 0;
+}
+
+static void
+rocket_postclose(struct drm_device *dev, struct drm_file *file)
+{
+ struct rocket_file_priv *rocket_priv = file->driver_priv;
+
+ iommu_domain_free(rocket_priv->domain);
+ kfree(rocket_priv);
+}
+
+static const struct drm_ioctl_desc rocket_drm_driver_ioctls[] = {
+#define ROCKET_IOCTL(n, func) \
+ DRM_IOCTL_DEF_DRV(ROCKET_##n, rocket_ioctl_##func, 0)
+};
+
+DEFINE_DRM_ACCEL_FOPS(rocket_accel_driver_fops);
+
+/*
+ * Rocket driver version:
+ * - 1.0 - initial interface
+ */
+static const struct drm_driver rocket_drm_driver = {
+ .driver_features = DRIVER_COMPUTE_ACCEL,
+ .open = rocket_open,
+ .postclose = rocket_postclose,
+ .ioctls = rocket_drm_driver_ioctls,
+ .num_ioctls = ARRAY_SIZE(rocket_drm_driver_ioctls),
+ .fops = &rocket_accel_driver_fops,
+ .name = "rocket",
+ .desc = "rocket DRM",
+};
+
+static int rocket_drm_bind(struct device *dev)
+{
+ struct device_node *core_node;
+ struct rocket_device *rdev;
+ struct drm_device *ddev;
+ unsigned int num_cores = 1;
+ int err;
+
+ rdev = devm_drm_dev_alloc(dev, &rocket_drm_driver, struct rocket_device, ddev);
+ if (IS_ERR(rdev))
+ return PTR_ERR(rdev);
+
+ ddev = &rdev->ddev;
+ dev_set_drvdata(dev, rdev);
+
+ for_each_compatible_node(core_node, NULL, "rockchip,rk3588-rknn-core")
+ if (of_device_is_available(core_node))
+ num_cores++;
+
+ rdev->cores = devm_kcalloc(dev, num_cores, sizeof(*rdev->cores), GFP_KERNEL);
+ if (IS_ERR(rdev->cores))
+ return PTR_ERR(rdev->cores);
+
+ /* Add core 0, any other cores will be added later when they are bound */
+ rdev->cores[0].rdev = rdev;
+ rdev->cores[0].dev = dev;
+ rdev->cores[0].index = 0;
+ rdev->num_cores = 1;
+
+ err = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(40));
+ if (err)
+ return err;
+
+ err = rocket_device_init(rdev);
+ if (err) {
+ dev_err_probe(dev, err, "Fatal error during NPU init\n");
+ goto err_device_fini;
+ }
+
+ err = component_bind_all(dev, rdev);
+ if (err)
+ goto err_device_fini;
+
+ err = drm_dev_register(ddev, 0);
+ if (err < 0)
+ goto err_unbind;
+
+ return 0;
+
+err_unbind:
+ component_unbind_all(dev, rdev);
+err_device_fini:
+ rocket_device_fini(rdev);
+ return err;
+}
+
+static void rocket_drm_unbind(struct device *dev)
+{
+ struct rocket_device *rdev = dev_get_drvdata(dev);
+ struct drm_device *ddev = &rdev->ddev;
+
+ drm_dev_unregister(ddev);
+
+ component_unbind_all(dev, rdev);
+
+ rocket_device_fini(rdev);
+}
+
+const struct component_master_ops rocket_drm_ops = {
+ .bind = rocket_drm_bind,
+ .unbind = rocket_drm_unbind,
+};
+
+static int rocket_core_bind(struct device *dev, struct device *master, void *data)
+{
+ struct rocket_device *rdev = data;
+ unsigned int core = rdev->num_cores;
+ int err;
+
+ dev_set_drvdata(dev, rdev);
+
+ rdev->cores[core].rdev = rdev;
+ rdev->cores[core].dev = dev;
+ rdev->cores[core].index = core;
+ rdev->cores[core].link = device_link_add(dev, rdev->cores[0].dev,
+ DL_FLAG_STATELESS | DL_FLAG_PM_RUNTIME);
+
+ rdev->num_cores++;
+
+ err = rocket_core_init(&rdev->cores[core]);
+ if (err) {
+ rocket_device_fini(rdev);
+ return err;
+ }
+
+ return 0;
+}
+
+static void rocket_core_unbind(struct device *dev, struct device *master, void *data)
+{
+ struct rocket_device *rdev = data;
+
+ for (unsigned int core = 1; core < rdev->num_cores; core++) {
+ if (rdev->cores[core].dev == dev) {
+ rocket_core_fini(&rdev->cores[core]);
+ device_link_del(rdev->cores[core].link);
+ break;
+ }
+ }
+}
+
+const struct component_ops rocket_core_ops = {
+ .bind = rocket_core_bind,
+ .unbind = rocket_core_unbind,
+};
+
+static int rocket_probe(struct platform_device *pdev)
+{
+ struct component_match *match = NULL;
+ struct device_node *core_node;
+
+ if (fwnode_device_is_compatible(pdev->dev.fwnode, "rockchip,rk3588-rknn-core"))
+ return component_add(&pdev->dev, &rocket_core_ops);
+
+ for_each_compatible_node(core_node, NULL, "rockchip,rk3588-rknn-core") {
+ if (!of_device_is_available(core_node))
+ continue;
+
+ drm_of_component_match_add(&pdev->dev, &match,
+ component_compare_of, core_node);
+ }
+
+ return component_master_add_with_match(&pdev->dev, &rocket_drm_ops, match);
+}
+
+static void rocket_remove(struct platform_device *pdev)
+{
+ if (fwnode_device_is_compatible(pdev->dev.fwnode, "rockchip,rk3588-rknn-core-top"))
+ component_master_del(&pdev->dev, &rocket_drm_ops);
+ else if (fwnode_device_is_compatible(pdev->dev.fwnode, "rockchip,rk3588-rknn-core"))
+ component_del(&pdev->dev, &rocket_core_ops);
+}
+
+static const struct of_device_id dt_match[] = {
+ { .compatible = "rockchip,rk3588-rknn-core-top" },
+ { .compatible = "rockchip,rk3588-rknn-core" },
+ {}
+};
+MODULE_DEVICE_TABLE(of, dt_match);
+
+static int find_core_for_dev(struct device *dev)
+{
+ struct rocket_device *rdev = dev_get_drvdata(dev);
+
+ for (unsigned int core = 0; core < rdev->num_cores; core++) {
+ if (dev == rdev->cores[core].dev)
+ return core;
+ }
+
+ return -1;
+}
+
+static int rocket_device_runtime_resume(struct device *dev)
+{
+ struct rocket_device *rdev = dev_get_drvdata(dev);
+ int core = find_core_for_dev(dev);
+ int err = 0;
+
+ if (core < 0)
+ return -ENODEV;
+
+ err = clk_bulk_prepare_enable(ARRAY_SIZE(rdev->cores[core].clks), rdev->cores[core].clks);
+ if (err) {
+ dev_err(dev, "failed to enable (%d) clocks for core %d\n", err, core);
+ return err;
+ }
+
+ return 0;
+}
+
+static int rocket_device_runtime_suspend(struct device *dev)
+{
+ struct rocket_device *rdev = dev_get_drvdata(dev);
+ int core = find_core_for_dev(dev);
+
+ if (core < 0)
+ return -ENODEV;
+
+ clk_bulk_disable_unprepare(ARRAY_SIZE(rdev->cores[core].clks), rdev->cores[core].clks);
+
+ return 0;
+}
+
+EXPORT_GPL_DEV_PM_OPS(rocket_pm_ops) = {
+ RUNTIME_PM_OPS(rocket_device_runtime_suspend, rocket_device_runtime_resume, NULL)
+ SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend, pm_runtime_force_resume)
+};
+
+static struct platform_driver rocket_driver = {
+ .probe = rocket_probe,
+ .remove = rocket_remove,
+ .driver = {
+ .name = "rocket",
+ .pm = pm_ptr(&rocket_pm_ops),
+ .of_match_table = dt_match,
+ },
+};
+module_platform_driver(rocket_driver);
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("DRM driver for the Rockchip NPU IP");
+MODULE_AUTHOR("Tomeu Vizoso");
diff --git a/drivers/accel/rocket/rocket_drv.h b/drivers/accel/rocket/rocket_drv.h
new file mode 100644
index 0000000000000000000000000000000000000000..3219621afb72acdfa915c110e2ec3aacb66bd940
--- /dev/null
+++ b/drivers/accel/rocket/rocket_drv.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright 2024-2025 Tomeu Vizoso <tomeu@tomeuvizoso.net> */
+
+#ifndef __ROCKET_DRV_H__
+#define __ROCKET_DRV_H__
+
+#include "rocket_device.h"
+
+struct rocket_file_priv {
+ struct rocket_device *rdev;
+
+ struct iommu_domain *domain;
+};
+
+#endif
--
2.49.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v7 03/10] accel/rocket: Add IOCTL for BO creation
2025-06-06 6:28 [PATCH v7 00/10] New DRM accel driver for Rockchip's RKNN NPU Tomeu Vizoso
2025-06-06 6:28 ` [PATCH v7 02/10] accel/rocket: Add a new driver for Rockchip's NPU Tomeu Vizoso
@ 2025-06-06 6:28 ` Tomeu Vizoso
2025-06-24 10:17 ` Robin Murphy
` (2 more replies)
2025-06-06 6:28 ` [PATCH v7 04/10] accel/rocket: Add job submission IOCTL Tomeu Vizoso
` (9 subsequent siblings)
11 siblings, 3 replies; 30+ messages in thread
From: Tomeu Vizoso @ 2025-06-06 6:28 UTC (permalink / raw)
To: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner,
Oded Gabbay, Jonathan Corbet, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
Christian König, Sebastian Reichel, Nicolas Frattaroli,
Kever Yang, Robin Murphy, Daniel Stone, Da Xue, Jeff Hugo
Cc: devicetree, linux-arm-kernel, linux-rockchip, linux-kernel,
dri-devel, linux-doc, linux-media, linaro-mm-sig, Tomeu Vizoso
This uses the SHMEM DRM helpers and we map right away to the CPU and NPU
sides, as all buffers are expected to be accessed from both.
v2:
- Sync the IOMMUs for the other cores when mapping and unmapping.
v3:
- Make use of GPL-2.0-only for the copyright notice (Jeff Hugo)
v6:
- Use mutexes guard (Markus Elfring)
v7:
- Assign its own IOMMU domain to each client, for isolation (Daniel
Stone and Robin Murphy)
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
---
drivers/accel/rocket/Makefile | 3 +-
drivers/accel/rocket/rocket_device.c | 4 ++
drivers/accel/rocket/rocket_device.h | 2 +
drivers/accel/rocket/rocket_drv.c | 7 ++-
drivers/accel/rocket/rocket_gem.c | 115 +++++++++++++++++++++++++++++++++++
drivers/accel/rocket/rocket_gem.h | 27 ++++++++
include/uapi/drm/rocket_accel.h | 44 ++++++++++++++
7 files changed, 200 insertions(+), 2 deletions(-)
diff --git a/drivers/accel/rocket/Makefile b/drivers/accel/rocket/Makefile
index abdd75f2492eaecf8bf5e78a2ac150ea19ac3e96..4deef267f9e1238c4d8bd108dcc8afd9dc8b2b8f 100644
--- a/drivers/accel/rocket/Makefile
+++ b/drivers/accel/rocket/Makefile
@@ -5,4 +5,5 @@ obj-$(CONFIG_DRM_ACCEL_ROCKET) := rocket.o
rocket-y := \
rocket_core.o \
rocket_device.o \
- rocket_drv.o
+ rocket_drv.o \
+ rocket_gem.o
diff --git a/drivers/accel/rocket/rocket_device.c b/drivers/accel/rocket/rocket_device.c
index a05c103e117e3eaa6439884b7acb6e3483296edb..5e559104741af22c528914c96e44558323ab6c89 100644
--- a/drivers/accel/rocket/rocket_device.c
+++ b/drivers/accel/rocket/rocket_device.c
@@ -4,6 +4,7 @@
#include <linux/array_size.h>
#include <linux/clk.h>
#include <linux/dev_printk.h>
+#include <linux/mutex.h>
#include "rocket_device.h"
@@ -16,10 +17,13 @@ int rocket_device_init(struct rocket_device *rdev)
if (err)
return err;
+ mutex_init(&rdev->iommu_lock);
+
return 0;
}
void rocket_device_fini(struct rocket_device *rdev)
{
+ mutex_destroy(&rdev->iommu_lock);
rocket_core_fini(&rdev->cores[0]);
}
diff --git a/drivers/accel/rocket/rocket_device.h b/drivers/accel/rocket/rocket_device.h
index b5d5f1479d56e2fde59bbcad9de2b58cef9a9a4d..10acfe8534f00a7985d40a93f4b2f7f69d43caee 100644
--- a/drivers/accel/rocket/rocket_device.h
+++ b/drivers/accel/rocket/rocket_device.h
@@ -13,6 +13,8 @@
struct rocket_device {
struct drm_device ddev;
+ struct mutex iommu_lock;
+
struct rocket_core *cores;
unsigned int num_cores;
};
diff --git a/drivers/accel/rocket/rocket_drv.c b/drivers/accel/rocket/rocket_drv.c
index b38a5c6264cb4e74d5e381adaeba1426e576fa56..2b8a88db20c408f313f4f4fe36b051c9d5e4829b 100644
--- a/drivers/accel/rocket/rocket_drv.c
+++ b/drivers/accel/rocket/rocket_drv.c
@@ -6,6 +6,7 @@
#include <drm/drm_gem.h>
#include <drm/drm_ioctl.h>
#include <drm/drm_of.h>
+#include <drm/rocket_accel.h>
#include <linux/array_size.h>
#include <linux/clk.h>
#include <linux/component.h>
@@ -16,6 +17,7 @@
#include <linux/pm_runtime.h>
#include "rocket_drv.h"
+#include "rocket_gem.h"
static int
rocket_open(struct drm_device *dev, struct drm_file *file)
@@ -46,6 +48,8 @@ rocket_postclose(struct drm_device *dev, struct drm_file *file)
static const struct drm_ioctl_desc rocket_drm_driver_ioctls[] = {
#define ROCKET_IOCTL(n, func) \
DRM_IOCTL_DEF_DRV(ROCKET_##n, rocket_ioctl_##func, 0)
+
+ ROCKET_IOCTL(CREATE_BO, create_bo),
};
DEFINE_DRM_ACCEL_FOPS(rocket_accel_driver_fops);
@@ -55,9 +59,10 @@ DEFINE_DRM_ACCEL_FOPS(rocket_accel_driver_fops);
* - 1.0 - initial interface
*/
static const struct drm_driver rocket_drm_driver = {
- .driver_features = DRIVER_COMPUTE_ACCEL,
+ .driver_features = DRIVER_COMPUTE_ACCEL | DRIVER_GEM,
.open = rocket_open,
.postclose = rocket_postclose,
+ .gem_create_object = rocket_gem_create_object,
.ioctls = rocket_drm_driver_ioctls,
.num_ioctls = ARRAY_SIZE(rocket_drm_driver_ioctls),
.fops = &rocket_accel_driver_fops,
diff --git a/drivers/accel/rocket/rocket_gem.c b/drivers/accel/rocket/rocket_gem.c
new file mode 100644
index 0000000000000000000000000000000000000000..61b7f970a6885aa13784daa1222611a02aa10dee
--- /dev/null
+++ b/drivers/accel/rocket/rocket_gem.c
@@ -0,0 +1,115 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright 2024-2025 Tomeu Vizoso <tomeu@tomeuvizoso.net> */
+
+#include <drm/drm_device.h>
+#include <drm/drm_utils.h>
+#include <drm/rocket_accel.h>
+#include <linux/dma-mapping.h>
+#include <linux/iommu.h>
+
+#include "rocket_device.h"
+#include "rocket_drv.h"
+#include "rocket_gem.h"
+
+static void rocket_gem_bo_free(struct drm_gem_object *obj)
+{
+ struct rocket_device *rdev = to_rocket_device(obj->dev);
+ struct rocket_gem_object *bo = to_rocket_bo(obj);
+ size_t unmapped;
+
+ drm_WARN_ON(obj->dev, bo->base.pages_use_count > 1);
+
+ guard(mutex)(&rdev->iommu_lock);
+
+ unmapped = iommu_unmap(bo->domain, bo->base.sgt->sgl->dma_address, bo->size);
+ drm_WARN_ON(obj->dev, unmapped != bo->size);
+
+ /* This will unmap the pages from the IOMMU linked to core 0 */
+ drm_gem_shmem_free(&bo->base);
+}
+
+static const struct drm_gem_object_funcs rocket_gem_funcs = {
+ .free = rocket_gem_bo_free,
+ .print_info = drm_gem_shmem_object_print_info,
+ .pin = drm_gem_shmem_object_pin,
+ .unpin = drm_gem_shmem_object_unpin,
+ .get_sg_table = drm_gem_shmem_object_get_sg_table,
+ .vmap = drm_gem_shmem_object_vmap,
+ .vunmap = drm_gem_shmem_object_vunmap,
+ .mmap = drm_gem_shmem_object_mmap,
+ .vm_ops = &drm_gem_shmem_vm_ops,
+};
+
+struct drm_gem_object *rocket_gem_create_object(struct drm_device *dev, size_t size)
+{
+ struct rocket_gem_object *obj;
+
+ obj = kzalloc(sizeof(*obj), GFP_KERNEL);
+ if (!obj)
+ return ERR_PTR(-ENOMEM);
+
+ obj->base.base.funcs = &rocket_gem_funcs;
+
+ return &obj->base.base;
+}
+
+int rocket_ioctl_create_bo(struct drm_device *dev, void *data, struct drm_file *file)
+{
+ struct rocket_file_priv *rocket_priv = file->driver_priv;
+ struct drm_rocket_create_bo *args = data;
+ struct rocket_device *rdev = to_rocket_device(dev);
+ struct drm_gem_shmem_object *shmem_obj;
+ struct rocket_gem_object *rkt_obj;
+ struct drm_gem_object *gem_obj;
+ struct sg_table *sgt;
+ int ret;
+
+ shmem_obj = drm_gem_shmem_create(dev, args->size);
+ if (IS_ERR(shmem_obj))
+ return PTR_ERR(shmem_obj);
+
+ gem_obj = &shmem_obj->base;
+ rkt_obj = to_rocket_bo(gem_obj);
+
+ rkt_obj->domain = rocket_priv->domain;
+ rkt_obj->size = args->size;
+ rkt_obj->offset = 0;
+
+ ret = drm_gem_handle_create(file, gem_obj, &args->handle);
+ drm_gem_object_put(gem_obj);
+
+ guard(mutex)(&rdev->iommu_lock);
+
+ if (ret)
+ goto err;
+
+ sgt = drm_gem_shmem_get_pages_sgt(shmem_obj);
+ if (IS_ERR(sgt)) {
+ ret = PTR_ERR(sgt);
+ goto err;
+ }
+
+ ret = iommu_map_sgtable(rocket_priv->domain,
+ shmem_obj->sgt->sgl->dma_address,
+ shmem_obj->sgt,
+ IOMMU_READ | IOMMU_WRITE);
+ if (ret < 0 || ret < args->size) {
+ drm_err(dev, "failed to map buffer: size=%d request_size=%u\n",
+ ret, args->size);
+ ret = -ENOMEM;
+ goto err;
+ }
+
+ /* iommu_map_sgtable might have aligned the size */
+ rkt_obj->size = ret;
+ dma_sync_sgtable_for_device(dev->dev, shmem_obj->sgt, DMA_BIDIRECTIONAL);
+ args->offset = drm_vma_node_offset_addr(&gem_obj->vma_node);
+ args->dma_address = sg_dma_address(shmem_obj->sgt->sgl);
+
+ return 0;
+
+err:
+ drm_gem_shmem_object_free(gem_obj);
+
+ return ret;
+}
diff --git a/drivers/accel/rocket/rocket_gem.h b/drivers/accel/rocket/rocket_gem.h
new file mode 100644
index 0000000000000000000000000000000000000000..e8a4d6213fd80419be2ec8af04583a67fb1a4b75
--- /dev/null
+++ b/drivers/accel/rocket/rocket_gem.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright 2024-2025 Tomeu Vizoso <tomeu@tomeuvizoso.net> */
+
+#ifndef __ROCKET_GEM_H__
+#define __ROCKET_GEM_H__
+
+#include <drm/drm_gem_shmem_helper.h>
+
+struct rocket_gem_object {
+ struct drm_gem_shmem_object base;
+
+ struct iommu_domain *domain;
+ size_t size;
+ u32 offset;
+};
+
+struct drm_gem_object *rocket_gem_create_object(struct drm_device *dev, size_t size);
+
+int rocket_ioctl_create_bo(struct drm_device *dev, void *data, struct drm_file *file);
+
+static inline
+struct rocket_gem_object *to_rocket_bo(struct drm_gem_object *obj)
+{
+ return container_of(to_drm_gem_shmem_obj(obj), struct rocket_gem_object, base);
+}
+
+#endif
diff --git a/include/uapi/drm/rocket_accel.h b/include/uapi/drm/rocket_accel.h
new file mode 100644
index 0000000000000000000000000000000000000000..95720702b7c4413d72b89c1f0f59abb22dc8c6b3
--- /dev/null
+++ b/include/uapi/drm/rocket_accel.h
@@ -0,0 +1,44 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2024 Tomeu Vizoso
+ */
+#ifndef __DRM_UAPI_ROCKET_ACCEL_H__
+#define __DRM_UAPI_ROCKET_ACCEL_H__
+
+#include "drm.h"
+
+#if defined(__cplusplus)
+extern "C" {
+#endif
+
+#define DRM_ROCKET_CREATE_BO 0x00
+
+#define DRM_IOCTL_ROCKET_CREATE_BO DRM_IOWR(DRM_COMMAND_BASE + DRM_ROCKET_CREATE_BO, struct drm_rocket_create_bo)
+
+/**
+ * struct drm_rocket_create_bo - ioctl argument for creating Rocket BOs.
+ *
+ */
+struct drm_rocket_create_bo {
+ /** Input: Size of the requested BO. */
+ __u32 size;
+
+ /** Output: GEM handle for the BO. */
+ __u32 handle;
+
+ /**
+ * Output: DMA address for the BO in the NPU address space. This address
+ * is private to the DRM fd and is valid for the lifetime of the GEM
+ * handle.
+ */
+ __u64 dma_address;
+
+ /** Output: Offset into the drm node to use for subsequent mmap call. */
+ __u64 offset;
+};
+
+#if defined(__cplusplus)
+}
+#endif
+
+#endif /* __DRM_UAPI_ROCKET_ACCEL_H__ */
--
2.49.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v7 04/10] accel/rocket: Add job submission IOCTL
2025-06-06 6:28 [PATCH v7 00/10] New DRM accel driver for Rockchip's RKNN NPU Tomeu Vizoso
2025-06-06 6:28 ` [PATCH v7 02/10] accel/rocket: Add a new driver for Rockchip's NPU Tomeu Vizoso
2025-06-06 6:28 ` [PATCH v7 03/10] accel/rocket: Add IOCTL for BO creation Tomeu Vizoso
@ 2025-06-06 6:28 ` Tomeu Vizoso
2025-06-24 13:50 ` Robin Murphy
2025-06-30 20:56 ` Rob Herring
2025-06-06 6:28 ` [PATCH v7 05/10] accel/rocket: Add IOCTLs for synchronizing memory accesses Tomeu Vizoso
` (8 subsequent siblings)
11 siblings, 2 replies; 30+ messages in thread
From: Tomeu Vizoso @ 2025-06-06 6:28 UTC (permalink / raw)
To: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner,
Oded Gabbay, Jonathan Corbet, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
Christian König, Sebastian Reichel, Nicolas Frattaroli,
Kever Yang, Robin Murphy, Daniel Stone, Da Xue, Jeff Hugo
Cc: devicetree, linux-arm-kernel, linux-rockchip, linux-kernel,
dri-devel, linux-doc, linux-media, linaro-mm-sig, Tomeu Vizoso
Using the DRM GPU scheduler infrastructure, with a scheduler for each
core.
Userspace can decide for a series of tasks to be executed sequentially
in the same core, so SRAM locality can be taken advantage of.
The job submission code was initially based on Panfrost.
v2:
- Remove hardcoded number of cores
- Misc. style fixes (Jeffrey Hugo)
- Repack IOCTL struct (Jeffrey Hugo)
v3:
- Adapt to a split of the register block in the DT bindings (Nicolas
Frattaroli)
- Make use of GPL-2.0-only for the copyright notice (Jeff Hugo)
- Use drm_* logging functions (Thomas Zimmermann)
- Rename reg i/o macros (Thomas Zimmermann)
- Add padding to ioctls and check for zero (Jeff Hugo)
- Improve error handling (Nicolas Frattaroli)
v6:
- Use mutexes guard (Markus Elfring)
- Use u64_to_user_ptr (Jeff Hugo)
- Drop rocket_fence (Rob Herring)
v7:
- Assign its own IOMMU domain to each client, for isolation (Daniel
Stone and Robin Murphy)
Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
---
drivers/accel/rocket/Makefile | 3 +-
drivers/accel/rocket/rocket_core.c | 10 +
drivers/accel/rocket/rocket_core.h | 14 +
drivers/accel/rocket/rocket_device.c | 2 +
drivers/accel/rocket/rocket_device.h | 2 +
drivers/accel/rocket/rocket_drv.c | 15 +
drivers/accel/rocket/rocket_drv.h | 4 +
drivers/accel/rocket/rocket_job.c | 700 +++++++++++++++++++++++++++++++++++
drivers/accel/rocket/rocket_job.h | 52 +++
include/uapi/drm/rocket_accel.h | 64 ++++
10 files changed, 865 insertions(+), 1 deletion(-)
diff --git a/drivers/accel/rocket/Makefile b/drivers/accel/rocket/Makefile
index 4deef267f9e1238c4d8bd108dcc8afd9dc8b2b8f..3713dfe223d6ec6293ced3ef9291af2f3d144131 100644
--- a/drivers/accel/rocket/Makefile
+++ b/drivers/accel/rocket/Makefile
@@ -6,4 +6,5 @@ rocket-y := \
rocket_core.o \
rocket_device.o \
rocket_drv.o \
- rocket_gem.o
+ rocket_gem.o \
+ rocket_job.o
diff --git a/drivers/accel/rocket/rocket_core.c b/drivers/accel/rocket/rocket_core.c
index 3a6f25f2b4103075102739588bcdad96510e2a4e..b57e10d9938c0f71d0107841244ec969ca9e30e1 100644
--- a/drivers/accel/rocket/rocket_core.c
+++ b/drivers/accel/rocket/rocket_core.c
@@ -8,6 +8,7 @@
#include <linux/pm_runtime.h>
#include "rocket_core.h"
+#include "rocket_job.h"
int rocket_core_init(struct rocket_core *core)
{
@@ -38,6 +39,10 @@ int rocket_core_init(struct rocket_core *core)
return PTR_ERR(core->core_iomem);
}
+ err = rocket_job_init(core);
+ if (err)
+ return err;
+
pm_runtime_use_autosuspend(dev);
/*
@@ -51,6 +56,10 @@ int rocket_core_init(struct rocket_core *core)
pm_runtime_enable(dev);
err = pm_runtime_get_sync(dev);
+ if (err) {
+ rocket_job_fini(core);
+ return err;
+ }
version = rocket_pc_readl(core, VERSION);
version += rocket_pc_readl(core, VERSION_NUM) & 0xffff;
@@ -67,4 +76,5 @@ void rocket_core_fini(struct rocket_core *core)
{
pm_runtime_dont_use_autosuspend(core->dev);
pm_runtime_disable(core->dev);
+ rocket_job_fini(core);
}
diff --git a/drivers/accel/rocket/rocket_core.h b/drivers/accel/rocket/rocket_core.h
index 1b1beb9798f03ec2ca325496a4d894674d0b798d..de5fb4e26d4542bda8abf6ab8d4bd562755d547e 100644
--- a/drivers/accel/rocket/rocket_core.h
+++ b/drivers/accel/rocket/rocket_core.h
@@ -37,6 +37,20 @@ struct rocket_core {
void __iomem *cna_iomem;
void __iomem *core_iomem;
struct clk_bulk_data clks[4];
+
+ struct rocket_job *in_flight_job;
+
+ spinlock_t job_lock;
+
+ struct {
+ struct workqueue_struct *wq;
+ struct work_struct work;
+ atomic_t pending;
+ } reset;
+
+ struct drm_gpu_scheduler sched;
+ u64 fence_context;
+ u64 emit_seqno;
};
int rocket_core_init(struct rocket_core *core);
diff --git a/drivers/accel/rocket/rocket_device.c b/drivers/accel/rocket/rocket_device.c
index 5e559104741af22c528914c96e44558323ab6c89..8f5c99aeaa6118c406cc570f7d4747cd1cb1c082 100644
--- a/drivers/accel/rocket/rocket_device.c
+++ b/drivers/accel/rocket/rocket_device.c
@@ -18,12 +18,14 @@ int rocket_device_init(struct rocket_device *rdev)
return err;
mutex_init(&rdev->iommu_lock);
+ mutex_init(&rdev->sched_lock);
return 0;
}
void rocket_device_fini(struct rocket_device *rdev)
{
+ mutex_destroy(&rdev->sched_lock);
mutex_destroy(&rdev->iommu_lock);
rocket_core_fini(&rdev->cores[0]);
}
diff --git a/drivers/accel/rocket/rocket_device.h b/drivers/accel/rocket/rocket_device.h
index 10acfe8534f00a7985d40a93f4b2f7f69d43caee..50e46f0516bd1615b5f826c5002a6c0ecbf9aed4 100644
--- a/drivers/accel/rocket/rocket_device.h
+++ b/drivers/accel/rocket/rocket_device.h
@@ -13,6 +13,8 @@
struct rocket_device {
struct drm_device ddev;
+ struct mutex sched_lock;
+
struct mutex iommu_lock;
struct rocket_core *cores;
diff --git a/drivers/accel/rocket/rocket_drv.c b/drivers/accel/rocket/rocket_drv.c
index 2b8a88db20c408f313f4f4fe36b051c9d5e4829b..4ab78193c186dfcfc3e323f16c588e85e6a8a334 100644
--- a/drivers/accel/rocket/rocket_drv.c
+++ b/drivers/accel/rocket/rocket_drv.c
@@ -18,12 +18,14 @@
#include "rocket_drv.h"
#include "rocket_gem.h"
+#include "rocket_job.h"
static int
rocket_open(struct drm_device *dev, struct drm_file *file)
{
struct rocket_device *rdev = to_rocket_device(dev);
struct rocket_file_priv *rocket_priv;
+ int ret;
rocket_priv = kzalloc(sizeof(*rocket_priv), GFP_KERNEL);
if (!rocket_priv)
@@ -33,7 +35,15 @@ rocket_open(struct drm_device *dev, struct drm_file *file)
rocket_priv->domain = iommu_paging_domain_alloc(dev->dev);
file->driver_priv = rocket_priv;
+ ret = rocket_job_open(rocket_priv);
+ if (ret)
+ goto err_free;
+
return 0;
+
+err_free:
+ kfree(rocket_priv);
+ return ret;
}
static void
@@ -42,6 +52,7 @@ rocket_postclose(struct drm_device *dev, struct drm_file *file)
struct rocket_file_priv *rocket_priv = file->driver_priv;
iommu_domain_free(rocket_priv->domain);
+ rocket_job_close(rocket_priv);
kfree(rocket_priv);
}
@@ -50,6 +61,7 @@ static const struct drm_ioctl_desc rocket_drm_driver_ioctls[] = {
DRM_IOCTL_DEF_DRV(ROCKET_##n, rocket_ioctl_##func, 0)
ROCKET_IOCTL(CREATE_BO, create_bo),
+ ROCKET_IOCTL(SUBMIT, submit),
};
DEFINE_DRM_ACCEL_FOPS(rocket_accel_driver_fops);
@@ -258,6 +270,9 @@ static int rocket_device_runtime_suspend(struct device *dev)
if (core < 0)
return -ENODEV;
+ if (!rocket_job_is_idle(&rdev->cores[core]))
+ return -EBUSY;
+
clk_bulk_disable_unprepare(ARRAY_SIZE(rdev->cores[core].clks), rdev->cores[core].clks);
return 0;
diff --git a/drivers/accel/rocket/rocket_drv.h b/drivers/accel/rocket/rocket_drv.h
index 3219621afb72acdfa915c110e2ec3aacb66bd940..d095e999a8a612fb0b1bf7b96b7c759082a62ab9 100644
--- a/drivers/accel/rocket/rocket_drv.h
+++ b/drivers/accel/rocket/rocket_drv.h
@@ -4,12 +4,16 @@
#ifndef __ROCKET_DRV_H__
#define __ROCKET_DRV_H__
+#include <drm/gpu_scheduler.h>
+
#include "rocket_device.h"
struct rocket_file_priv {
struct rocket_device *rdev;
struct iommu_domain *domain;
+
+ struct drm_sched_entity sched_entity;
};
#endif
diff --git a/drivers/accel/rocket/rocket_job.c b/drivers/accel/rocket/rocket_job.c
new file mode 100644
index 0000000000000000000000000000000000000000..52246b6ae2e91cea0b3cc1de5c05fb502615bf46
--- /dev/null
+++ b/drivers/accel/rocket/rocket_job.c
@@ -0,0 +1,700 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright 2019 Linaro, Ltd, Rob Herring <robh@kernel.org> */
+/* Copyright 2019 Collabora ltd. */
+/* Copyright 2024-2025 Tomeu Vizoso <tomeu@tomeuvizoso.net> */
+
+#include <drm/drm_print.h>
+#include <drm/drm_file.h>
+#include <drm/drm_gem.h>
+#include <drm/rocket_accel.h>
+#include <linux/interrupt.h>
+#include <linux/iommu.h>
+#include <linux/platform_device.h>
+#include <linux/pm_runtime.h>
+
+#include "rocket_core.h"
+#include "rocket_device.h"
+#include "rocket_drv.h"
+#include "rocket_job.h"
+#include "rocket_registers.h"
+
+#define JOB_TIMEOUT_MS 500
+
+static struct rocket_job *
+to_rocket_job(struct drm_sched_job *sched_job)
+{
+ return container_of(sched_job, struct rocket_job, base);
+}
+
+static const char *rocket_fence_get_driver_name(struct dma_fence *fence)
+{
+ return "rocket";
+}
+
+static const char *rocket_fence_get_timeline_name(struct dma_fence *fence)
+{
+ return "rockchip-npu";
+}
+
+static const struct dma_fence_ops rocket_fence_ops = {
+ .get_driver_name = rocket_fence_get_driver_name,
+ .get_timeline_name = rocket_fence_get_timeline_name,
+};
+
+static struct dma_fence *rocket_fence_create(struct rocket_core *core)
+{
+ struct dma_fence *fence;
+
+ fence = kzalloc(sizeof(*fence), GFP_KERNEL);
+ if (!fence)
+ return ERR_PTR(-ENOMEM);
+
+ dma_fence_init(fence, &rocket_fence_ops, &core->job_lock,
+ core->fence_context, ++core->emit_seqno);
+
+ return fence;
+}
+
+static int
+rocket_copy_tasks(struct drm_device *dev,
+ struct drm_file *file_priv,
+ struct drm_rocket_job *job,
+ struct rocket_job *rjob)
+{
+ struct drm_rocket_task *tasks;
+ int ret = 0;
+ int i;
+
+ rjob->task_count = job->task_count;
+
+ if (!rjob->task_count)
+ return 0;
+
+ tasks = kvmalloc_array(rjob->task_count, sizeof(*tasks), GFP_KERNEL);
+ if (!tasks) {
+ ret = -ENOMEM;
+ drm_dbg(dev, "Failed to allocate incoming tasks\n");
+ goto fail;
+ }
+
+ if (copy_from_user(tasks, u64_to_user_ptr(job->tasks), rjob->task_count * sizeof(*tasks))) {
+ ret = -EFAULT;
+ drm_dbg(dev, "Failed to copy incoming tasks\n");
+ goto fail;
+ }
+
+ rjob->tasks = kvmalloc_array(job->task_count, sizeof(*rjob->tasks), GFP_KERNEL);
+ if (!rjob->tasks) {
+ drm_dbg(dev, "Failed to allocate task array\n");
+ ret = -ENOMEM;
+ goto fail;
+ }
+
+ for (i = 0; i < rjob->task_count; i++) {
+ if (tasks[i].reserved != 0) {
+ drm_dbg(dev, "Reserved field in drm_rocket_task struct should be 0.\n");
+ return -EINVAL;
+ }
+
+ if (tasks[i].regcmd_count == 0) {
+ ret = -EINVAL;
+ goto fail;
+ }
+ rjob->tasks[i].regcmd = tasks[i].regcmd;
+ rjob->tasks[i].regcmd_count = tasks[i].regcmd_count;
+ }
+
+fail:
+ kvfree(tasks);
+ return ret;
+}
+
+static void rocket_job_hw_submit(struct rocket_core *core, struct rocket_job *job)
+{
+ struct rocket_task *task;
+ bool task_pp_en = 1;
+ bool task_count = 1;
+
+ /* GO ! */
+
+ /* Don't queue the job if a reset is in progress */
+ if (atomic_read(&core->reset.pending))
+ return;
+
+ task = &job->tasks[job->next_task_idx];
+ job->next_task_idx++;
+
+ rocket_pc_writel(core, BASE_ADDRESS, 0x1);
+
+ rocket_cna_writel(core, S_POINTER, 0xe + 0x10000000 * core->index);
+ rocket_core_writel(core, S_POINTER, 0xe + 0x10000000 * core->index);
+
+ rocket_pc_writel(core, BASE_ADDRESS, task->regcmd);
+ rocket_pc_writel(core, REGISTER_AMOUNTS, (task->regcmd_count + 1) / 2 - 1);
+
+ rocket_pc_writel(core, INTERRUPT_MASK, PC_INTERRUPT_MASK_DPU_0 | PC_INTERRUPT_MASK_DPU_1);
+ rocket_pc_writel(core, INTERRUPT_CLEAR, PC_INTERRUPT_CLEAR_DPU_0 | PC_INTERRUPT_CLEAR_DPU_1);
+
+ rocket_pc_writel(core, TASK_CON, ((0x6 | task_pp_en) << 12) | task_count);
+
+ rocket_pc_writel(core, TASK_DMA_BASE_ADDR, 0x0);
+
+ rocket_pc_writel(core, OPERATION_ENABLE, 0x1);
+
+ dev_dbg(core->dev, "Submitted regcmd at 0x%llx to core %d", task->regcmd, core->index);
+}
+
+static int rocket_acquire_object_fences(struct drm_gem_object **bos,
+ int bo_count,
+ struct drm_sched_job *job,
+ bool is_write)
+{
+ int i, ret;
+
+ for (i = 0; i < bo_count; i++) {
+ ret = dma_resv_reserve_fences(bos[i]->resv, 1);
+ if (ret)
+ return ret;
+
+ ret = drm_sched_job_add_implicit_dependencies(job, bos[i],
+ is_write);
+ if (ret)
+ return ret;
+ }
+
+ return 0;
+}
+
+static void rocket_attach_object_fences(struct drm_gem_object **bos,
+ int bo_count,
+ struct dma_fence *fence)
+{
+ int i;
+
+ for (i = 0; i < bo_count; i++)
+ dma_resv_add_fence(bos[i]->resv, fence, DMA_RESV_USAGE_WRITE);
+}
+
+static int rocket_job_push(struct rocket_job *job)
+{
+ struct rocket_device *rdev = job->rdev;
+ struct drm_gem_object **bos;
+ struct ww_acquire_ctx acquire_ctx;
+ int ret = 0;
+
+ bos = kvmalloc_array(job->in_bo_count + job->out_bo_count, sizeof(void *),
+ GFP_KERNEL);
+ memcpy(bos, job->in_bos, job->in_bo_count * sizeof(void *));
+ memcpy(&bos[job->in_bo_count], job->out_bos, job->out_bo_count * sizeof(void *));
+
+ ret = drm_gem_lock_reservations(bos, job->in_bo_count + job->out_bo_count, &acquire_ctx);
+ if (ret)
+ goto err;
+
+ scoped_guard(mutex, &rdev->sched_lock) {
+ drm_sched_job_arm(&job->base);
+
+ job->inference_done_fence = dma_fence_get(&job->base.s_fence->finished);
+
+ ret = rocket_acquire_object_fences(job->in_bos, job->in_bo_count, &job->base, false);
+ if (ret)
+ goto err_unlock;
+
+ ret = rocket_acquire_object_fences(job->out_bos, job->out_bo_count, &job->base, true);
+ if (ret)
+ goto err_unlock;
+
+ kref_get(&job->refcount); /* put by scheduler job completion */
+
+ drm_sched_entity_push_job(&job->base);
+ }
+
+ rocket_attach_object_fences(job->out_bos, job->out_bo_count, job->inference_done_fence);
+
+err_unlock:
+ drm_gem_unlock_reservations(bos, job->in_bo_count + job->out_bo_count, &acquire_ctx);
+err:
+ kfree(bos);
+
+ return ret;
+}
+
+static void rocket_job_cleanup(struct kref *ref)
+{
+ struct rocket_job *job = container_of(ref, struct rocket_job,
+ refcount);
+ unsigned int i;
+
+ dma_fence_put(job->done_fence);
+ dma_fence_put(job->inference_done_fence);
+
+ if (job->in_bos) {
+ for (i = 0; i < job->in_bo_count; i++)
+ drm_gem_object_put(job->in_bos[i]);
+
+ kvfree(job->in_bos);
+ }
+
+ if (job->out_bos) {
+ for (i = 0; i < job->out_bo_count; i++)
+ drm_gem_object_put(job->out_bos[i]);
+
+ kvfree(job->out_bos);
+ }
+
+ kfree(job->tasks);
+
+ kfree(job);
+}
+
+static void rocket_job_put(struct rocket_job *job)
+{
+ kref_put(&job->refcount, rocket_job_cleanup);
+}
+
+static void rocket_job_free(struct drm_sched_job *sched_job)
+{
+ struct rocket_job *job = to_rocket_job(sched_job);
+
+ drm_sched_job_cleanup(sched_job);
+
+ rocket_job_put(job);
+}
+
+static struct rocket_core *sched_to_core(struct rocket_device *rdev,
+ struct drm_gpu_scheduler *sched)
+{
+ unsigned int core;
+
+ for (core = 0; core < rdev->num_cores; core++) {
+ if (&rdev->cores[core].sched == sched)
+ return &rdev->cores[core];
+ }
+
+ return NULL;
+}
+
+static struct dma_fence *rocket_job_run(struct drm_sched_job *sched_job)
+{
+ struct rocket_job *job = to_rocket_job(sched_job);
+ struct rocket_device *rdev = job->rdev;
+ struct rocket_core *core = sched_to_core(rdev, sched_job->sched);
+ struct dma_fence *fence = NULL;
+ int ret;
+
+ if (unlikely(job->base.s_fence->finished.error))
+ return NULL;
+
+ /*
+ * Nothing to execute: can happen if the job has finished while
+ * we were resetting the GPU.
+ */
+ if (job->next_task_idx == job->task_count)
+ return NULL;
+
+ fence = rocket_fence_create(core);
+ if (IS_ERR(fence))
+ return fence;
+
+ if (job->done_fence)
+ dma_fence_put(job->done_fence);
+ job->done_fence = dma_fence_get(fence);
+
+ ret = pm_runtime_get_sync(core->dev);
+ if (ret < 0)
+ return fence;
+
+ ret = iommu_attach_group(job->domain, iommu_group_get(core->dev));
+ if (ret < 0)
+ return fence;
+
+ scoped_guard(spinlock, &core->job_lock) {
+ core->in_flight_job = job;
+ rocket_job_hw_submit(core, job);
+ }
+
+ return fence;
+}
+
+static void rocket_job_handle_done(struct rocket_core *core,
+ struct rocket_job *job)
+{
+ if (job->next_task_idx < job->task_count) {
+ rocket_job_hw_submit(core, job);
+ return;
+ }
+
+ core->in_flight_job = NULL;
+ iommu_detach_group(NULL, iommu_group_get(core->dev));
+ dma_fence_signal_locked(job->done_fence);
+ pm_runtime_put_autosuspend(core->dev);
+}
+
+static void rocket_job_handle_irq(struct rocket_core *core)
+{
+ u32 status, raw_status;
+
+ pm_runtime_mark_last_busy(core->dev);
+
+ status = rocket_pc_readl(core, INTERRUPT_STATUS);
+ raw_status = rocket_pc_readl(core, INTERRUPT_RAW_STATUS);
+
+ rocket_pc_writel(core, OPERATION_ENABLE, 0x0);
+ rocket_pc_writel(core, INTERRUPT_CLEAR, 0x1ffff);
+
+ scoped_guard(spinlock, &core->job_lock)
+ if (core->in_flight_job)
+ rocket_job_handle_done(core, core->in_flight_job);
+}
+
+static void
+rocket_reset(struct rocket_core *core, struct drm_sched_job *bad)
+{
+ bool cookie;
+
+ if (!atomic_read(&core->reset.pending))
+ return;
+
+ /*
+ * Stop the scheduler.
+ *
+ * FIXME: We temporarily get out of the dma_fence_signalling section
+ * because the cleanup path generate lockdep splats when taking locks
+ * to release job resources. We should rework the code to follow this
+ * pattern:
+ *
+ * try_lock
+ * if (locked)
+ * release
+ * else
+ * schedule_work_to_release_later
+ */
+ drm_sched_stop(&core->sched, bad);
+
+ cookie = dma_fence_begin_signalling();
+
+ if (bad)
+ drm_sched_increase_karma(bad);
+
+ /*
+ * Mask job interrupts and synchronize to make sure we won't be
+ * interrupted during our reset.
+ */
+ rocket_pc_writel(core, INTERRUPT_MASK, 0x0);
+ synchronize_irq(core->irq);
+
+ /* Handle the remaining interrupts before we reset. */
+ rocket_job_handle_irq(core);
+
+ /*
+ * Remaining interrupts have been handled, but we might still have
+ * stuck jobs. Let's make sure the PM counters stay balanced by
+ * manually calling pm_runtime_put_noidle() and
+ * rocket_devfreq_record_idle() for each stuck job.
+ * Let's also make sure the cycle counting register's refcnt is
+ * kept balanced to prevent it from running forever
+ */
+ scoped_guard(spinlock, &core->job_lock) {
+ if (core->in_flight_job)
+ pm_runtime_put_noidle(core->dev);
+
+ core->in_flight_job = NULL;
+ }
+
+ /* Proceed with reset now. */
+ pm_runtime_force_suspend(core->dev);
+ pm_runtime_force_resume(core->dev);
+
+ /* GPU has been reset, we can clear the reset pending bit. */
+ atomic_set(&core->reset.pending, 0);
+
+ /*
+ * Now resubmit jobs that were previously queued but didn't have a
+ * chance to finish.
+ * FIXME: We temporarily get out of the DMA fence signalling section
+ * while resubmitting jobs because the job submission logic will
+ * allocate memory with the GFP_KERNEL flag which can trigger memory
+ * reclaim and exposes a lock ordering issue.
+ */
+ dma_fence_end_signalling(cookie);
+ drm_sched_resubmit_jobs(&core->sched);
+ cookie = dma_fence_begin_signalling();
+
+ /* Restart the scheduler */
+ drm_sched_start(&core->sched, 0);
+
+ dma_fence_end_signalling(cookie);
+}
+
+static enum drm_gpu_sched_stat rocket_job_timedout(struct drm_sched_job *sched_job)
+{
+ struct rocket_job *job = to_rocket_job(sched_job);
+ struct rocket_device *rdev = job->rdev;
+ struct rocket_core *core = sched_to_core(rdev, sched_job->sched);
+
+ /*
+ * If the GPU managed to complete this jobs fence, the timeout is
+ * spurious. Bail out.
+ */
+ if (dma_fence_is_signaled(job->done_fence))
+ return DRM_GPU_SCHED_STAT_NOMINAL;
+
+ /*
+ * Rocket IRQ handler may take a long time to process an interrupt
+ * if there is another IRQ handler hogging the processing.
+ * For example, the HDMI encoder driver might be stuck in the IRQ
+ * handler for a significant time in a case of bad cable connection.
+ * In order to catch such cases and not report spurious rocket
+ * job timeouts, synchronize the IRQ handler and re-check the fence
+ * status.
+ */
+ synchronize_irq(core->irq);
+
+ if (dma_fence_is_signaled(job->done_fence)) {
+ dev_warn(core->dev, "unexpectedly high interrupt latency\n");
+ return DRM_GPU_SCHED_STAT_NOMINAL;
+ }
+
+ dev_err(core->dev, "gpu sched timeout");
+
+ atomic_set(&core->reset.pending, 1);
+ rocket_reset(core, sched_job);
+ iommu_detach_group(NULL, iommu_group_get(core->dev));
+
+ return DRM_GPU_SCHED_STAT_NOMINAL;
+}
+
+static void rocket_reset_work(struct work_struct *work)
+{
+ struct rocket_core *core;
+
+ core = container_of(work, struct rocket_core, reset.work);
+ rocket_reset(core, NULL);
+}
+
+static const struct drm_sched_backend_ops rocket_sched_ops = {
+ .run_job = rocket_job_run,
+ .timedout_job = rocket_job_timedout,
+ .free_job = rocket_job_free
+};
+
+static irqreturn_t rocket_job_irq_handler_thread(int irq, void *data)
+{
+ struct rocket_core *core = data;
+
+ rocket_job_handle_irq(core);
+
+ return IRQ_HANDLED;
+}
+
+static irqreturn_t rocket_job_irq_handler(int irq, void *data)
+{
+ struct rocket_core *core = data;
+ u32 raw_status = rocket_pc_readl(core, INTERRUPT_RAW_STATUS);
+
+ WARN_ON(raw_status & PC_INTERRUPT_RAW_STATUS_DMA_READ_ERROR);
+ WARN_ON(raw_status & PC_INTERRUPT_RAW_STATUS_DMA_READ_ERROR);
+
+ if (!(raw_status & PC_INTERRUPT_RAW_STATUS_DPU_0 ||
+ raw_status & PC_INTERRUPT_RAW_STATUS_DPU_1))
+ return IRQ_NONE;
+
+ rocket_pc_writel(core, INTERRUPT_MASK, 0x0);
+
+ return IRQ_WAKE_THREAD;
+}
+
+int rocket_job_init(struct rocket_core *core)
+{
+ struct drm_sched_init_args args = {
+ .ops = &rocket_sched_ops,
+ .num_rqs = DRM_SCHED_PRIORITY_COUNT,
+ .credit_limit = 1,
+ .timeout = msecs_to_jiffies(JOB_TIMEOUT_MS),
+ .name = dev_name(core->dev),
+ .dev = core->dev,
+ };
+ int ret;
+
+ INIT_WORK(&core->reset.work, rocket_reset_work);
+ spin_lock_init(&core->job_lock);
+
+ core->irq = platform_get_irq(to_platform_device(core->dev), 0);
+ if (core->irq < 0)
+ return core->irq;
+
+ ret = devm_request_threaded_irq(core->dev, core->irq,
+ rocket_job_irq_handler,
+ rocket_job_irq_handler_thread,
+ IRQF_SHARED, KBUILD_MODNAME "-job",
+ core);
+ if (ret) {
+ dev_err(core->dev, "failed to request job irq");
+ return ret;
+ }
+
+ core->reset.wq = alloc_ordered_workqueue("rocket-reset-%d", 0, core->index);
+ if (!core->reset.wq)
+ return -ENOMEM;
+
+ core->fence_context = dma_fence_context_alloc(1);
+
+ args.timeout_wq = core->reset.wq;
+ ret = drm_sched_init(&core->sched, &args);
+ if (ret) {
+ dev_err(core->dev, "Failed to create scheduler: %d.", ret);
+ goto err_sched;
+ }
+
+ return 0;
+
+err_sched:
+ drm_sched_fini(&core->sched);
+
+ destroy_workqueue(core->reset.wq);
+ return ret;
+}
+
+void rocket_job_fini(struct rocket_core *core)
+{
+ drm_sched_fini(&core->sched);
+
+ cancel_work_sync(&core->reset.work);
+ destroy_workqueue(core->reset.wq);
+}
+
+int rocket_job_open(struct rocket_file_priv *rocket_priv)
+{
+ struct rocket_device *rdev = rocket_priv->rdev;
+ struct drm_gpu_scheduler **scheds = kmalloc_array(rdev->num_cores, sizeof(scheds),
+ GFP_KERNEL);
+ unsigned int core;
+ int ret;
+
+ for (core = 0; core < rdev->num_cores; core++)
+ scheds[core] = &rdev->cores[core].sched;
+
+ ret = drm_sched_entity_init(&rocket_priv->sched_entity,
+ DRM_SCHED_PRIORITY_NORMAL,
+ scheds,
+ rdev->num_cores, NULL);
+ if (WARN_ON(ret))
+ return ret;
+
+ return 0;
+}
+
+void rocket_job_close(struct rocket_file_priv *rocket_priv)
+{
+ struct drm_sched_entity *entity = &rocket_priv->sched_entity;
+
+ kfree(entity->sched_list);
+ drm_sched_entity_destroy(entity);
+}
+
+int rocket_job_is_idle(struct rocket_core *core)
+{
+ /* If there are any jobs in this HW queue, we're not idle */
+ if (atomic_read(&core->sched.credit_count))
+ return false;
+
+ return true;
+}
+
+static int rocket_ioctl_submit_job(struct drm_device *dev, struct drm_file *file,
+ struct drm_rocket_job *job)
+{
+ struct rocket_device *rdev = to_rocket_device(dev);
+ struct rocket_file_priv *file_priv = file->driver_priv;
+ struct rocket_job *rjob = NULL;
+ int ret = 0;
+
+ if (job->task_count == 0)
+ return -EINVAL;
+
+ rjob = kzalloc(sizeof(*rjob), GFP_KERNEL);
+ if (!rjob)
+ return -ENOMEM;
+
+ kref_init(&rjob->refcount);
+
+ rjob->rdev = rdev;
+
+ ret = drm_sched_job_init(&rjob->base,
+ &file_priv->sched_entity,
+ 1, NULL);
+ if (ret)
+ goto out_put_job;
+
+ ret = rocket_copy_tasks(dev, file, job, rjob);
+ if (ret)
+ goto out_cleanup_job;
+
+ ret = drm_gem_objects_lookup(file, u64_to_user_ptr(job->in_bo_handles),
+ job->in_bo_handle_count, &rjob->in_bos);
+ if (ret)
+ goto out_cleanup_job;
+
+ rjob->in_bo_count = job->in_bo_handle_count;
+
+ ret = drm_gem_objects_lookup(file, u64_to_user_ptr(job->out_bo_handles),
+ job->out_bo_handle_count, &rjob->out_bos);
+ if (ret)
+ goto out_cleanup_job;
+
+ rjob->out_bo_count = job->out_bo_handle_count;
+
+ rjob->domain = file_priv->domain;
+
+ ret = rocket_job_push(rjob);
+ if (ret)
+ goto out_cleanup_job;
+
+out_cleanup_job:
+ if (ret)
+ drm_sched_job_cleanup(&rjob->base);
+out_put_job:
+ rocket_job_put(rjob);
+
+ return ret;
+}
+
+int rocket_ioctl_submit(struct drm_device *dev, void *data, struct drm_file *file)
+{
+ struct drm_rocket_submit *args = data;
+ struct drm_rocket_job *jobs;
+ int ret = 0;
+ unsigned int i = 0;
+
+ if (args->reserved != 0) {
+ drm_dbg(dev, "Reserved field in drm_rocket_submit struct should be 0.\n");
+ return -EINVAL;
+ }
+
+ jobs = kvmalloc_array(args->job_count, sizeof(*jobs), GFP_KERNEL);
+ if (!jobs) {
+ drm_dbg(dev, "Failed to allocate incoming job array\n");
+ return -ENOMEM;
+ }
+
+ if (copy_from_user(jobs, u64_to_user_ptr(args->jobs),
+ args->job_count * sizeof(*jobs))) {
+ ret = -EFAULT;
+ drm_dbg(dev, "Failed to copy incoming job array\n");
+ goto exit;
+ }
+
+ for (i = 0; i < args->job_count; i++) {
+ if (jobs[i].reserved != 0) {
+ drm_dbg(dev, "Reserved field in drm_rocket_job struct should be 0.\n");
+ return -EINVAL;
+ }
+
+ rocket_ioctl_submit_job(dev, file, &jobs[i]);
+ }
+
+exit:
+ kfree(jobs);
+
+ return ret;
+}
diff --git a/drivers/accel/rocket/rocket_job.h b/drivers/accel/rocket/rocket_job.h
new file mode 100644
index 0000000000000000000000000000000000000000..e461de99906a1081f66410117b8e41e340790b2a
--- /dev/null
+++ b/drivers/accel/rocket/rocket_job.h
@@ -0,0 +1,52 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright 2024-2025 Tomeu Vizoso <tomeu@tomeuvizoso.net> */
+
+#ifndef __ROCKET_JOB_H__
+#define __ROCKET_JOB_H__
+
+#include <drm/drm_drv.h>
+#include <drm/gpu_scheduler.h>
+
+#include "rocket_core.h"
+#include "rocket_drv.h"
+
+struct rocket_task {
+ u64 regcmd;
+ u32 regcmd_count;
+};
+
+struct rocket_job {
+ struct drm_sched_job base;
+
+ struct rocket_device *rdev;
+
+ struct drm_gem_object **in_bos;
+ struct drm_gem_object **out_bos;
+
+ u32 in_bo_count;
+ u32 out_bo_count;
+
+ struct rocket_task *tasks;
+ u32 task_count;
+ u32 next_task_idx;
+
+ /* Fence to be signaled by drm-sched once its done with the job */
+ struct dma_fence *inference_done_fence;
+
+ /* Fence to be signaled by IRQ handler when the job is complete. */
+ struct dma_fence *done_fence;
+
+ struct iommu_domain *domain;
+
+ struct kref refcount;
+};
+
+int rocket_ioctl_submit(struct drm_device *dev, void *data, struct drm_file *file);
+
+int rocket_job_init(struct rocket_core *core);
+void rocket_job_fini(struct rocket_core *core);
+int rocket_job_open(struct rocket_file_priv *rocket_priv);
+void rocket_job_close(struct rocket_file_priv *rocket_priv);
+int rocket_job_is_idle(struct rocket_core *core);
+
+#endif
diff --git a/include/uapi/drm/rocket_accel.h b/include/uapi/drm/rocket_accel.h
index 95720702b7c4413d72b89c1f0f59abb22dc8c6b3..cb1b5934c201160e7650aabd1b3a2b1c77b1fd7b 100644
--- a/include/uapi/drm/rocket_accel.h
+++ b/include/uapi/drm/rocket_accel.h
@@ -12,8 +12,10 @@ extern "C" {
#endif
#define DRM_ROCKET_CREATE_BO 0x00
+#define DRM_ROCKET_SUBMIT 0x01
#define DRM_IOCTL_ROCKET_CREATE_BO DRM_IOWR(DRM_COMMAND_BASE + DRM_ROCKET_CREATE_BO, struct drm_rocket_create_bo)
+#define DRM_IOCTL_ROCKET_SUBMIT DRM_IOW(DRM_COMMAND_BASE + DRM_ROCKET_SUBMIT, struct drm_rocket_submit)
/**
* struct drm_rocket_create_bo - ioctl argument for creating Rocket BOs.
@@ -37,6 +39,68 @@ struct drm_rocket_create_bo {
__u64 offset;
};
+/**
+ * struct drm_rocket_task - A task to be run on the NPU
+ *
+ * A task is the smallest unit of work that can be run on the NPU.
+ */
+struct drm_rocket_task {
+ /** Input: DMA address to NPU mapping of register command buffer */
+ __u64 regcmd;
+
+ /** Input: Number of commands in the register command buffer */
+ __u32 regcmd_count;
+
+ /** Reserved, must be zero. */
+ __u32 reserved;
+};
+
+/**
+ * struct drm_rocket_job - A job to be run on the NPU
+ *
+ * The kernel will schedule the execution of this job taking into account its
+ * dependencies with other jobs. All tasks in the same job will be executed
+ * sequentially on the same core, to benefit from memory residency in SRAM.
+ */
+struct drm_rocket_job {
+ /** Input: Pointer to an array of struct drm_rocket_task. */
+ __u64 tasks;
+
+ /** Input: Pointer to a u32 array of the BOs that are read by the job. */
+ __u64 in_bo_handles;
+
+ /** Input: Pointer to a u32 array of the BOs that are written to by the job. */
+ __u64 out_bo_handles;
+
+ /** Input: Number of tasks passed in. */
+ __u32 task_count;
+
+ /** Input: Number of input BO handles passed in (size is that times 4). */
+ __u32 in_bo_handle_count;
+
+ /** Input: Number of output BO handles passed in (size is that times 4). */
+ __u32 out_bo_handle_count;
+
+ /** Reserved, must be zero. */
+ __u32 reserved;
+};
+
+/**
+ * struct drm_rocket_submit - ioctl argument for submitting commands to the NPU.
+ *
+ * The kernel will schedule the execution of these jobs in dependency order.
+ */
+struct drm_rocket_submit {
+ /** Input: Pointer to an array of struct drm_rocket_job. */
+ __u64 jobs;
+
+ /** Input: Number of jobs passed in. */
+ __u32 job_count;
+
+ /** Reserved, must be zero. */
+ __u32 reserved;
+};
+
#if defined(__cplusplus)
}
#endif
--
2.49.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v7 05/10] accel/rocket: Add IOCTLs for synchronizing memory accesses
2025-06-06 6:28 [PATCH v7 00/10] New DRM accel driver for Rockchip's RKNN NPU Tomeu Vizoso
` (2 preceding siblings ...)
2025-06-06 6:28 ` [PATCH v7 04/10] accel/rocket: Add job submission IOCTL Tomeu Vizoso
@ 2025-06-06 6:28 ` Tomeu Vizoso
2025-06-24 16:52 ` Robin Murphy
2025-06-06 6:28 ` [PATCH v7 06/10] dt-bindings: npu: rockchip,rknn: Add bindings Tomeu Vizoso
` (7 subsequent siblings)
11 siblings, 1 reply; 30+ messages in thread
From: Tomeu Vizoso @ 2025-06-06 6:28 UTC (permalink / raw)
To: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner,
Oded Gabbay, Jonathan Corbet, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
Christian König, Sebastian Reichel, Nicolas Frattaroli,
Kever Yang, Robin Murphy, Daniel Stone, Da Xue, Jeff Hugo
Cc: devicetree, linux-arm-kernel, linux-rockchip, linux-kernel,
dri-devel, linux-doc, linux-media, linaro-mm-sig, Tomeu Vizoso,
Jeff Hugo
The NPU cores have their own access to the memory bus, and this isn't
cache coherent with the CPUs.
Add IOCTLs so userspace can mark when the caches need to be flushed, and
also when a writer job needs to be waited for before the buffer can be
accessed from the CPU.
Initially based on the same IOCTLs from the Etnaviv driver.
v2:
- Don't break UABI by reordering the IOCTL IDs (Jeff Hugo)
v3:
- Check that padding fields in IOCTLs are zero (Jeff Hugo)
v6:
- Fix conversion logic to make sure we use DMA_BIDIRECTIONAL when needed
(Lucas Stach)
Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com>
---
drivers/accel/rocket/rocket_drv.c | 2 +
drivers/accel/rocket/rocket_gem.c | 82 +++++++++++++++++++++++++++++++++++++++
drivers/accel/rocket/rocket_gem.h | 5 +++
include/uapi/drm/rocket_accel.h | 37 ++++++++++++++++++
4 files changed, 126 insertions(+)
diff --git a/drivers/accel/rocket/rocket_drv.c b/drivers/accel/rocket/rocket_drv.c
index 4ab78193c186dfcfc3e323f16c588e85e6a8a334..eb9284ee2511f730afe6a532225c2706ce0e2822 100644
--- a/drivers/accel/rocket/rocket_drv.c
+++ b/drivers/accel/rocket/rocket_drv.c
@@ -62,6 +62,8 @@ static const struct drm_ioctl_desc rocket_drm_driver_ioctls[] = {
ROCKET_IOCTL(CREATE_BO, create_bo),
ROCKET_IOCTL(SUBMIT, submit),
+ ROCKET_IOCTL(PREP_BO, prep_bo),
+ ROCKET_IOCTL(FINI_BO, fini_bo),
};
DEFINE_DRM_ACCEL_FOPS(rocket_accel_driver_fops);
diff --git a/drivers/accel/rocket/rocket_gem.c b/drivers/accel/rocket/rocket_gem.c
index 61b7f970a6885aa13784daa1222611a02aa10dee..07024b6e71bf544dc7f00b008b9afb74b0c4e802 100644
--- a/drivers/accel/rocket/rocket_gem.c
+++ b/drivers/accel/rocket/rocket_gem.c
@@ -113,3 +113,85 @@ int rocket_ioctl_create_bo(struct drm_device *dev, void *data, struct drm_file *
return ret;
}
+
+static inline enum dma_data_direction rocket_op_to_dma_dir(u32 op)
+{
+ op &= ROCKET_PREP_READ | ROCKET_PREP_WRITE;
+
+ if (op == ROCKET_PREP_READ)
+ return DMA_FROM_DEVICE;
+ else if (op == ROCKET_PREP_WRITE)
+ return DMA_TO_DEVICE;
+ else
+ return DMA_BIDIRECTIONAL;
+}
+
+int rocket_ioctl_prep_bo(struct drm_device *dev, void *data, struct drm_file *file)
+{
+ struct drm_rocket_prep_bo *args = data;
+ unsigned long timeout = drm_timeout_abs_to_jiffies(args->timeout_ns);
+ struct rocket_device *rdev = to_rocket_device(dev);
+ struct drm_gem_object *gem_obj;
+ struct drm_gem_shmem_object *shmem_obj;
+ bool write = !!(args->op & ROCKET_PREP_WRITE);
+ long ret = 0;
+
+ if (args->op & ~(ROCKET_PREP_READ | ROCKET_PREP_WRITE))
+ return -EINVAL;
+
+ gem_obj = drm_gem_object_lookup(file, args->handle);
+ if (!gem_obj)
+ return -ENOENT;
+
+ ret = dma_resv_wait_timeout(gem_obj->resv, dma_resv_usage_rw(write),
+ true, timeout);
+ if (!ret)
+ ret = timeout ? -ETIMEDOUT : -EBUSY;
+
+ shmem_obj = &to_rocket_bo(gem_obj)->base;
+
+ for (unsigned int core = 1; core < rdev->num_cores; core++) {
+ dma_sync_sgtable_for_cpu(rdev->cores[core].dev, shmem_obj->sgt,
+ rocket_op_to_dma_dir(args->op));
+ }
+
+ to_rocket_bo(gem_obj)->last_cpu_prep_op = args->op;
+
+ drm_gem_object_put(gem_obj);
+
+ return ret;
+}
+
+int rocket_ioctl_fini_bo(struct drm_device *dev, void *data, struct drm_file *file)
+{
+ struct rocket_device *rdev = to_rocket_device(dev);
+ struct drm_rocket_fini_bo *args = data;
+ struct drm_gem_shmem_object *shmem_obj;
+ struct rocket_gem_object *rkt_obj;
+ struct drm_gem_object *gem_obj;
+
+ if (args->reserved != 0) {
+ drm_dbg(dev, "Reserved field in drm_rocket_fini_bo struct should be 0.\n");
+ return -EINVAL;
+ }
+
+ gem_obj = drm_gem_object_lookup(file, args->handle);
+ if (!gem_obj)
+ return -ENOENT;
+
+ rkt_obj = to_rocket_bo(gem_obj);
+ shmem_obj = &rkt_obj->base;
+
+ WARN_ON(rkt_obj->last_cpu_prep_op == 0);
+
+ for (unsigned int core = 1; core < rdev->num_cores; core++) {
+ dma_sync_sgtable_for_device(rdev->cores[core].dev, shmem_obj->sgt,
+ rocket_op_to_dma_dir(rkt_obj->last_cpu_prep_op));
+ }
+
+ rkt_obj->last_cpu_prep_op = 0;
+
+ drm_gem_object_put(gem_obj);
+
+ return 0;
+}
diff --git a/drivers/accel/rocket/rocket_gem.h b/drivers/accel/rocket/rocket_gem.h
index e8a4d6213fd80419be2ec8af04583a67fb1a4b75..a52a63cd78339a6150b99592ab5f94feeeb51fde 100644
--- a/drivers/accel/rocket/rocket_gem.h
+++ b/drivers/accel/rocket/rocket_gem.h
@@ -12,12 +12,17 @@ struct rocket_gem_object {
struct iommu_domain *domain;
size_t size;
u32 offset;
+ u32 last_cpu_prep_op;
};
struct drm_gem_object *rocket_gem_create_object(struct drm_device *dev, size_t size);
int rocket_ioctl_create_bo(struct drm_device *dev, void *data, struct drm_file *file);
+int rocket_ioctl_prep_bo(struct drm_device *dev, void *data, struct drm_file *file);
+
+int rocket_ioctl_fini_bo(struct drm_device *dev, void *data, struct drm_file *file);
+
static inline
struct rocket_gem_object *to_rocket_bo(struct drm_gem_object *obj)
{
diff --git a/include/uapi/drm/rocket_accel.h b/include/uapi/drm/rocket_accel.h
index cb1b5934c201160e7650aabd1b3a2b1c77b1fd7b..b5c80dd767be56e9720b51e4a82617a425a881a1 100644
--- a/include/uapi/drm/rocket_accel.h
+++ b/include/uapi/drm/rocket_accel.h
@@ -13,9 +13,13 @@ extern "C" {
#define DRM_ROCKET_CREATE_BO 0x00
#define DRM_ROCKET_SUBMIT 0x01
+#define DRM_ROCKET_PREP_BO 0x02
+#define DRM_ROCKET_FINI_BO 0x03
#define DRM_IOCTL_ROCKET_CREATE_BO DRM_IOWR(DRM_COMMAND_BASE + DRM_ROCKET_CREATE_BO, struct drm_rocket_create_bo)
#define DRM_IOCTL_ROCKET_SUBMIT DRM_IOW(DRM_COMMAND_BASE + DRM_ROCKET_SUBMIT, struct drm_rocket_submit)
+#define DRM_IOCTL_ROCKET_PREP_BO DRM_IOW(DRM_COMMAND_BASE + DRM_ROCKET_PREP_BO, struct drm_rocket_prep_bo)
+#define DRM_IOCTL_ROCKET_FINI_BO DRM_IOW(DRM_COMMAND_BASE + DRM_ROCKET_FINI_BO, struct drm_rocket_fini_bo)
/**
* struct drm_rocket_create_bo - ioctl argument for creating Rocket BOs.
@@ -39,6 +43,39 @@ struct drm_rocket_create_bo {
__u64 offset;
};
+#define ROCKET_PREP_READ 0x01
+#define ROCKET_PREP_WRITE 0x02
+
+/**
+ * struct drm_rocket_prep_bo - ioctl argument for starting CPU ownership of the BO.
+ *
+ * Takes care of waiting for any NPU jobs that might still use the NPU and performs cache
+ * synchronization.
+ */
+struct drm_rocket_prep_bo {
+ /** Input: GEM handle of the buffer object. */
+ __u32 handle;
+
+ /** Input: mask of ROCKET_PREP_x, direction of the access. */
+ __u32 op;
+
+ /** Input: Amount of time to wait for NPU jobs. */
+ __s64 timeout_ns;
+};
+
+/**
+ * struct drm_rocket_fini_bo - ioctl argument for finishing CPU ownership of the BO.
+ *
+ * Synchronize caches for NPU access.
+ */
+struct drm_rocket_fini_bo {
+ /** Input: GEM handle of the buffer object. */
+ __u32 handle;
+
+ /** Reserved, must be zero. */
+ __u32 reserved;
+};
+
/**
* struct drm_rocket_task - A task to be run on the NPU
*
--
2.49.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v7 06/10] dt-bindings: npu: rockchip,rknn: Add bindings
2025-06-06 6:28 [PATCH v7 00/10] New DRM accel driver for Rockchip's RKNN NPU Tomeu Vizoso
` (3 preceding siblings ...)
2025-06-06 6:28 ` [PATCH v7 05/10] accel/rocket: Add IOCTLs for synchronizing memory accesses Tomeu Vizoso
@ 2025-06-06 6:28 ` Tomeu Vizoso
2025-06-24 13:27 ` Robin Murphy
2025-06-06 6:28 ` [PATCH v7 07/10] arm64: dts: rockchip: add pd_npu label for RK3588 power domains Tomeu Vizoso
` (6 subsequent siblings)
11 siblings, 1 reply; 30+ messages in thread
From: Tomeu Vizoso @ 2025-06-06 6:28 UTC (permalink / raw)
To: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner,
Oded Gabbay, Jonathan Corbet, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
Christian König, Sebastian Reichel, Nicolas Frattaroli,
Kever Yang, Robin Murphy, Daniel Stone, Da Xue, Jeff Hugo
Cc: devicetree, linux-arm-kernel, linux-rockchip, linux-kernel,
dri-devel, linux-doc, linux-media, linaro-mm-sig, Tomeu Vizoso,
Krzysztof Kozlowski
Add the bindings for the Neural Processing Unit IP from Rockchip.
v2:
- Adapt to new node structure (one node per core, each with its own
IOMMU)
- Several misc. fixes from Sebastian Reichel
v3:
- Split register block in its constituent subblocks, and only require
the ones that the kernel would ever use (Nicolas Frattaroli)
- Group supplies (Rob Herring)
- Explain the way in which the top core is special (Rob Herring)
v4:
- Change required node name to npu@ (Rob Herring and Krzysztof Kozlowski)
- Remove unneeded items: (Krzysztof Kozlowski)
- Fix use of minItems/maxItems (Krzysztof Kozlowski)
- Add reg-names to list of required properties (Krzysztof Kozlowski)
- Fix example (Krzysztof Kozlowski)
v5:
- Rename file to rockchip,rk3588-rknn-core.yaml (Krzysztof Kozlowski)
- Streamline compatible property (Krzysztof Kozlowski)
v6:
- Remove mention to NVDLA, as the hardware is only incidentally related
(Kever Yang)
- Mark pclk and npu clocks as required by all clocks (Rob Herring)
v7:
- Remove allOf section, not needed now that all nodes require 4 clocks
(Heiko Stübner)
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
---
.../bindings/npu/rockchip,rk3588-rknn-core.yaml | 118 +++++++++++++++++++++
1 file changed, 118 insertions(+)
diff --git a/Documentation/devicetree/bindings/npu/rockchip,rk3588-rknn-core.yaml b/Documentation/devicetree/bindings/npu/rockchip,rk3588-rknn-core.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..0588c085a723a34f4fa30a9680ea948d960b092f
--- /dev/null
+++ b/Documentation/devicetree/bindings/npu/rockchip,rk3588-rknn-core.yaml
@@ -0,0 +1,118 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/npu/rockchip,rk3588-rknn-core.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Neural Processing Unit IP from Rockchip
+
+maintainers:
+ - Tomeu Vizoso <tomeu@tomeuvizoso.net>
+
+description:
+ Rockchip IP for accelerating inference of neural networks.
+
+ There is to be a node per each core in the NPU. In Rockchip's design there
+ will be one core that is special because it is able to redistribute work to
+ the other cores by forwarding register writes and sharing data. This special
+ core is called the top core and should have the compatible string that
+ corresponds to top cores.
+
+properties:
+ $nodename:
+ pattern: '^npu@[a-f0-9]+$'
+
+ compatible:
+ enum:
+ - rockchip,rk3588-rknn-core-top
+ - rockchip,rk3588-rknn-core
+
+ reg:
+ maxItems: 3
+
+ reg-names:
+ items:
+ - const: pc
+ - const: cna
+ - const: core
+
+ clocks:
+ maxItems: 4
+
+ clock-names:
+ items:
+ - const: aclk
+ - const: hclk
+ - const: npu
+ - const: pclk
+
+ interrupts:
+ maxItems: 1
+
+ iommus:
+ maxItems: 1
+
+ npu-supply: true
+
+ power-domains:
+ maxItems: 1
+
+ resets:
+ maxItems: 2
+
+ reset-names:
+ items:
+ - const: srst_a
+ - const: srst_h
+
+ sram-supply: true
+
+required:
+ - compatible
+ - reg
+ - reg-names
+ - clocks
+ - clock-names
+ - interrupts
+ - iommus
+ - power-domains
+ - resets
+ - reset-names
+ - npu-supply
+ - sram-supply
+
+additionalProperties: false
+
+examples:
+ - |
+ #include <dt-bindings/clock/rockchip,rk3588-cru.h>
+ #include <dt-bindings/interrupt-controller/irq.h>
+ #include <dt-bindings/interrupt-controller/arm-gic.h>
+ #include <dt-bindings/power/rk3588-power.h>
+ #include <dt-bindings/reset/rockchip,rk3588-cru.h>
+
+ bus {
+ #address-cells = <2>;
+ #size-cells = <2>;
+
+ npu@fdab0000 {
+ compatible = "rockchip,rk3588-rknn-core-top";
+ reg = <0x0 0xfdab0000 0x0 0x1000>,
+ <0x0 0xfdab1000 0x0 0x1000>,
+ <0x0 0xfdab3000 0x0 0x1000>;
+ reg-names = "pc", "cna", "core";
+ assigned-clocks = <&scmi_clk SCMI_CLK_NPU>;
+ assigned-clock-rates = <200000000>;
+ clocks = <&cru ACLK_NPU0>, <&cru HCLK_NPU0>,
+ <&scmi_clk SCMI_CLK_NPU>, <&cru PCLK_NPU_ROOT>;
+ clock-names = "aclk", "hclk", "npu", "pclk";
+ interrupts = <GIC_SPI 110 IRQ_TYPE_LEVEL_HIGH 0>;
+ iommus = <&rknn_mmu_top>;
+ npu-supply = <&vdd_npu_s0>;
+ power-domains = <&power RK3588_PD_NPUTOP>;
+ resets = <&cru SRST_A_RKNN0>, <&cru SRST_H_RKNN0>;
+ reset-names = "srst_a", "srst_h";
+ sram-supply = <&vdd_npu_mem_s0>;
+ };
+ };
+...
--
2.49.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v7 07/10] arm64: dts: rockchip: add pd_npu label for RK3588 power domains
2025-06-06 6:28 [PATCH v7 00/10] New DRM accel driver for Rockchip's RKNN NPU Tomeu Vizoso
` (4 preceding siblings ...)
2025-06-06 6:28 ` [PATCH v7 06/10] dt-bindings: npu: rockchip,rknn: Add bindings Tomeu Vizoso
@ 2025-06-06 6:28 ` Tomeu Vizoso
2025-06-06 6:28 ` [PATCH v7 08/10] arm64: dts: rockchip: Add nodes for NPU and its MMU to rk3588-base Tomeu Vizoso
` (5 subsequent siblings)
11 siblings, 0 replies; 30+ messages in thread
From: Tomeu Vizoso @ 2025-06-06 6:28 UTC (permalink / raw)
To: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner,
Oded Gabbay, Jonathan Corbet, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
Christian König, Sebastian Reichel, Nicolas Frattaroli,
Kever Yang, Robin Murphy, Daniel Stone, Da Xue, Jeff Hugo
Cc: devicetree, linux-arm-kernel, linux-rockchip, linux-kernel,
dri-devel, linux-doc, linux-media, linaro-mm-sig, Tomeu Vizoso
From: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
The NPU of the RK3588 has an external supply. This supply also affects
the power domain of the NPU, not just the NPU device nodes themselves.
Since correctly modelled boards will want the power domain to be aware
of the regulator so that it doesn't always have to be on, add a label to
the NPU power domain node so board files can reference it.
Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
---
arch/arm64/boot/dts/rockchip/rk3588-base.dtsi | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/boot/dts/rockchip/rk3588-base.dtsi b/arch/arm64/boot/dts/rockchip/rk3588-base.dtsi
index 1e18ad93ba0ebdad31642b88ff0f90ef4e8dc76f..335b4489691b6299c4fe28342e827f07c920c665 100644
--- a/arch/arm64/boot/dts/rockchip/rk3588-base.dtsi
+++ b/arch/arm64/boot/dts/rockchip/rk3588-base.dtsi
@@ -837,7 +837,7 @@ power: power-controller {
status = "okay";
/* These power domains are grouped by VD_NPU */
- power-domain@RK3588_PD_NPU {
+ pd_npu: power-domain@RK3588_PD_NPU {
reg = <RK3588_PD_NPU>;
#power-domain-cells = <0>;
#address-cells = <1>;
--
2.49.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v7 08/10] arm64: dts: rockchip: Add nodes for NPU and its MMU to rk3588-base
2025-06-06 6:28 [PATCH v7 00/10] New DRM accel driver for Rockchip's RKNN NPU Tomeu Vizoso
` (5 preceding siblings ...)
2025-06-06 6:28 ` [PATCH v7 07/10] arm64: dts: rockchip: add pd_npu label for RK3588 power domains Tomeu Vizoso
@ 2025-06-06 6:28 ` Tomeu Vizoso
2025-06-06 6:28 ` [PATCH v7 09/10] arm64: dts: rockchip: Enable the NPU on quartzpro64 Tomeu Vizoso
` (4 subsequent siblings)
11 siblings, 0 replies; 30+ messages in thread
From: Tomeu Vizoso @ 2025-06-06 6:28 UTC (permalink / raw)
To: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner,
Oded Gabbay, Jonathan Corbet, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
Christian König, Sebastian Reichel, Nicolas Frattaroli,
Kever Yang, Robin Murphy, Daniel Stone, Da Xue, Jeff Hugo
Cc: devicetree, linux-arm-kernel, linux-rockchip, linux-kernel,
dri-devel, linux-doc, linux-media, linaro-mm-sig, Tomeu Vizoso
See Chapter 36 "RKNN" from the RK3588 TRM (Part 1).
The IP is divided in three cores, programmed independently. The first
core though is special, being able to delegate work to the other cores.
The IOMMU of the first core is also special in that it has two subunits
(read/write?) that need to be programmed in sync.
v2:
- Have one device for each NPU core (Sebastian Reichel)
- Have one device for each IOMMU (Sebastian Reichel)
- Correctly sort nodes (Diederik de Haas)
- Add rockchip,iommu compatible to IOMMU nodes (Sebastian Reichel)
v3:
- Adapt to a split of the register block in the DT bindings (Nicolas
Frattaroli)
v4:
- Adapt to changes in bindings
v6:
- pclk and npu clocks are needed by all clocks (Rob Herring)
Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
---
arch/arm64/boot/dts/rockchip/rk3588-base.dtsi | 87 +++++++++++++++++++++++++++
1 file changed, 87 insertions(+)
diff --git a/arch/arm64/boot/dts/rockchip/rk3588-base.dtsi b/arch/arm64/boot/dts/rockchip/rk3588-base.dtsi
index 335b4489691b6299c4fe28342e827f07c920c665..a44dfb376fdf4c29e1bd307d9a7d1621e11d8c59 100644
--- a/arch/arm64/boot/dts/rockchip/rk3588-base.dtsi
+++ b/arch/arm64/boot/dts/rockchip/rk3588-base.dtsi
@@ -1136,6 +1136,93 @@ power-domain@RK3588_PD_SDMMC {
};
};
+ rknn_core_top: npu@fdab0000 {
+ compatible = "rockchip,rk3588-rknn-core-top";
+ reg = <0x0 0xfdab0000 0x0 0x1000>,
+ <0x0 0xfdab1000 0x0 0x1000>,
+ <0x0 0xfdab3000 0x0 0x1000>;
+ reg-names = "pc", "cna", "core";
+ interrupts = <GIC_SPI 110 IRQ_TYPE_LEVEL_HIGH 0>;
+ clocks = <&cru ACLK_NPU0>, <&cru HCLK_NPU0>,
+ <&scmi_clk SCMI_CLK_NPU>, <&cru PCLK_NPU_ROOT>;
+ clock-names = "aclk", "hclk", "npu", "pclk";
+ assigned-clocks = <&scmi_clk SCMI_CLK_NPU>;
+ assigned-clock-rates = <200000000>;
+ resets = <&cru SRST_A_RKNN0>, <&cru SRST_H_RKNN0>;
+ reset-names = "srst_a", "srst_h";
+ power-domains = <&power RK3588_PD_NPUTOP>;
+ iommus = <&rknn_mmu_top>;
+ status = "disabled";
+ };
+
+ rknn_mmu_top: iommu@fdab9000 {
+ compatible = "rockchip,rk3588-iommu", "rockchip,rk3568-iommu";
+ reg = <0x0 0xfdab9000 0x0 0x100>,
+ <0x0 0xfdaba000 0x0 0x100>;
+ interrupts = <GIC_SPI 110 IRQ_TYPE_LEVEL_HIGH 0>;
+ clocks = <&cru ACLK_NPU0>, <&cru HCLK_NPU0>;
+ clock-names = "aclk", "iface";
+ #iommu-cells = <0>;
+ power-domains = <&power RK3588_PD_NPUTOP>;
+ status = "disabled";
+ };
+
+ rknn_core_1: npu@fdac0000 {
+ compatible = "rockchip,rk3588-rknn-core";
+ reg = <0x0 0xfdac0000 0x0 0x1000>,
+ <0x0 0xfdac1000 0x0 0x1000>,
+ <0x0 0xfdac3000 0x0 0x1000>;
+ reg-names = "pc", "cna", "core";
+ interrupts = <GIC_SPI 111 IRQ_TYPE_LEVEL_HIGH 0>;
+ clocks = <&cru ACLK_NPU1>, <&cru HCLK_NPU1>,
+ <&scmi_clk SCMI_CLK_NPU>, <&cru PCLK_NPU_ROOT>;
+ clock-names = "aclk", "hclk", "npu", "pclk";
+ resets = <&cru SRST_A_RKNN1>, <&cru SRST_H_RKNN1>;
+ reset-names = "srst_a", "srst_h";
+ power-domains = <&power RK3588_PD_NPU1>;
+ iommus = <&rknn_mmu_1>;
+ status = "disabled";
+ };
+
+ rknn_mmu_1: iommu@fdac9000 {
+ compatible = "rockchip,rk3588-iommu", "rockchip,rk3568-iommu";
+ reg = <0x0 0xfdaca000 0x0 0x100>;
+ interrupts = <GIC_SPI 111 IRQ_TYPE_LEVEL_HIGH 0>;
+ clocks = <&cru ACLK_NPU1>, <&cru HCLK_NPU1>;
+ clock-names = "aclk", "iface";
+ #iommu-cells = <0>;
+ power-domains = <&power RK3588_PD_NPU1>;
+ status = "disabled";
+ };
+
+ rknn_core_2: npu@fdad0000 {
+ compatible = "rockchip,rk3588-rknn-core";
+ reg = <0x0 0xfdad0000 0x0 0x1000>,
+ <0x0 0xfdad1000 0x0 0x1000>,
+ <0x0 0xfdad3000 0x0 0x1000>;
+ reg-names = "pc", "cna", "core";
+ interrupts = <GIC_SPI 112 IRQ_TYPE_LEVEL_HIGH 0>;
+ clocks = <&cru ACLK_NPU2>, <&cru HCLK_NPU2>,
+ <&scmi_clk SCMI_CLK_NPU>, <&cru PCLK_NPU_ROOT>;
+ clock-names = "aclk", "hclk", "npu", "pclk";
+ resets = <&cru SRST_A_RKNN2>, <&cru SRST_H_RKNN2>;
+ reset-names = "srst_a", "srst_h";
+ power-domains = <&power RK3588_PD_NPU2>;
+ iommus = <&rknn_mmu_2>;
+ status = "disabled";
+ };
+
+ rknn_mmu_2: iommu@fdad9000 {
+ compatible = "rockchip,rk3588-iommu", "rockchip,rk3568-iommu";
+ reg = <0x0 0xfdada000 0x0 0x100>;
+ interrupts = <GIC_SPI 112 IRQ_TYPE_LEVEL_HIGH 0>;
+ clocks = <&cru ACLK_NPU2>, <&cru HCLK_NPU2>;
+ clock-names = "aclk", "iface";
+ #iommu-cells = <0>;
+ power-domains = <&power RK3588_PD_NPU2>;
+ status = "disabled";
+ };
+
vpu121: video-codec@fdb50000 {
compatible = "rockchip,rk3588-vpu121", "rockchip,rk3568-vpu";
reg = <0x0 0xfdb50000 0x0 0x800>;
--
2.49.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v7 09/10] arm64: dts: rockchip: Enable the NPU on quartzpro64
2025-06-06 6:28 [PATCH v7 00/10] New DRM accel driver for Rockchip's RKNN NPU Tomeu Vizoso
` (6 preceding siblings ...)
2025-06-06 6:28 ` [PATCH v7 08/10] arm64: dts: rockchip: Add nodes for NPU and its MMU to rk3588-base Tomeu Vizoso
@ 2025-06-06 6:28 ` Tomeu Vizoso
2025-06-06 6:28 ` [PATCH v7 10/10] arm64: dts: rockchip: enable NPU on ROCK 5B Tomeu Vizoso
` (3 subsequent siblings)
11 siblings, 0 replies; 30+ messages in thread
From: Tomeu Vizoso @ 2025-06-06 6:28 UTC (permalink / raw)
To: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner,
Oded Gabbay, Jonathan Corbet, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
Christian König, Sebastian Reichel, Nicolas Frattaroli,
Kever Yang, Robin Murphy, Daniel Stone, Da Xue, Jeff Hugo
Cc: devicetree, linux-arm-kernel, linux-rockchip, linux-kernel,
dri-devel, linux-doc, linux-media, linaro-mm-sig, Tomeu Vizoso
Enable the nodes added in a previous commit to the rk3588s device tree.
v2:
- Split nodes (Sebastian Reichel)
- Sort nodes (Sebastian Reichel)
- Add board regulators (Sebastian Reichel)
Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
---
.../arm64/boot/dts/rockchip/rk3588-quartzpro64.dts | 30 ++++++++++++++++++++++
1 file changed, 30 insertions(+)
diff --git a/arch/arm64/boot/dts/rockchip/rk3588-quartzpro64.dts b/arch/arm64/boot/dts/rockchip/rk3588-quartzpro64.dts
index 78aaa6635b5d20a650aba8d8c2d0d4f498ff0d33..2e45b213c25b99571dd71ce90bc7970418f60276 100644
--- a/arch/arm64/boot/dts/rockchip/rk3588-quartzpro64.dts
+++ b/arch/arm64/boot/dts/rockchip/rk3588-quartzpro64.dts
@@ -415,6 +415,36 @@ &pcie3x4 {
status = "okay";
};
+&rknn_core_top {
+ npu-supply = <&vdd_npu_s0>;
+ sram-supply = <&vdd_npu_mem_s0>;
+ status = "okay";
+};
+
+&rknn_core_1 {
+ npu-supply = <&vdd_npu_s0>;
+ sram-supply = <&vdd_npu_mem_s0>;
+ status = "okay";
+};
+
+&rknn_core_2 {
+ npu-supply = <&vdd_npu_s0>;
+ sram-supply = <&vdd_npu_mem_s0>;
+ status = "okay";
+};
+
+&rknn_mmu_top {
+ status = "okay";
+};
+
+&rknn_mmu_1 {
+ status = "okay";
+};
+
+&rknn_mmu_2 {
+ status = "okay";
+};
+
&saradc {
vref-supply = <&vcc_1v8_s0>;
status = "okay";
--
2.49.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v7 10/10] arm64: dts: rockchip: enable NPU on ROCK 5B
2025-06-06 6:28 [PATCH v7 00/10] New DRM accel driver for Rockchip's RKNN NPU Tomeu Vizoso
` (7 preceding siblings ...)
2025-06-06 6:28 ` [PATCH v7 09/10] arm64: dts: rockchip: Enable the NPU on quartzpro64 Tomeu Vizoso
@ 2025-06-06 6:28 ` Tomeu Vizoso
2025-06-06 9:20 ` Nicolas Frattaroli
[not found] ` <20250606-6-10-rocket-v7-1-dc16cfe6fe4e@tomeuvizoso.net>
` (2 subsequent siblings)
11 siblings, 1 reply; 30+ messages in thread
From: Tomeu Vizoso @ 2025-06-06 6:28 UTC (permalink / raw)
To: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner,
Oded Gabbay, Jonathan Corbet, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
Christian König, Sebastian Reichel, Nicolas Frattaroli,
Kever Yang, Robin Murphy, Daniel Stone, Da Xue, Jeff Hugo
Cc: devicetree, linux-arm-kernel, linux-rockchip, linux-kernel,
dri-devel, linux-doc, linux-media, linaro-mm-sig, Tomeu Vizoso
From: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
The NPU on the ROCK5B uses the same regulator for both the sram-supply
and the npu's supply. Add this regulator, and enable all the NPU bits.
Also add the regulator as a domain-supply to the pd_npu power domain.
Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
---
arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts | 56 +++++++++++++++++++++++++
1 file changed, 56 insertions(+)
diff --git a/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts b/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts
index d22068475c5dc6cb885f878f3f527a66edf1ba70..49500f7cbcb14af4919a6c1997e9e53a01d84973 100644
--- a/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts
+++ b/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts
@@ -316,6 +316,28 @@ regulator-state-mem {
};
};
+&i2c1 {
+ pinctrl-names = "default";
+ pinctrl-0 = <&i2c1m2_xfer>;
+ status = "okay";
+
+ vdd_npu_s0: regulator@42 {
+ compatible = "rockchip,rk8602";
+ reg = <0x42>;
+ fcs,suspend-voltage-selector = <1>;
+ regulator-name = "vdd_npu_s0";
+ regulator-boot-on;
+ regulator-min-microvolt = <550000>;
+ regulator-max-microvolt = <950000>;
+ regulator-ramp-delay = <2300>;
+ vin-supply = <&vcc5v0_sys>;
+
+ regulator-state-mem {
+ regulator-off-in-suspend;
+ };
+ };
+};
+
&i2c6 {
status = "okay";
@@ -440,6 +462,10 @@ &pd_gpu {
domain-supply = <&vdd_gpu_s0>;
};
+&pd_npu {
+ domain-supply = <&vdd_npu_s0>;
+};
+
&pinctrl {
hdmirx {
hdmirx_hpd: hdmirx-5v-detection {
@@ -500,6 +526,36 @@ &pwm1 {
status = "okay";
};
+&rknn_core_top {
+ npu-supply = <&vdd_npu_s0>;
+ sram-supply = <&vdd_npu_s0>;
+ status = "okay";
+};
+
+&rknn_core_1 {
+ npu-supply = <&vdd_npu_s0>;
+ sram-supply = <&vdd_npu_s0>;
+ status = "okay";
+};
+
+&rknn_core_2 {
+ npu-supply = <&vdd_npu_s0>;
+ sram-supply = <&vdd_npu_s0>;
+ status = "okay";
+};
+
+&rknn_mmu_top {
+ status = "okay";
+};
+
+&rknn_mmu_1 {
+ status = "okay";
+};
+
+&rknn_mmu_2 {
+ status = "okay";
+};
+
&saradc {
vref-supply = <&avcc_1v8_s0>;
status = "okay";
--
2.49.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH v7 10/10] arm64: dts: rockchip: enable NPU on ROCK 5B
2025-06-06 6:28 ` [PATCH v7 10/10] arm64: dts: rockchip: enable NPU on ROCK 5B Tomeu Vizoso
@ 2025-06-06 9:20 ` Nicolas Frattaroli
2025-06-07 12:24 ` Heiko Stübner
0 siblings, 1 reply; 30+ messages in thread
From: Nicolas Frattaroli @ 2025-06-06 9:20 UTC (permalink / raw)
To: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner,
Oded Gabbay, Jonathan Corbet, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
Christian König, Sebastian Reichel, Kever Yang, Robin Murphy,
Daniel Stone, Da Xue, Jeff Hugo, Tomeu Vizoso
Cc: devicetree, linux-arm-kernel, linux-rockchip, linux-kernel,
dri-devel, linux-doc, linux-media, linaro-mm-sig, Tomeu Vizoso
Hi Tomeu,
On Friday, 6 June 2025 08:28:30 Central European Summer Time Tomeu Vizoso wrote:
> From: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
>
> The NPU on the ROCK5B uses the same regulator for both the sram-supply
> and the npu's supply. Add this regulator, and enable all the NPU bits.
> Also add the regulator as a domain-supply to the pd_npu power domain.
>
> Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
> Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
> ---
> arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts | 56 +++++++++++++++++++++++++
> 1 file changed, 56 insertions(+)
>
> diff --git a/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts b/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts
> index d22068475c5dc6cb885f878f3f527a66edf1ba70..49500f7cbcb14af4919a6c1997e9e53a01d84973 100644
> --- a/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts
> +++ b/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts
> @@ -316,6 +316,28 @@ regulator-state-mem {
> };
> };
>
> +&i2c1 {
> + pinctrl-names = "default";
> + pinctrl-0 = <&i2c1m2_xfer>;
> + status = "okay";
> +
> + vdd_npu_s0: regulator@42 {
> + compatible = "rockchip,rk8602";
> + reg = <0x42>;
> + fcs,suspend-voltage-selector = <1>;
> + regulator-name = "vdd_npu_s0";
> + regulator-boot-on;
> + regulator-min-microvolt = <550000>;
> + regulator-max-microvolt = <950000>;
> + regulator-ramp-delay = <2300>;
> + vin-supply = <&vcc5v0_sys>;
> +
> + regulator-state-mem {
> + regulator-off-in-suspend;
> + };
> + };
> +};
> +
> &i2c6 {
> status = "okay";
>
> @@ -440,6 +462,10 @@ &pd_gpu {
> domain-supply = <&vdd_gpu_s0>;
> };
>
> +&pd_npu {
> + domain-supply = <&vdd_npu_s0>;
> +};
> +
> &pinctrl {
> hdmirx {
> hdmirx_hpd: hdmirx-5v-detection {
> @@ -500,6 +526,36 @@ &pwm1 {
> status = "okay";
> };
>
> +&rknn_core_top {
> + npu-supply = <&vdd_npu_s0>;
> + sram-supply = <&vdd_npu_s0>;
> + status = "okay";
> +};
> +
> +&rknn_core_1 {
> + npu-supply = <&vdd_npu_s0>;
> + sram-supply = <&vdd_npu_s0>;
> + status = "okay";
> +};
> +
> +&rknn_core_2 {
> + npu-supply = <&vdd_npu_s0>;
> + sram-supply = <&vdd_npu_s0>;
> + status = "okay";
> +};
> +
> +&rknn_mmu_top {
> + status = "okay";
> +};
> +
> +&rknn_mmu_1 {
> + status = "okay";
> +};
> +
> +&rknn_mmu_2 {
> + status = "okay";
> +};
> +
> &saradc {
> vref-supply = <&avcc_1v8_s0>;
> status = "okay";
>
>
Feel free to replace this patch with the following, if your series is
based on linux-next or v6.16. It moves the enablement into the new
shared ROCK 5B/5B+ dtsi, and I've added a regulator-enable-ramp-delay
while I was at it because I've run into hard-to-reproduce problems
relating to it before that Heiko quickly identified and fixed in his
recent series[1] for basically all already present regulators. Remains
to be seen if the final patch lands in that form but this should make
it easier for people to try out as it means a bad luck roll for the
day won't make them run into as many weird issues.
[1]: https://lore.kernel.org/all/20250605185001.377055-1-heiko@sntech.de/
---
From ff1c370a158f4340aa5dfa4ed5034e815e5371be Mon Sep 17 00:00:00 2001
From: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
Date: Tue, 3 Jun 2025 17:03:10 +0200
Subject: [PATCH] arm64: dts: rockchip: enable NPU on ROCK 5B/+
The NPU on the ROCK5B uses the same regulator for both the sram-supply
and the npu's supply. Add this regulator, and enable all the NPU bits.
Also add the regulator as a domain-supply to the pd_npu power domain.
The 5B+'s regulator setup is identical to the 5B in this regard, so it
goes in the shared .dtsi.
Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
---
.../boot/dts/rockchip/rk3588-rock-5b.dtsi | 57 +++++++++++++++++++
1 file changed, 57 insertions(+)
diff --git a/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dtsi b/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dtsi
index 51e83f0ed809..5a20cc2555fb 100644
--- a/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dtsi
+++ b/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dtsi
@@ -332,6 +332,29 @@ regulator-state-mem {
};
};
+&i2c1 {
+ pinctrl-names = "default";
+ pinctrl-0 = <&i2c1m2_xfer>;
+ status = "okay";
+
+ vdd_npu_s0: regulator@42 {
+ compatible = "rockchip,rk8602";
+ reg = <0x42>;
+ fcs,suspend-voltage-selector = <1>;
+ regulator-name = "vdd_npu_s0";
+ regulator-boot-on;
+ regulator-enable-ramp-delay = <500>;
+ regulator-min-microvolt = <550000>;
+ regulator-max-microvolt = <950000>;
+ regulator-ramp-delay = <2300>;
+ vin-supply = <&vcc5v0_sys>;
+
+ regulator-state-mem {
+ regulator-off-in-suspend;
+ };
+ };
+};
+
&i2c3 {
status = "okay";
};
@@ -521,6 +544,10 @@ &pd_gpu {
domain-supply = <&vdd_gpu_s0>;
};
+&pd_npu {
+ domain-supply = <&vdd_npu_s0>;
+};
+
&pinctrl {
hdmirx {
hdmirx_hpd: hdmirx-5v-detection {
@@ -585,6 +612,36 @@ &pwm1 {
status = "okay";
};
+&rknn_core_top {
+ npu-supply = <&vdd_npu_s0>;
+ sram-supply = <&vdd_npu_s0>;
+ status = "okay";
+};
+
+&rknn_core_1 {
+ npu-supply = <&vdd_npu_s0>;
+ sram-supply = <&vdd_npu_s0>;
+ status = "okay";
+};
+
+&rknn_core_2 {
+ npu-supply = <&vdd_npu_s0>;
+ sram-supply = <&vdd_npu_s0>;
+ status = "okay";
+};
+
+&rknn_mmu_top {
+ status = "okay";
+};
+
+&rknn_mmu_1 {
+ status = "okay";
+};
+
+&rknn_mmu_2 {
+ status = "okay";
+};
+
&saradc {
vref-supply = <&avcc_1v8_s0>;
status = "okay";
--
2.49.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH v7 10/10] arm64: dts: rockchip: enable NPU on ROCK 5B
2025-06-06 9:20 ` Nicolas Frattaroli
@ 2025-06-07 12:24 ` Heiko Stübner
0 siblings, 0 replies; 30+ messages in thread
From: Heiko Stübner @ 2025-06-07 12:24 UTC (permalink / raw)
To: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Oded Gabbay,
Jonathan Corbet, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
Christian König, Sebastian Reichel, Kever Yang, Robin Murphy,
Daniel Stone, Da Xue, Jeff Hugo, Tomeu Vizoso, Nicolas Frattaroli
Cc: devicetree, linux-arm-kernel, linux-rockchip, linux-kernel,
dri-devel, linux-doc, linux-media, linaro-mm-sig, Tomeu Vizoso
Am Freitag, 6. Juni 2025, 11:20:32 Mitteleuropäische Sommerzeit schrieb Nicolas Frattaroli:
> Hi Tomeu,
>
> On Friday, 6 June 2025 08:28:30 Central European Summer Time Tomeu Vizoso wrote:
> > From: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
> >
> > The NPU on the ROCK5B uses the same regulator for both the sram-supply
> > and the npu's supply. Add this regulator, and enable all the NPU bits.
> > Also add the regulator as a domain-supply to the pd_npu power domain.
> >
> > Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
> > Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
> > ---
> > arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts | 56 +++++++++++++++++++++++++
> > 1 file changed, 56 insertions(+)
> >
> > diff --git a/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts b/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts
> > index d22068475c5dc6cb885f878f3f527a66edf1ba70..49500f7cbcb14af4919a6c1997e9e53a01d84973 100644
> > --- a/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts
> > +++ b/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts
> > @@ -316,6 +316,28 @@ regulator-state-mem {
> > };
> > };
> >
> > +&i2c1 {
> > + pinctrl-names = "default";
> > + pinctrl-0 = <&i2c1m2_xfer>;
> > + status = "okay";
> > +
> > + vdd_npu_s0: regulator@42 {
> > + compatible = "rockchip,rk8602";
> > + reg = <0x42>;
> > + fcs,suspend-voltage-selector = <1>;
> > + regulator-name = "vdd_npu_s0";
> > + regulator-boot-on;
> > + regulator-min-microvolt = <550000>;
> > + regulator-max-microvolt = <950000>;
> > + regulator-ramp-delay = <2300>;
> > + vin-supply = <&vcc5v0_sys>;
> > +
> > + regulator-state-mem {
> > + regulator-off-in-suspend;
> > + };
> > + };
> > +};
> > +
> > &i2c6 {
> > status = "okay";
> >
> > @@ -440,6 +462,10 @@ &pd_gpu {
> > domain-supply = <&vdd_gpu_s0>;
> > };
> >
> > +&pd_npu {
> > + domain-supply = <&vdd_npu_s0>;
> > +};
> > +
> > &pinctrl {
> > hdmirx {
> > hdmirx_hpd: hdmirx-5v-detection {
> > @@ -500,6 +526,36 @@ &pwm1 {
> > status = "okay";
> > };
> >
> > +&rknn_core_top {
> > + npu-supply = <&vdd_npu_s0>;
> > + sram-supply = <&vdd_npu_s0>;
> > + status = "okay";
> > +};
> > +
> > +&rknn_core_1 {
> > + npu-supply = <&vdd_npu_s0>;
> > + sram-supply = <&vdd_npu_s0>;
> > + status = "okay";
> > +};
> > +
> > +&rknn_core_2 {
> > + npu-supply = <&vdd_npu_s0>;
> > + sram-supply = <&vdd_npu_s0>;
> > + status = "okay";
> > +};
> > +
> > +&rknn_mmu_top {
> > + status = "okay";
> > +};
> > +
> > +&rknn_mmu_1 {
> > + status = "okay";
> > +};
> > +
> > +&rknn_mmu_2 {
> > + status = "okay";
> > +};
> > +
> > &saradc {
> > vref-supply = <&avcc_1v8_s0>;
> > status = "okay";
> >
> >
>
> Feel free to replace this patch with the following, if your series is
> based on linux-next or v6.16. It moves the enablement into the new
> shared ROCK 5B/5B+ dtsi, and I've added a regulator-enable-ramp-delay
> while I was at it because I've run into hard-to-reproduce problems
> relating to it before that Heiko quickly identified and fixed in his
> recent series[1] for basically all already present regulators. Remains
> to be seen if the final patch lands in that form but this should make
> it easier for people to try out as it means a bad luck roll for the
> day won't make them run into as many weird issues.
>
> [1]: https://lore.kernel.org/all/20250605185001.377055-1-heiko@sntech.de/
Reading that just now reminds me to point to
https://lore.kernel.org/lkml/20250606190418.478633-1-heiko@sntech.de/
As Chen Yu pointed out in the reply to v2, this is more a property of the
regulator IC itself, so likely should go into the driver.
So with a bit of luck after 6.16-rc1 all the fan53555 clones should use
somewhat hardware-accurate enable-times.
> From ff1c370a158f4340aa5dfa4ed5034e815e5371be Mon Sep 17 00:00:00 2001
> From: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
> Date: Tue, 3 Jun 2025 17:03:10 +0200
> Subject: [PATCH] arm64: dts: rockchip: enable NPU on ROCK 5B/+
>
> The NPU on the ROCK5B uses the same regulator for both the sram-supply
> and the npu's supply. Add this regulator, and enable all the NPU bits.
> Also add the regulator as a domain-supply to the pd_npu power domain.
>
> The 5B+'s regulator setup is identical to the 5B in this regard, so it
> goes in the shared .dtsi.
>
> Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
> ---
> .../boot/dts/rockchip/rk3588-rock-5b.dtsi | 57 +++++++++++++++++++
> 1 file changed, 57 insertions(+)
>
> diff --git a/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dtsi b/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dtsi
> index 51e83f0ed809..5a20cc2555fb 100644
> --- a/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dtsi
> +++ b/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dtsi
> @@ -332,6 +332,29 @@ regulator-state-mem {
> };
> };
>
> +&i2c1 {
> + pinctrl-names = "default";
> + pinctrl-0 = <&i2c1m2_xfer>;
> + status = "okay";
> +
> + vdd_npu_s0: regulator@42 {
> + compatible = "rockchip,rk8602";
> + reg = <0x42>;
> + fcs,suspend-voltage-selector = <1>;
> + regulator-name = "vdd_npu_s0";
> + regulator-boot-on;
> + regulator-enable-ramp-delay = <500>;
> + regulator-min-microvolt = <550000>;
> + regulator-max-microvolt = <950000>;
> + regulator-ramp-delay = <2300>;
> + vin-supply = <&vcc5v0_sys>;
> +
> + regulator-state-mem {
> + regulator-off-in-suspend;
> + };
> + };
> +};
> +
> &i2c3 {
I think your patch here has some other dependency still?
Because in the rk3588-rock-5b.dtsi there is no i2c3 yet
In torvalds' branch from _just now_ ;-) .
Heiko
> status = "okay";
> };
> @@ -521,6 +544,10 @@ &pd_gpu {
> domain-supply = <&vdd_gpu_s0>;
> };
>
> +&pd_npu {
> + domain-supply = <&vdd_npu_s0>;
> +};
> +
> &pinctrl {
> hdmirx {
> hdmirx_hpd: hdmirx-5v-detection {
> @@ -585,6 +612,36 @@ &pwm1 {
> status = "okay";
> };
>
> +&rknn_core_top {
> + npu-supply = <&vdd_npu_s0>;
> + sram-supply = <&vdd_npu_s0>;
> + status = "okay";
> +};
> +
> +&rknn_core_1 {
> + npu-supply = <&vdd_npu_s0>;
> + sram-supply = <&vdd_npu_s0>;
> + status = "okay";
> +};
> +
> +&rknn_core_2 {
> + npu-supply = <&vdd_npu_s0>;
> + sram-supply = <&vdd_npu_s0>;
> + status = "okay";
> +};
> +
> +&rknn_mmu_top {
> + status = "okay";
> +};
> +
> +&rknn_mmu_1 {
> + status = "okay";
> +};
> +
> +&rknn_mmu_2 {
> + status = "okay";
> +};
> +
> &saradc {
> vref-supply = <&avcc_1v8_s0>;
> status = "okay";
>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v7 01/10] accel/rocket: Add registers header
[not found] ` <20250606-6-10-rocket-v7-1-dc16cfe6fe4e@tomeuvizoso.net>
@ 2025-06-19 8:39 ` Robert Foss
0 siblings, 0 replies; 30+ messages in thread
From: Robert Foss @ 2025-06-19 8:39 UTC (permalink / raw)
To: Tomeu Vizoso
Cc: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner,
Oded Gabbay, Jonathan Corbet, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
Christian König, Sebastian Reichel, Nicolas Frattaroli,
Kever Yang, Robin Murphy, Daniel Stone, Da Xue, Jeff Hugo,
devicetree, linux-arm-kernel, linux-rockchip, linux-kernel,
dri-devel, linux-doc, linux-media, linaro-mm-sig
On Fri, Jun 6, 2025 at 8:29 AM Tomeu Vizoso <tomeu@tomeuvizoso.net> wrote:
>
> A XML file was generated with the data from the TRM, and then this
> header was generated from it.
>
> The canonical location for the XML file is the Mesa3D repository.
>
> v3:
> - Make use of GPL-2.0-only for the copyright notice (Jeff Hugo)
>
> Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
Reviewed-by: Robert Foss <rfoss@kernel.org>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v7 02/10] accel/rocket: Add a new driver for Rockchip's NPU
2025-06-06 6:28 ` [PATCH v7 02/10] accel/rocket: Add a new driver for Rockchip's NPU Tomeu Vizoso
@ 2025-06-19 8:59 ` Robert Foss
2025-07-11 17:38 ` Andrew Davis
1 sibling, 0 replies; 30+ messages in thread
From: Robert Foss @ 2025-06-19 8:59 UTC (permalink / raw)
To: Tomeu Vizoso
Cc: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner,
Oded Gabbay, Jonathan Corbet, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
Christian König, Sebastian Reichel, Nicolas Frattaroli,
Kever Yang, Robin Murphy, Daniel Stone, Da Xue, Jeff Hugo,
devicetree, linux-arm-kernel, linux-rockchip, linux-kernel,
dri-devel, linux-doc, linux-media, linaro-mm-sig
On Fri, Jun 6, 2025 at 8:29 AM Tomeu Vizoso <tomeu@tomeuvizoso.net> wrote:
>
> This initial version supports the NPU as shipped in the RK3588 SoC and
> described in the first part of its TRM, in Chapter 36.
>
> This NPU contains 3 independent cores that the driver can submit jobs
> to.
>
> This commit adds just hardware initialization and power management.
>
> v2:
> - Split cores and IOMMUs as independent devices (Sebastian Reichel)
> - Add some documentation (Jeffrey Hugo)
> - Be more explicit in the Kconfig documentation (Jeffrey Hugo)
> - Remove resets, as these haven't been found useful so far (Zenghui Yu)
> - Repack structs (Jeffrey Hugo)
> - Use DEFINE_DRM_ACCEL_FOPS (Jeffrey Hugo)
> - Use devm_drm_dev_alloc (Jeffrey Hugo)
> - Use probe log helper (Jeffrey Hugo)
> - Introduce UABI header in a later patch (Jeffrey Hugo)
>
> v3:
> - Adapt to a split of the register block in the DT bindings (Nicolas
> Frattaroli)
> - Move registers header to its own commit (Thomas Zimmermann)
> - Misc. cleanups (Thomas Zimmermann and Jeff Hugo)
> - Make use of GPL-2.0-only for the copyright notice (Jeff Hugo)
> - PM improvements (Nicolas Frattaroli)
>
> v4:
> - Use bulk clk API (Krzysztof Kozlowski)
>
> v6:
> - Remove mention to NVDLA, as the hardware is only incidentally related
> (Kever Yang)
> - Use calloc instead of GFP_ZERO (Jeff Hugo)
> - Explicitly include linux/container_of.h (Jeff Hugo)
> - pclk and npu clocks are now needed by all cores (Rob Herring)
>
> v7:
> - Assign its own IOMMU domain to each client, for isolation (Daniel
> Stone and Robin Murphy)
>
> Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
Reviewed-by: Robert Foss <rfoss@kernel.org>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v7 00/10] New DRM accel driver for Rockchip's RKNN NPU
2025-06-06 6:28 [PATCH v7 00/10] New DRM accel driver for Rockchip's RKNN NPU Tomeu Vizoso
` (9 preceding siblings ...)
[not found] ` <20250606-6-10-rocket-v7-1-dc16cfe6fe4e@tomeuvizoso.net>
@ 2025-06-19 9:01 ` Heiko Stuebner
2025-06-20 9:28 ` Heiko Stuebner
11 siblings, 0 replies; 30+ messages in thread
From: Heiko Stuebner @ 2025-06-19 9:01 UTC (permalink / raw)
To: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Oded Gabbay,
Jonathan Corbet, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
Christian König, Sebastian Reichel, Nicolas Frattaroli,
Kever Yang, Robin Murphy, Daniel Stone, Da Xue, Jeff Hugo,
Tomeu Vizoso
Cc: devicetree, linux-arm-kernel, linux-rockchip, linux-kernel,
dri-devel, linux-doc, linux-media, linaro-mm-sig, Tomeu Vizoso,
Jeff Hugo, Krzysztof Kozlowski
Am Freitag, 6. Juni 2025, 08:28:20 Mitteleuropäische Sommerzeit schrieb Tomeu Vizoso:
> This series adds a new driver for the NPU that Rockchip includes in its
> newer SoCs, developed by them on the NVDLA base.
>
> In its current form, it supports the specific NPU in the RK3588 SoC.
>
> The userspace driver is part of Mesa and an initial draft can be found at:
>
> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29698
>
> Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
> ---
> Changes in v7:
> - Actually enable process isolation by allocating its own IOMMU domain
> to each DRM client.
> - Link to v6: https://lore.kernel.org/r/20250604-6-10-rocket-v6-0-237ac75ddb5e@tomeuvizoso.net
I was able to successfully run the SSDLite MobileDet model, detecting
elements correctly on that "Sounds of New York" youtube video all the
demos seem to be using ;-) - on a rk3588-tiger board.
NPU needed like 30ms per frame or so and also detected the expected
things, so
Tested-by: Heiko Stuebner <heiko@sntech.de>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v7 00/10] New DRM accel driver for Rockchip's RKNN NPU
2025-06-06 6:28 [PATCH v7 00/10] New DRM accel driver for Rockchip's RKNN NPU Tomeu Vizoso
` (10 preceding siblings ...)
2025-06-19 9:01 ` [PATCH v7 00/10] New DRM accel driver for Rockchip's RKNN NPU Heiko Stuebner
@ 2025-06-20 9:28 ` Heiko Stuebner
2025-06-24 5:03 ` Tomeu Vizoso
11 siblings, 1 reply; 30+ messages in thread
From: Heiko Stuebner @ 2025-06-20 9:28 UTC (permalink / raw)
To: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Oded Gabbay,
Jonathan Corbet, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
Christian König, Sebastian Reichel, Nicolas Frattaroli,
Kever Yang, Robin Murphy, Daniel Stone, Da Xue, Jeff Hugo,
Tomeu Vizoso
Cc: devicetree, linux-arm-kernel, linux-rockchip, linux-kernel,
dri-devel, linux-doc, linux-media, linaro-mm-sig, Tomeu Vizoso,
Jeff Hugo, Krzysztof Kozlowski
Am Freitag, 6. Juni 2025, 08:28:20 Mitteleuropäische Sommerzeit schrieb Tomeu Vizoso:
> This series adds a new driver for the NPU that Rockchip includes in its
> newer SoCs, developed by them on the NVDLA base.
>
> In its current form, it supports the specific NPU in the RK3588 SoC.
>
> The userspace driver is part of Mesa and an initial draft can be found at:
>
> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29698
>
> Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
> ---
> Nicolas Frattaroli (2):
> arm64: dts: rockchip: add pd_npu label for RK3588 power domains
> arm64: dts: rockchip: enable NPU on ROCK 5B
>
> Tomeu Vizoso (8):
> accel/rocket: Add registers header
> accel/rocket: Add a new driver for Rockchip's NPU
> accel/rocket: Add IOCTL for BO creation
> accel/rocket: Add job submission IOCTL
> accel/rocket: Add IOCTLs for synchronizing memory accesses
> dt-bindings: npu: rockchip,rknn: Add bindings
> arm64: dts: rockchip: Add nodes for NPU and its MMU to rk3588-base
> arm64: dts: rockchip: Enable the NPU on quartzpro64
from a handling point of view, I would expect patch 1 - 6
(driver code + dt-binding patch) to go through some driver tree
but have not clue which one that is.
And afterwards, I would pick up the arm64 devicetree additions
patches 7 - 10 .
Heiko
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v7 00/10] New DRM accel driver for Rockchip's RKNN NPU
2025-06-20 9:28 ` Heiko Stuebner
@ 2025-06-24 5:03 ` Tomeu Vizoso
0 siblings, 0 replies; 30+ messages in thread
From: Tomeu Vizoso @ 2025-06-24 5:03 UTC (permalink / raw)
To: Heiko Stuebner
Cc: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Oded Gabbay,
Jonathan Corbet, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
Christian König, Sebastian Reichel, Nicolas Frattaroli,
Kever Yang, Robin Murphy, Daniel Stone, Da Xue, Jeff Hugo,
devicetree, linux-arm-kernel, linux-rockchip, linux-kernel,
dri-devel, linux-doc, linux-media, linaro-mm-sig,
Krzysztof Kozlowski
On Fri, Jun 20, 2025 at 11:28 AM Heiko Stuebner <heiko@sntech.de> wrote:
>
> Am Freitag, 6. Juni 2025, 08:28:20 Mitteleuropäische Sommerzeit schrieb Tomeu Vizoso:
> > This series adds a new driver for the NPU that Rockchip includes in its
> > newer SoCs, developed by them on the NVDLA base.
> >
> > In its current form, it supports the specific NPU in the RK3588 SoC.
> >
> > The userspace driver is part of Mesa and an initial draft can be found at:
> >
> > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29698
> >
> > Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
>
> > ---
> > Nicolas Frattaroli (2):
> > arm64: dts: rockchip: add pd_npu label for RK3588 power domains
> > arm64: dts: rockchip: enable NPU on ROCK 5B
> >
> > Tomeu Vizoso (8):
> > accel/rocket: Add registers header
> > accel/rocket: Add a new driver for Rockchip's NPU
> > accel/rocket: Add IOCTL for BO creation
> > accel/rocket: Add job submission IOCTL
> > accel/rocket: Add IOCTLs for synchronizing memory accesses
> > dt-bindings: npu: rockchip,rknn: Add bindings
> > arm64: dts: rockchip: Add nodes for NPU and its MMU to rk3588-base
> > arm64: dts: rockchip: Enable the NPU on quartzpro64
>
> from a handling point of view, I would expect patch 1 - 6
> (driver code + dt-binding patch) to go through some driver tree
> but have not clue which one that is.
I think Jeff Hugo would be pulling it into drm-misc? We still need a
R-b for the job submission patch, and also have the userspace driver
ready for merging.
> And afterwards, I would pick up the arm64 devicetree additions
> patches 7 - 10 .
Sounds great.
Thanks!
Tomeu
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v7 03/10] accel/rocket: Add IOCTL for BO creation
2025-06-06 6:28 ` [PATCH v7 03/10] accel/rocket: Add IOCTL for BO creation Tomeu Vizoso
@ 2025-06-24 10:17 ` Robin Murphy
2025-07-01 19:31 ` Heiko Stübner
2025-07-11 17:40 ` Andrew Davis
2 siblings, 0 replies; 30+ messages in thread
From: Robin Murphy @ 2025-06-24 10:17 UTC (permalink / raw)
To: Tomeu Vizoso, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Heiko Stuebner, Oded Gabbay, Jonathan Corbet, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
Sumit Semwal, Christian König, Sebastian Reichel,
Nicolas Frattaroli, Kever Yang, Daniel Stone, Da Xue, Jeff Hugo
Cc: devicetree, linux-arm-kernel, linux-rockchip, linux-kernel,
dri-devel, linux-doc, linux-media, linaro-mm-sig
On 2025-06-06 7:28 am, Tomeu Vizoso wrote:
> This uses the SHMEM DRM helpers and we map right away to the CPU and NPU
> sides, as all buffers are expected to be accessed from both.
>
> v2:
> - Sync the IOMMUs for the other cores when mapping and unmapping.
>
> v3:
> - Make use of GPL-2.0-only for the copyright notice (Jeff Hugo)
>
> v6:
> - Use mutexes guard (Markus Elfring)
>
> v7:
> - Assign its own IOMMU domain to each client, for isolation (Daniel
> Stone and Robin Murphy)
>
> Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
> Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
> ---
> drivers/accel/rocket/Makefile | 3 +-
> drivers/accel/rocket/rocket_device.c | 4 ++
> drivers/accel/rocket/rocket_device.h | 2 +
> drivers/accel/rocket/rocket_drv.c | 7 ++-
> drivers/accel/rocket/rocket_gem.c | 115 +++++++++++++++++++++++++++++++++++
> drivers/accel/rocket/rocket_gem.h | 27 ++++++++
> include/uapi/drm/rocket_accel.h | 44 ++++++++++++++
> 7 files changed, 200 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/accel/rocket/Makefile b/drivers/accel/rocket/Makefile
> index abdd75f2492eaecf8bf5e78a2ac150ea19ac3e96..4deef267f9e1238c4d8bd108dcc8afd9dc8b2b8f 100644
> --- a/drivers/accel/rocket/Makefile
> +++ b/drivers/accel/rocket/Makefile
> @@ -5,4 +5,5 @@ obj-$(CONFIG_DRM_ACCEL_ROCKET) := rocket.o
> rocket-y := \
> rocket_core.o \
> rocket_device.o \
> - rocket_drv.o
> + rocket_drv.o \
> + rocket_gem.o
> diff --git a/drivers/accel/rocket/rocket_device.c b/drivers/accel/rocket/rocket_device.c
> index a05c103e117e3eaa6439884b7acb6e3483296edb..5e559104741af22c528914c96e44558323ab6c89 100644
> --- a/drivers/accel/rocket/rocket_device.c
> +++ b/drivers/accel/rocket/rocket_device.c
> @@ -4,6 +4,7 @@
> #include <linux/array_size.h>
> #include <linux/clk.h>
> #include <linux/dev_printk.h>
> +#include <linux/mutex.h>
>
> #include "rocket_device.h"
>
> @@ -16,10 +17,13 @@ int rocket_device_init(struct rocket_device *rdev)
> if (err)
> return err;
>
> + mutex_init(&rdev->iommu_lock);
> +
> return 0;
> }
>
> void rocket_device_fini(struct rocket_device *rdev)
> {
> + mutex_destroy(&rdev->iommu_lock);
> rocket_core_fini(&rdev->cores[0]);
> }
> diff --git a/drivers/accel/rocket/rocket_device.h b/drivers/accel/rocket/rocket_device.h
> index b5d5f1479d56e2fde59bbcad9de2b58cef9a9a4d..10acfe8534f00a7985d40a93f4b2f7f69d43caee 100644
> --- a/drivers/accel/rocket/rocket_device.h
> +++ b/drivers/accel/rocket/rocket_device.h
> @@ -13,6 +13,8 @@
> struct rocket_device {
> struct drm_device ddev;
>
> + struct mutex iommu_lock;
> +
> struct rocket_core *cores;
> unsigned int num_cores;
> };
> diff --git a/drivers/accel/rocket/rocket_drv.c b/drivers/accel/rocket/rocket_drv.c
> index b38a5c6264cb4e74d5e381adaeba1426e576fa56..2b8a88db20c408f313f4f4fe36b051c9d5e4829b 100644
> --- a/drivers/accel/rocket/rocket_drv.c
> +++ b/drivers/accel/rocket/rocket_drv.c
> @@ -6,6 +6,7 @@
> #include <drm/drm_gem.h>
> #include <drm/drm_ioctl.h>
> #include <drm/drm_of.h>
> +#include <drm/rocket_accel.h>
> #include <linux/array_size.h>
> #include <linux/clk.h>
> #include <linux/component.h>
> @@ -16,6 +17,7 @@
> #include <linux/pm_runtime.h>
>
> #include "rocket_drv.h"
> +#include "rocket_gem.h"
>
> static int
> rocket_open(struct drm_device *dev, struct drm_file *file)
> @@ -46,6 +48,8 @@ rocket_postclose(struct drm_device *dev, struct drm_file *file)
> static const struct drm_ioctl_desc rocket_drm_driver_ioctls[] = {
> #define ROCKET_IOCTL(n, func) \
> DRM_IOCTL_DEF_DRV(ROCKET_##n, rocket_ioctl_##func, 0)
> +
> + ROCKET_IOCTL(CREATE_BO, create_bo),
> };
>
> DEFINE_DRM_ACCEL_FOPS(rocket_accel_driver_fops);
> @@ -55,9 +59,10 @@ DEFINE_DRM_ACCEL_FOPS(rocket_accel_driver_fops);
> * - 1.0 - initial interface
> */
> static const struct drm_driver rocket_drm_driver = {
> - .driver_features = DRIVER_COMPUTE_ACCEL,
> + .driver_features = DRIVER_COMPUTE_ACCEL | DRIVER_GEM,
> .open = rocket_open,
> .postclose = rocket_postclose,
> + .gem_create_object = rocket_gem_create_object,
> .ioctls = rocket_drm_driver_ioctls,
> .num_ioctls = ARRAY_SIZE(rocket_drm_driver_ioctls),
> .fops = &rocket_accel_driver_fops,
> diff --git a/drivers/accel/rocket/rocket_gem.c b/drivers/accel/rocket/rocket_gem.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..61b7f970a6885aa13784daa1222611a02aa10dee
> --- /dev/null
> +++ b/drivers/accel/rocket/rocket_gem.c
> @@ -0,0 +1,115 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/* Copyright 2024-2025 Tomeu Vizoso <tomeu@tomeuvizoso.net> */
> +
> +#include <drm/drm_device.h>
> +#include <drm/drm_utils.h>
> +#include <drm/rocket_accel.h>
> +#include <linux/dma-mapping.h>
> +#include <linux/iommu.h>
> +
> +#include "rocket_device.h"
> +#include "rocket_drv.h"
> +#include "rocket_gem.h"
> +
> +static void rocket_gem_bo_free(struct drm_gem_object *obj)
> +{
> + struct rocket_device *rdev = to_rocket_device(obj->dev);
> + struct rocket_gem_object *bo = to_rocket_bo(obj);
> + size_t unmapped;
> +
> + drm_WARN_ON(obj->dev, bo->base.pages_use_count > 1);
> +
> + guard(mutex)(&rdev->iommu_lock);
> +
> + unmapped = iommu_unmap(bo->domain, bo->base.sgt->sgl->dma_address, bo->size);
> + drm_WARN_ON(obj->dev, unmapped != bo->size);
> +
> + /* This will unmap the pages from the IOMMU linked to core 0 */
This means "DMA-unmap the pages", right? If things have been done
correctly then the iommu_unmap() above will already have removed the
actual translation all cores' IOMMUs were using.
> + drm_gem_shmem_free(&bo->base);
> +}
> +
> +static const struct drm_gem_object_funcs rocket_gem_funcs = {
> + .free = rocket_gem_bo_free,
> + .print_info = drm_gem_shmem_object_print_info,
> + .pin = drm_gem_shmem_object_pin,
> + .unpin = drm_gem_shmem_object_unpin,
> + .get_sg_table = drm_gem_shmem_object_get_sg_table,
> + .vmap = drm_gem_shmem_object_vmap,
> + .vunmap = drm_gem_shmem_object_vunmap,
> + .mmap = drm_gem_shmem_object_mmap,
> + .vm_ops = &drm_gem_shmem_vm_ops,
> +};
> +
> +struct drm_gem_object *rocket_gem_create_object(struct drm_device *dev, size_t size)
> +{
> + struct rocket_gem_object *obj;
> +
> + obj = kzalloc(sizeof(*obj), GFP_KERNEL);
> + if (!obj)
> + return ERR_PTR(-ENOMEM);
> +
> + obj->base.base.funcs = &rocket_gem_funcs;
> +
> + return &obj->base.base;
> +}
> +
> +int rocket_ioctl_create_bo(struct drm_device *dev, void *data, struct drm_file *file)
> +{
> + struct rocket_file_priv *rocket_priv = file->driver_priv;
> + struct drm_rocket_create_bo *args = data;
> + struct rocket_device *rdev = to_rocket_device(dev);
> + struct drm_gem_shmem_object *shmem_obj;
> + struct rocket_gem_object *rkt_obj;
> + struct drm_gem_object *gem_obj;
> + struct sg_table *sgt;
> + int ret;
> +
> + shmem_obj = drm_gem_shmem_create(dev, args->size);
> + if (IS_ERR(shmem_obj))
> + return PTR_ERR(shmem_obj);
> +
> + gem_obj = &shmem_obj->base;
> + rkt_obj = to_rocket_bo(gem_obj);
> +
> + rkt_obj->domain = rocket_priv->domain;
> + rkt_obj->size = args->size;
> + rkt_obj->offset = 0;
> +
> + ret = drm_gem_handle_create(file, gem_obj, &args->handle);
> + drm_gem_object_put(gem_obj);
> +
> + guard(mutex)(&rdev->iommu_lock);
> +
> + if (ret)
> + goto err;
> +
> + sgt = drm_gem_shmem_get_pages_sgt(shmem_obj);
> + if (IS_ERR(sgt)) {
> + ret = PTR_ERR(sgt);
> + goto err;
> + }
> +
> + ret = iommu_map_sgtable(rocket_priv->domain,
> + shmem_obj->sgt->sgl->dma_address,
Is this expected to be a DMA address implicitly generated by the
dma_map_sg() in drm_gem_shmem_get_pages_sgt()? I would strongly
recommend against relying on that - at the moment it happens that
iommu-dma still does complete dma_map_* operations in the unattached DMA
ops domain, mostly redundantly, but I've long been meaning to optimise
that so that it only performs any necessary cache maintenance on the
underlying memory when the caller is already using their own IOMMU
domain. At that point the returned DMA address is likely to just be the
PA, and this tactic probably won't work.
> + shmem_obj->sgt,
> + IOMMU_READ | IOMMU_WRITE);
> + if (ret < 0 || ret < args->size) {
> + drm_err(dev, "failed to map buffer: size=%d request_size=%u\n",
> + ret, args->size);
> + ret = -ENOMEM;
> + goto err;
> + }
> +
> + /* iommu_map_sgtable might have aligned the size */
> + rkt_obj->size = ret;
> + dma_sync_sgtable_for_device(dev->dev, shmem_obj->sgt, DMA_BIDIRECTIONAL);
What's this for? The buffer is already in for_device state when it
initially comes back from get_pages_sgt, and hasn't even been touched
yet anyway.
Thanks,
Robin.
> + args->offset = drm_vma_node_offset_addr(&gem_obj->vma_node);
> + args->dma_address = sg_dma_address(shmem_obj->sgt->sgl);
> +
> + return 0;
> +
> +err:
> + drm_gem_shmem_object_free(gem_obj);
> +
> + return ret;
> +}
> diff --git a/drivers/accel/rocket/rocket_gem.h b/drivers/accel/rocket/rocket_gem.h
> new file mode 100644
> index 0000000000000000000000000000000000000000..e8a4d6213fd80419be2ec8af04583a67fb1a4b75
> --- /dev/null
> +++ b/drivers/accel/rocket/rocket_gem.h
> @@ -0,0 +1,27 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/* Copyright 2024-2025 Tomeu Vizoso <tomeu@tomeuvizoso.net> */
> +
> +#ifndef __ROCKET_GEM_H__
> +#define __ROCKET_GEM_H__
> +
> +#include <drm/drm_gem_shmem_helper.h>
> +
> +struct rocket_gem_object {
> + struct drm_gem_shmem_object base;
> +
> + struct iommu_domain *domain;
> + size_t size;
> + u32 offset;
> +};
> +
> +struct drm_gem_object *rocket_gem_create_object(struct drm_device *dev, size_t size);
> +
> +int rocket_ioctl_create_bo(struct drm_device *dev, void *data, struct drm_file *file);
> +
> +static inline
> +struct rocket_gem_object *to_rocket_bo(struct drm_gem_object *obj)
> +{
> + return container_of(to_drm_gem_shmem_obj(obj), struct rocket_gem_object, base);
> +}
> +
> +#endif
> diff --git a/include/uapi/drm/rocket_accel.h b/include/uapi/drm/rocket_accel.h
> new file mode 100644
> index 0000000000000000000000000000000000000000..95720702b7c4413d72b89c1f0f59abb22dc8c6b3
> --- /dev/null
> +++ b/include/uapi/drm/rocket_accel.h
> @@ -0,0 +1,44 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2024 Tomeu Vizoso
> + */
> +#ifndef __DRM_UAPI_ROCKET_ACCEL_H__
> +#define __DRM_UAPI_ROCKET_ACCEL_H__
> +
> +#include "drm.h"
> +
> +#if defined(__cplusplus)
> +extern "C" {
> +#endif
> +
> +#define DRM_ROCKET_CREATE_BO 0x00
> +
> +#define DRM_IOCTL_ROCKET_CREATE_BO DRM_IOWR(DRM_COMMAND_BASE + DRM_ROCKET_CREATE_BO, struct drm_rocket_create_bo)
> +
> +/**
> + * struct drm_rocket_create_bo - ioctl argument for creating Rocket BOs.
> + *
> + */
> +struct drm_rocket_create_bo {
> + /** Input: Size of the requested BO. */
> + __u32 size;
> +
> + /** Output: GEM handle for the BO. */
> + __u32 handle;
> +
> + /**
> + * Output: DMA address for the BO in the NPU address space. This address
> + * is private to the DRM fd and is valid for the lifetime of the GEM
> + * handle.
> + */
> + __u64 dma_address;
> +
> + /** Output: Offset into the drm node to use for subsequent mmap call. */
> + __u64 offset;
> +};
> +
> +#if defined(__cplusplus)
> +}
> +#endif
> +
> +#endif /* __DRM_UAPI_ROCKET_ACCEL_H__ */
>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v7 06/10] dt-bindings: npu: rockchip,rknn: Add bindings
2025-06-06 6:28 ` [PATCH v7 06/10] dt-bindings: npu: rockchip,rknn: Add bindings Tomeu Vizoso
@ 2025-06-24 13:27 ` Robin Murphy
2025-07-11 16:02 ` Tomeu Vizoso
0 siblings, 1 reply; 30+ messages in thread
From: Robin Murphy @ 2025-06-24 13:27 UTC (permalink / raw)
To: Tomeu Vizoso, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Heiko Stuebner, Oded Gabbay, Jonathan Corbet, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
Sumit Semwal, Christian König, Sebastian Reichel,
Nicolas Frattaroli, Kever Yang, Daniel Stone, Da Xue, Jeff Hugo
Cc: devicetree, linux-arm-kernel, linux-rockchip, linux-kernel,
dri-devel, linux-doc, linux-media, linaro-mm-sig,
Krzysztof Kozlowski
On 2025-06-06 7:28 am, Tomeu Vizoso wrote:
> Add the bindings for the Neural Processing Unit IP from Rockchip.
>
> v2:
> - Adapt to new node structure (one node per core, each with its own
> IOMMU)
> - Several misc. fixes from Sebastian Reichel
>
> v3:
> - Split register block in its constituent subblocks, and only require
> the ones that the kernel would ever use (Nicolas Frattaroli)
> - Group supplies (Rob Herring)
> - Explain the way in which the top core is special (Rob Herring)
>
> v4:
> - Change required node name to npu@ (Rob Herring and Krzysztof Kozlowski)
> - Remove unneeded items: (Krzysztof Kozlowski)
> - Fix use of minItems/maxItems (Krzysztof Kozlowski)
> - Add reg-names to list of required properties (Krzysztof Kozlowski)
> - Fix example (Krzysztof Kozlowski)
>
> v5:
> - Rename file to rockchip,rk3588-rknn-core.yaml (Krzysztof Kozlowski)
> - Streamline compatible property (Krzysztof Kozlowski)
>
> v6:
> - Remove mention to NVDLA, as the hardware is only incidentally related
> (Kever Yang)
> - Mark pclk and npu clocks as required by all clocks (Rob Herring)
>
> v7:
> - Remove allOf section, not needed now that all nodes require 4 clocks
> (Heiko Stübner)
>
> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
> Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
> ---
> .../bindings/npu/rockchip,rk3588-rknn-core.yaml | 118 +++++++++++++++++++++
> 1 file changed, 118 insertions(+)
>
> diff --git a/Documentation/devicetree/bindings/npu/rockchip,rk3588-rknn-core.yaml b/Documentation/devicetree/bindings/npu/rockchip,rk3588-rknn-core.yaml
> new file mode 100644
> index 0000000000000000000000000000000000000000..0588c085a723a34f4fa30a9680ea948d960b092f
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/npu/rockchip,rk3588-rknn-core.yaml
> @@ -0,0 +1,118 @@
> +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/npu/rockchip,rk3588-rknn-core.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: Neural Processing Unit IP from Rockchip
> +
> +maintainers:
> + - Tomeu Vizoso <tomeu@tomeuvizoso.net>
> +
> +description:
> + Rockchip IP for accelerating inference of neural networks.
> +
> + There is to be a node per each core in the NPU. In Rockchip's design there
> + will be one core that is special because it is able to redistribute work to
> + the other cores by forwarding register writes and sharing data. This special
> + core is called the top core and should have the compatible string that
> + corresponds to top cores.
Say a future SoC, for scaling reasons, puts down two or more whole NPUs
rather than just increasing the number of sub-cores in one? How is a DT
consumer then going to know which "cores" are associated with which "top
cores"? I think at the very least they want phandles in one direction or
the other, but if there is a real functional hierarchy then I'd be
strongly tempted to have the "core" nodes as children of their "top
core", particularly since "forwarding register writes" sounds absolutely
like something which could justify being represented as a "bus" in the
DT sense.
Thanks,
Robin.
> +
> +properties:
> + $nodename:
> + pattern: '^npu@[a-f0-9]+$'
> +
> + compatible:
> + enum:
> + - rockchip,rk3588-rknn-core-top
> + - rockchip,rk3588-rknn-core
> +
> + reg:
> + maxItems: 3
> +
> + reg-names:
> + items:
> + - const: pc
> + - const: cna
> + - const: core
> +
> + clocks:
> + maxItems: 4
> +
> + clock-names:
> + items:
> + - const: aclk
> + - const: hclk
> + - const: npu
> + - const: pclk
> +
> + interrupts:
> + maxItems: 1
> +
> + iommus:
> + maxItems: 1
> +
> + npu-supply: true
> +
> + power-domains:
> + maxItems: 1
> +
> + resets:
> + maxItems: 2
> +
> + reset-names:
> + items:
> + - const: srst_a
> + - const: srst_h
> +
> + sram-supply: true
> +
> +required:
> + - compatible
> + - reg
> + - reg-names
> + - clocks
> + - clock-names
> + - interrupts
> + - iommus
> + - power-domains
> + - resets
> + - reset-names
> + - npu-supply
> + - sram-supply
> +
> +additionalProperties: false
> +
> +examples:
> + - |
> + #include <dt-bindings/clock/rockchip,rk3588-cru.h>
> + #include <dt-bindings/interrupt-controller/irq.h>
> + #include <dt-bindings/interrupt-controller/arm-gic.h>
> + #include <dt-bindings/power/rk3588-power.h>
> + #include <dt-bindings/reset/rockchip,rk3588-cru.h>
> +
> + bus {
> + #address-cells = <2>;
> + #size-cells = <2>;
> +
> + npu@fdab0000 {
> + compatible = "rockchip,rk3588-rknn-core-top";
> + reg = <0x0 0xfdab0000 0x0 0x1000>,
> + <0x0 0xfdab1000 0x0 0x1000>,
> + <0x0 0xfdab3000 0x0 0x1000>;
> + reg-names = "pc", "cna", "core";
> + assigned-clocks = <&scmi_clk SCMI_CLK_NPU>;
> + assigned-clock-rates = <200000000>;
> + clocks = <&cru ACLK_NPU0>, <&cru HCLK_NPU0>,
> + <&scmi_clk SCMI_CLK_NPU>, <&cru PCLK_NPU_ROOT>;
> + clock-names = "aclk", "hclk", "npu", "pclk";
> + interrupts = <GIC_SPI 110 IRQ_TYPE_LEVEL_HIGH 0>;
> + iommus = <&rknn_mmu_top>;
> + npu-supply = <&vdd_npu_s0>;
> + power-domains = <&power RK3588_PD_NPUTOP>;
> + resets = <&cru SRST_A_RKNN0>, <&cru SRST_H_RKNN0>;
> + reset-names = "srst_a", "srst_h";
> + sram-supply = <&vdd_npu_mem_s0>;
> + };
> + };
> +...
>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v7 04/10] accel/rocket: Add job submission IOCTL
2025-06-06 6:28 ` [PATCH v7 04/10] accel/rocket: Add job submission IOCTL Tomeu Vizoso
@ 2025-06-24 13:50 ` Robin Murphy
2025-07-11 16:00 ` Tomeu Vizoso
2025-06-30 20:56 ` Rob Herring
1 sibling, 1 reply; 30+ messages in thread
From: Robin Murphy @ 2025-06-24 13:50 UTC (permalink / raw)
To: Tomeu Vizoso, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Heiko Stuebner, Oded Gabbay, Jonathan Corbet, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
Sumit Semwal, Christian König, Sebastian Reichel,
Nicolas Frattaroli, Kever Yang, Daniel Stone, Da Xue, Jeff Hugo
Cc: devicetree, linux-arm-kernel, linux-rockchip, linux-kernel,
dri-devel, linux-doc, linux-media, linaro-mm-sig
On 2025-06-06 7:28 am, Tomeu Vizoso wrote:
[...]
> diff --git a/drivers/accel/rocket/rocket_device.h b/drivers/accel/rocket/rocket_device.h
> index 10acfe8534f00a7985d40a93f4b2f7f69d43caee..50e46f0516bd1615b5f826c5002a6c0ecbf9aed4 100644
> --- a/drivers/accel/rocket/rocket_device.h
> +++ b/drivers/accel/rocket/rocket_device.h
> @@ -13,6 +13,8 @@
> struct rocket_device {
> struct drm_device ddev;
>
> + struct mutex sched_lock;
> +
> struct mutex iommu_lock;
Just realised I missed this in the last patch, but iommu_lock appears to
be completely unnecessary now.
> struct rocket_core *cores;
[...]
> +static void rocket_job_hw_submit(struct rocket_core *core, struct rocket_job *job)
> +{
> + struct rocket_task *task;
> + bool task_pp_en = 1;
> + bool task_count = 1;
> +
> + /* GO ! */
> +
> + /* Don't queue the job if a reset is in progress */
> + if (atomic_read(&core->reset.pending))
> + return;
> +
> + task = &job->tasks[job->next_task_idx];
> + job->next_task_idx++;
> +
> + rocket_pc_writel(core, BASE_ADDRESS, 0x1);
> +
> + rocket_cna_writel(core, S_POINTER, 0xe + 0x10000000 * core->index);
> + rocket_core_writel(core, S_POINTER, 0xe + 0x10000000 * core->index);
Those really look like bitfield operations rather than actual arithmetic
to me.
> +
> + rocket_pc_writel(core, BASE_ADDRESS, task->regcmd);
I don't see how regcmd is created (I guess that's in userspace?), but
given that it's explicitly u64 all the way through - and especially
since you claim to support 40-bit DMA addresses - it definitely seems
suspicious that the upper 32 bits never seem to be consumed anywhere :/
> + rocket_pc_writel(core, REGISTER_AMOUNTS, (task->regcmd_count + 1) / 2 - 1);
> +
> + rocket_pc_writel(core, INTERRUPT_MASK, PC_INTERRUPT_MASK_DPU_0 | PC_INTERRUPT_MASK_DPU_1);
> + rocket_pc_writel(core, INTERRUPT_CLEAR, PC_INTERRUPT_CLEAR_DPU_0 | PC_INTERRUPT_CLEAR_DPU_1);
> +
> + rocket_pc_writel(core, TASK_CON, ((0x6 | task_pp_en) << 12) | task_count);
> +
> + rocket_pc_writel(core, TASK_DMA_BASE_ADDR, 0x0);
> +
> + rocket_pc_writel(core, OPERATION_ENABLE, 0x1);
> +
> + dev_dbg(core->dev, "Submitted regcmd at 0x%llx to core %d", task->regcmd, core->index);
> +}
[...]
> +static struct dma_fence *rocket_job_run(struct drm_sched_job *sched_job)
> +{
> + struct rocket_job *job = to_rocket_job(sched_job);
> + struct rocket_device *rdev = job->rdev;
> + struct rocket_core *core = sched_to_core(rdev, sched_job->sched);
> + struct dma_fence *fence = NULL;
> + int ret;
> +
> + if (unlikely(job->base.s_fence->finished.error))
> + return NULL;
> +
> + /*
> + * Nothing to execute: can happen if the job has finished while
> + * we were resetting the GPU.
GPU? (Similarly in various other comments/prints)
> + */
> + if (job->next_task_idx == job->task_count)
> + return NULL;
> +
> + fence = rocket_fence_create(core);
> + if (IS_ERR(fence))
> + return fence;
> +
> + if (job->done_fence)
> + dma_fence_put(job->done_fence);
> + job->done_fence = dma_fence_get(fence);
> +
> + ret = pm_runtime_get_sync(core->dev);
> + if (ret < 0)
> + return fence;
> +
> + ret = iommu_attach_group(job->domain, iommu_group_get(core->dev));
I don't see iommu_group_put() anywhere, so you're leaking refcounts all
over.
> + if (ret < 0)
> + return fence;
> +
> + scoped_guard(spinlock, &core->job_lock) {
> + core->in_flight_job = job;
> + rocket_job_hw_submit(core, job);
> + }
> +
> + return fence;
> +}
[...]
> +static void rocket_job_handle_irq(struct rocket_core *core)
> +{
> + u32 status, raw_status;
> +
> + pm_runtime_mark_last_busy(core->dev);
> +
> + status = rocket_pc_readl(core, INTERRUPT_STATUS);
> + raw_status = rocket_pc_readl(core, INTERRUPT_RAW_STATUS);
> +
> + rocket_pc_writel(core, OPERATION_ENABLE, 0x0);
> + rocket_pc_writel(core, INTERRUPT_CLEAR, 0x1ffff);
What was the point of reading the status registers if we're just going
to blindly clear every possible condition anyway?
> + scoped_guard(spinlock, &core->job_lock)
> + if (core->in_flight_job)
> + rocket_job_handle_done(core, core->in_flight_job);
But then is it really OK to just start the next task regardless of
whether the current task was reporting successful completion or an error?
> +}
> +
> +static void
> +rocket_reset(struct rocket_core *core, struct drm_sched_job *bad)
> +{
> + bool cookie;
> +
> + if (!atomic_read(&core->reset.pending))
> + return;
> +
> + /*
> + * Stop the scheduler.
> + *
> + * FIXME: We temporarily get out of the dma_fence_signalling section
> + * because the cleanup path generate lockdep splats when taking locks
> + * to release job resources. We should rework the code to follow this
> + * pattern:
> + *
> + * try_lock
> + * if (locked)
> + * release
> + * else
> + * schedule_work_to_release_later
> + */
> + drm_sched_stop(&core->sched, bad);
> +
> + cookie = dma_fence_begin_signalling();
> +
> + if (bad)
> + drm_sched_increase_karma(bad);
> +
> + /*
> + * Mask job interrupts and synchronize to make sure we won't be
> + * interrupted during our reset.
> + */
> + rocket_pc_writel(core, INTERRUPT_MASK, 0x0);
> + synchronize_irq(core->irq);
...except it's a shared IRQ, so it can still merrily fire at any time.
> +
> + /* Handle the remaining interrupts before we reset. */
> + rocket_job_handle_irq(core);
> +
> + /*
> + * Remaining interrupts have been handled, but we might still have
> + * stuck jobs. Let's make sure the PM counters stay balanced by
> + * manually calling pm_runtime_put_noidle() and
> + * rocket_devfreq_record_idle() for each stuck job.
> + * Let's also make sure the cycle counting register's refcnt is
> + * kept balanced to prevent it from running forever
Comments that don't match the code are more confusing than helpful :/
> + */
> + scoped_guard(spinlock, &core->job_lock) {
> + if (core->in_flight_job)
> + pm_runtime_put_noidle(core->dev);
> +
> + core->in_flight_job = NULL;
> + }
> +
> + /* Proceed with reset now. */
> + pm_runtime_force_suspend(core->dev);
> + pm_runtime_force_resume(core->dev);
Can you guarantee that actually resets the hardware if something else is
holding the power domain open or RPM is disabled? I'm not familiar with
the details of drm_sched, but if there are other jobs queued behind the
stuck one would it even pass the rocket_job_is_idle() check for suspend
to succeed anyway?
Not to mention that you have an actual reset control in the DT binding,
which isn't even optional... :/
> + /* GPU has been reset, we can clear the reset pending bit. */
> + atomic_set(&core->reset.pending, 0);
> +
> + /*
> + * Now resubmit jobs that were previously queued but didn't have a
> + * chance to finish.
> + * FIXME: We temporarily get out of the DMA fence signalling section
> + * while resubmitting jobs because the job submission logic will
> + * allocate memory with the GFP_KERNEL flag which can trigger memory
> + * reclaim and exposes a lock ordering issue.
> + */
> + dma_fence_end_signalling(cookie);
> + drm_sched_resubmit_jobs(&core->sched);
Since I happened to look, this says it's deprecated?
> + cookie = dma_fence_begin_signalling();
> +
> + /* Restart the scheduler */
> + drm_sched_start(&core->sched, 0);
> +
> + dma_fence_end_signalling(cookie);
> +}
> +
> +static enum drm_gpu_sched_stat rocket_job_timedout(struct drm_sched_job *sched_job)
> +{
> + struct rocket_job *job = to_rocket_job(sched_job);
> + struct rocket_device *rdev = job->rdev;
> + struct rocket_core *core = sched_to_core(rdev, sched_job->sched);
> +
> + /*
> + * If the GPU managed to complete this jobs fence, the timeout is
> + * spurious. Bail out.
> + */
> + if (dma_fence_is_signaled(job->done_fence))
> + return DRM_GPU_SCHED_STAT_NOMINAL;
Do we really need the same return condition twice? What if the IRQ fires
immediately after we've made this check, and is handled without delay
such that sychronize_irq() effectively still does nothing? Either way
we've taken longer than the timeout value to observe the job completing
successfully, and either that's significant and worth warning about or
it's not - I don't see any point in trying to (inaccurately) nitpick
*why* it might have happened.
> + /*
> + * Rocket IRQ handler may take a long time to process an interrupt
> + * if there is another IRQ handler hogging the processing.
> + * For example, the HDMI encoder driver might be stuck in the IRQ
> + * handler for a significant time in a case of bad cable connection.
What have HDMI cables got to do with anything here? Yes, in general IRQ
latency can be high, since CPUs can have IRQs masked and/or be taking
higher-priority interrupts for any number of reasons. I don't see how an
oddly-specific example (of apparently poor driver design, to boot) is
useful.
> + * In order to catch such cases and not report spurious rocket
> + * job timeouts, synchronize the IRQ handler and re-check the fence
> + * status.
> + */
> + synchronize_irq(core->irq);
> +
> + if (dma_fence_is_signaled(job->done_fence)) {
> + dev_warn(core->dev, "unexpectedly high interrupt latency\n");
> + return DRM_GPU_SCHED_STAT_NOMINAL;
> + }
> +
> + dev_err(core->dev, "gpu sched timeout");
> +
> + atomic_set(&core->reset.pending, 1);
> + rocket_reset(core, sched_job);
> + iommu_detach_group(NULL, iommu_group_get(core->dev));
> +
> + return DRM_GPU_SCHED_STAT_NOMINAL;
> +}
> +
> +static void rocket_reset_work(struct work_struct *work)
> +{
> + struct rocket_core *core;
> +
> + core = container_of(work, struct rocket_core, reset.work);
> + rocket_reset(core, NULL);
> +}
> +
> +static const struct drm_sched_backend_ops rocket_sched_ops = {
> + .run_job = rocket_job_run,
> + .timedout_job = rocket_job_timedout,
> + .free_job = rocket_job_free
> +};
> +
> +static irqreturn_t rocket_job_irq_handler_thread(int irq, void *data)
> +{
> + struct rocket_core *core = data;
> +
> + rocket_job_handle_irq(core);
> +
> + return IRQ_HANDLED;
> +}
> +
> +static irqreturn_t rocket_job_irq_handler(int irq, void *data)
> +{
> + struct rocket_core *core = data;
> + u32 raw_status = rocket_pc_readl(core, INTERRUPT_RAW_STATUS);
Given that this can be a shared IRQ as above, it would be a good idea to
take care to avoid register accesses while suspended. Especially if
you're trying to utilise suspend to reset a failing job that may well be
throwing IOMMU faults.
> +
> + WARN_ON(raw_status & PC_INTERRUPT_RAW_STATUS_DMA_READ_ERROR);
> + WARN_ON(raw_status & PC_INTERRUPT_RAW_STATUS_DMA_READ_ERROR);
> +
> + if (!(raw_status & PC_INTERRUPT_RAW_STATUS_DPU_0 ||
> + raw_status & PC_INTERRUPT_RAW_STATUS_DPU_1))
> + return IRQ_NONE;
> +
> + rocket_pc_writel(core, INTERRUPT_MASK, 0x0);
> +
> + return IRQ_WAKE_THREAD;
> +}
> +
> +int rocket_job_init(struct rocket_core *core)
> +{
> + struct drm_sched_init_args args = {
> + .ops = &rocket_sched_ops,
> + .num_rqs = DRM_SCHED_PRIORITY_COUNT,
> + .credit_limit = 1,
Ah, does this mean that all the stuff about queued jobs was in fact all
nonsense anyway?
> + .timeout = msecs_to_jiffies(JOB_TIMEOUT_MS),
> + .name = dev_name(core->dev),
> + .dev = core->dev,
> + };
> + int ret;
> +
> + INIT_WORK(&core->reset.work, rocket_reset_work);
> + spin_lock_init(&core->job_lock);
> +
> + core->irq = platform_get_irq(to_platform_device(core->dev), 0);
> + if (core->irq < 0)
> + return core->irq;
> +
> + ret = devm_request_threaded_irq(core->dev, core->irq,
> + rocket_job_irq_handler,
> + rocket_job_irq_handler_thread,
> + IRQF_SHARED, KBUILD_MODNAME "-job",
Is it really a "job" interrupt though? The binding and the register
definitions suggest it's just a general status interrupt for the core.
Furthermore since we expect to have multiple cores, being able to more
easily identify and attribute per-core IRQ activity seems more useful
for debugging than copy-pasting from something really rather different
which also expects to be the only one of its kind on the system.
Thanks,
Robin.
> + core);
> + if (ret) {
> + dev_err(core->dev, "failed to request job irq");
> + return ret;
> + }
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v7 05/10] accel/rocket: Add IOCTLs for synchronizing memory accesses
2025-06-06 6:28 ` [PATCH v7 05/10] accel/rocket: Add IOCTLs for synchronizing memory accesses Tomeu Vizoso
@ 2025-06-24 16:52 ` Robin Murphy
0 siblings, 0 replies; 30+ messages in thread
From: Robin Murphy @ 2025-06-24 16:52 UTC (permalink / raw)
To: Tomeu Vizoso, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Heiko Stuebner, Oded Gabbay, Jonathan Corbet, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
Sumit Semwal, Christian König, Sebastian Reichel,
Nicolas Frattaroli, Kever Yang, Daniel Stone, Da Xue, Jeff Hugo
Cc: devicetree, linux-arm-kernel, linux-rockchip, linux-kernel,
dri-devel, linux-doc, linux-media, linaro-mm-sig
On 2025-06-06 7:28 am, Tomeu Vizoso wrote:
> The NPU cores have their own access to the memory bus, and this isn't
> cache coherent with the CPUs.
>
> Add IOCTLs so userspace can mark when the caches need to be flushed, and
> also when a writer job needs to be waited for before the buffer can be
> accessed from the CPU.
>
> Initially based on the same IOCTLs from the Etnaviv driver.
>
> v2:
> - Don't break UABI by reordering the IOCTL IDs (Jeff Hugo)
>
> v3:
> - Check that padding fields in IOCTLs are zero (Jeff Hugo)
>
> v6:
> - Fix conversion logic to make sure we use DMA_BIDIRECTIONAL when needed
> (Lucas Stach)
>
> Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
> Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com>
> ---
> drivers/accel/rocket/rocket_drv.c | 2 +
> drivers/accel/rocket/rocket_gem.c | 82 +++++++++++++++++++++++++++++++++++++++
> drivers/accel/rocket/rocket_gem.h | 5 +++
> include/uapi/drm/rocket_accel.h | 37 ++++++++++++++++++
> 4 files changed, 126 insertions(+)
>
> diff --git a/drivers/accel/rocket/rocket_drv.c b/drivers/accel/rocket/rocket_drv.c
> index 4ab78193c186dfcfc3e323f16c588e85e6a8a334..eb9284ee2511f730afe6a532225c2706ce0e2822 100644
> --- a/drivers/accel/rocket/rocket_drv.c
> +++ b/drivers/accel/rocket/rocket_drv.c
> @@ -62,6 +62,8 @@ static const struct drm_ioctl_desc rocket_drm_driver_ioctls[] = {
>
> ROCKET_IOCTL(CREATE_BO, create_bo),
> ROCKET_IOCTL(SUBMIT, submit),
> + ROCKET_IOCTL(PREP_BO, prep_bo),
> + ROCKET_IOCTL(FINI_BO, fini_bo),
> };
>
> DEFINE_DRM_ACCEL_FOPS(rocket_accel_driver_fops);
> diff --git a/drivers/accel/rocket/rocket_gem.c b/drivers/accel/rocket/rocket_gem.c
> index 61b7f970a6885aa13784daa1222611a02aa10dee..07024b6e71bf544dc7f00b008b9afb74b0c4e802 100644
> --- a/drivers/accel/rocket/rocket_gem.c
> +++ b/drivers/accel/rocket/rocket_gem.c
> @@ -113,3 +113,85 @@ int rocket_ioctl_create_bo(struct drm_device *dev, void *data, struct drm_file *
>
> return ret;
> }
> +
> +static inline enum dma_data_direction rocket_op_to_dma_dir(u32 op)
> +{
> + op &= ROCKET_PREP_READ | ROCKET_PREP_WRITE;
> +
> + if (op == ROCKET_PREP_READ)
> + return DMA_FROM_DEVICE;
> + else if (op == ROCKET_PREP_WRITE)
> + return DMA_TO_DEVICE;
> + else
> + return DMA_BIDIRECTIONAL;
> +}
> +
> +int rocket_ioctl_prep_bo(struct drm_device *dev, void *data, struct drm_file *file)
> +{
> + struct drm_rocket_prep_bo *args = data;
> + unsigned long timeout = drm_timeout_abs_to_jiffies(args->timeout_ns);
> + struct rocket_device *rdev = to_rocket_device(dev);
> + struct drm_gem_object *gem_obj;
> + struct drm_gem_shmem_object *shmem_obj;
> + bool write = !!(args->op & ROCKET_PREP_WRITE);
> + long ret = 0;
> +
> + if (args->op & ~(ROCKET_PREP_READ | ROCKET_PREP_WRITE))
> + return -EINVAL;
> +
> + gem_obj = drm_gem_object_lookup(file, args->handle);
> + if (!gem_obj)
> + return -ENOENT;
> +
> + ret = dma_resv_wait_timeout(gem_obj->resv, dma_resv_usage_rw(write),
> + true, timeout);
> + if (!ret)
> + ret = timeout ? -ETIMEDOUT : -EBUSY;
> +
> + shmem_obj = &to_rocket_bo(gem_obj)->base;
> +
> + for (unsigned int core = 1; core < rdev->num_cores; core++) {
Huh? If you need to sync the BO memory ever, then you need to sync it
for the same device it was mapped, and certainly not 0 or 2+ times
depending on how may cores happen to be enabled. Please throw
CONFIG_DMA_API_DEBUG at this.
> + dma_sync_sgtable_for_cpu(rdev->cores[core].dev, shmem_obj->sgt,
> + rocket_op_to_dma_dir(args->op));
Hmm, the intent of the API is really that the direction for sync should
match the direction for map and unmap too; if it was mapped
DMA_BIDIRECTIONAL then it should be synced DMA_BIDIRECTIONAL. If you
have BOs which are really only used for one-directional purposes then
they should be mapped as such at creation.
Does anything actually prevent one thread form trying to read from a
buffer while another thread is writing it, and thus the read
inintuitively destroying newly-written data (and/or the write
unwittingly destroying its own data in FINI_BO because last_cpu_prep_op
got overwritten)? Unless there's a significant measurable benefit to
trying to be clever here (of which I'm somewhjat doubtful), I would be
strongly inclined to just keep things simple and straightforward.
Thanks,
Robin.
> + }
> +
> + to_rocket_bo(gem_obj)->last_cpu_prep_op = args->op;
> +
> + drm_gem_object_put(gem_obj);
> +
> + return ret;
> +}
> +
> +int rocket_ioctl_fini_bo(struct drm_device *dev, void *data, struct drm_file *file)
> +{
> + struct rocket_device *rdev = to_rocket_device(dev);
> + struct drm_rocket_fini_bo *args = data;
> + struct drm_gem_shmem_object *shmem_obj;
> + struct rocket_gem_object *rkt_obj;
> + struct drm_gem_object *gem_obj;
> +
> + if (args->reserved != 0) {
> + drm_dbg(dev, "Reserved field in drm_rocket_fini_bo struct should be 0.\n");
> + return -EINVAL;
> + }
> +
> + gem_obj = drm_gem_object_lookup(file, args->handle);
> + if (!gem_obj)
> + return -ENOENT;
> +
> + rkt_obj = to_rocket_bo(gem_obj);
> + shmem_obj = &rkt_obj->base;
> +
> + WARN_ON(rkt_obj->last_cpu_prep_op == 0);
> +
> + for (unsigned int core = 1; core < rdev->num_cores; core++) {
> + dma_sync_sgtable_for_device(rdev->cores[core].dev, shmem_obj->sgt,
> + rocket_op_to_dma_dir(rkt_obj->last_cpu_prep_op));
> + }
> +
> + rkt_obj->last_cpu_prep_op = 0;
> +
> + drm_gem_object_put(gem_obj);
> +
> + return 0;
> +}
> diff --git a/drivers/accel/rocket/rocket_gem.h b/drivers/accel/rocket/rocket_gem.h
> index e8a4d6213fd80419be2ec8af04583a67fb1a4b75..a52a63cd78339a6150b99592ab5f94feeeb51fde 100644
> --- a/drivers/accel/rocket/rocket_gem.h
> +++ b/drivers/accel/rocket/rocket_gem.h
> @@ -12,12 +12,17 @@ struct rocket_gem_object {
> struct iommu_domain *domain;
> size_t size;
> u32 offset;
> + u32 last_cpu_prep_op;
> };
>
> struct drm_gem_object *rocket_gem_create_object(struct drm_device *dev, size_t size);
>
> int rocket_ioctl_create_bo(struct drm_device *dev, void *data, struct drm_file *file);
>
> +int rocket_ioctl_prep_bo(struct drm_device *dev, void *data, struct drm_file *file);
> +
> +int rocket_ioctl_fini_bo(struct drm_device *dev, void *data, struct drm_file *file);
> +
> static inline
> struct rocket_gem_object *to_rocket_bo(struct drm_gem_object *obj)
> {
> diff --git a/include/uapi/drm/rocket_accel.h b/include/uapi/drm/rocket_accel.h
> index cb1b5934c201160e7650aabd1b3a2b1c77b1fd7b..b5c80dd767be56e9720b51e4a82617a425a881a1 100644
> --- a/include/uapi/drm/rocket_accel.h
> +++ b/include/uapi/drm/rocket_accel.h
> @@ -13,9 +13,13 @@ extern "C" {
>
> #define DRM_ROCKET_CREATE_BO 0x00
> #define DRM_ROCKET_SUBMIT 0x01
> +#define DRM_ROCKET_PREP_BO 0x02
> +#define DRM_ROCKET_FINI_BO 0x03
>
> #define DRM_IOCTL_ROCKET_CREATE_BO DRM_IOWR(DRM_COMMAND_BASE + DRM_ROCKET_CREATE_BO, struct drm_rocket_create_bo)
> #define DRM_IOCTL_ROCKET_SUBMIT DRM_IOW(DRM_COMMAND_BASE + DRM_ROCKET_SUBMIT, struct drm_rocket_submit)
> +#define DRM_IOCTL_ROCKET_PREP_BO DRM_IOW(DRM_COMMAND_BASE + DRM_ROCKET_PREP_BO, struct drm_rocket_prep_bo)
> +#define DRM_IOCTL_ROCKET_FINI_BO DRM_IOW(DRM_COMMAND_BASE + DRM_ROCKET_FINI_BO, struct drm_rocket_fini_bo)
>
> /**
> * struct drm_rocket_create_bo - ioctl argument for creating Rocket BOs.
> @@ -39,6 +43,39 @@ struct drm_rocket_create_bo {
> __u64 offset;
> };
>
> +#define ROCKET_PREP_READ 0x01
> +#define ROCKET_PREP_WRITE 0x02
> +
> +/**
> + * struct drm_rocket_prep_bo - ioctl argument for starting CPU ownership of the BO.
> + *
> + * Takes care of waiting for any NPU jobs that might still use the NPU and performs cache
> + * synchronization.
> + */
> +struct drm_rocket_prep_bo {
> + /** Input: GEM handle of the buffer object. */
> + __u32 handle;
> +
> + /** Input: mask of ROCKET_PREP_x, direction of the access. */
> + __u32 op;
> +
> + /** Input: Amount of time to wait for NPU jobs. */
> + __s64 timeout_ns;
> +};
> +
> +/**
> + * struct drm_rocket_fini_bo - ioctl argument for finishing CPU ownership of the BO.
> + *
> + * Synchronize caches for NPU access.
> + */
> +struct drm_rocket_fini_bo {
> + /** Input: GEM handle of the buffer object. */
> + __u32 handle;
> +
> + /** Reserved, must be zero. */
> + __u32 reserved;
> +};
> +
> /**
> * struct drm_rocket_task - A task to be run on the NPU
> *
>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v7 04/10] accel/rocket: Add job submission IOCTL
2025-06-06 6:28 ` [PATCH v7 04/10] accel/rocket: Add job submission IOCTL Tomeu Vizoso
2025-06-24 13:50 ` Robin Murphy
@ 2025-06-30 20:56 ` Rob Herring
1 sibling, 0 replies; 30+ messages in thread
From: Rob Herring @ 2025-06-30 20:56 UTC (permalink / raw)
To: Tomeu Vizoso
Cc: Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner, Oded Gabbay,
Jonathan Corbet, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
Christian König, Sebastian Reichel, Nicolas Frattaroli,
Kever Yang, Robin Murphy, Daniel Stone, Da Xue, Jeff Hugo,
devicetree, linux-arm-kernel, linux-rockchip, linux-kernel,
dri-devel, linux-doc, linux-media, linaro-mm-sig
On Fri, Jun 6, 2025 at 1:29 AM Tomeu Vizoso <tomeu@tomeuvizoso.net> wrote:
>
> Using the DRM GPU scheduler infrastructure, with a scheduler for each
> core.
>
> Userspace can decide for a series of tasks to be executed sequentially
> in the same core, so SRAM locality can be taken advantage of.
>
> The job submission code was initially based on Panfrost.
>
> v2:
> - Remove hardcoded number of cores
> - Misc. style fixes (Jeffrey Hugo)
> - Repack IOCTL struct (Jeffrey Hugo)
>
> v3:
> - Adapt to a split of the register block in the DT bindings (Nicolas
> Frattaroli)
> - Make use of GPL-2.0-only for the copyright notice (Jeff Hugo)
> - Use drm_* logging functions (Thomas Zimmermann)
> - Rename reg i/o macros (Thomas Zimmermann)
> - Add padding to ioctls and check for zero (Jeff Hugo)
> - Improve error handling (Nicolas Frattaroli)
>
> v6:
> - Use mutexes guard (Markus Elfring)
> - Use u64_to_user_ptr (Jeff Hugo)
> - Drop rocket_fence (Rob Herring)
>
> v7:
> - Assign its own IOMMU domain to each client, for isolation (Daniel
> Stone and Robin Murphy)
>
> Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
> ---
[...]
> --- a/include/uapi/drm/rocket_accel.h
> +++ b/include/uapi/drm/rocket_accel.h
> @@ -12,8 +12,10 @@ extern "C" {
> #endif
>
> #define DRM_ROCKET_CREATE_BO 0x00
> +#define DRM_ROCKET_SUBMIT 0x01
>
> #define DRM_IOCTL_ROCKET_CREATE_BO DRM_IOWR(DRM_COMMAND_BASE + DRM_ROCKET_CREATE_BO, struct drm_rocket_create_bo)
> +#define DRM_IOCTL_ROCKET_SUBMIT DRM_IOW(DRM_COMMAND_BASE + DRM_ROCKET_SUBMIT, struct drm_rocket_submit)
>
> /**
> * struct drm_rocket_create_bo - ioctl argument for creating Rocket BOs.
> @@ -37,6 +39,68 @@ struct drm_rocket_create_bo {
> __u64 offset;
> };
>
> +/**
> + * struct drm_rocket_task - A task to be run on the NPU
> + *
> + * A task is the smallest unit of work that can be run on the NPU.
> + */
> +struct drm_rocket_task {
> + /** Input: DMA address to NPU mapping of register command buffer */
> + __u64 regcmd;
> +
> + /** Input: Number of commands in the register command buffer */
> + __u32 regcmd_count;
> +
> + /** Reserved, must be zero. */
> + __u32 reserved;
> +};
> +
> +/**
> + * struct drm_rocket_job - A job to be run on the NPU
> + *
> + * The kernel will schedule the execution of this job taking into account its
> + * dependencies with other jobs. All tasks in the same job will be executed
> + * sequentially on the same core, to benefit from memory residency in SRAM.
> + */
> +struct drm_rocket_job {
> + /** Input: Pointer to an array of struct drm_rocket_task. */
> + __u64 tasks;
> +
> + /** Input: Pointer to a u32 array of the BOs that are read by the job. */
> + __u64 in_bo_handles;
> +
> + /** Input: Pointer to a u32 array of the BOs that are written to by the job. */
> + __u64 out_bo_handles;
> +
> + /** Input: Number of tasks passed in. */
> + __u32 task_count;
> +
> + /** Input: Number of input BO handles passed in (size is that times 4). */
> + __u32 in_bo_handle_count;
> +
> + /** Input: Number of output BO handles passed in (size is that times 4). */
> + __u32 out_bo_handle_count;
> +
> + /** Reserved, must be zero. */
> + __u32 reserved;
> +};
> +
> +/**
> + * struct drm_rocket_submit - ioctl argument for submitting commands to the NPU.
> + *
> + * The kernel will schedule the execution of these jobs in dependency order.
> + */
> +struct drm_rocket_submit {
> + /** Input: Pointer to an array of struct drm_rocket_job. */
> + __u64 jobs;
> +
> + /** Input: Number of jobs passed in. */
> + __u32 job_count;
Isn't there a problem if you need to expand drm_rocket_job beyond
using the 1 reserved field? You can't add to the struct because then
you don't know the size here. So you have to modify drm_rocket_submit
to modify drm_rocket_job. Maybe better if you plan for that now rather
than later by making the size explicit.
Though etnaviv at least has similar issues.
Rob
> +
> + /** Reserved, must be zero. */
> + __u32 reserved;
> +};
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v7 03/10] accel/rocket: Add IOCTL for BO creation
2025-06-06 6:28 ` [PATCH v7 03/10] accel/rocket: Add IOCTL for BO creation Tomeu Vizoso
2025-06-24 10:17 ` Robin Murphy
@ 2025-07-01 19:31 ` Heiko Stübner
2025-07-11 17:40 ` Andrew Davis
2 siblings, 0 replies; 30+ messages in thread
From: Heiko Stübner @ 2025-07-01 19:31 UTC (permalink / raw)
To: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Oded Gabbay,
Jonathan Corbet, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
Christian König, Sebastian Reichel, Nicolas Frattaroli,
Kever Yang, Robin Murphy, Daniel Stone, Da Xue, Jeff Hugo,
Tomeu Vizoso
Cc: devicetree, linux-arm-kernel, linux-rockchip, linux-kernel,
dri-devel, linux-doc, linux-media, linaro-mm-sig, Tomeu Vizoso
Am Freitag, 6. Juni 2025, 08:28:23 Mitteleuropäische Sommerzeit schrieb Tomeu Vizoso:
> This uses the SHMEM DRM helpers and we map right away to the CPU and NPU
> sides, as all buffers are expected to be accessed from both.
>
> v2:
> - Sync the IOMMUs for the other cores when mapping and unmapping.
>
> v3:
> - Make use of GPL-2.0-only for the copyright notice (Jeff Hugo)
>
> v6:
> - Use mutexes guard (Markus Elfring)
>
> v7:
> - Assign its own IOMMU domain to each client, for isolation (Daniel
> Stone and Robin Murphy)
>
> Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
> Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
> ---
> diff --git a/drivers/accel/rocket/rocket_gem.c b/drivers/accel/rocket/rocket_gem.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..61b7f970a6885aa13784daa1222611a02aa10dee
> --- /dev/null
> +++ b/drivers/accel/rocket/rocket_gem.c
> @@ -0,0 +1,115 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/* Copyright 2024-2025 Tomeu Vizoso <tomeu@tomeuvizoso.net> */
> +
> +#include <drm/drm_device.h>
> +#include <drm/drm_utils.h>
> +#include <drm/rocket_accel.h>
> +#include <linux/dma-mapping.h>
> +#include <linux/iommu.h>
> +
> +#include "rocket_device.h"
> +#include "rocket_drv.h"
> +#include "rocket_gem.h"
> +
> +static void rocket_gem_bo_free(struct drm_gem_object *obj)
> +{
> + struct rocket_device *rdev = to_rocket_device(obj->dev);
> + struct rocket_gem_object *bo = to_rocket_bo(obj);
> + size_t unmapped;
> +
> + drm_WARN_ON(obj->dev, bo->base.pages_use_count > 1);
This should probably be
drm_WARN_ON(obj->dev, refcount_read(&bo->base.pages_use_count) > 1);
as pages_use_count is of type refcount_t since
commit 051b6646d36d ("drm/shmem-helper: Use refcount_t for pages_use_count")
Heiko
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v7 04/10] accel/rocket: Add job submission IOCTL
2025-06-24 13:50 ` Robin Murphy
@ 2025-07-11 16:00 ` Tomeu Vizoso
2025-07-11 16:40 ` Robin Murphy
0 siblings, 1 reply; 30+ messages in thread
From: Tomeu Vizoso @ 2025-07-11 16:00 UTC (permalink / raw)
To: Robin Murphy
Cc: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner,
Oded Gabbay, Jonathan Corbet, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
Christian König, Sebastian Reichel, Nicolas Frattaroli,
Kever Yang, Daniel Stone, Da Xue, Jeff Hugo, devicetree,
linux-arm-kernel, linux-rockchip, linux-kernel, dri-devel,
linux-doc, linux-media, linaro-mm-sig
On Tue, Jun 24, 2025 at 3:50 PM Robin Murphy <robin.murphy@arm.com> wrote:
>
> On 2025-06-06 7:28 am, Tomeu Vizoso wrote:
> [...]
> > diff --git a/drivers/accel/rocket/rocket_device.h b/drivers/accel/rocket/rocket_device.h
> > index 10acfe8534f00a7985d40a93f4b2f7f69d43caee..50e46f0516bd1615b5f826c5002a6c0ecbf9aed4 100644
> > --- a/drivers/accel/rocket/rocket_device.h
> > +++ b/drivers/accel/rocket/rocket_device.h
> > @@ -13,6 +13,8 @@
> > struct rocket_device {
> > struct drm_device ddev;
> >
> > + struct mutex sched_lock;
> > +
> > struct mutex iommu_lock;
>
> Just realised I missed this in the last patch, but iommu_lock appears to
> be completely unnecessary now.
>
> > struct rocket_core *cores;
> [...]
> > +static void rocket_job_hw_submit(struct rocket_core *core, struct rocket_job *job)
> > +{
> > + struct rocket_task *task;
> > + bool task_pp_en = 1;
> > + bool task_count = 1;
> > +
> > + /* GO ! */
> > +
> > + /* Don't queue the job if a reset is in progress */
> > + if (atomic_read(&core->reset.pending))
> > + return;
> > +
> > + task = &job->tasks[job->next_task_idx];
> > + job->next_task_idx++;
> > +
> > + rocket_pc_writel(core, BASE_ADDRESS, 0x1);
> > +
> > + rocket_cna_writel(core, S_POINTER, 0xe + 0x10000000 * core->index);
> > + rocket_core_writel(core, S_POINTER, 0xe + 0x10000000 * core->index);
>
> Those really look like bitfield operations rather than actual arithmetic
> to me.
>
> > +
> > + rocket_pc_writel(core, BASE_ADDRESS, task->regcmd);
>
> I don't see how regcmd is created (I guess that's in userspace?), but
> given that it's explicitly u64 all the way through - and especially
> since you claim to support 40-bit DMA addresses - it definitely seems
> suspicious that the upper 32 bits never seem to be consumed anywhere :/
Yeah, but there's no other register for BASE_ADDRESS address in the TRM.
> > + rocket_pc_writel(core, REGISTER_AMOUNTS, (task->regcmd_count + 1) / 2 - 1);
> > +
> > + rocket_pc_writel(core, INTERRUPT_MASK, PC_INTERRUPT_MASK_DPU_0 | PC_INTERRUPT_MASK_DPU_1);
> > + rocket_pc_writel(core, INTERRUPT_CLEAR, PC_INTERRUPT_CLEAR_DPU_0 | PC_INTERRUPT_CLEAR_DPU_1);
> > +
> > + rocket_pc_writel(core, TASK_CON, ((0x6 | task_pp_en) << 12) | task_count);
> > +
> > + rocket_pc_writel(core, TASK_DMA_BASE_ADDR, 0x0);
> > +
> > + rocket_pc_writel(core, OPERATION_ENABLE, 0x1);
> > +
> > + dev_dbg(core->dev, "Submitted regcmd at 0x%llx to core %d", task->regcmd, core->index);
> > +}
> [...]
> > +static struct dma_fence *rocket_job_run(struct drm_sched_job *sched_job)
> > +{
> > + struct rocket_job *job = to_rocket_job(sched_job);
> > + struct rocket_device *rdev = job->rdev;
> > + struct rocket_core *core = sched_to_core(rdev, sched_job->sched);
> > + struct dma_fence *fence = NULL;
> > + int ret;
> > +
> > + if (unlikely(job->base.s_fence->finished.error))
> > + return NULL;
> > +
> > + /*
> > + * Nothing to execute: can happen if the job has finished while
> > + * we were resetting the GPU.
>
> GPU? (Similarly in various other comments/prints)
>
> > + */
> > + if (job->next_task_idx == job->task_count)
> > + return NULL;
> > +
> > + fence = rocket_fence_create(core);
> > + if (IS_ERR(fence))
> > + return fence;
> > +
> > + if (job->done_fence)
> > + dma_fence_put(job->done_fence);
> > + job->done_fence = dma_fence_get(fence);
> > +
> > + ret = pm_runtime_get_sync(core->dev);
> > + if (ret < 0)
> > + return fence;
> > +
> > + ret = iommu_attach_group(job->domain, iommu_group_get(core->dev));
>
> I don't see iommu_group_put() anywhere, so you're leaking refcounts all
> over.
>
> > + if (ret < 0)
> > + return fence;
> > +
> > + scoped_guard(spinlock, &core->job_lock) {
> > + core->in_flight_job = job;
> > + rocket_job_hw_submit(core, job);
> > + }
> > +
> > + return fence;
> > +}
> [...]
> > +static void rocket_job_handle_irq(struct rocket_core *core)
> > +{
> > + u32 status, raw_status;
> > +
> > + pm_runtime_mark_last_busy(core->dev);
> > +
> > + status = rocket_pc_readl(core, INTERRUPT_STATUS);
> > + raw_status = rocket_pc_readl(core, INTERRUPT_RAW_STATUS);
> > +
> > + rocket_pc_writel(core, OPERATION_ENABLE, 0x0);
> > + rocket_pc_writel(core, INTERRUPT_CLEAR, 0x1ffff);
>
> What was the point of reading the status registers if we're just going
> to blindly clear every possible condition anyway?
>
> > + scoped_guard(spinlock, &core->job_lock)
> > + if (core->in_flight_job)
> > + rocket_job_handle_done(core, core->in_flight_job);
>
> But then is it really OK to just start the next task regardless of
> whether the current task was reporting successful completion or an error?
>
> > +}
> > +
> > +static void
> > +rocket_reset(struct rocket_core *core, struct drm_sched_job *bad)
> > +{
> > + bool cookie;
> > +
> > + if (!atomic_read(&core->reset.pending))
> > + return;
> > +
> > + /*
> > + * Stop the scheduler.
> > + *
> > + * FIXME: We temporarily get out of the dma_fence_signalling section
> > + * because the cleanup path generate lockdep splats when taking locks
> > + * to release job resources. We should rework the code to follow this
> > + * pattern:
> > + *
> > + * try_lock
> > + * if (locked)
> > + * release
> > + * else
> > + * schedule_work_to_release_later
> > + */
> > + drm_sched_stop(&core->sched, bad);
> > +
> > + cookie = dma_fence_begin_signalling();
> > +
> > + if (bad)
> > + drm_sched_increase_karma(bad);
> > +
> > + /*
> > + * Mask job interrupts and synchronize to make sure we won't be
> > + * interrupted during our reset.
> > + */
> > + rocket_pc_writel(core, INTERRUPT_MASK, 0x0);
> > + synchronize_irq(core->irq);
>
> ...except it's a shared IRQ, so it can still merrily fire at any time.
>
> > +
> > + /* Handle the remaining interrupts before we reset. */
> > + rocket_job_handle_irq(core);
> > +
> > + /*
> > + * Remaining interrupts have been handled, but we might still have
> > + * stuck jobs. Let's make sure the PM counters stay balanced by
> > + * manually calling pm_runtime_put_noidle() and
> > + * rocket_devfreq_record_idle() for each stuck job.
> > + * Let's also make sure the cycle counting register's refcnt is
> > + * kept balanced to prevent it from running forever
>
> Comments that don't match the code are more confusing than helpful :/
>
> > + */
> > + scoped_guard(spinlock, &core->job_lock) {
> > + if (core->in_flight_job)
> > + pm_runtime_put_noidle(core->dev);
> > +
> > + core->in_flight_job = NULL;
> > + }
> > +
> > + /* Proceed with reset now. */
> > + pm_runtime_force_suspend(core->dev);
> > + pm_runtime_force_resume(core->dev);
>
> Can you guarantee that actually resets the hardware if something else is
> holding the power domain open or RPM is disabled? I'm not familiar with
> the details of drm_sched, but if there are other jobs queued behind the
> stuck one would it even pass the rocket_job_is_idle() check for suspend
> to succeed anyway?
>
> Not to mention that you have an actual reset control in the DT binding,
> which isn't even optional... :/
Yeah, I think in my testing the resets didn't work because we didn't
have the right resources in the DT nodes. Have retested and the
documented reset process is enough to reset each core.
Will be sending soon a revised version.
Thanks,
Tomeu
> > + /* GPU has been reset, we can clear the reset pending bit. */
> > + atomic_set(&core->reset.pending, 0);
> > +
> > + /*
> > + * Now resubmit jobs that were previously queued but didn't have a
> > + * chance to finish.
> > + * FIXME: We temporarily get out of the DMA fence signalling section
> > + * while resubmitting jobs because the job submission logic will
> > + * allocate memory with the GFP_KERNEL flag which can trigger memory
> > + * reclaim and exposes a lock ordering issue.
> > + */
> > + dma_fence_end_signalling(cookie);
> > + drm_sched_resubmit_jobs(&core->sched);
>
> Since I happened to look, this says it's deprecated?
>
> > + cookie = dma_fence_begin_signalling();
> > +
> > + /* Restart the scheduler */
> > + drm_sched_start(&core->sched, 0);
> > +
> > + dma_fence_end_signalling(cookie);
> > +}
> > +
> > +static enum drm_gpu_sched_stat rocket_job_timedout(struct drm_sched_job *sched_job)
> > +{
> > + struct rocket_job *job = to_rocket_job(sched_job);
> > + struct rocket_device *rdev = job->rdev;
> > + struct rocket_core *core = sched_to_core(rdev, sched_job->sched);
> > +
> > + /*
> > + * If the GPU managed to complete this jobs fence, the timeout is
> > + * spurious. Bail out.
> > + */
> > + if (dma_fence_is_signaled(job->done_fence))
> > + return DRM_GPU_SCHED_STAT_NOMINAL;
>
> Do we really need the same return condition twice? What if the IRQ fires
> immediately after we've made this check, and is handled without delay
> such that sychronize_irq() effectively still does nothing? Either way
> we've taken longer than the timeout value to observe the job completing
> successfully, and either that's significant and worth warning about or
> it's not - I don't see any point in trying to (inaccurately) nitpick
> *why* it might have happened.
>
> > + /*
> > + * Rocket IRQ handler may take a long time to process an interrupt
> > + * if there is another IRQ handler hogging the processing.
> > + * For example, the HDMI encoder driver might be stuck in the IRQ
> > + * handler for a significant time in a case of bad cable connection.
>
> What have HDMI cables got to do with anything here? Yes, in general IRQ
> latency can be high, since CPUs can have IRQs masked and/or be taking
> higher-priority interrupts for any number of reasons. I don't see how an
> oddly-specific example (of apparently poor driver design, to boot) is
> useful.
>
> > + * In order to catch such cases and not report spurious rocket
> > + * job timeouts, synchronize the IRQ handler and re-check the fence
> > + * status.
> > + */
> > + synchronize_irq(core->irq);
> > +
> > + if (dma_fence_is_signaled(job->done_fence)) {
> > + dev_warn(core->dev, "unexpectedly high interrupt latency\n");
> > + return DRM_GPU_SCHED_STAT_NOMINAL;
> > + }
> > +
> > + dev_err(core->dev, "gpu sched timeout");
> > +
> > + atomic_set(&core->reset.pending, 1);
> > + rocket_reset(core, sched_job);
> > + iommu_detach_group(NULL, iommu_group_get(core->dev));
> > +
> > + return DRM_GPU_SCHED_STAT_NOMINAL;
> > +}
> > +
> > +static void rocket_reset_work(struct work_struct *work)
> > +{
> > + struct rocket_core *core;
> > +
> > + core = container_of(work, struct rocket_core, reset.work);
> > + rocket_reset(core, NULL);
> > +}
> > +
> > +static const struct drm_sched_backend_ops rocket_sched_ops = {
> > + .run_job = rocket_job_run,
> > + .timedout_job = rocket_job_timedout,
> > + .free_job = rocket_job_free
> > +};
> > +
> > +static irqreturn_t rocket_job_irq_handler_thread(int irq, void *data)
> > +{
> > + struct rocket_core *core = data;
> > +
> > + rocket_job_handle_irq(core);
> > +
> > + return IRQ_HANDLED;
> > +}
> > +
> > +static irqreturn_t rocket_job_irq_handler(int irq, void *data)
> > +{
> > + struct rocket_core *core = data;
> > + u32 raw_status = rocket_pc_readl(core, INTERRUPT_RAW_STATUS);
>
> Given that this can be a shared IRQ as above, it would be a good idea to
> take care to avoid register accesses while suspended. Especially if
> you're trying to utilise suspend to reset a failing job that may well be
> throwing IOMMU faults.
>
> > +
> > + WARN_ON(raw_status & PC_INTERRUPT_RAW_STATUS_DMA_READ_ERROR);
> > + WARN_ON(raw_status & PC_INTERRUPT_RAW_STATUS_DMA_READ_ERROR);
> > +
> > + if (!(raw_status & PC_INTERRUPT_RAW_STATUS_DPU_0 ||
> > + raw_status & PC_INTERRUPT_RAW_STATUS_DPU_1))
> > + return IRQ_NONE;
> > +
> > + rocket_pc_writel(core, INTERRUPT_MASK, 0x0);
> > +
> > + return IRQ_WAKE_THREAD;
> > +}
> > +
> > +int rocket_job_init(struct rocket_core *core)
> > +{
> > + struct drm_sched_init_args args = {
> > + .ops = &rocket_sched_ops,
> > + .num_rqs = DRM_SCHED_PRIORITY_COUNT,
> > + .credit_limit = 1,
>
> Ah, does this mean that all the stuff about queued jobs was in fact all
> nonsense anyway?
>
> > + .timeout = msecs_to_jiffies(JOB_TIMEOUT_MS),
> > + .name = dev_name(core->dev),
> > + .dev = core->dev,
> > + };
> > + int ret;
> > +
> > + INIT_WORK(&core->reset.work, rocket_reset_work);
> > + spin_lock_init(&core->job_lock);
> > +
> > + core->irq = platform_get_irq(to_platform_device(core->dev), 0);
> > + if (core->irq < 0)
> > + return core->irq;
> > +
> > + ret = devm_request_threaded_irq(core->dev, core->irq,
> > + rocket_job_irq_handler,
> > + rocket_job_irq_handler_thread,
> > + IRQF_SHARED, KBUILD_MODNAME "-job",
>
> Is it really a "job" interrupt though? The binding and the register
> definitions suggest it's just a general status interrupt for the core.
> Furthermore since we expect to have multiple cores, being able to more
> easily identify and attribute per-core IRQ activity seems more useful
> for debugging than copy-pasting from something really rather different
> which also expects to be the only one of its kind on the system.
>
> Thanks,
> Robin.
>
> > + core);
> > + if (ret) {
> > + dev_err(core->dev, "failed to request job irq");
> > + return ret;
> > + }
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v7 06/10] dt-bindings: npu: rockchip,rknn: Add bindings
2025-06-24 13:27 ` Robin Murphy
@ 2025-07-11 16:02 ` Tomeu Vizoso
0 siblings, 0 replies; 30+ messages in thread
From: Tomeu Vizoso @ 2025-07-11 16:02 UTC (permalink / raw)
To: Robin Murphy
Cc: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner,
Oded Gabbay, Jonathan Corbet, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
Christian König, Sebastian Reichel, Nicolas Frattaroli,
Kever Yang, Daniel Stone, Da Xue, Jeff Hugo, devicetree,
linux-arm-kernel, linux-rockchip, linux-kernel, dri-devel,
linux-doc, linux-media, linaro-mm-sig, Krzysztof Kozlowski
On Tue, Jun 24, 2025 at 3:27 PM Robin Murphy <robin.murphy@arm.com> wrote:
>
> On 2025-06-06 7:28 am, Tomeu Vizoso wrote:
> > Add the bindings for the Neural Processing Unit IP from Rockchip.
> >
> > v2:
> > - Adapt to new node structure (one node per core, each with its own
> > IOMMU)
> > - Several misc. fixes from Sebastian Reichel
> >
> > v3:
> > - Split register block in its constituent subblocks, and only require
> > the ones that the kernel would ever use (Nicolas Frattaroli)
> > - Group supplies (Rob Herring)
> > - Explain the way in which the top core is special (Rob Herring)
> >
> > v4:
> > - Change required node name to npu@ (Rob Herring and Krzysztof Kozlowski)
> > - Remove unneeded items: (Krzysztof Kozlowski)
> > - Fix use of minItems/maxItems (Krzysztof Kozlowski)
> > - Add reg-names to list of required properties (Krzysztof Kozlowski)
> > - Fix example (Krzysztof Kozlowski)
> >
> > v5:
> > - Rename file to rockchip,rk3588-rknn-core.yaml (Krzysztof Kozlowski)
> > - Streamline compatible property (Krzysztof Kozlowski)
> >
> > v6:
> > - Remove mention to NVDLA, as the hardware is only incidentally related
> > (Kever Yang)
> > - Mark pclk and npu clocks as required by all clocks (Rob Herring)
> >
> > v7:
> > - Remove allOf section, not needed now that all nodes require 4 clocks
> > (Heiko Stübner)
> >
> > Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
> > Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
> > Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
> > ---
> > .../bindings/npu/rockchip,rk3588-rknn-core.yaml | 118 +++++++++++++++++++++
> > 1 file changed, 118 insertions(+)
> >
> > diff --git a/Documentation/devicetree/bindings/npu/rockchip,rk3588-rknn-core.yaml b/Documentation/devicetree/bindings/npu/rockchip,rk3588-rknn-core.yaml
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..0588c085a723a34f4fa30a9680ea948d960b092f
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/npu/rockchip,rk3588-rknn-core.yaml
> > @@ -0,0 +1,118 @@
> > +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> > +%YAML 1.2
> > +---
> > +$id: http://devicetree.org/schemas/npu/rockchip,rk3588-rknn-core.yaml#
> > +$schema: http://devicetree.org/meta-schemas/core.yaml#
> > +
> > +title: Neural Processing Unit IP from Rockchip
> > +
> > +maintainers:
> > + - Tomeu Vizoso <tomeu@tomeuvizoso.net>
> > +
> > +description:
> > + Rockchip IP for accelerating inference of neural networks.
> > +
> > + There is to be a node per each core in the NPU. In Rockchip's design there
> > + will be one core that is special because it is able to redistribute work to
> > + the other cores by forwarding register writes and sharing data. This special
> > + core is called the top core and should have the compatible string that
> > + corresponds to top cores.
>
> Say a future SoC, for scaling reasons, puts down two or more whole NPUs
> rather than just increasing the number of sub-cores in one? How is a DT
> consumer then going to know which "cores" are associated with which "top
> cores"? I think at the very least they want phandles in one direction or
> the other, but if there is a real functional hierarchy then I'd be
> strongly tempted to have the "core" nodes as children of their "top
> core", particularly since "forwarding register writes" sounds absolutely
> like something which could justify being represented as a "bus" in the
> DT sense.
Actually, I experimented with having the three cores as completely
independent units and things just work with how resources are
referenced in the DT nodes.
So I'm just having a top-level node per core with the same compatible.
Thanks,
Tomeu
> Thanks,
> Robin.
>
> > +
> > +properties:
> > + $nodename:
> > + pattern: '^npu@[a-f0-9]+$'
> > +
> > + compatible:
> > + enum:
> > + - rockchip,rk3588-rknn-core-top
> > + - rockchip,rk3588-rknn-core
> > +
> > + reg:
> > + maxItems: 3
> > +
> > + reg-names:
> > + items:
> > + - const: pc
> > + - const: cna
> > + - const: core
> > +
> > + clocks:
> > + maxItems: 4
> > +
> > + clock-names:
> > + items:
> > + - const: aclk
> > + - const: hclk
> > + - const: npu
> > + - const: pclk
> > +
> > + interrupts:
> > + maxItems: 1
> > +
> > + iommus:
> > + maxItems: 1
> > +
> > + npu-supply: true
> > +
> > + power-domains:
> > + maxItems: 1
> > +
> > + resets:
> > + maxItems: 2
> > +
> > + reset-names:
> > + items:
> > + - const: srst_a
> > + - const: srst_h
> > +
> > + sram-supply: true
> > +
> > +required:
> > + - compatible
> > + - reg
> > + - reg-names
> > + - clocks
> > + - clock-names
> > + - interrupts
> > + - iommus
> > + - power-domains
> > + - resets
> > + - reset-names
> > + - npu-supply
> > + - sram-supply
> > +
> > +additionalProperties: false
> > +
> > +examples:
> > + - |
> > + #include <dt-bindings/clock/rockchip,rk3588-cru.h>
> > + #include <dt-bindings/interrupt-controller/irq.h>
> > + #include <dt-bindings/interrupt-controller/arm-gic.h>
> > + #include <dt-bindings/power/rk3588-power.h>
> > + #include <dt-bindings/reset/rockchip,rk3588-cru.h>
> > +
> > + bus {
> > + #address-cells = <2>;
> > + #size-cells = <2>;
> > +
> > + npu@fdab0000 {
> > + compatible = "rockchip,rk3588-rknn-core-top";
> > + reg = <0x0 0xfdab0000 0x0 0x1000>,
> > + <0x0 0xfdab1000 0x0 0x1000>,
> > + <0x0 0xfdab3000 0x0 0x1000>;
> > + reg-names = "pc", "cna", "core";
> > + assigned-clocks = <&scmi_clk SCMI_CLK_NPU>;
> > + assigned-clock-rates = <200000000>;
> > + clocks = <&cru ACLK_NPU0>, <&cru HCLK_NPU0>,
> > + <&scmi_clk SCMI_CLK_NPU>, <&cru PCLK_NPU_ROOT>;
> > + clock-names = "aclk", "hclk", "npu", "pclk";
> > + interrupts = <GIC_SPI 110 IRQ_TYPE_LEVEL_HIGH 0>;
> > + iommus = <&rknn_mmu_top>;
> > + npu-supply = <&vdd_npu_s0>;
> > + power-domains = <&power RK3588_PD_NPUTOP>;
> > + resets = <&cru SRST_A_RKNN0>, <&cru SRST_H_RKNN0>;
> > + reset-names = "srst_a", "srst_h";
> > + sram-supply = <&vdd_npu_mem_s0>;
> > + };
> > + };
> > +...
> >
>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v7 04/10] accel/rocket: Add job submission IOCTL
2025-07-11 16:00 ` Tomeu Vizoso
@ 2025-07-11 16:40 ` Robin Murphy
2025-07-11 16:45 ` Tomeu Vizoso
0 siblings, 1 reply; 30+ messages in thread
From: Robin Murphy @ 2025-07-11 16:40 UTC (permalink / raw)
To: Tomeu Vizoso
Cc: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner,
Oded Gabbay, Jonathan Corbet, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
Christian König, Sebastian Reichel, Nicolas Frattaroli,
Kever Yang, Daniel Stone, Da Xue, Jeff Hugo, devicetree,
linux-arm-kernel, linux-rockchip, linux-kernel, dri-devel,
linux-doc, linux-media, linaro-mm-sig
On 11/07/2025 5:00 pm, Tomeu Vizoso wrote:
> On Tue, Jun 24, 2025 at 3:50 PM Robin Murphy <robin.murphy@arm.com> wrote:
>>
>> On 2025-06-06 7:28 am, Tomeu Vizoso wrote:
>> [...]
>>> diff --git a/drivers/accel/rocket/rocket_device.h b/drivers/accel/rocket/rocket_device.h
>>> index 10acfe8534f00a7985d40a93f4b2f7f69d43caee..50e46f0516bd1615b5f826c5002a6c0ecbf9aed4 100644
>>> --- a/drivers/accel/rocket/rocket_device.h
>>> +++ b/drivers/accel/rocket/rocket_device.h
>>> @@ -13,6 +13,8 @@
>>> struct rocket_device {
>>> struct drm_device ddev;
>>>
>>> + struct mutex sched_lock;
>>> +
>>> struct mutex iommu_lock;
>>
>> Just realised I missed this in the last patch, but iommu_lock appears to
>> be completely unnecessary now.
>>
>>> struct rocket_core *cores;
>> [...]
>>> +static void rocket_job_hw_submit(struct rocket_core *core, struct rocket_job *job)
>>> +{
>>> + struct rocket_task *task;
>>> + bool task_pp_en = 1;
>>> + bool task_count = 1;
>>> +
>>> + /* GO ! */
>>> +
>>> + /* Don't queue the job if a reset is in progress */
>>> + if (atomic_read(&core->reset.pending))
>>> + return;
>>> +
>>> + task = &job->tasks[job->next_task_idx];
>>> + job->next_task_idx++;
>>> +
>>> + rocket_pc_writel(core, BASE_ADDRESS, 0x1);
>>> +
>>> + rocket_cna_writel(core, S_POINTER, 0xe + 0x10000000 * core->index);
>>> + rocket_core_writel(core, S_POINTER, 0xe + 0x10000000 * core->index);
>>
>> Those really look like bitfield operations rather than actual arithmetic
>> to me.
>>
>>> +
>>> + rocket_pc_writel(core, BASE_ADDRESS, task->regcmd);
>>
>> I don't see how regcmd is created (I guess that's in userspace?), but
>> given that it's explicitly u64 all the way through - and especially
>> since you claim to support 40-bit DMA addresses - it definitely seems
>> suspicious that the upper 32 bits never seem to be consumed anywhere :/
>
> Yeah, but there's no other register for BASE_ADDRESS address in the TRM.
That only reaffirms the question then - if this value is only ever
written verbatim to a 32-bit register, why is it 64-bit?
Thanks,
Robin.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v7 04/10] accel/rocket: Add job submission IOCTL
2025-07-11 16:40 ` Robin Murphy
@ 2025-07-11 16:45 ` Tomeu Vizoso
0 siblings, 0 replies; 30+ messages in thread
From: Tomeu Vizoso @ 2025-07-11 16:45 UTC (permalink / raw)
To: Robin Murphy
Cc: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner,
Oded Gabbay, Jonathan Corbet, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
Christian König, Sebastian Reichel, Nicolas Frattaroli,
Kever Yang, Daniel Stone, Da Xue, Jeff Hugo, devicetree,
linux-arm-kernel, linux-rockchip, linux-kernel, dri-devel,
linux-doc, linux-media, linaro-mm-sig
On Fri, Jul 11, 2025 at 6:40 PM Robin Murphy <robin.murphy@arm.com> wrote:
>
> On 11/07/2025 5:00 pm, Tomeu Vizoso wrote:
> > On Tue, Jun 24, 2025 at 3:50 PM Robin Murphy <robin.murphy@arm.com> wrote:
> >>
> >> On 2025-06-06 7:28 am, Tomeu Vizoso wrote:
> >> [...]
> >>> diff --git a/drivers/accel/rocket/rocket_device.h b/drivers/accel/rocket/rocket_device.h
> >>> index 10acfe8534f00a7985d40a93f4b2f7f69d43caee..50e46f0516bd1615b5f826c5002a6c0ecbf9aed4 100644
> >>> --- a/drivers/accel/rocket/rocket_device.h
> >>> +++ b/drivers/accel/rocket/rocket_device.h
> >>> @@ -13,6 +13,8 @@
> >>> struct rocket_device {
> >>> struct drm_device ddev;
> >>>
> >>> + struct mutex sched_lock;
> >>> +
> >>> struct mutex iommu_lock;
> >>
> >> Just realised I missed this in the last patch, but iommu_lock appears to
> >> be completely unnecessary now.
> >>
> >>> struct rocket_core *cores;
> >> [...]
> >>> +static void rocket_job_hw_submit(struct rocket_core *core, struct rocket_job *job)
> >>> +{
> >>> + struct rocket_task *task;
> >>> + bool task_pp_en = 1;
> >>> + bool task_count = 1;
> >>> +
> >>> + /* GO ! */
> >>> +
> >>> + /* Don't queue the job if a reset is in progress */
> >>> + if (atomic_read(&core->reset.pending))
> >>> + return;
> >>> +
> >>> + task = &job->tasks[job->next_task_idx];
> >>> + job->next_task_idx++;
> >>> +
> >>> + rocket_pc_writel(core, BASE_ADDRESS, 0x1);
> >>> +
> >>> + rocket_cna_writel(core, S_POINTER, 0xe + 0x10000000 * core->index);
> >>> + rocket_core_writel(core, S_POINTER, 0xe + 0x10000000 * core->index);
> >>
> >> Those really look like bitfield operations rather than actual arithmetic
> >> to me.
> >>
> >>> +
> >>> + rocket_pc_writel(core, BASE_ADDRESS, task->regcmd);
> >>
> >> I don't see how regcmd is created (I guess that's in userspace?), but
> >> given that it's explicitly u64 all the way through - and especially
> >> since you claim to support 40-bit DMA addresses - it definitely seems
> >> suspicious that the upper 32 bits never seem to be consumed anywhere :/
> >
> > Yeah, but there's no other register for BASE_ADDRESS address in the TRM.
>
> That only reaffirms the question then - if this value is only ever
> written verbatim to a 32-bit register, why is it 64-bit?
Ah, sure, it will be 32-bit in v8.
Thanks,
Tomeu
> Thanks,
> Robin.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v7 02/10] accel/rocket: Add a new driver for Rockchip's NPU
2025-06-06 6:28 ` [PATCH v7 02/10] accel/rocket: Add a new driver for Rockchip's NPU Tomeu Vizoso
2025-06-19 8:59 ` Robert Foss
@ 2025-07-11 17:38 ` Andrew Davis
2025-07-12 8:10 ` Tomeu Vizoso
1 sibling, 1 reply; 30+ messages in thread
From: Andrew Davis @ 2025-07-11 17:38 UTC (permalink / raw)
To: Tomeu Vizoso, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Heiko Stuebner, Oded Gabbay, Jonathan Corbet, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
Sumit Semwal, Christian König, Sebastian Reichel,
Nicolas Frattaroli, Kever Yang, Robin Murphy, Daniel Stone,
Da Xue, Jeff Hugo
Cc: devicetree, linux-arm-kernel, linux-rockchip, linux-kernel,
dri-devel, linux-doc, linux-media, linaro-mm-sig
On 6/6/25 1:28 AM, Tomeu Vizoso wrote:
> This initial version supports the NPU as shipped in the RK3588 SoC and
> described in the first part of its TRM, in Chapter 36.
>
> This NPU contains 3 independent cores that the driver can submit jobs
> to.
>
> This commit adds just hardware initialization and power management.
>
> v2:
> - Split cores and IOMMUs as independent devices (Sebastian Reichel)
> - Add some documentation (Jeffrey Hugo)
> - Be more explicit in the Kconfig documentation (Jeffrey Hugo)
> - Remove resets, as these haven't been found useful so far (Zenghui Yu)
> - Repack structs (Jeffrey Hugo)
> - Use DEFINE_DRM_ACCEL_FOPS (Jeffrey Hugo)
> - Use devm_drm_dev_alloc (Jeffrey Hugo)
> - Use probe log helper (Jeffrey Hugo)
> - Introduce UABI header in a later patch (Jeffrey Hugo)
>
> v3:
> - Adapt to a split of the register block in the DT bindings (Nicolas
> Frattaroli)
> - Move registers header to its own commit (Thomas Zimmermann)
> - Misc. cleanups (Thomas Zimmermann and Jeff Hugo)
> - Make use of GPL-2.0-only for the copyright notice (Jeff Hugo)
> - PM improvements (Nicolas Frattaroli)
>
> v4:
> - Use bulk clk API (Krzysztof Kozlowski)
>
> v6:
> - Remove mention to NVDLA, as the hardware is only incidentally related
> (Kever Yang)
> - Use calloc instead of GFP_ZERO (Jeff Hugo)
> - Explicitly include linux/container_of.h (Jeff Hugo)
> - pclk and npu clocks are now needed by all cores (Rob Herring)
>
> v7:
> - Assign its own IOMMU domain to each client, for isolation (Daniel
> Stone and Robin Murphy)
>
> Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
> ---
> Documentation/accel/index.rst | 1 +
> Documentation/accel/rocket/index.rst | 19 +++
> MAINTAINERS | 10 ++
> drivers/accel/Kconfig | 1 +
> drivers/accel/Makefile | 1 +
> drivers/accel/rocket/Kconfig | 25 ++++
> drivers/accel/rocket/Makefile | 8 +
> drivers/accel/rocket/rocket_core.c | 70 +++++++++
> drivers/accel/rocket/rocket_core.h | 45 ++++++
> drivers/accel/rocket/rocket_device.c | 25 ++++
> drivers/accel/rocket/rocket_device.h | 26 ++++
> drivers/accel/rocket/rocket_drv.c | 279 +++++++++++++++++++++++++++++++++++
> drivers/accel/rocket/rocket_drv.h | 15 ++
> 13 files changed, 525 insertions(+)
>
> diff --git a/Documentation/accel/index.rst b/Documentation/accel/index.rst
> index bc85f26533d88891dde482f91e26c99991b22869..d8fa332d60a890dbb617454d2a26d9b6f9b196aa 100644
> --- a/Documentation/accel/index.rst
> +++ b/Documentation/accel/index.rst
> @@ -10,6 +10,7 @@ Compute Accelerators
> introduction
> amdxdna/index
> qaic/index
> + rocket/index
>
> .. only:: subproject and html
>
> diff --git a/Documentation/accel/rocket/index.rst b/Documentation/accel/rocket/index.rst
> new file mode 100644
> index 0000000000000000000000000000000000000000..300eb3aeab1d8c6514c65af4d216b2d5a1669131
> --- /dev/null
> +++ b/Documentation/accel/rocket/index.rst
> @@ -0,0 +1,19 @@
> +.. SPDX-License-Identifier: GPL-2.0-only
> +
> +=====================================
> + accel/rocket Rockchip NPU driver
> +=====================================
> +
> +The accel/rocket driver supports the Neural Processing Units (NPUs) inside some
> +Rockchip SoCs such as the RK3588. Rockchip calls it RKNN and sometimes RKNPU.
> +
> +The hardware is described in chapter 36 in the RK3588 TRM.
> +
> +This driver just powers the hardware on and off, allocates and maps buffers to
> +the device and submits jobs to the frontend unit. Everything else is done in
> +userspace, as a Gallium driver (also called rocket) that is part of the Mesa3D
> +project.
> +
> +Hardware currently supported:
> +
> +* RK3588
> \ No newline at end of file
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 96b82704950184bd71623ff41fc4df31e4c7fe87..2d8833bf1f2db06ca624d703f19066adab2f9fde 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -7263,6 +7263,16 @@ T: git https://gitlab.freedesktop.org/drm/misc/kernel.git
> F: drivers/accel/ivpu/
> F: include/uapi/drm/ivpu_accel.h
>
> +DRM ACCEL DRIVER FOR ROCKCHIP NPU
> +M: Tomeu Vizoso <tomeu@tomeuvizoso.net>
> +L: dri-devel@lists.freedesktop.org
> +S: Supported
> +T: git https://gitlab.freedesktop.org/drm/misc/kernel.git
> +F: Documentation/accel/rocket/
> +F: Documentation/devicetree/bindings/npu/rockchip,rknn-core.yaml
> +F: drivers/accel/rocket/
> +F: include/uapi/drm/rocket_accel.h
> +
> DRM COMPUTE ACCELERATORS DRIVERS AND FRAMEWORK
> M: Oded Gabbay <ogabbay@kernel.org>
> L: dri-devel@lists.freedesktop.org
> diff --git a/drivers/accel/Kconfig b/drivers/accel/Kconfig
> index 5b9490367a39fd12d35a8d9021768aa186c09308..bb01cebc42bf16ebf02e938040f339ff94869e33 100644
> --- a/drivers/accel/Kconfig
> +++ b/drivers/accel/Kconfig
> @@ -28,5 +28,6 @@ source "drivers/accel/amdxdna/Kconfig"
> source "drivers/accel/habanalabs/Kconfig"
> source "drivers/accel/ivpu/Kconfig"
> source "drivers/accel/qaic/Kconfig"
> +source "drivers/accel/rocket/Kconfig"
>
> endif
> diff --git a/drivers/accel/Makefile b/drivers/accel/Makefile
> index a301fb6089d4c515430175c5e2ba9190f6dc9158..ffc3fa58866616d933184a7659573cd4d4780a8d 100644
> --- a/drivers/accel/Makefile
> +++ b/drivers/accel/Makefile
> @@ -4,3 +4,4 @@ obj-$(CONFIG_DRM_ACCEL_AMDXDNA) += amdxdna/
> obj-$(CONFIG_DRM_ACCEL_HABANALABS) += habanalabs/
> obj-$(CONFIG_DRM_ACCEL_IVPU) += ivpu/
> obj-$(CONFIG_DRM_ACCEL_QAIC) += qaic/
> +obj-$(CONFIG_DRM_ACCEL_ROCKET) += rocket/
> \ No newline at end of file
Couple of these no newline warnings
> diff --git a/drivers/accel/rocket/Kconfig b/drivers/accel/rocket/Kconfig
> new file mode 100644
> index 0000000000000000000000000000000000000000..9a59c6c61bf4d6460d8008b16331f001c97de67d
> --- /dev/null
> +++ b/drivers/accel/rocket/Kconfig
> @@ -0,0 +1,25 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +
> +config DRM_ACCEL_ROCKET
> + tristate "Rocket (support for Rockchip NPUs)"
> + depends on DRM
> + depends on ARM64 || COMPILE_TEST
Should this be more specific for now ARCH_ROCKCHIP?
> + depends on MMU
> + select DRM_SCHED
> + select IOMMU_SUPPORT
> + select IOMMU_IO_PGTABLE_LPAE
> + select DRM_GEM_SHMEM_HELPER
> + help
> + Choose this option if you have a Rockchip SoC that contains a
> + compatible Neural Processing Unit (NPU), such as the RK3588. Called by
> + Rockchip either RKNN or RKNPU, it accelerates inference of neural
> + networks.
> +
> + The interface exposed to userspace is described in
> + include/uapi/drm/rocket_accel.h and is used by the Rocket userspace
> + driver in Mesa3D.
> +
> + If unsure, say N.
> +
> + To compile this driver as a module, choose M here: the
> + module will be called rocket.
> diff --git a/drivers/accel/rocket/Makefile b/drivers/accel/rocket/Makefile
> new file mode 100644
> index 0000000000000000000000000000000000000000..abdd75f2492eaecf8bf5e78a2ac150ea19ac3e96
> --- /dev/null
> +++ b/drivers/accel/rocket/Makefile
> @@ -0,0 +1,8 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +
> +obj-$(CONFIG_DRM_ACCEL_ROCKET) := rocket.o
> +
> +rocket-y := \
> + rocket_core.o \
> + rocket_device.o \
> + rocket_drv.o
> diff --git a/drivers/accel/rocket/rocket_core.c b/drivers/accel/rocket/rocket_core.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..3a6f25f2b4103075102739588bcdad96510e2a4e
> --- /dev/null
> +++ b/drivers/accel/rocket/rocket_core.c
> @@ -0,0 +1,70 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/* Copyright 2024-2025 Tomeu Vizoso <tomeu@tomeuvizoso.net> */
> +
> +#include <linux/clk.h>
> +#include <linux/dev_printk.h>
> +#include <linux/err.h>
> +#include <linux/platform_device.h>
> +#include <linux/pm_runtime.h>
> +
> +#include "rocket_core.h"
> +
> +int rocket_core_init(struct rocket_core *core)
> +{
> + struct device *dev = core->dev;
> + struct platform_device *pdev = to_platform_device(dev);
> + u32 version;
> + int err = 0;
> +
> + err = devm_clk_bulk_get(dev, ARRAY_SIZE(core->clks), core->clks);
> + if (err)
> + return dev_err_probe(dev, err, "failed to get clocks for core %d\n", core->index);
> +
> + core->pc_iomem = devm_platform_ioremap_resource_byname(pdev, "pc");
> + if (IS_ERR(core->pc_iomem)) {
> + dev_err(dev, "couldn't find PC registers %ld\n", PTR_ERR(core->pc_iomem));
> + return PTR_ERR(core->pc_iomem);
> + }
> +
> + core->cna_iomem = devm_platform_ioremap_resource_byname(pdev, "cna");
> + if (IS_ERR(core->cna_iomem)) {
> + dev_err(dev, "couldn't find CNA registers %ld\n", PTR_ERR(core->cna_iomem));
> + return PTR_ERR(core->cna_iomem);
> + }
> +
> + core->core_iomem = devm_platform_ioremap_resource_byname(pdev, "core");
> + if (IS_ERR(core->core_iomem)) {
> + dev_err(dev, "couldn't find CORE registers %ld\n", PTR_ERR(core->core_iomem));
> + return PTR_ERR(core->core_iomem);
> + }
> +
> + pm_runtime_use_autosuspend(dev);
> +
> + /*
> + * As this NPU will be most often used as part of a media pipeline that
> + * ends presenting in a display, choose 50 ms (~3 frames at 60Hz) as an
> + * autosuspend delay as that will keep the device powered up while the
> + * pipeline is running.
> + */
> + pm_runtime_set_autosuspend_delay(dev, 50);
> +
> + pm_runtime_enable(dev);
devm_pm_runtime_enable(dev) here would take care of both functions
in rocket_core_fini() so you wouldn't need that and can cleanup
some return paths here.
Andrew
> +
> + err = pm_runtime_get_sync(dev);
> +
> + version = rocket_pc_readl(core, VERSION);
> + version += rocket_pc_readl(core, VERSION_NUM) & 0xffff;
> +
> + pm_runtime_mark_last_busy(dev);
> + pm_runtime_put_autosuspend(dev);
> +
> + dev_info(dev, "Rockchip NPU core %d version: %d\n", core->index, version);
> +
> + return 0;
> +}
> +
> +void rocket_core_fini(struct rocket_core *core)
> +{
> + pm_runtime_dont_use_autosuspend(core->dev);
> + pm_runtime_disable(core->dev);
> +}
> diff --git a/drivers/accel/rocket/rocket_core.h b/drivers/accel/rocket/rocket_core.h
> new file mode 100644
> index 0000000000000000000000000000000000000000..1b1beb9798f03ec2ca325496a4d894674d0b798d
> --- /dev/null
> +++ b/drivers/accel/rocket/rocket_core.h
> @@ -0,0 +1,45 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/* Copyright 2024-2025 Tomeu Vizoso <tomeu@tomeuvizoso.net> */
> +
> +#ifndef __ROCKET_CORE_H__
> +#define __ROCKET_CORE_H__
> +
> +#include <drm/gpu_scheduler.h>
> +#include <linux/clk.h>
> +#include <linux/io.h>
> +#include <linux/mutex_types.h>
> +
> +#include "rocket_registers.h"
> +
> +#define rocket_pc_readl(core, reg) \
> + readl((core)->pc_iomem + (REG_PC_##reg))
> +#define rocket_pc_writel(core, reg, value) \
> + writel(value, (core)->pc_iomem + (REG_PC_##reg))
> +
> +#define rocket_cna_readl(core, reg) \
> + readl((core)->cna_iomem + (REG_CNA_##reg) - REG_CNA_S_STATUS)
> +#define rocket_cna_writel(core, reg, value) \
> + writel(value, (core)->cna_iomem + (REG_CNA_##reg) - REG_CNA_S_STATUS)
> +
> +#define rocket_core_readl(core, reg) \
> + readl((core)->core_iomem + (REG_CORE_##reg) - REG_CORE_S_STATUS)
> +#define rocket_core_writel(core, reg, value) \
> + writel(value, (core)->core_iomem + (REG_CORE_##reg) - REG_CORE_S_STATUS)
> +
> +struct rocket_core {
> + struct device *dev;
> + struct rocket_device *rdev;
> + struct device_link *link;
> + unsigned int index;
> +
> + int irq;
> + void __iomem *pc_iomem;
> + void __iomem *cna_iomem;
> + void __iomem *core_iomem;
> + struct clk_bulk_data clks[4];
> +};
> +
> +int rocket_core_init(struct rocket_core *core);
> +void rocket_core_fini(struct rocket_core *core);
> +
> +#endif
> diff --git a/drivers/accel/rocket/rocket_device.c b/drivers/accel/rocket/rocket_device.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..a05c103e117e3eaa6439884b7acb6e3483296edb
> --- /dev/null
> +++ b/drivers/accel/rocket/rocket_device.c
> @@ -0,0 +1,25 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/* Copyright 2024-2025 Tomeu Vizoso <tomeu@tomeuvizoso.net> */
> +
> +#include <linux/array_size.h>
> +#include <linux/clk.h>
> +#include <linux/dev_printk.h>
> +
> +#include "rocket_device.h"
> +
> +int rocket_device_init(struct rocket_device *rdev)
> +{
> + int err;
> +
> + /* Initialize core 0 (top) */
> + err = rocket_core_init(&rdev->cores[0]);
> + if (err)
> + return err;
> +
> + return 0;
> +}
> +
> +void rocket_device_fini(struct rocket_device *rdev)
> +{
> + rocket_core_fini(&rdev->cores[0]);
> +}
> diff --git a/drivers/accel/rocket/rocket_device.h b/drivers/accel/rocket/rocket_device.h
> new file mode 100644
> index 0000000000000000000000000000000000000000..b5d5f1479d56e2fde59bbcad9de2b58cef9a9a4d
> --- /dev/null
> +++ b/drivers/accel/rocket/rocket_device.h
> @@ -0,0 +1,26 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/* Copyright 2024-2025 Tomeu Vizoso <tomeu@tomeuvizoso.net> */
> +
> +#ifndef __ROCKET_DEVICE_H__
> +#define __ROCKET_DEVICE_H__
> +
> +#include <drm/drm_device.h>
> +#include <linux/clk.h>
> +#include <linux/container_of.h>
> +
> +#include "rocket_core.h"
> +
> +struct rocket_device {
> + struct drm_device ddev;
> +
> + struct rocket_core *cores;
> + unsigned int num_cores;
> +};
> +
> +int rocket_device_init(struct rocket_device *rdev);
> +void rocket_device_fini(struct rocket_device *rdev);
> +
> +#define to_rocket_device(drm_dev) \
> + ((struct rocket_device *)container_of(drm_dev, struct rocket_device, ddev))
> +
> +#endif
> diff --git a/drivers/accel/rocket/rocket_drv.c b/drivers/accel/rocket/rocket_drv.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..b38a5c6264cb4e74d5e381adaeba1426e576fa56
> --- /dev/null
> +++ b/drivers/accel/rocket/rocket_drv.c
> @@ -0,0 +1,279 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/* Copyright 2024-2025 Tomeu Vizoso <tomeu@tomeuvizoso.net> */
> +
> +#include <drm/drm_accel.h>
> +#include <drm/drm_drv.h>
> +#include <drm/drm_gem.h>
> +#include <drm/drm_ioctl.h>
> +#include <drm/drm_of.h>
> +#include <linux/array_size.h>
> +#include <linux/clk.h>
> +#include <linux/component.h>
> +#include <linux/dma-mapping.h>
> +#include <linux/iommu.h>
> +#include <linux/of.h>
> +#include <linux/platform_device.h>
> +#include <linux/pm_runtime.h>
> +
> +#include "rocket_drv.h"
> +
> +static int
> +rocket_open(struct drm_device *dev, struct drm_file *file)
> +{
> + struct rocket_device *rdev = to_rocket_device(dev);
> + struct rocket_file_priv *rocket_priv;
> +
> + rocket_priv = kzalloc(sizeof(*rocket_priv), GFP_KERNEL);
> + if (!rocket_priv)
> + return -ENOMEM;
> +
> + rocket_priv->rdev = rdev;
> + rocket_priv->domain = iommu_paging_domain_alloc(dev->dev);
> + file->driver_priv = rocket_priv;
> +
> + return 0;
> +}
> +
> +static void
> +rocket_postclose(struct drm_device *dev, struct drm_file *file)
> +{
> + struct rocket_file_priv *rocket_priv = file->driver_priv;
> +
> + iommu_domain_free(rocket_priv->domain);
> + kfree(rocket_priv);
> +}
> +
> +static const struct drm_ioctl_desc rocket_drm_driver_ioctls[] = {
> +#define ROCKET_IOCTL(n, func) \
> + DRM_IOCTL_DEF_DRV(ROCKET_##n, rocket_ioctl_##func, 0)
> +};
> +
> +DEFINE_DRM_ACCEL_FOPS(rocket_accel_driver_fops);
> +
> +/*
> + * Rocket driver version:
> + * - 1.0 - initial interface
> + */
> +static const struct drm_driver rocket_drm_driver = {
> + .driver_features = DRIVER_COMPUTE_ACCEL,
> + .open = rocket_open,
> + .postclose = rocket_postclose,
> + .ioctls = rocket_drm_driver_ioctls,
> + .num_ioctls = ARRAY_SIZE(rocket_drm_driver_ioctls),
> + .fops = &rocket_accel_driver_fops,
> + .name = "rocket",
> + .desc = "rocket DRM",
> +};
> +
> +static int rocket_drm_bind(struct device *dev)
> +{
> + struct device_node *core_node;
> + struct rocket_device *rdev;
> + struct drm_device *ddev;
> + unsigned int num_cores = 1;
> + int err;
> +
> + rdev = devm_drm_dev_alloc(dev, &rocket_drm_driver, struct rocket_device, ddev);
> + if (IS_ERR(rdev))
> + return PTR_ERR(rdev);
> +
> + ddev = &rdev->ddev;
> + dev_set_drvdata(dev, rdev);
> +
> + for_each_compatible_node(core_node, NULL, "rockchip,rk3588-rknn-core")
> + if (of_device_is_available(core_node))
> + num_cores++;
> +
> + rdev->cores = devm_kcalloc(dev, num_cores, sizeof(*rdev->cores), GFP_KERNEL);
> + if (IS_ERR(rdev->cores))
> + return PTR_ERR(rdev->cores);
> +
> + /* Add core 0, any other cores will be added later when they are bound */
> + rdev->cores[0].rdev = rdev;
> + rdev->cores[0].dev = dev;
> + rdev->cores[0].index = 0;
> + rdev->num_cores = 1;
> +
> + err = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(40));
> + if (err)
> + return err;
> +
> + err = rocket_device_init(rdev);
> + if (err) {
> + dev_err_probe(dev, err, "Fatal error during NPU init\n");
> + goto err_device_fini;
> + }
> +
> + err = component_bind_all(dev, rdev);
> + if (err)
> + goto err_device_fini;
> +
> + err = drm_dev_register(ddev, 0);
> + if (err < 0)
> + goto err_unbind;
> +
> + return 0;
> +
> +err_unbind:
> + component_unbind_all(dev, rdev);
> +err_device_fini:
> + rocket_device_fini(rdev);
> + return err;
> +}
> +
> +static void rocket_drm_unbind(struct device *dev)
> +{
> + struct rocket_device *rdev = dev_get_drvdata(dev);
> + struct drm_device *ddev = &rdev->ddev;
> +
> + drm_dev_unregister(ddev);
> +
> + component_unbind_all(dev, rdev);
> +
> + rocket_device_fini(rdev);
> +}
> +
> +const struct component_master_ops rocket_drm_ops = {
> + .bind = rocket_drm_bind,
> + .unbind = rocket_drm_unbind,
> +};
> +
> +static int rocket_core_bind(struct device *dev, struct device *master, void *data)
> +{
> + struct rocket_device *rdev = data;
> + unsigned int core = rdev->num_cores;
> + int err;
> +
> + dev_set_drvdata(dev, rdev);
> +
> + rdev->cores[core].rdev = rdev;
> + rdev->cores[core].dev = dev;
> + rdev->cores[core].index = core;
> + rdev->cores[core].link = device_link_add(dev, rdev->cores[0].dev,
> + DL_FLAG_STATELESS | DL_FLAG_PM_RUNTIME);
> +
> + rdev->num_cores++;
> +
> + err = rocket_core_init(&rdev->cores[core]);
> + if (err) {
> + rocket_device_fini(rdev);
> + return err;
> + }
> +
> + return 0;
> +}
> +
> +static void rocket_core_unbind(struct device *dev, struct device *master, void *data)
> +{
> + struct rocket_device *rdev = data;
> +
> + for (unsigned int core = 1; core < rdev->num_cores; core++) {
> + if (rdev->cores[core].dev == dev) {
> + rocket_core_fini(&rdev->cores[core]);
> + device_link_del(rdev->cores[core].link);
> + break;
> + }
> + }
> +}
> +
> +const struct component_ops rocket_core_ops = {
> + .bind = rocket_core_bind,
> + .unbind = rocket_core_unbind,
> +};
> +
> +static int rocket_probe(struct platform_device *pdev)
> +{
> + struct component_match *match = NULL;
> + struct device_node *core_node;
> +
> + if (fwnode_device_is_compatible(pdev->dev.fwnode, "rockchip,rk3588-rknn-core"))
> + return component_add(&pdev->dev, &rocket_core_ops);
> +
> + for_each_compatible_node(core_node, NULL, "rockchip,rk3588-rknn-core") {
> + if (!of_device_is_available(core_node))
> + continue;
> +
> + drm_of_component_match_add(&pdev->dev, &match,
> + component_compare_of, core_node);
> + }
> +
> + return component_master_add_with_match(&pdev->dev, &rocket_drm_ops, match);
> +}
> +
> +static void rocket_remove(struct platform_device *pdev)
> +{
> + if (fwnode_device_is_compatible(pdev->dev.fwnode, "rockchip,rk3588-rknn-core-top"))
> + component_master_del(&pdev->dev, &rocket_drm_ops);
> + else if (fwnode_device_is_compatible(pdev->dev.fwnode, "rockchip,rk3588-rknn-core"))
> + component_del(&pdev->dev, &rocket_core_ops);
> +}
> +
> +static const struct of_device_id dt_match[] = {
> + { .compatible = "rockchip,rk3588-rknn-core-top" },
> + { .compatible = "rockchip,rk3588-rknn-core" },
> + {}
> +};
> +MODULE_DEVICE_TABLE(of, dt_match);
> +
> +static int find_core_for_dev(struct device *dev)
> +{
> + struct rocket_device *rdev = dev_get_drvdata(dev);
> +
> + for (unsigned int core = 0; core < rdev->num_cores; core++) {
> + if (dev == rdev->cores[core].dev)
> + return core;
> + }
> +
> + return -1;
> +}
> +
> +static int rocket_device_runtime_resume(struct device *dev)
> +{
> + struct rocket_device *rdev = dev_get_drvdata(dev);
> + int core = find_core_for_dev(dev);
> + int err = 0;
> +
> + if (core < 0)
> + return -ENODEV;
> +
> + err = clk_bulk_prepare_enable(ARRAY_SIZE(rdev->cores[core].clks), rdev->cores[core].clks);
> + if (err) {
> + dev_err(dev, "failed to enable (%d) clocks for core %d\n", err, core);
> + return err;
> + }
> +
> + return 0;
> +}
> +
> +static int rocket_device_runtime_suspend(struct device *dev)
> +{
> + struct rocket_device *rdev = dev_get_drvdata(dev);
> + int core = find_core_for_dev(dev);
> +
> + if (core < 0)
> + return -ENODEV;
> +
> + clk_bulk_disable_unprepare(ARRAY_SIZE(rdev->cores[core].clks), rdev->cores[core].clks);
> +
> + return 0;
> +}
> +
> +EXPORT_GPL_DEV_PM_OPS(rocket_pm_ops) = {
> + RUNTIME_PM_OPS(rocket_device_runtime_suspend, rocket_device_runtime_resume, NULL)
> + SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend, pm_runtime_force_resume)
> +};
> +
> +static struct platform_driver rocket_driver = {
> + .probe = rocket_probe,
> + .remove = rocket_remove,
> + .driver = {
> + .name = "rocket",
> + .pm = pm_ptr(&rocket_pm_ops),
> + .of_match_table = dt_match,
> + },
> +};
> +module_platform_driver(rocket_driver);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_DESCRIPTION("DRM driver for the Rockchip NPU IP");
> +MODULE_AUTHOR("Tomeu Vizoso");
> diff --git a/drivers/accel/rocket/rocket_drv.h b/drivers/accel/rocket/rocket_drv.h
> new file mode 100644
> index 0000000000000000000000000000000000000000..3219621afb72acdfa915c110e2ec3aacb66bd940
> --- /dev/null
> +++ b/drivers/accel/rocket/rocket_drv.h
> @@ -0,0 +1,15 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/* Copyright 2024-2025 Tomeu Vizoso <tomeu@tomeuvizoso.net> */
> +
> +#ifndef __ROCKET_DRV_H__
> +#define __ROCKET_DRV_H__
> +
> +#include "rocket_device.h"
> +
> +struct rocket_file_priv {
> + struct rocket_device *rdev;
> +
> + struct iommu_domain *domain;
> +};
> +
> +#endif
>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v7 03/10] accel/rocket: Add IOCTL for BO creation
2025-06-06 6:28 ` [PATCH v7 03/10] accel/rocket: Add IOCTL for BO creation Tomeu Vizoso
2025-06-24 10:17 ` Robin Murphy
2025-07-01 19:31 ` Heiko Stübner
@ 2025-07-11 17:40 ` Andrew Davis
2 siblings, 0 replies; 30+ messages in thread
From: Andrew Davis @ 2025-07-11 17:40 UTC (permalink / raw)
To: Tomeu Vizoso, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Heiko Stuebner, Oded Gabbay, Jonathan Corbet, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
Sumit Semwal, Christian König, Sebastian Reichel,
Nicolas Frattaroli, Kever Yang, Robin Murphy, Daniel Stone,
Da Xue, Jeff Hugo
Cc: devicetree, linux-arm-kernel, linux-rockchip, linux-kernel,
dri-devel, linux-doc, linux-media, linaro-mm-sig
On 6/6/25 1:28 AM, Tomeu Vizoso wrote:
> This uses the SHMEM DRM helpers and we map right away to the CPU and NPU
> sides, as all buffers are expected to be accessed from both.
>
> v2:
> - Sync the IOMMUs for the other cores when mapping and unmapping.
>
> v3:
> - Make use of GPL-2.0-only for the copyright notice (Jeff Hugo)
>
> v6:
> - Use mutexes guard (Markus Elfring)
>
> v7:
> - Assign its own IOMMU domain to each client, for isolation (Daniel
> Stone and Robin Murphy)
>
> Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
> Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
> ---
> drivers/accel/rocket/Makefile | 3 +-
> drivers/accel/rocket/rocket_device.c | 4 ++
> drivers/accel/rocket/rocket_device.h | 2 +
> drivers/accel/rocket/rocket_drv.c | 7 ++-
> drivers/accel/rocket/rocket_gem.c | 115 +++++++++++++++++++++++++++++++++++
> drivers/accel/rocket/rocket_gem.h | 27 ++++++++
> include/uapi/drm/rocket_accel.h | 44 ++++++++++++++
> 7 files changed, 200 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/accel/rocket/Makefile b/drivers/accel/rocket/Makefile
> index abdd75f2492eaecf8bf5e78a2ac150ea19ac3e96..4deef267f9e1238c4d8bd108dcc8afd9dc8b2b8f 100644
> --- a/drivers/accel/rocket/Makefile
> +++ b/drivers/accel/rocket/Makefile
> @@ -5,4 +5,5 @@ obj-$(CONFIG_DRM_ACCEL_ROCKET) := rocket.o
> rocket-y := \
> rocket_core.o \
> rocket_device.o \
> - rocket_drv.o
> + rocket_drv.o \
> + rocket_gem.o
> diff --git a/drivers/accel/rocket/rocket_device.c b/drivers/accel/rocket/rocket_device.c
> index a05c103e117e3eaa6439884b7acb6e3483296edb..5e559104741af22c528914c96e44558323ab6c89 100644
> --- a/drivers/accel/rocket/rocket_device.c
> +++ b/drivers/accel/rocket/rocket_device.c
> @@ -4,6 +4,7 @@
> #include <linux/array_size.h>
> #include <linux/clk.h>
> #include <linux/dev_printk.h>
> +#include <linux/mutex.h>
>
> #include "rocket_device.h"
>
> @@ -16,10 +17,13 @@ int rocket_device_init(struct rocket_device *rdev)
> if (err)
> return err;
>
> + mutex_init(&rdev->iommu_lock);
devm_mutex_init() again keeps you from needing rocket_device_fini().
Same in the next patch even if you don't end up needing the iommu_lock.
Andrew
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v7 02/10] accel/rocket: Add a new driver for Rockchip's NPU
2025-07-11 17:38 ` Andrew Davis
@ 2025-07-12 8:10 ` Tomeu Vizoso
0 siblings, 0 replies; 30+ messages in thread
From: Tomeu Vizoso @ 2025-07-12 8:10 UTC (permalink / raw)
To: Andrew Davis
Cc: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner,
Oded Gabbay, Jonathan Corbet, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
Christian König, Sebastian Reichel, Nicolas Frattaroli,
Kever Yang, Robin Murphy, Daniel Stone, Da Xue, Jeff Hugo,
devicetree, linux-arm-kernel, linux-rockchip, linux-kernel,
dri-devel, linux-doc, linux-media, linaro-mm-sig
On Fri, Jul 11, 2025 at 7:38 PM Andrew Davis <afd@ti.com> wrote:
>
> On 6/6/25 1:28 AM, Tomeu Vizoso wrote:
> > This initial version supports the NPU as shipped in the RK3588 SoC and
> > described in the first part of its TRM, in Chapter 36.
> >
> > This NPU contains 3 independent cores that the driver can submit jobs
> > to.
> >
> > This commit adds just hardware initialization and power management.
> >
> > v2:
> > - Split cores and IOMMUs as independent devices (Sebastian Reichel)
> > - Add some documentation (Jeffrey Hugo)
> > - Be more explicit in the Kconfig documentation (Jeffrey Hugo)
> > - Remove resets, as these haven't been found useful so far (Zenghui Yu)
> > - Repack structs (Jeffrey Hugo)
> > - Use DEFINE_DRM_ACCEL_FOPS (Jeffrey Hugo)
> > - Use devm_drm_dev_alloc (Jeffrey Hugo)
> > - Use probe log helper (Jeffrey Hugo)
> > - Introduce UABI header in a later patch (Jeffrey Hugo)
> >
> > v3:
> > - Adapt to a split of the register block in the DT bindings (Nicolas
> > Frattaroli)
> > - Move registers header to its own commit (Thomas Zimmermann)
> > - Misc. cleanups (Thomas Zimmermann and Jeff Hugo)
> > - Make use of GPL-2.0-only for the copyright notice (Jeff Hugo)
> > - PM improvements (Nicolas Frattaroli)
> >
> > v4:
> > - Use bulk clk API (Krzysztof Kozlowski)
> >
> > v6:
> > - Remove mention to NVDLA, as the hardware is only incidentally related
> > (Kever Yang)
> > - Use calloc instead of GFP_ZERO (Jeff Hugo)
> > - Explicitly include linux/container_of.h (Jeff Hugo)
> > - pclk and npu clocks are now needed by all cores (Rob Herring)
> >
> > v7:
> > - Assign its own IOMMU domain to each client, for isolation (Daniel
> > Stone and Robin Murphy)
> >
> > Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
> > ---
> > Documentation/accel/index.rst | 1 +
> > Documentation/accel/rocket/index.rst | 19 +++
> > MAINTAINERS | 10 ++
> > drivers/accel/Kconfig | 1 +
> > drivers/accel/Makefile | 1 +
> > drivers/accel/rocket/Kconfig | 25 ++++
> > drivers/accel/rocket/Makefile | 8 +
> > drivers/accel/rocket/rocket_core.c | 70 +++++++++
> > drivers/accel/rocket/rocket_core.h | 45 ++++++
> > drivers/accel/rocket/rocket_device.c | 25 ++++
> > drivers/accel/rocket/rocket_device.h | 26 ++++
> > drivers/accel/rocket/rocket_drv.c | 279 +++++++++++++++++++++++++++++++++++
> > drivers/accel/rocket/rocket_drv.h | 15 ++
> > 13 files changed, 525 insertions(+)
> >
> > diff --git a/Documentation/accel/index.rst b/Documentation/accel/index.rst
> > index bc85f26533d88891dde482f91e26c99991b22869..d8fa332d60a890dbb617454d2a26d9b6f9b196aa 100644
> > --- a/Documentation/accel/index.rst
> > +++ b/Documentation/accel/index.rst
> > @@ -10,6 +10,7 @@ Compute Accelerators
> > introduction
> > amdxdna/index
> > qaic/index
> > + rocket/index
> >
> > .. only:: subproject and html
> >
> > diff --git a/Documentation/accel/rocket/index.rst b/Documentation/accel/rocket/index.rst
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..300eb3aeab1d8c6514c65af4d216b2d5a1669131
> > --- /dev/null
> > +++ b/Documentation/accel/rocket/index.rst
> > @@ -0,0 +1,19 @@
> > +.. SPDX-License-Identifier: GPL-2.0-only
> > +
> > +=====================================
> > + accel/rocket Rockchip NPU driver
> > +=====================================
> > +
> > +The accel/rocket driver supports the Neural Processing Units (NPUs) inside some
> > +Rockchip SoCs such as the RK3588. Rockchip calls it RKNN and sometimes RKNPU.
> > +
> > +The hardware is described in chapter 36 in the RK3588 TRM.
> > +
> > +This driver just powers the hardware on and off, allocates and maps buffers to
> > +the device and submits jobs to the frontend unit. Everything else is done in
> > +userspace, as a Gallium driver (also called rocket) that is part of the Mesa3D
> > +project.
> > +
> > +Hardware currently supported:
> > +
> > +* RK3588
> > \ No newline at end of file
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 96b82704950184bd71623ff41fc4df31e4c7fe87..2d8833bf1f2db06ca624d703f19066adab2f9fde 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -7263,6 +7263,16 @@ T: git https://gitlab.freedesktop.org/drm/misc/kernel.git
> > F: drivers/accel/ivpu/
> > F: include/uapi/drm/ivpu_accel.h
> >
> > +DRM ACCEL DRIVER FOR ROCKCHIP NPU
> > +M: Tomeu Vizoso <tomeu@tomeuvizoso.net>
> > +L: dri-devel@lists.freedesktop.org
> > +S: Supported
> > +T: git https://gitlab.freedesktop.org/drm/misc/kernel.git
> > +F: Documentation/accel/rocket/
> > +F: Documentation/devicetree/bindings/npu/rockchip,rknn-core.yaml
> > +F: drivers/accel/rocket/
> > +F: include/uapi/drm/rocket_accel.h
> > +
> > DRM COMPUTE ACCELERATORS DRIVERS AND FRAMEWORK
> > M: Oded Gabbay <ogabbay@kernel.org>
> > L: dri-devel@lists.freedesktop.org
> > diff --git a/drivers/accel/Kconfig b/drivers/accel/Kconfig
> > index 5b9490367a39fd12d35a8d9021768aa186c09308..bb01cebc42bf16ebf02e938040f339ff94869e33 100644
> > --- a/drivers/accel/Kconfig
> > +++ b/drivers/accel/Kconfig
> > @@ -28,5 +28,6 @@ source "drivers/accel/amdxdna/Kconfig"
> > source "drivers/accel/habanalabs/Kconfig"
> > source "drivers/accel/ivpu/Kconfig"
> > source "drivers/accel/qaic/Kconfig"
> > +source "drivers/accel/rocket/Kconfig"
> >
> > endif
> > diff --git a/drivers/accel/Makefile b/drivers/accel/Makefile
> > index a301fb6089d4c515430175c5e2ba9190f6dc9158..ffc3fa58866616d933184a7659573cd4d4780a8d 100644
> > --- a/drivers/accel/Makefile
> > +++ b/drivers/accel/Makefile
> > @@ -4,3 +4,4 @@ obj-$(CONFIG_DRM_ACCEL_AMDXDNA) += amdxdna/
> > obj-$(CONFIG_DRM_ACCEL_HABANALABS) += habanalabs/
> > obj-$(CONFIG_DRM_ACCEL_IVPU) += ivpu/
> > obj-$(CONFIG_DRM_ACCEL_QAIC) += qaic/
> > +obj-$(CONFIG_DRM_ACCEL_ROCKET) += rocket/
> > \ No newline at end of file
>
> Couple of these no newline warnings
>
> > diff --git a/drivers/accel/rocket/Kconfig b/drivers/accel/rocket/Kconfig
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..9a59c6c61bf4d6460d8008b16331f001c97de67d
> > --- /dev/null
> > +++ b/drivers/accel/rocket/Kconfig
> > @@ -0,0 +1,25 @@
> > +# SPDX-License-Identifier: GPL-2.0-only
> > +
> > +config DRM_ACCEL_ROCKET
> > + tristate "Rocket (support for Rockchip NPUs)"
> > + depends on DRM
> > + depends on ARM64 || COMPILE_TEST
>
> Should this be more specific for now ARCH_ROCKCHIP?
Yep, already queued for v8.
> > + depends on MMU
> > + select DRM_SCHED
> > + select IOMMU_SUPPORT
> > + select IOMMU_IO_PGTABLE_LPAE
> > + select DRM_GEM_SHMEM_HELPER
> > + help
> > + Choose this option if you have a Rockchip SoC that contains a
> > + compatible Neural Processing Unit (NPU), such as the RK3588. Called by
> > + Rockchip either RKNN or RKNPU, it accelerates inference of neural
> > + networks.
> > +
> > + The interface exposed to userspace is described in
> > + include/uapi/drm/rocket_accel.h and is used by the Rocket userspace
> > + driver in Mesa3D.
> > +
> > + If unsure, say N.
> > +
> > + To compile this driver as a module, choose M here: the
> > + module will be called rocket.
> > diff --git a/drivers/accel/rocket/Makefile b/drivers/accel/rocket/Makefile
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..abdd75f2492eaecf8bf5e78a2ac150ea19ac3e96
> > --- /dev/null
> > +++ b/drivers/accel/rocket/Makefile
> > @@ -0,0 +1,8 @@
> > +# SPDX-License-Identifier: GPL-2.0-only
> > +
> > +obj-$(CONFIG_DRM_ACCEL_ROCKET) := rocket.o
> > +
> > +rocket-y := \
> > + rocket_core.o \
> > + rocket_device.o \
> > + rocket_drv.o
> > diff --git a/drivers/accel/rocket/rocket_core.c b/drivers/accel/rocket/rocket_core.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..3a6f25f2b4103075102739588bcdad96510e2a4e
> > --- /dev/null
> > +++ b/drivers/accel/rocket/rocket_core.c
> > @@ -0,0 +1,70 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/* Copyright 2024-2025 Tomeu Vizoso <tomeu@tomeuvizoso.net> */
> > +
> > +#include <linux/clk.h>
> > +#include <linux/dev_printk.h>
> > +#include <linux/err.h>
> > +#include <linux/platform_device.h>
> > +#include <linux/pm_runtime.h>
> > +
> > +#include "rocket_core.h"
> > +
> > +int rocket_core_init(struct rocket_core *core)
> > +{
> > + struct device *dev = core->dev;
> > + struct platform_device *pdev = to_platform_device(dev);
> > + u32 version;
> > + int err = 0;
> > +
> > + err = devm_clk_bulk_get(dev, ARRAY_SIZE(core->clks), core->clks);
> > + if (err)
> > + return dev_err_probe(dev, err, "failed to get clocks for core %d\n", core->index);
> > +
> > + core->pc_iomem = devm_platform_ioremap_resource_byname(pdev, "pc");
> > + if (IS_ERR(core->pc_iomem)) {
> > + dev_err(dev, "couldn't find PC registers %ld\n", PTR_ERR(core->pc_iomem));
> > + return PTR_ERR(core->pc_iomem);
> > + }
> > +
> > + core->cna_iomem = devm_platform_ioremap_resource_byname(pdev, "cna");
> > + if (IS_ERR(core->cna_iomem)) {
> > + dev_err(dev, "couldn't find CNA registers %ld\n", PTR_ERR(core->cna_iomem));
> > + return PTR_ERR(core->cna_iomem);
> > + }
> > +
> > + core->core_iomem = devm_platform_ioremap_resource_byname(pdev, "core");
> > + if (IS_ERR(core->core_iomem)) {
> > + dev_err(dev, "couldn't find CORE registers %ld\n", PTR_ERR(core->core_iomem));
> > + return PTR_ERR(core->core_iomem);
> > + }
> > +
> > + pm_runtime_use_autosuspend(dev);
> > +
> > + /*
> > + * As this NPU will be most often used as part of a media pipeline that
> > + * ends presenting in a display, choose 50 ms (~3 frames at 60Hz) as an
> > + * autosuspend delay as that will keep the device powered up while the
> > + * pipeline is running.
> > + */
> > + pm_runtime_set_autosuspend_delay(dev, 50);
> > +
> > + pm_runtime_enable(dev);
>
>
> devm_pm_runtime_enable(dev) here would take care of both functions
> in rocket_core_fini() so you wouldn't need that and can cleanup
> some return paths here.
Unfortunately, that would cause the suspend callback being called
after devm has freed the clocks, so I need to force for the suspend to
happen before any other cleanup.
Thanks,
Tomeu
> Andrew
>
> > +
> > + err = pm_runtime_get_sync(dev);
> > +
> > + version = rocket_pc_readl(core, VERSION);
> > + version += rocket_pc_readl(core, VERSION_NUM) & 0xffff;
> > +
> > + pm_runtime_mark_last_busy(dev);
> > + pm_runtime_put_autosuspend(dev);
> > +
> > + dev_info(dev, "Rockchip NPU core %d version: %d\n", core->index, version);
> > +
> > + return 0;
> > +}
> > +
> > +void rocket_core_fini(struct rocket_core *core)
> > +{
> > + pm_runtime_dont_use_autosuspend(core->dev);
> > + pm_runtime_disable(core->dev);
> > +}
> > diff --git a/drivers/accel/rocket/rocket_core.h b/drivers/accel/rocket/rocket_core.h
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..1b1beb9798f03ec2ca325496a4d894674d0b798d
> > --- /dev/null
> > +++ b/drivers/accel/rocket/rocket_core.h
> > @@ -0,0 +1,45 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/* Copyright 2024-2025 Tomeu Vizoso <tomeu@tomeuvizoso.net> */
> > +
> > +#ifndef __ROCKET_CORE_H__
> > +#define __ROCKET_CORE_H__
> > +
> > +#include <drm/gpu_scheduler.h>
> > +#include <linux/clk.h>
> > +#include <linux/io.h>
> > +#include <linux/mutex_types.h>
> > +
> > +#include "rocket_registers.h"
> > +
> > +#define rocket_pc_readl(core, reg) \
> > + readl((core)->pc_iomem + (REG_PC_##reg))
> > +#define rocket_pc_writel(core, reg, value) \
> > + writel(value, (core)->pc_iomem + (REG_PC_##reg))
> > +
> > +#define rocket_cna_readl(core, reg) \
> > + readl((core)->cna_iomem + (REG_CNA_##reg) - REG_CNA_S_STATUS)
> > +#define rocket_cna_writel(core, reg, value) \
> > + writel(value, (core)->cna_iomem + (REG_CNA_##reg) - REG_CNA_S_STATUS)
> > +
> > +#define rocket_core_readl(core, reg) \
> > + readl((core)->core_iomem + (REG_CORE_##reg) - REG_CORE_S_STATUS)
> > +#define rocket_core_writel(core, reg, value) \
> > + writel(value, (core)->core_iomem + (REG_CORE_##reg) - REG_CORE_S_STATUS)
> > +
> > +struct rocket_core {
> > + struct device *dev;
> > + struct rocket_device *rdev;
> > + struct device_link *link;
> > + unsigned int index;
> > +
> > + int irq;
> > + void __iomem *pc_iomem;
> > + void __iomem *cna_iomem;
> > + void __iomem *core_iomem;
> > + struct clk_bulk_data clks[4];
> > +};
> > +
> > +int rocket_core_init(struct rocket_core *core);
> > +void rocket_core_fini(struct rocket_core *core);
> > +
> > +#endif
> > diff --git a/drivers/accel/rocket/rocket_device.c b/drivers/accel/rocket/rocket_device.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..a05c103e117e3eaa6439884b7acb6e3483296edb
> > --- /dev/null
> > +++ b/drivers/accel/rocket/rocket_device.c
> > @@ -0,0 +1,25 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/* Copyright 2024-2025 Tomeu Vizoso <tomeu@tomeuvizoso.net> */
> > +
> > +#include <linux/array_size.h>
> > +#include <linux/clk.h>
> > +#include <linux/dev_printk.h>
> > +
> > +#include "rocket_device.h"
> > +
> > +int rocket_device_init(struct rocket_device *rdev)
> > +{
> > + int err;
> > +
> > + /* Initialize core 0 (top) */
> > + err = rocket_core_init(&rdev->cores[0]);
> > + if (err)
> > + return err;
> > +
> > + return 0;
> > +}
> > +
> > +void rocket_device_fini(struct rocket_device *rdev)
> > +{
> > + rocket_core_fini(&rdev->cores[0]);
> > +}
> > diff --git a/drivers/accel/rocket/rocket_device.h b/drivers/accel/rocket/rocket_device.h
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..b5d5f1479d56e2fde59bbcad9de2b58cef9a9a4d
> > --- /dev/null
> > +++ b/drivers/accel/rocket/rocket_device.h
> > @@ -0,0 +1,26 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/* Copyright 2024-2025 Tomeu Vizoso <tomeu@tomeuvizoso.net> */
> > +
> > +#ifndef __ROCKET_DEVICE_H__
> > +#define __ROCKET_DEVICE_H__
> > +
> > +#include <drm/drm_device.h>
> > +#include <linux/clk.h>
> > +#include <linux/container_of.h>
> > +
> > +#include "rocket_core.h"
> > +
> > +struct rocket_device {
> > + struct drm_device ddev;
> > +
> > + struct rocket_core *cores;
> > + unsigned int num_cores;
> > +};
> > +
> > +int rocket_device_init(struct rocket_device *rdev);
> > +void rocket_device_fini(struct rocket_device *rdev);
> > +
> > +#define to_rocket_device(drm_dev) \
> > + ((struct rocket_device *)container_of(drm_dev, struct rocket_device, ddev))
> > +
> > +#endif
> > diff --git a/drivers/accel/rocket/rocket_drv.c b/drivers/accel/rocket/rocket_drv.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..b38a5c6264cb4e74d5e381adaeba1426e576fa56
> > --- /dev/null
> > +++ b/drivers/accel/rocket/rocket_drv.c
> > @@ -0,0 +1,279 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/* Copyright 2024-2025 Tomeu Vizoso <tomeu@tomeuvizoso.net> */
> > +
> > +#include <drm/drm_accel.h>
> > +#include <drm/drm_drv.h>
> > +#include <drm/drm_gem.h>
> > +#include <drm/drm_ioctl.h>
> > +#include <drm/drm_of.h>
> > +#include <linux/array_size.h>
> > +#include <linux/clk.h>
> > +#include <linux/component.h>
> > +#include <linux/dma-mapping.h>
> > +#include <linux/iommu.h>
> > +#include <linux/of.h>
> > +#include <linux/platform_device.h>
> > +#include <linux/pm_runtime.h>
> > +
> > +#include "rocket_drv.h"
> > +
> > +static int
> > +rocket_open(struct drm_device *dev, struct drm_file *file)
> > +{
> > + struct rocket_device *rdev = to_rocket_device(dev);
> > + struct rocket_file_priv *rocket_priv;
> > +
> > + rocket_priv = kzalloc(sizeof(*rocket_priv), GFP_KERNEL);
> > + if (!rocket_priv)
> > + return -ENOMEM;
> > +
> > + rocket_priv->rdev = rdev;
> > + rocket_priv->domain = iommu_paging_domain_alloc(dev->dev);
> > + file->driver_priv = rocket_priv;
> > +
> > + return 0;
> > +}
> > +
> > +static void
> > +rocket_postclose(struct drm_device *dev, struct drm_file *file)
> > +{
> > + struct rocket_file_priv *rocket_priv = file->driver_priv;
> > +
> > + iommu_domain_free(rocket_priv->domain);
> > + kfree(rocket_priv);
> > +}
> > +
> > +static const struct drm_ioctl_desc rocket_drm_driver_ioctls[] = {
> > +#define ROCKET_IOCTL(n, func) \
> > + DRM_IOCTL_DEF_DRV(ROCKET_##n, rocket_ioctl_##func, 0)
> > +};
> > +
> > +DEFINE_DRM_ACCEL_FOPS(rocket_accel_driver_fops);
> > +
> > +/*
> > + * Rocket driver version:
> > + * - 1.0 - initial interface
> > + */
> > +static const struct drm_driver rocket_drm_driver = {
> > + .driver_features = DRIVER_COMPUTE_ACCEL,
> > + .open = rocket_open,
> > + .postclose = rocket_postclose,
> > + .ioctls = rocket_drm_driver_ioctls,
> > + .num_ioctls = ARRAY_SIZE(rocket_drm_driver_ioctls),
> > + .fops = &rocket_accel_driver_fops,
> > + .name = "rocket",
> > + .desc = "rocket DRM",
> > +};
> > +
> > +static int rocket_drm_bind(struct device *dev)
> > +{
> > + struct device_node *core_node;
> > + struct rocket_device *rdev;
> > + struct drm_device *ddev;
> > + unsigned int num_cores = 1;
> > + int err;
> > +
> > + rdev = devm_drm_dev_alloc(dev, &rocket_drm_driver, struct rocket_device, ddev);
> > + if (IS_ERR(rdev))
> > + return PTR_ERR(rdev);
> > +
> > + ddev = &rdev->ddev;
> > + dev_set_drvdata(dev, rdev);
> > +
> > + for_each_compatible_node(core_node, NULL, "rockchip,rk3588-rknn-core")
> > + if (of_device_is_available(core_node))
> > + num_cores++;
> > +
> > + rdev->cores = devm_kcalloc(dev, num_cores, sizeof(*rdev->cores), GFP_KERNEL);
> > + if (IS_ERR(rdev->cores))
> > + return PTR_ERR(rdev->cores);
> > +
> > + /* Add core 0, any other cores will be added later when they are bound */
> > + rdev->cores[0].rdev = rdev;
> > + rdev->cores[0].dev = dev;
> > + rdev->cores[0].index = 0;
> > + rdev->num_cores = 1;
> > +
> > + err = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(40));
> > + if (err)
> > + return err;
> > +
> > + err = rocket_device_init(rdev);
> > + if (err) {
> > + dev_err_probe(dev, err, "Fatal error during NPU init\n");
> > + goto err_device_fini;
> > + }
> > +
> > + err = component_bind_all(dev, rdev);
> > + if (err)
> > + goto err_device_fini;
> > +
> > + err = drm_dev_register(ddev, 0);
> > + if (err < 0)
> > + goto err_unbind;
> > +
> > + return 0;
> > +
> > +err_unbind:
> > + component_unbind_all(dev, rdev);
> > +err_device_fini:
> > + rocket_device_fini(rdev);
> > + return err;
> > +}
> > +
> > +static void rocket_drm_unbind(struct device *dev)
> > +{
> > + struct rocket_device *rdev = dev_get_drvdata(dev);
> > + struct drm_device *ddev = &rdev->ddev;
> > +
> > + drm_dev_unregister(ddev);
> > +
> > + component_unbind_all(dev, rdev);
> > +
> > + rocket_device_fini(rdev);
> > +}
> > +
> > +const struct component_master_ops rocket_drm_ops = {
> > + .bind = rocket_drm_bind,
> > + .unbind = rocket_drm_unbind,
> > +};
> > +
> > +static int rocket_core_bind(struct device *dev, struct device *master, void *data)
> > +{
> > + struct rocket_device *rdev = data;
> > + unsigned int core = rdev->num_cores;
> > + int err;
> > +
> > + dev_set_drvdata(dev, rdev);
> > +
> > + rdev->cores[core].rdev = rdev;
> > + rdev->cores[core].dev = dev;
> > + rdev->cores[core].index = core;
> > + rdev->cores[core].link = device_link_add(dev, rdev->cores[0].dev,
> > + DL_FLAG_STATELESS | DL_FLAG_PM_RUNTIME);
> > +
> > + rdev->num_cores++;
> > +
> > + err = rocket_core_init(&rdev->cores[core]);
> > + if (err) {
> > + rocket_device_fini(rdev);
> > + return err;
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static void rocket_core_unbind(struct device *dev, struct device *master, void *data)
> > +{
> > + struct rocket_device *rdev = data;
> > +
> > + for (unsigned int core = 1; core < rdev->num_cores; core++) {
> > + if (rdev->cores[core].dev == dev) {
> > + rocket_core_fini(&rdev->cores[core]);
> > + device_link_del(rdev->cores[core].link);
> > + break;
> > + }
> > + }
> > +}
> > +
> > +const struct component_ops rocket_core_ops = {
> > + .bind = rocket_core_bind,
> > + .unbind = rocket_core_unbind,
> > +};
> > +
> > +static int rocket_probe(struct platform_device *pdev)
> > +{
> > + struct component_match *match = NULL;
> > + struct device_node *core_node;
> > +
> > + if (fwnode_device_is_compatible(pdev->dev.fwnode, "rockchip,rk3588-rknn-core"))
> > + return component_add(&pdev->dev, &rocket_core_ops);
> > +
> > + for_each_compatible_node(core_node, NULL, "rockchip,rk3588-rknn-core") {
> > + if (!of_device_is_available(core_node))
> > + continue;
> > +
> > + drm_of_component_match_add(&pdev->dev, &match,
> > + component_compare_of, core_node);
> > + }
> > +
> > + return component_master_add_with_match(&pdev->dev, &rocket_drm_ops, match);
> > +}
> > +
> > +static void rocket_remove(struct platform_device *pdev)
> > +{
> > + if (fwnode_device_is_compatible(pdev->dev.fwnode, "rockchip,rk3588-rknn-core-top"))
> > + component_master_del(&pdev->dev, &rocket_drm_ops);
> > + else if (fwnode_device_is_compatible(pdev->dev.fwnode, "rockchip,rk3588-rknn-core"))
> > + component_del(&pdev->dev, &rocket_core_ops);
> > +}
> > +
> > +static const struct of_device_id dt_match[] = {
> > + { .compatible = "rockchip,rk3588-rknn-core-top" },
> > + { .compatible = "rockchip,rk3588-rknn-core" },
> > + {}
> > +};
> > +MODULE_DEVICE_TABLE(of, dt_match);
> > +
> > +static int find_core_for_dev(struct device *dev)
> > +{
> > + struct rocket_device *rdev = dev_get_drvdata(dev);
> > +
> > + for (unsigned int core = 0; core < rdev->num_cores; core++) {
> > + if (dev == rdev->cores[core].dev)
> > + return core;
> > + }
> > +
> > + return -1;
> > +}
> > +
> > +static int rocket_device_runtime_resume(struct device *dev)
> > +{
> > + struct rocket_device *rdev = dev_get_drvdata(dev);
> > + int core = find_core_for_dev(dev);
> > + int err = 0;
> > +
> > + if (core < 0)
> > + return -ENODEV;
> > +
> > + err = clk_bulk_prepare_enable(ARRAY_SIZE(rdev->cores[core].clks), rdev->cores[core].clks);
> > + if (err) {
> > + dev_err(dev, "failed to enable (%d) clocks for core %d\n", err, core);
> > + return err;
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static int rocket_device_runtime_suspend(struct device *dev)
> > +{
> > + struct rocket_device *rdev = dev_get_drvdata(dev);
> > + int core = find_core_for_dev(dev);
> > +
> > + if (core < 0)
> > + return -ENODEV;
> > +
> > + clk_bulk_disable_unprepare(ARRAY_SIZE(rdev->cores[core].clks), rdev->cores[core].clks);
> > +
> > + return 0;
> > +}
> > +
> > +EXPORT_GPL_DEV_PM_OPS(rocket_pm_ops) = {
> > + RUNTIME_PM_OPS(rocket_device_runtime_suspend, rocket_device_runtime_resume, NULL)
> > + SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend, pm_runtime_force_resume)
> > +};
> > +
> > +static struct platform_driver rocket_driver = {
> > + .probe = rocket_probe,
> > + .remove = rocket_remove,
> > + .driver = {
> > + .name = "rocket",
> > + .pm = pm_ptr(&rocket_pm_ops),
> > + .of_match_table = dt_match,
> > + },
> > +};
> > +module_platform_driver(rocket_driver);
> > +
> > +MODULE_LICENSE("GPL");
> > +MODULE_DESCRIPTION("DRM driver for the Rockchip NPU IP");
> > +MODULE_AUTHOR("Tomeu Vizoso");
> > diff --git a/drivers/accel/rocket/rocket_drv.h b/drivers/accel/rocket/rocket_drv.h
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..3219621afb72acdfa915c110e2ec3aacb66bd940
> > --- /dev/null
> > +++ b/drivers/accel/rocket/rocket_drv.h
> > @@ -0,0 +1,15 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/* Copyright 2024-2025 Tomeu Vizoso <tomeu@tomeuvizoso.net> */
> > +
> > +#ifndef __ROCKET_DRV_H__
> > +#define __ROCKET_DRV_H__
> > +
> > +#include "rocket_device.h"
> > +
> > +struct rocket_file_priv {
> > + struct rocket_device *rdev;
> > +
> > + struct iommu_domain *domain;
> > +};
> > +
> > +#endif
> >
^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2025-07-12 8:13 UTC | newest]
Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-06 6:28 [PATCH v7 00/10] New DRM accel driver for Rockchip's RKNN NPU Tomeu Vizoso
2025-06-06 6:28 ` [PATCH v7 02/10] accel/rocket: Add a new driver for Rockchip's NPU Tomeu Vizoso
2025-06-19 8:59 ` Robert Foss
2025-07-11 17:38 ` Andrew Davis
2025-07-12 8:10 ` Tomeu Vizoso
2025-06-06 6:28 ` [PATCH v7 03/10] accel/rocket: Add IOCTL for BO creation Tomeu Vizoso
2025-06-24 10:17 ` Robin Murphy
2025-07-01 19:31 ` Heiko Stübner
2025-07-11 17:40 ` Andrew Davis
2025-06-06 6:28 ` [PATCH v7 04/10] accel/rocket: Add job submission IOCTL Tomeu Vizoso
2025-06-24 13:50 ` Robin Murphy
2025-07-11 16:00 ` Tomeu Vizoso
2025-07-11 16:40 ` Robin Murphy
2025-07-11 16:45 ` Tomeu Vizoso
2025-06-30 20:56 ` Rob Herring
2025-06-06 6:28 ` [PATCH v7 05/10] accel/rocket: Add IOCTLs for synchronizing memory accesses Tomeu Vizoso
2025-06-24 16:52 ` Robin Murphy
2025-06-06 6:28 ` [PATCH v7 06/10] dt-bindings: npu: rockchip,rknn: Add bindings Tomeu Vizoso
2025-06-24 13:27 ` Robin Murphy
2025-07-11 16:02 ` Tomeu Vizoso
2025-06-06 6:28 ` [PATCH v7 07/10] arm64: dts: rockchip: add pd_npu label for RK3588 power domains Tomeu Vizoso
2025-06-06 6:28 ` [PATCH v7 08/10] arm64: dts: rockchip: Add nodes for NPU and its MMU to rk3588-base Tomeu Vizoso
2025-06-06 6:28 ` [PATCH v7 09/10] arm64: dts: rockchip: Enable the NPU on quartzpro64 Tomeu Vizoso
2025-06-06 6:28 ` [PATCH v7 10/10] arm64: dts: rockchip: enable NPU on ROCK 5B Tomeu Vizoso
2025-06-06 9:20 ` Nicolas Frattaroli
2025-06-07 12:24 ` Heiko Stübner
[not found] ` <20250606-6-10-rocket-v7-1-dc16cfe6fe4e@tomeuvizoso.net>
2025-06-19 8:39 ` [PATCH v7 01/10] accel/rocket: Add registers header Robert Foss
2025-06-19 9:01 ` [PATCH v7 00/10] New DRM accel driver for Rockchip's RKNN NPU Heiko Stuebner
2025-06-20 9:28 ` Heiko Stuebner
2025-06-24 5:03 ` Tomeu Vizoso
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).