From: Heiko Stuebner <heiko@sntech.de>
To: Tomeu Vizoso <tomeu@tomeuvizoso.net>,
Oded Gabbay <ogabbay@kernel.org>, MidG971 <midgy971@gmail.com>
Cc: Rob Herring <robh@kernel.org>,
Krzysztof Kozlowski <krzk+dt@kernel.org>,
Conor Dooley <conor+dt@kernel.org>,
dri-devel@lists.freedesktop.org, devicetree@vger.kernel.org,
linux-arm-kernel@lists.infradead.org,
linux-rockchip@lists.infradead.org, linux-kernel@vger.kernel.org,
Midgy BALON <midgy971@gmail.com>
Subject: Re: [PATCH v2 0/4] accel: rocket: Add RK3568 NPU support
Date: Fri, 29 May 2026 20:04:17 +0200 [thread overview]
Message-ID: <9739310.nlapOpYt14@phil> (raw)
In-Reply-To: <20260529155824.3099831-1-midgy971@gmail.com>
Hi,
Am Freitag, 29. Mai 2026, 17:58:20 Mitteleuropäische Sommerzeit schrieb MidG971:
> From: Midgy BALON <midgy971@gmail.com>
>
> This series adds Rockchip RK3568 support to the upstream Rocket accel
> driver (drivers/accel/rocket/), tested on a Radxa ROCK 3B board running
> Linux 6.19-rc5.
>
> The RK3568 carries a single NVDLA-derived NPU core (0.8 TOPS), the same
> IP family as the three-core RK3588 NPU already supported by the driver.
> The hardware register layout (pc/cna/core regions, interrupt, IOMMU) is
> identical; the differences are:
>
> - 32-bit DMA address limit (NPU AXI bus and IOMMU page walker are 32-bit)
> - Requires explicit PVTPLL initialisation via two TF-A SCMI calls before
> the NPU NOC bus can be de-idled
> - Requires explicit PMU writes to power on the NPU domain (because the
> RK3568 power domain RK3568_PD_NPU is always_on so the generic
> pm-domains callback is a no-op) and de-idle the NPU NOC bus
>
> Patch 1 introduces a per-SoC rocket_soc_data abstraction (dma_bits and
> optional noc_init callback) plumbed via of_device_get_match_data(), and
> adds RK3568 SoC support on top of it. The DMA mask for the parent
> DRM facade device is chosen based on the narrowest core present
> (32-bit if any RK3568 core is in the system).
>
> Patch 2 documents the new rk3568-rknn-core compatible and the
> rockchip,pmu phandle that RK3568 requires; the sram-supply property
> becomes conditional (RK3588-only).
>
> Patches 3-4 add the RK3568 NPU and IOMMU nodes to rk356x-base.dtsi and
> enable them on the Radxa ROCK 3B.
>
> Verified on Radxa ROCK 3B (RK3568, 8 GB RAM):
> - /dev/accel/accel0 created at boot
> - dmesg: "Rockchip NPU core 0 version: 0"
> - IOMMU domain attached per open()
> - Job submission path complete: regcmd reaches the NPU's program
> controller, PC processes all 135 regcmd entries, broadcasts to
> sub-units, and advances to BSP-equivalent completion state
> (PC_TASKST=0x11000)
>
> Status of end-to-end inference: NOT YET WORKING. After 12 days of
what about the iommu side, aka the parts mentioned in
https://lore.kernel.org/linux-rockchip/5663593b-2c53-4632-ad2c-db9efa8e9ab2@rock-chips.com/
does that is in some way responsible for the not-yet-working state?
Also in general, we don't want to merge partially working code.
Either things work, or they don't, especially as right now you wouldn't
even know if it's your code that is wrong, or some other part that needs
changes.
Also please reduce those novel-sized (generated) texts.
For the cover-letter alone I'd need a fireplace, an armchair and a
hot cocoa to fully parse it.
Heiko
> investigation comparing rocket's behaviour against the vendor BSP RKNPU
> driver, the NPU's MMIO state at submission time matches BSP byte-for-byte
> (CNA configs, sub-unit OP_ENABLE registers, CBUF_CON0, etc.) but no
> sub-unit transitions to its EXECUTER state and the completion IRQ never
> fires. The kernel driver and DT infrastructure in this series stand on
> their own — the driver loads, IOMMU domain is attached, regcmd reaches
> the NPU, PC state machine matches BSP — but a mesa-side regcmd issue
> (or another piece we have not yet found) blocks the final conv firing.
>
> I am sending this series now because the kernel and DT pieces are
> self-contained, verifiable, and ready for review. A separate RFC on
> mesa-dev will follow with the userspace findings. Detailed investigation
> notes are available on request; relevant highlights for the maintainer:
>
> 1. Mesa rocket userspace (src/gallium/drivers/rocket/) targets RK3588.
> For RK3568, several encoded values need adjustment. Most notably,
> sub-unit OP_ENABLE register offset on RK3568 is 0x_00c, not 0x_008.
> Mesa emits writes at 0x1008/0x2008/0x3008/0x4008/0x5008 — BSP regcmd
> captures show no writes at these offsets across two distinct conv
> shapes (YOLOv5s 6x6/s2 and MobileNet 3x3/s2). BSP writes OP_ENABLE
> at offset 0x_00c with multi-bit values (CMAC=0x1, ACCU=0x0, DPU=0x108,
> DPU_RDMA=0x13f), not bit-0 booleans. This and a handful of other
> shape-independent value differences will be filed as a mesa RFC.
>
> 2. The vendor BSP RKNPU driver writes the userspace task_base_addr to
> PC_DMA_BASE_ADDR (PC offset 0x34); the rocket driver did not. PC's
> TASK_DMA engine reads struct rknpu_task descriptors from there. With
> task_pp_en=1 in TASK_CON and a kernel-allocated descriptor BO,
> PC's task counter state machine advances from "stuck at 0xf000" to
> the BSP completion state. This is the most invasive piece of the
> investigation and is held back for a follow-on patch (not in this
> series); the current series gets the driver to a working /dev/accel/
> node and an attached IOMMU domain, which is the right shape for v2.
>
> 3. The NPU's master AXI port is 32-bit, but dma_alloc_coherent() through
> the dma-iommu framework silently ignores GFP_DMA32 even with a 32-bit
> dma_mask set on the device. When BOs for the NPU are allocated kernel-
> side, __get_free_pages(GFP_DMA32 | __GFP_ZERO, order) + dma_map_single()
> is the working pattern. Not in this series, but might be a useful
> documentation note for other 32-bit AXI accelerators using dma-iommu.
>
> This series builds against current v6.19-rc5 with no checkpatch warnings,
> the dtb builds, and dtbs_check passes. The April v1 series included a
> fifth patch ("Use of_find_matching_node() instead of for_each_of_allnodes")
> which is no longer required — upstream rocket already uses
> for_each_compatible_node() since v6.19-rc5.
>
> Changes since v1 (April 2026, never sent on-list):
> - Rebased to v6.19-rc5
> - Patch 1 absorbed v1 patch 1 (obsolete) and now includes the
> rocket_soc_data abstraction needed to support both RK3568 and
> RK3588 cores in the same driver
> - Cover letter expanded with current investigation status
>
> Assisted by Claude Sonnet/Opus 4.x throughout the investigation. All
> findings empirically verified via BSP register captures and side-by-side
> rocket execution traces on the same board.
>
> Midgy BALON (4):
> accel: rocket: Add support for Rockchip RK3568
> dt-bindings: npu: rockchip,rk3588-rknn-core: Add RK3568 support
> arm64: dts: rockchip: rk356x: Add NPU and its IOMMU
> arm64: dts: rockchip: rk3568-rock-3b: Enable NPU
>
> Documentation/devicetree/bindings/npu/rockchip,rk3588-rknn-core.yaml | 18 ++++++++++++++--
> arch/arm64/boot/dts/rockchip/rk356x-base.dtsi | 31 +++++++++++++++++++++++++++
> arch/arm64/boot/dts/rockchip/rk3568-rock-3b.dts | 9 ++++++++
> drivers/accel/rocket/rocket_core.c | 21 +++++++++++++++++-
> drivers/accel/rocket/rocket_core.h | 18 ++++++++++++++--
> drivers/accel/rocket/rocket_device.c | 23 +++++++++++++++++--
> drivers/accel/rocket/rocket_drv.c | 79 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
> 7 files changed, 192 insertions(+), 7 deletions(-)
>
>
> Midgy BALON (4):
> accel: rocket: Add support for Rockchip RK3568
> dt-bindings: npu: rockchip,rk3588-rknn-core: Add RK3568 support
> arm64: dts: rockchip: rk356x: Add NPU and its IOMMU
> arm64: dts: rockchip: rk3568-rock-3b: Enable NPU
>
> .../npu/rockchip,rk3588-rknn-core.yaml | 18 ++++-
> .../boot/dts/rockchip/rk3568-rock-3b.dts | 9 +++
> arch/arm64/boot/dts/rockchip/rk356x-base.dtsi | 31 ++++++++
> drivers/accel/rocket/rocket_core.c | 18 ++++-
> drivers/accel/rocket/rocket_core.h | 16 +++++
> drivers/accel/rocket/rocket_device.c | 25 ++++++-
> drivers/accel/rocket/rocket_drv.c | 71 ++++++++++++++++++-
> 7 files changed, 182 insertions(+), 6 deletions(-)
>
>
prev parent reply other threads:[~2026-05-29 18:04 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-29 15:58 [PATCH v2 0/4] accel: rocket: Add RK3568 NPU support MidG971
2026-05-29 15:58 ` [PATCH v2 1/4] accel: rocket: Add support for Rockchip RK3568 MidG971
2026-05-29 18:19 ` Heiko Stuebner
2026-05-29 15:58 ` [PATCH v2 2/4] dt-bindings: npu: rockchip,rk3588-rknn-core: Add RK3568 support MidG971
2026-05-29 16:18 ` Krzysztof Kozlowski
2026-05-29 15:58 ` [PATCH v2 3/4] arm64: dts: rockchip: rk356x: Add NPU and its IOMMU MidG971
2026-05-29 15:58 ` [PATCH v2 4/4] arm64: dts: rockchip: rk3568-rock-3b: Enable NPU MidG971
2026-05-29 16:17 ` [PATCH v2 0/4] accel: rocket: Add RK3568 NPU support Krzysztof Kozlowski
2026-05-29 18:04 ` Heiko Stuebner [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9739310.nlapOpYt14@phil \
--to=heiko@sntech.de \
--cc=conor+dt@kernel.org \
--cc=devicetree@vger.kernel.org \
--cc=dri-devel@lists.freedesktop.org \
--cc=krzk+dt@kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rockchip@lists.infradead.org \
--cc=midgy971@gmail.com \
--cc=ogabbay@kernel.org \
--cc=robh@kernel.org \
--cc=tomeu@tomeuvizoso.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox