* [PATCH V1 0/9] AMD AI Engine device driver for Versal
@ 2025-07-02 15:56 Gregory Williams
2025-07-02 15:56 ` [PATCH V1 1/9] firmware: xilinx: Add IOCTL support for the AIE run time operations Gregory Williams
` (8 more replies)
0 siblings, 9 replies; 21+ messages in thread
From: Gregory Williams @ 2025-07-02 15:56 UTC (permalink / raw)
To: ogabbay, michal.simek, robh
Cc: Gregory Williams, dri-devel, devicetree, linux-kernel
Hi,
AI engine is a tile array based acceleration engine provided by AMD.
These engines provide high compute density for vector-based algorithms
and flexible custom compute and data movement. It has core tiles for
compute, memory tiles for local storage, and shim tiles to interface the
FPGA fabric and DDR. More details about the architecture can be found
here: https://www.amd.com/en/products/adaptive-socs-and-fpgas/technologies/ai-engine.html
This patchset introduces a driver for the AMD AI Engine in AMD Versal
devices. The driver manages the AI Engine array and allows users to
request an AI Engine partition (group of AI Engine tiles) for their
application.
Note, two Versal firmware patches are included as they contain
functionality for the AI Engines.
Thanks,
Gregory Williams
Gregory Williams (7):
dt-bindings: power: Add AMD Versal power domain bindings
dt-bindings: soc: xilinx: Add AI engine DT binding
accel: amd-ai-engine: Add AMD AI Engine device driver
accel: amd-ai-engine: Add support to enable/disable clocks and change
clock frequency
accel: amd-ai-engine: Add support for AIEML devices
accel: amd-ai-engine: Create tile memory information
accel: amd-ai-engine: Adds AI Engine reset operations
Ronak Jain (2):
firmware: xilinx: Add IOCTL support for the AIE run time operations
firmware: xilinx: Add IOCTL support to query QoS
.../bindings/soc/xilinx/xlnx,ai-engine.yaml | 151 +++++++
MAINTAINERS | 9 +
drivers/accel/Kconfig | 1 +
drivers/accel/Makefile | 1 +
drivers/accel/amd-ai-engine/Kconfig | 15 +
drivers/accel/amd-ai-engine/Makefile | 15 +
drivers/accel/amd-ai-engine/ai-engine-aie.c | 423 ++++++++++++++++++
drivers/accel/amd-ai-engine/ai-engine-aieml.c | 362 +++++++++++++++
.../accel/amd-ai-engine/ai-engine-aperture.c | 195 ++++++++
drivers/accel/amd-ai-engine/ai-engine-clock.c | 326 ++++++++++++++
drivers/accel/amd-ai-engine/ai-engine-dev.c | 230 ++++++++++
.../accel/amd-ai-engine/ai-engine-internal.h | 360 +++++++++++++++
drivers/accel/amd-ai-engine/ai-engine-part.c | 167 +++++++
drivers/accel/amd-ai-engine/ai-engine-res.c | 184 ++++++++
drivers/accel/amd-ai-engine/ai-engine-reset.c | 300 +++++++++++++
drivers/firmware/xilinx/zynqmp.c | 46 ++
include/dt-bindings/power/xlnx-versal-power.h | 55 +++
include/linux/amd-ai-engine.h | 73 +++
include/linux/firmware/xlnx-zynqmp.h | 36 ++
19 files changed, 2949 insertions(+)
create mode 100644 Documentation/devicetree/bindings/soc/xilinx/xlnx,ai-engine.yaml
create mode 100644 drivers/accel/amd-ai-engine/Kconfig
create mode 100644 drivers/accel/amd-ai-engine/Makefile
create mode 100644 drivers/accel/amd-ai-engine/ai-engine-aie.c
create mode 100644 drivers/accel/amd-ai-engine/ai-engine-aieml.c
create mode 100644 drivers/accel/amd-ai-engine/ai-engine-aperture.c
create mode 100644 drivers/accel/amd-ai-engine/ai-engine-clock.c
create mode 100644 drivers/accel/amd-ai-engine/ai-engine-dev.c
create mode 100644 drivers/accel/amd-ai-engine/ai-engine-internal.h
create mode 100644 drivers/accel/amd-ai-engine/ai-engine-part.c
create mode 100644 drivers/accel/amd-ai-engine/ai-engine-res.c
create mode 100644 drivers/accel/amd-ai-engine/ai-engine-reset.c
create mode 100644 include/dt-bindings/power/xlnx-versal-power.h
create mode 100644 include/linux/amd-ai-engine.h
--
2.34.1
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH V1 1/9] firmware: xilinx: Add IOCTL support for the AIE run time operations
2025-07-02 15:56 [PATCH V1 0/9] AMD AI Engine device driver for Versal Gregory Williams
@ 2025-07-02 15:56 ` Gregory Williams
2025-07-03 6:50 ` Krzysztof Kozlowski
2025-07-02 15:56 ` [PATCH V1 2/9] firmware: xilinx: Add IOCTL support to query QoS Gregory Williams
` (7 subsequent siblings)
8 siblings, 1 reply; 21+ messages in thread
From: Gregory Williams @ 2025-07-02 15:56 UTC (permalink / raw)
To: ogabbay, michal.simek, robh
Cc: Ronak Jain, dri-devel, devicetree, linux-kernel, gregory.williams
From: Ronak Jain <ronak.jain@amd.com>
Add IOCTL support for the AIE run time operations listed below
- Column Reset
- Shim Reset
- Enabling of column clock buffer
- Zeroisation of Program and data memories
- Disabling of column clock buffer
- Enabling AXI-MM error event
- Set L2 controller NPI INTR
Signed-off-by: Ronak Jain <ronak.jain@amd.com>
---
drivers/firmware/xilinx/zynqmp.c | 20 +++++++++++++++++++
include/linux/firmware/xlnx-zynqmp.h | 30 ++++++++++++++++++++++++++++
2 files changed, 50 insertions(+)
diff --git a/drivers/firmware/xilinx/zynqmp.c b/drivers/firmware/xilinx/zynqmp.c
index 7356e860e65c..d9fdfd232d11 100644
--- a/drivers/firmware/xilinx/zynqmp.c
+++ b/drivers/firmware/xilinx/zynqmp.c
@@ -1039,6 +1039,26 @@ int zynqmp_pm_set_boot_health_status(u32 value)
return zynqmp_pm_invoke_fn(PM_IOCTL, NULL, 3, 0, IOCTL_SET_BOOT_HEALTH_STATUS, value);
}
+/**
+ * zynqmp_pm_aie_operation - AI engine run time operations
+ * @node: AI engine node id
+ * @start_col: Starting column of AI partition
+ * @num_col: Number of column in AI partition
+ * @operation: ORed value of operations
+ *
+ * Return: Returns status, either success or error+reason
+ */
+int zynqmp_pm_aie_operation(u32 node, u16 start_col, u16 num_col, u32 operation)
+{
+ u32 partition;
+
+ partition = num_col;
+ partition = ((partition << 16U) | start_col);
+ return zynqmp_pm_invoke_fn(PM_IOCTL, NULL, 4, node, IOCTL_AIE_OPS,
+ partition, operation);
+}
+EXPORT_SYMBOL_GPL(zynqmp_pm_aie_operation);
+
/**
* zynqmp_pm_reset_assert - Request setting of reset (1 - assert, 0 - release)
* @reset: Reset to be configured
diff --git a/include/linux/firmware/xlnx-zynqmp.h b/include/linux/firmware/xlnx-zynqmp.h
index 6d4dbc196b93..1d30366f741b 100644
--- a/include/linux/firmware/xlnx-zynqmp.h
+++ b/include/linux/firmware/xlnx-zynqmp.h
@@ -136,6 +136,16 @@
#define SD_ITAPDLY 0xFF180314
#define SD_OTAPDLYSEL 0xFF180318
+/**
+ * XPM_VERSAL_EVENT_ERROR_MASK_AIE_CR: Error event mask for ME Correctable Error.
+ */
+#define XPM_VERSAL_EVENT_ERROR_MASK_AIE_CR BIT(16)
+
+/**
+ * XPM_VERSAL_EVENT_ERROR_MASK_AIE_NCR: Error event mask for ME Non-Correctable Error.
+ */
+#define XPM_VERSAL_EVENT_ERROR_MASK_AIE_NCR BIT(17)
+
/**
* XPM_EVENT_ERROR_MASK_DDRMC_CR: Error event mask for DDRMC MC Correctable ECC Error.
*/
@@ -155,6 +165,17 @@ enum pm_module_id {
TF_A_MODULE_ID = 0xa,
};
+/* AIE Operation */
+#define XILINX_AIE_OPS_COL_RST BIT(0)
+#define XILINX_AIE_OPS_SHIM_RST BIT(1)
+#define XILINX_AIE_OPS_ENB_COL_CLK_BUFF BIT(2)
+#define XILINX_AIE_OPS_ZEROISATION BIT(3)
+#define XILINX_AIE_OPS_DIS_COL_CLK_BUFF BIT(4)
+#define XILINX_AIE_OPS_ENB_AXI_MM_ERR_EVENT BIT(5)
+#define XILINX_AIE_OPS_SET_L2_CTRL_NPI_INTR BIT(6)
+#define XILINX_AIE_OPS_DATA_MEM_ZEROIZATION BIT(8U)
+#define XILINX_AIE_OPS_MEM_TILE_ZEROIZATION BIT(9U)
+
enum pm_api_cb_id {
PM_INIT_SUSPEND_CB = 30,
PM_ACKNOWLEDGE_CB = 31,
@@ -244,6 +265,8 @@ enum pm_ioctl_id {
/* Dynamic SD/GEM configuration */
IOCTL_SET_SD_CONFIG = 30,
IOCTL_SET_GEM_CONFIG = 31,
+ /* AIE/AIEML Operations */
+ IOCTL_AIE_OPS = 33,
/* IOCTL to get default/current QoS */
IOCTL_GET_QOS = 34,
};
@@ -633,6 +656,7 @@ int zynqmp_pm_set_tcm_config(u32 node_id, enum rpu_tcm_comb tcm_mode);
int zynqmp_pm_set_sd_config(u32 node, enum pm_sd_config_type config, u32 value);
int zynqmp_pm_set_gem_config(u32 node, enum pm_gem_config_type config,
u32 value);
+int zynqmp_pm_aie_operation(u32 node, u16 start_col, u16 num_col, u32 operation);
#else
static inline int zynqmp_pm_get_api_version(u32 *version)
{
@@ -951,6 +975,12 @@ static inline int zynqmp_pm_set_gem_config(u32 node,
return -ENODEV;
}
+static inline int zynqmp_pm_aie_operation(u32 node, u16 start_col,
+ u16 num_col, u32 operation)
+{
+ return -ENODEV;
+}
+
#endif
#endif /* __FIRMWARE_ZYNQMP_H__ */
--
2.34.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH V1 2/9] firmware: xilinx: Add IOCTL support to query QoS
2025-07-02 15:56 [PATCH V1 0/9] AMD AI Engine device driver for Versal Gregory Williams
2025-07-02 15:56 ` [PATCH V1 1/9] firmware: xilinx: Add IOCTL support for the AIE run time operations Gregory Williams
@ 2025-07-02 15:56 ` Gregory Williams
2025-07-02 15:56 ` [PATCH V1 3/9] dt-bindings: power: Add AMD Versal power domain bindings Gregory Williams
` (6 subsequent siblings)
8 siblings, 0 replies; 21+ messages in thread
From: Gregory Williams @ 2025-07-02 15:56 UTC (permalink / raw)
To: ogabbay, michal.simek, robh
Cc: Ronak Jain, dri-devel, devicetree, linux-kernel, gregory.williams,
Amanda Baze
From: Ronak Jain <ronak.jain@amd.com>
Add support to query the QoS value on a device by using the PM IOCTL
EEMI API.
The caller only passes the node ID of the given device node and IOCTL
API will return the default QoS value as well as the current QoS
value.
Signed-off-by: Ronak Jain <ronak.jain@amd.com>
Signed-off-by: Amanda Baze <amanda.baze@amd.com>
---
drivers/firmware/xilinx/zynqmp.c | 26 ++++++++++++++++++++++++++
include/linux/firmware/xlnx-zynqmp.h | 6 ++++++
2 files changed, 32 insertions(+)
diff --git a/drivers/firmware/xilinx/zynqmp.c b/drivers/firmware/xilinx/zynqmp.c
index d9fdfd232d11..52dae076d2cb 100644
--- a/drivers/firmware/xilinx/zynqmp.c
+++ b/drivers/firmware/xilinx/zynqmp.c
@@ -1636,6 +1636,32 @@ int zynqmp_pm_get_feature_config(enum pm_feature_config_id id,
return zynqmp_pm_invoke_fn(PM_IOCTL, payload, 3, 0, IOCTL_GET_FEATURE_CONFIG, id);
}
+/**
+ * zynqmp_pm_get_qos - PM call to query default and current QoS of the node
+ * @node: Node Id of the device
+ * @def_qos: Default QoS value
+ * @qos: Current QoS value
+ *
+ * Return: Returns 0 on success and the default and current QoS registers in
+ * @def_qos and @qos or error value on failure
+ */
+int zynqmp_pm_get_qos(u32 node, u32 *const def_qos, u32 *const qos)
+{
+ u32 ret_payload[PAYLOAD_ARG_CNT];
+ int ret;
+
+ if (!def_qos || !qos)
+ return -EINVAL;
+
+ ret = zynqmp_pm_invoke_fn(PM_IOCTL, ret_payload, 2, node, IOCTL_GET_QOS);
+
+ *def_qos = ret_payload[1];
+ *qos = ret_payload[2];
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(zynqmp_pm_get_qos);
+
/**
* zynqmp_pm_set_sd_config - PM call to set value of SD config registers
* @node: SD node ID
diff --git a/include/linux/firmware/xlnx-zynqmp.h b/include/linux/firmware/xlnx-zynqmp.h
index 1d30366f741b..b2ca960d3bbe 100644
--- a/include/linux/firmware/xlnx-zynqmp.h
+++ b/include/linux/firmware/xlnx-zynqmp.h
@@ -657,6 +657,7 @@ int zynqmp_pm_set_sd_config(u32 node, enum pm_sd_config_type config, u32 value);
int zynqmp_pm_set_gem_config(u32 node, enum pm_gem_config_type config,
u32 value);
int zynqmp_pm_aie_operation(u32 node, u16 start_col, u16 num_col, u32 operation);
+int zynqmp_pm_get_qos(u32 node, u32 *const def_qos, u32 *const qos);
#else
static inline int zynqmp_pm_get_api_version(u32 *version)
{
@@ -981,6 +982,11 @@ static inline int zynqmp_pm_aie_operation(u32 node, u16 start_col,
return -ENODEV;
}
+static inline int zynqmp_pm_get_qos(u32 node, u32 *const def_qos, u32 *const qos)
+{
+ return -ENODEV;
+}
+
#endif
#endif /* __FIRMWARE_ZYNQMP_H__ */
--
2.34.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH V1 3/9] dt-bindings: power: Add AMD Versal power domain bindings
2025-07-02 15:56 [PATCH V1 0/9] AMD AI Engine device driver for Versal Gregory Williams
2025-07-02 15:56 ` [PATCH V1 1/9] firmware: xilinx: Add IOCTL support for the AIE run time operations Gregory Williams
2025-07-02 15:56 ` [PATCH V1 2/9] firmware: xilinx: Add IOCTL support to query QoS Gregory Williams
@ 2025-07-02 15:56 ` Gregory Williams
2025-07-03 6:43 ` Krzysztof Kozlowski
2025-07-02 15:56 ` [PATCH V1 4/9] dt-bindings: soc: xilinx: Add AI engine DT binding Gregory Williams
` (5 subsequent siblings)
8 siblings, 1 reply; 21+ messages in thread
From: Gregory Williams @ 2025-07-02 15:56 UTC (permalink / raw)
To: ogabbay, michal.simek, robh
Cc: Gregory Williams, dri-devel, devicetree, linux-kernel
Define Versal power domain value macros.
Signed-off-by: Gregory Williams <gregory.williams@amd.com>
---
include/dt-bindings/power/xlnx-versal-power.h | 55 +++++++++++++++++++
1 file changed, 55 insertions(+)
create mode 100644 include/dt-bindings/power/xlnx-versal-power.h
diff --git a/include/dt-bindings/power/xlnx-versal-power.h b/include/dt-bindings/power/xlnx-versal-power.h
new file mode 100644
index 000000000000..effbc70e5a12
--- /dev/null
+++ b/include/dt-bindings/power/xlnx-versal-power.h
@@ -0,0 +1,55 @@
+/* SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause) */
+/*
+ * Copyright (C) 2019 - 2021 Xilinx, Inc.
+ * Copyright (C) 2024 Advanced Micro Devices, Inc.
+ */
+
+#ifndef _DT_BINDINGS_VERSAL_POWER_H
+#define _DT_BINDINGS_VERSAL_POWER_H
+
+#define PM_DEV_RPU0_0 (0x18110005U)
+#define PM_DEV_RPU0_1 (0x18110006U)
+#define PM_DEV_OCM_0 (0x18314007U)
+#define PM_DEV_OCM_1 (0x18314008U)
+#define PM_DEV_OCM_2 (0x18314009U)
+#define PM_DEV_OCM_3 (0x1831400aU)
+#define PM_DEV_TCM_0_A (0x1831800bU)
+#define PM_DEV_TCM_0_B (0x1831800cU)
+#define PM_DEV_TCM_1_A (0x1831800dU)
+#define PM_DEV_TCM_1_B (0x1831800eU)
+#define PM_DEV_USB_0 (0x18224018U)
+#define PM_DEV_GEM_0 (0x18224019U)
+#define PM_DEV_GEM_1 (0x1822401aU)
+#define PM_DEV_SPI_0 (0x1822401bU)
+#define PM_DEV_SPI_1 (0x1822401cU)
+#define PM_DEV_I2C_0 (0x1822401dU)
+#define PM_DEV_I2C_1 (0x1822401eU)
+#define PM_DEV_CAN_FD_0 (0x1822401fU)
+#define PM_DEV_CAN_FD_1 (0x18224020U)
+#define PM_DEV_UART_0 (0x18224021U)
+#define PM_DEV_UART_1 (0x18224022U)
+#define PM_DEV_GPIO (0x18224023U)
+#define PM_DEV_TTC_0 (0x18224024U)
+#define PM_DEV_TTC_1 (0x18224025U)
+#define PM_DEV_TTC_2 (0x18224026U)
+#define PM_DEV_TTC_3 (0x18224027U)
+#define PM_DEV_SWDT_LPD (0x18224028U)
+#define PM_DEV_SWDT_FPD (0x18224029U)
+#define PM_DEV_OSPI (0x1822402aU)
+#define PM_DEV_QSPI (0x1822402bU)
+#define PM_DEV_GPIO_PMC (0x1822402cU)
+#define PM_DEV_I2C_PMC (0x1822402dU)
+#define PM_DEV_SDIO_0 (0x1822402eU)
+#define PM_DEV_SDIO_1 (0x1822402fU)
+#define PM_DEV_RTC (0x18224034U)
+#define PM_DEV_ADMA_0 (0x18224035U)
+#define PM_DEV_ADMA_1 (0x18224036U)
+#define PM_DEV_ADMA_2 (0x18224037U)
+#define PM_DEV_ADMA_3 (0x18224038U)
+#define PM_DEV_ADMA_4 (0x18224039U)
+#define PM_DEV_ADMA_5 (0x1822403aU)
+#define PM_DEV_ADMA_6 (0x1822403bU)
+#define PM_DEV_ADMA_7 (0x1822403cU)
+#define PM_DEV_AI (0x18224072U)
+
+#endif
--
2.34.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH V1 4/9] dt-bindings: soc: xilinx: Add AI engine DT binding
2025-07-02 15:56 [PATCH V1 0/9] AMD AI Engine device driver for Versal Gregory Williams
` (2 preceding siblings ...)
2025-07-02 15:56 ` [PATCH V1 3/9] dt-bindings: power: Add AMD Versal power domain bindings Gregory Williams
@ 2025-07-02 15:56 ` Gregory Williams
2025-07-03 6:48 ` Krzysztof Kozlowski
2025-07-02 15:56 ` [PATCH V1 5/9] accel: amd-ai-engine: Add AMD AI Engine device driver Gregory Williams
` (4 subsequent siblings)
8 siblings, 1 reply; 21+ messages in thread
From: Gregory Williams @ 2025-07-02 15:56 UTC (permalink / raw)
To: ogabbay, michal.simek, robh
Cc: Gregory Williams, dri-devel, devicetree, linux-kernel
In the device tree, there will be device node for the AI engine device,
and device nodes for the statically configured AI engine apertures.
Apertures are an isolated set of columns with in the AI engine device
with their own address space and interrupt.
Signed-off-by: Gregory Williams <gregory.williams@amd.com>
---
.../bindings/soc/xilinx/xlnx,ai-engine.yaml | 151 ++++++++++++++++++
1 file changed, 151 insertions(+)
create mode 100644 Documentation/devicetree/bindings/soc/xilinx/xlnx,ai-engine.yaml
diff --git a/Documentation/devicetree/bindings/soc/xilinx/xlnx,ai-engine.yaml b/Documentation/devicetree/bindings/soc/xilinx/xlnx,ai-engine.yaml
new file mode 100644
index 000000000000..7d9a36c56366
--- /dev/null
+++ b/Documentation/devicetree/bindings/soc/xilinx/xlnx,ai-engine.yaml
@@ -0,0 +1,151 @@
+# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/soc/xilinx/xlnx,ai-engine.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: AMD AI Engine
+
+maintainers:
+ - Gregory Williams <gregory.williams@amd.com>
+
+description:
+ The AMD AI Engine is a tile processor with many cores (up to 400) that
+ can run in parallel. The data routing between cores is configured through
+ internal switches, and shim tiles interface with external interconnect, such
+ as memory or PL. One AI engine device can have multiple apertures, each
+ has its own address space and interrupt. At runtime application can create
+ multiple partitions within an aperture which are groups of columns of AI
+ engine tiles. Each AI engine partition is the minimum resetable unit for an
+ AI engine application.
+
+properties:
+ compatible:
+ const: xlnx,ai-engine-v2.0
+
+ reg:
+ maxItems: 1
+
+ '#address-cells':
+ const: 2
+
+ '#size-cells':
+ const: 2
+
+ power-domains:
+ description:
+ Platform management node id used to request power management services
+ from the firmware driver.
+
+ xlnx,aie-gen:
+ $ref: /schemas/types.yaml#/definitions/uint8
+ description:
+ Hardware generation of AI engine device. E.g. the current values supported
+ are 1 (AIE) and 2 (AIEML).
+
+ xlnx,shim-rows:
+ $ref: /schemas/types.yaml#/definitions/uint8-array
+ description:
+ start row and the number of rows of SHIM tiles of the AI engine device
+
+ xlnx,core-rows:
+ $ref: /schemas/types.yaml#/definitions/uint8-array
+ description:
+ start row and the number of rows of core tiles of the AI engine device
+
+ xlnx,mem-rows:
+ $ref: /schemas/types.yaml#/definitions/uint8-array
+ description:
+ start row and the number of rows of memory tiles of the AI engine device
+
+required:
+ - compatible
+ - reg
+ - power-domains
+ - xlnx,aie-gen
+ - xlnx,shim-rows
+ - xlnx,core-rows
+ - xlnx,mem-rows
+
+patternProperties:
+ "^aperture@[0-9]+$":
+ type: object
+ description:
+ AI engine aperture which is a group of column based tiles of the
+ AI engine device. Each AI engine apertures isolated from the
+ other AI engine apertures. An AI engine aperture is defined by
+ AMD/Xilinx platform design tools.
+
+ properties:
+ compatible:
+ const: xlnx,ai-engine-aperture
+
+ reg:
+ description:
+ Physical base address and length of the aperture registers.
+ The AI engine address space assigned to Linux is defined by
+ Xilinx/AMD platform design tool.
+
+ interrupts:
+ maxItems: 3
+
+ interrupt-names:
+ items:
+ - const: interrupt1
+ - const: interrupt2
+ - const: interrupt3
+
+ xlnx,columns:
+ $ref: /schemas/types.yaml#/definitions/uint32-array
+ description:
+ It describes the location of the aperture. It specifies the start
+ column and the number of columns. E.g. an aperture starts from
+ column 0 and there are 50 columns, it will be presented as <0 50>.
+
+ xlnx,node-id:
+ $ref: /schemas/types.yaml#/definitions/uint32
+ description:
+ AI engine aperture node ID, which is defined by AMD/Xilinx platform
+ design tool to identify the AI engine aperture in the firmware.
+
+ required:
+ - compatible
+ - reg
+ - xlnx,columns
+ - xlnx,node-id
+
+ additionalProperties: false
+
+additionalProperties: false
+
+examples:
+ - |
+ #include <dt-bindings/power/xlnx-versal-power.h>
+ bus {
+ #address-cells = <2>;
+ #size-cells = <2>;
+ ai_engine: ai-engine@20000000000 {
+ compatible = "xlnx,ai-engine-v2.0";
+ reg = <0x200 0x00 0x01 0x00>;
+ #address-cells = <2>;
+ #size-cells = <2>;
+ power-domains = <&versal_firmware PM_DEV_AI>;
+ xlnx,aie-gen = /bits/ 8 <0x1>;
+ xlnx,core-rows = /bits/ 8 <1 8>;
+ xlnx,mem-rows = /bits/ 8 <0 0>;
+ xlnx,shim-rows = /bits/ 8 <0 1>;
+
+ aperture0: aperture@200000000000 {
+ /* 50 columns and 8 core tile rows + 1 SHIM row */
+ compatible = "xlnx,ai-engine-aperture";
+ reg = <0x200 0x0 0x1 0x0>;
+ interrupts = <0x0 0x94 0x4>,
+ <0x0 0x95 0x4>,
+ <0x0 0x96 0x4>;
+ interrupt-names = "interrupt1", "interrupt2", "interrupt3";
+ interrupt-parent = <&gic>;
+ xlnx,columns = <0 50>;
+ xlnx,node-id = <1>;
+ };
+ };
+ };
--
2.34.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH V1 5/9] accel: amd-ai-engine: Add AMD AI Engine device driver
2025-07-02 15:56 [PATCH V1 0/9] AMD AI Engine device driver for Versal Gregory Williams
` (3 preceding siblings ...)
2025-07-02 15:56 ` [PATCH V1 4/9] dt-bindings: soc: xilinx: Add AI engine DT binding Gregory Williams
@ 2025-07-02 15:56 ` Gregory Williams
2025-07-02 15:56 ` [PATCH V1 6/9] accel: amd-ai-engine: Add support to enable/disable clocks and change clock frequency Gregory Williams
` (3 subsequent siblings)
8 siblings, 0 replies; 21+ messages in thread
From: Gregory Williams @ 2025-07-02 15:56 UTC (permalink / raw)
To: ogabbay, michal.simek, robh
Cc: Gregory Williams, dri-devel, devicetree, linux-kernel
Add support for AMD AI Engine on AMD Versal devices. AMD AI Engine is an
array based accelerator for applications like beamforming and machine
learning inference [1].
AI Engine device handle can have multiple apertures (groups of AI engine
columns) that are isolated from one another. At runtime, applications can
request for multiple partitions within an aperture.
The driver architecture:
+----------------------------------------+
| AIE Device |
| +----------------+ +----------------+ |
| | aperture_0 | | aperture_X | |
| |----------------| |----------------| |
| | partition_0 | | partition_0 | |
| | ... | | ... | |
| | partition_X | | partition_X | |
| +----------------+ +----------------+ |
+----------------------------------------+
The driver provides the following functionality:
- AI Engine device probe and aperture probe
- Partition request and release
- Setting partition frequency
- Partition setup and teardown
[1] https://www.amd.com/en/products/adaptive-socs-and-fpgas/technologies/ai-engine.html
Signed-off-by: Gregory Williams <gregory.williams@amd.com>
---
MAINTAINERS | 9 +
drivers/accel/Kconfig | 1 +
drivers/accel/Makefile | 1 +
drivers/accel/amd-ai-engine/Kconfig | 15 ++
drivers/accel/amd-ai-engine/Makefile | 12 +
drivers/accel/amd-ai-engine/ai-engine-aie.c | 46 ++++
.../accel/amd-ai-engine/ai-engine-aperture.c | 195 +++++++++++++++
drivers/accel/amd-ai-engine/ai-engine-dev.c | 228 +++++++++++++++++
.../accel/amd-ai-engine/ai-engine-internal.h | 230 ++++++++++++++++++
drivers/accel/amd-ai-engine/ai-engine-part.c | 65 +++++
drivers/accel/amd-ai-engine/ai-engine-res.c | 114 +++++++++
include/linux/amd-ai-engine.h | 46 ++++
12 files changed, 962 insertions(+)
create mode 100644 drivers/accel/amd-ai-engine/Kconfig
create mode 100644 drivers/accel/amd-ai-engine/Makefile
create mode 100644 drivers/accel/amd-ai-engine/ai-engine-aie.c
create mode 100644 drivers/accel/amd-ai-engine/ai-engine-aperture.c
create mode 100644 drivers/accel/amd-ai-engine/ai-engine-dev.c
create mode 100644 drivers/accel/amd-ai-engine/ai-engine-internal.h
create mode 100644 drivers/accel/amd-ai-engine/ai-engine-part.c
create mode 100644 drivers/accel/amd-ai-engine/ai-engine-res.c
create mode 100644 include/linux/amd-ai-engine.h
diff --git a/MAINTAINERS b/MAINTAINERS
index f69a86b9610a..cf03943d2d7e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1007,6 +1007,15 @@ L: dmaengine@vger.kernel.org
S: Supported
F: drivers/dma/amd/ae4dma/
+AMD AI ENGINE DRIVER
+M: Gregory Williams <gregory.williams@amd.com>
+L: dri-devel@lists.freedesktop.org
+S: Maintained
+T: git https://gitlab.freedesktop.org/drm/misc/kernel.git
+F: Documentation/devicetree/bindings/soc/xilinx/xlnx,ai-engine.yaml
+F: drivers/accel/amd-ai-engine/
+F: include/linux/amd-ai-engine.h
+
AMD AXI W1 DRIVER
M: Kris Chaplin <kris.chaplin@amd.com>
R: Thomas Delev <thomas.delev@amd.com>
diff --git a/drivers/accel/Kconfig b/drivers/accel/Kconfig
index 5b9490367a39..22a7a7f19d9a 100644
--- a/drivers/accel/Kconfig
+++ b/drivers/accel/Kconfig
@@ -24,6 +24,7 @@ menuconfig DRM_ACCEL
different device files, called accel/accel* (in /dev, sysfs
and debugfs).
+source "drivers/accel/amd-ai-engine/Kconfig"
source "drivers/accel/amdxdna/Kconfig"
source "drivers/accel/habanalabs/Kconfig"
source "drivers/accel/ivpu/Kconfig"
diff --git a/drivers/accel/Makefile b/drivers/accel/Makefile
index a301fb6089d4..a79c97a39ca1 100644
--- a/drivers/accel/Makefile
+++ b/drivers/accel/Makefile
@@ -1,5 +1,6 @@
# SPDX-License-Identifier: GPL-2.0-only
+obj-$(CONFIG_DRM_ACCEL_AMDAIE) += amd-ai-engine/
obj-$(CONFIG_DRM_ACCEL_AMDXDNA) += amdxdna/
obj-$(CONFIG_DRM_ACCEL_HABANALABS) += habanalabs/
obj-$(CONFIG_DRM_ACCEL_IVPU) += ivpu/
diff --git a/drivers/accel/amd-ai-engine/Kconfig b/drivers/accel/amd-ai-engine/Kconfig
new file mode 100644
index 000000000000..c82b2ab58f71
--- /dev/null
+++ b/drivers/accel/amd-ai-engine/Kconfig
@@ -0,0 +1,15 @@
+# SPDX-License-Identifier: GPL-2.0-only
+
+config DRM_ACCEL_AMDAIE
+ tristate "AMD Versal AI Engine"
+ depends on ARM64 || COMPILE_TEST
+ depends on ZYNQMP_FIRMWARE
+ depends on DRM_ACCEL
+ help
+ This option enables support for the AMD AI engine driver.
+ One AMD AI engine device can have multiple partitions (groups of
+ AI engine tiles). AMD AI engine device driver instance manages
+ AI engine partitions. Applications access its partitions through
+ AI engine partition device instance.
+
+ If unsure, say N
diff --git a/drivers/accel/amd-ai-engine/Makefile b/drivers/accel/amd-ai-engine/Makefile
new file mode 100644
index 000000000000..ed635a2f2602
--- /dev/null
+++ b/drivers/accel/amd-ai-engine/Makefile
@@ -0,0 +1,12 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# Makefile for AMD AI engine device driver
+
+obj-$(CONFIG_DRM_ACCEL_AMDAIE) += amd-aie.o
+
+amd-aie-$(CONFIG_DRM_ACCEL_AMDAIE) := \
+ ai-engine-aie.o \
+ ai-engine-aperture.o \
+ ai-engine-dev.o \
+ ai-engine-part.o \
+ ai-engine-res.o
diff --git a/drivers/accel/amd-ai-engine/ai-engine-aie.c b/drivers/accel/amd-ai-engine/ai-engine-aie.c
new file mode 100644
index 000000000000..7b4be8d2c5eb
--- /dev/null
+++ b/drivers/accel/amd-ai-engine/ai-engine-aie.c
@@ -0,0 +1,46 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * AMD AI Engine driver AIE device specific implementation
+ *
+ * Copyright(C) 2025 Advanced Micro Devices, Inc. All rights reserved.
+ */
+
+#include <linux/amd-ai-engine.h>
+
+#include "ai-engine-internal.h"
+
+#define AIE_ARRAY_SHIFT 30U
+#define AIE_COL_SHIFT 23U
+#define AIE_ROW_SHIFT 18U
+
+static u32 aie_get_tile_type(struct aie_device *adev, struct aie_location *loc)
+{
+ if (loc->row)
+ return AIE_TILE_TYPE_TILE;
+ /* SHIM row */
+ if ((loc->col % 4) < 2)
+ return AIE_TILE_TYPE_SHIMPL;
+
+ return AIE_TILE_TYPE_SHIMNOC;
+}
+
+static const struct aie_tile_operations aie_ops = {
+ .get_tile_type = aie_get_tile_type,
+};
+
+/**
+ * aie_device_init() - Initialize AI engine device struct AIE specific
+ * @adev: AI engine device
+ *
+ * This function initialize the AI engine device structure device version
+ * specific elements such as register addressing related array shift,
+ * column shift, and row shift; AIE device specific device operations, device
+ * columns resource.
+ */
+void aie_device_init(struct aie_device *adev)
+{
+ adev->array_shift = AIE_ARRAY_SHIFT;
+ adev->col_shift = AIE_COL_SHIFT;
+ adev->row_shift = AIE_ROW_SHIFT;
+ adev->ops = &aie_ops;
+}
diff --git a/drivers/accel/amd-ai-engine/ai-engine-aperture.c b/drivers/accel/amd-ai-engine/ai-engine-aperture.c
new file mode 100644
index 000000000000..80346a0a18dc
--- /dev/null
+++ b/drivers/accel/amd-ai-engine/ai-engine-aperture.c
@@ -0,0 +1,195 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * AMD AI Engine aperture driver
+ *
+ * Copyright(C) 2025 Advanced Micro Devices, Inc. All rights reserved.
+ */
+
+#include <linux/amd-ai-engine.h>
+#include <linux/device.h>
+#include <linux/firmware/xlnx-zynqmp.h>
+#include <linux/list.h>
+#include <linux/mutex.h>
+#include <linux/of.h>
+#include <linux/of_platform.h>
+#include <linux/platform_device.h>
+
+#include "ai-engine-internal.h"
+
+/**
+ * aie_aperture_request_part() - request AI engine partition
+ * @aperture: AI engine aperture
+ * @req: AI engine partition request arguments
+ *
+ * Return: partition pointer for success, and error pointer for failure
+ */
+struct aie_partition *
+aie_aperture_request_part(struct aie_aperture *aperture,
+ struct aie_partition_req *req)
+{
+ u8 start_col, num_col, end_col;
+ struct aie_partition *apart;
+ int ret;
+
+ start_col = req->start_col;
+ num_col = req->num_col;
+ if (num_col == 0) {
+ start_col = aperture->range.start.col;
+ num_col = aperture->range.size.col;
+ }
+
+ end_col = start_col + num_col - 1;
+ if (start_col < aperture->range.start.col ||
+ end_col >= (aperture->range.start.col + aperture->range.size.col))
+ return ERR_PTR(-ERANGE);
+
+ mutex_lock(&aperture->mlock);
+ ret = aie_resource_get_region(&aperture->cols_res,
+ start_col - aperture->range.start.col,
+ num_col);
+ if (ret != (u32)start_col - aperture->range.start.col) {
+ /* Column range returned is not what user requested */
+ if (ret > 0)
+ aie_resource_put_region(&aperture->cols_res, ret, num_col);
+ mutex_unlock(&aperture->mlock);
+ return ERR_PTR(-EBUSY);
+ }
+
+ apart = aie_part_create(aperture, start_col, num_col);
+ if (IS_ERR(apart)) {
+ aie_resource_put_region(&aperture->cols_res,
+ start_col - aperture->range.start.col,
+ num_col);
+ mutex_unlock(&aperture->mlock);
+ return ERR_PTR(-EINVAL);
+ }
+
+ list_add_tail(&apart->node, &aperture->partitions);
+ mutex_unlock(&aperture->mlock);
+ return apart;
+}
+
+int aie_aperture_probe(struct platform_device *pdev)
+{
+ struct aie_device *adev = dev_get_drvdata(pdev->dev.parent);
+ struct aie_aperture *laperture, *aperture;
+ struct aie_range *range;
+ u32 regs[2];
+ int ret;
+
+ aperture = devm_kzalloc(&pdev->dev, sizeof(*aperture), GFP_KERNEL);
+ if (!aperture)
+ return -ENOMEM;
+
+ platform_set_drvdata(pdev, aperture);
+ INIT_LIST_HEAD(&aperture->partitions);
+ mutex_init(&aperture->mlock);
+
+ aperture->dev = &pdev->dev;
+ range = &aperture->range;
+ ret = of_property_read_u32_array(pdev->dev.of_node, "xlnx,columns",
+ regs, ARRAY_SIZE(regs));
+ if (ret < 0) {
+ dev_err(&pdev->dev,
+ "probe %pOF failed, no tiles range information.",
+ pdev->dev.of_node);
+ return ret;
+ }
+ range->start.col = regs[0] & aligned_byte_mask(1);
+ range->size.col = regs[1] & aligned_byte_mask(1);
+ range->start.row = 0;
+ range->size.row = adev->ttype_attr[AIE_TILE_TYPE_SHIMPL].num_rows +
+ adev->ttype_attr[AIE_TILE_TYPE_MEMORY].num_rows +
+ adev->ttype_attr[AIE_TILE_TYPE_TILE].num_rows;
+
+ ret = of_property_read_u32_index(pdev->dev.of_node, "xlnx,node-id", 0,
+ &aperture->node_id);
+ if (ret < 0) {
+ dev_err(&pdev->dev,
+ "probe %pOF failed, no aperture node id.",
+ pdev->dev.of_node);
+ return ret;
+ }
+
+ /* Validate the aperture */
+ list_for_each_entry(laperture, &adev->apertures, node) {
+ u32 start_col, end_col, check_start_col, check_end_col;
+
+ if (laperture->node_id == aperture->node_id) {
+ dev_err(&pdev->dev,
+ "probe failed, aperture %u exists.",
+ aperture->node_id);
+ return -EINVAL;
+ }
+
+ range = &aperture->range;
+ start_col = range->start.col;
+ end_col = start_col + range->size.col - 1;
+ check_start_col = laperture->range.start.col;
+ check_end_col = check_start_col + laperture->range.size.col - 1;
+ if ((start_col >= check_start_col &&
+ start_col <= check_end_col) ||
+ (end_col >= check_start_col &&
+ end_col <= check_end_col)) {
+ dev_err(&pdev->dev,
+ "probe failed, aperture %x overlaps other aperture.",
+ aperture->node_id);
+ return -EINVAL;
+ }
+ }
+
+ /*
+ * Initialize columns resource map to remember which columns have been
+ * assigned. Used for partition management.
+ */
+ ret = aie_resource_initialize(&aperture->cols_res,
+ aperture->range.size.col);
+ if (ret) {
+ dev_err(&pdev->dev, "failed to initialize columns resource.");
+ return ret;
+ }
+
+ aperture->base = devm_ioremap_resource(&pdev->dev, pdev->resource);
+ if (!aperture->base) {
+ ret = -ENOMEM;
+ goto aie_res_uninit;
+ }
+
+ ret = zynqmp_pm_request_node(aperture->node_id,
+ ZYNQMP_PM_CAPABILITY_ACCESS, 0,
+ ZYNQMP_PM_REQUEST_ACK_BLOCKING);
+ if (ret < 0) {
+ dev_err(&pdev->dev, "Unable to request node %d", aperture->node_id);
+ goto aie_res_uninit;
+ }
+
+ dev_set_name(&pdev->dev, "aieaperture_%u_%u", aperture->range.start.col,
+ aperture->range.size.col);
+ dev_info(&pdev->dev,
+ "AI engine aperture %s, id 0x%x, cols(%u, %u) aie_tile_rows(%u, %u) memory_tile_rows(%u, %u) is probed successfully.",
+ dev_name(&pdev->dev), aperture->node_id,
+ aperture->range.start.col, aperture->range.size.col,
+ adev->ttype_attr[AIE_TILE_TYPE_TILE].start_row,
+ adev->ttype_attr[AIE_TILE_TYPE_TILE].num_rows,
+ adev->ttype_attr[AIE_TILE_TYPE_MEMORY].start_row,
+ adev->ttype_attr[AIE_TILE_TYPE_MEMORY].num_rows);
+
+ aperture->adev = adev;
+ mutex_lock(&adev->mlock);
+ list_add_tail(&aperture->node, &adev->apertures);
+ mutex_unlock(&adev->mlock);
+
+ return ret;
+
+aie_res_uninit:
+ aie_resource_uninitialize(&aperture->cols_res);
+ return ret;
+}
+
+void aie_aperture_remove(struct platform_device *pdev)
+{
+ struct aie_aperture *aperture = platform_get_drvdata(pdev);
+
+ aie_resource_uninitialize(&aperture->cols_res);
+ zynqmp_pm_release_node(aperture->node_id);
+}
diff --git a/drivers/accel/amd-ai-engine/ai-engine-dev.c b/drivers/accel/amd-ai-engine/ai-engine-dev.c
new file mode 100644
index 000000000000..ba28257cbd04
--- /dev/null
+++ b/drivers/accel/amd-ai-engine/ai-engine-dev.c
@@ -0,0 +1,228 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * AMD AI Engine device driver
+ *
+ * Copyright(C) 2025 Advanced Micro Devices, Inc. All rights reserved.
+ */
+
+#include <linux/amd-ai-engine.h>
+#include <linux/clk.h>
+#include <linux/device.h>
+#include <linux/io.h>
+#include <linux/list.h>
+#include <linux/mutex.h>
+#include <linux/of.h>
+#include <linux/of_platform.h>
+#include <linux/platform_device.h>
+
+#include "ai-engine-internal.h"
+
+/**
+ * aie_partition_request() - Request an AI engine partition
+ * @dev: AI engine device pointer
+ * @req: AI engine partition request arguments
+ *
+ * Return: pointer to the AI engine partition, error pointer value for failure.
+ *
+ * This function searches through the aie device aperture list to request a
+ * partition given start column and number of columns in @req. If the partition
+ * can be found, it will try to request it. User can only use the AI engine
+ * partition after it is successfully requested.
+ */
+void *aie_partition_request(struct device *dev, struct aie_partition_req *req)
+{
+ struct aie_device *adev = dev_get_drvdata(dev);
+ struct aie_partition *apart = NULL;
+ struct aie_aperture *laperture;
+
+ if (!req)
+ return ERR_PTR(-EINVAL);
+
+ list_for_each_entry(laperture, &adev->apertures, node) {
+ apart = aie_aperture_request_part(laperture, req);
+ if (PTR_ERR(apart) == -ERANGE) {
+ continue;
+ } else if (PTR_ERR(apart) == -EBUSY) {
+ /* if requesting full aperture, try next aperture in list */
+ if (req->num_col == 0)
+ continue;
+ dev_err(laperture->dev,
+ "failed to request partition (%u,%u), already in use.",
+ req->start_col, req->num_col);
+ return ERR_PTR(PTR_ERR(apart));
+ } else if (IS_ERR(apart)) {
+ dev_err(laperture->dev,
+ "failed to create partition (%u, %u).",
+ req->start_col, req->num_col);
+ return ERR_PTR(PTR_ERR(apart));
+ }
+ break;
+ }
+
+ if (IS_ERR_OR_NULL(apart)) {
+ dev_err(adev->dev,
+ "failed to request partition (%u, %u): invalid partition.",
+ req->start_col, req->num_col);
+ return ERR_PTR(-EINVAL);
+ }
+
+ dev_info(adev->dev, "Partition (%u, %u) created successfully.",
+ apart->range.start.col, apart->range.size.col);
+ return apart;
+}
+EXPORT_SYMBOL_GPL(aie_partition_request);
+
+/**
+ * aie_partition_release() - Decrease refcount of the AI engine partition
+ * @apart: AI engine partition device pointer
+ */
+void aie_partition_release(void *apart)
+{
+ aie_part_release((struct aie_partition *)apart);
+}
+EXPORT_SYMBOL_GPL(aie_partition_release);
+
+static const struct of_device_id amd_aie_aperture_of_match[] = {
+ { .compatible = "xlnx,ai-engine-aperture", },
+ { /* end of table */ },
+};
+MODULE_DEVICE_TABLE(of, amd_aie_aperture_of_match);
+
+static struct platform_driver amd_aie_aperture_driver = {
+ .probe = aie_aperture_probe,
+ .remove = aie_aperture_remove,
+ .driver = {
+ .name = "amd-aie-aperture",
+ .of_match_table = amd_aie_aperture_of_match,
+ },
+};
+
+static int amd_ai_engine_probe(struct platform_device *pdev)
+{
+ struct aie_device *adev;
+ u32 pm_reg[2];
+ u8 regs_u8[2];
+ u8 aie_gen;
+ int ret;
+
+ adev = devm_kzalloc(&pdev->dev, sizeof(*adev), GFP_KERNEL);
+ if (!adev)
+ return -ENOMEM;
+
+ platform_set_drvdata(pdev, adev);
+ INIT_LIST_HEAD(&adev->apertures);
+ mutex_init(&adev->mlock);
+
+ ret = of_property_read_u8_array(pdev->dev.of_node, "xlnx,shim-rows",
+ regs_u8, ARRAY_SIZE(regs_u8));
+ if (ret < 0) {
+ dev_err(&pdev->dev,
+ "no SHIM rows information in device tree");
+ return ret;
+ }
+ adev->ttype_attr[AIE_TILE_TYPE_SHIMPL].start_row = regs_u8[0];
+ adev->ttype_attr[AIE_TILE_TYPE_SHIMPL].num_rows = regs_u8[1];
+ adev->ttype_attr[AIE_TILE_TYPE_SHIMNOC].start_row = regs_u8[0];
+ adev->ttype_attr[AIE_TILE_TYPE_SHIMNOC].num_rows = regs_u8[1];
+
+ ret = of_property_read_u8_array(pdev->dev.of_node, "xlnx,core-rows",
+ regs_u8, ARRAY_SIZE(regs_u8));
+ if (ret < 0) {
+ dev_err(&pdev->dev,
+ "Failed to read core rows information");
+ return ret;
+ }
+ adev->ttype_attr[AIE_TILE_TYPE_TILE].start_row = regs_u8[0];
+ adev->ttype_attr[AIE_TILE_TYPE_TILE].num_rows = regs_u8[1];
+
+ ret = of_property_read_u8_array(pdev->dev.of_node, "xlnx,mem-rows",
+ regs_u8, ARRAY_SIZE(regs_u8));
+ if (ret < 0) {
+ dev_err(&pdev->dev,
+ "Failed to read mem rows information");
+ return ret;
+ }
+ adev->ttype_attr[AIE_TILE_TYPE_MEMORY].start_row = regs_u8[0];
+ adev->ttype_attr[AIE_TILE_TYPE_MEMORY].num_rows = regs_u8[1];
+
+ ret = of_property_read_u8(pdev->dev.of_node, "xlnx,aie-gen", &aie_gen);
+ if (ret < 0) {
+ dev_warn(&pdev->dev,
+ "no aie dev generation information in device tree");
+ return ret;
+ }
+ adev->dev_gen = aie_gen;
+ if (aie_gen == AIE_DEVICE_GEN_AIE) {
+ aie_device_init(adev);
+ } else {
+ dev_err(&pdev->dev, "Invalid device generation");
+ return -EINVAL;
+ }
+
+ /*
+ * AI Engine platform management node ID is required for requesting
+ * services from firmware driver.
+ */
+ ret = of_property_read_u32_array(pdev->dev.of_node, "power-domains",
+ pm_reg, ARRAY_SIZE(pm_reg));
+ if (ret < 0) {
+ dev_err(&pdev->dev,
+ "Failed to read power manangement information");
+ return ret;
+ }
+ adev->pm_node_id = pm_reg[1];
+
+ adev->clk = devm_clk_get(&pdev->dev, "aclk0");
+ if (IS_ERR(adev->clk)) {
+ dev_err(&pdev->dev, "Failed to get device clock.");
+ return PTR_ERR(adev->clk);
+ }
+
+ adev->dev = &pdev->dev;
+ dev_info(&pdev->dev,
+ "AMD AI Engine device %s probed. Device generation: %u. Clock frequency: %ldHz.",
+ dev_name(&pdev->dev), aie_gen, clk_get_rate(adev->clk));
+ return of_platform_populate(pdev->dev.of_node, NULL, NULL, &pdev->dev);
+}
+
+static const struct of_device_id amd_ai_engine_of_match[] = {
+ { .compatible = "xlnx,ai-engine-v2.0", },
+ { /* end of table */ },
+};
+MODULE_DEVICE_TABLE(of, amd_ai_engine_of_match);
+
+static struct platform_driver amd_ai_engine_driver = {
+ .probe = amd_ai_engine_probe,
+ .driver = {
+ .name = "amd-ai-engine",
+ .of_match_table = amd_ai_engine_of_match,
+ },
+};
+
+static int __init amd_ai_engine_init(void)
+{
+ int ret;
+
+ ret = platform_driver_register(&amd_ai_engine_driver);
+ if (ret)
+ return ret;
+
+ ret = platform_driver_register(&amd_aie_aperture_driver);
+ if (ret) {
+ platform_driver_unregister(&amd_ai_engine_driver);
+ return ret;
+ }
+
+ return 0;
+}
+module_init(amd_ai_engine_init);
+
+static void __exit amd_ai_engine_exit(void)
+{
+ platform_driver_unregister(&amd_aie_aperture_driver);
+ platform_driver_unregister(&amd_ai_engine_driver);
+}
+module_exit(amd_ai_engine_exit);
+
+MODULE_AUTHOR("Advanced Micro Devices, Inc.");
+MODULE_LICENSE("GPL");
diff --git a/drivers/accel/amd-ai-engine/ai-engine-internal.h b/drivers/accel/amd-ai-engine/ai-engine-internal.h
new file mode 100644
index 000000000000..4f1d8ace2977
--- /dev/null
+++ b/drivers/accel/amd-ai-engine/ai-engine-internal.h
@@ -0,0 +1,230 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * AMD AI Engine driver internal header
+ *
+ * Copyright(C) 2025 Advanced Micro Devices, Inc. All rights reserved.
+ */
+
+#ifndef AIE_INTERNAL_H
+#define AIE_INTERNAL_H
+
+#include <linux/amd-ai-engine.h>
+#include <linux/clk.h>
+#include <linux/device.h>
+#include <linux/io.h>
+#include <linux/list.h>
+#include <linux/mutex.h>
+#include <linux/of.h>
+#include <linux/of_platform.h>
+#include <linux/platform_device.h>
+
+#define AIE_DEVICE_GEN_AIE 1U
+
+#define KBYTES(n) ((n) * SZ_1K)
+
+/*
+ * Macros for AI engine tile type bitmasks
+ */
+enum aie_tile_type {
+ AIE_TILE_TYPE_TILE,
+ AIE_TILE_TYPE_SHIMPL,
+ AIE_TILE_TYPE_SHIMNOC,
+ AIE_TILE_TYPE_MEMORY,
+ AIE_TILE_TYPE_MAX
+};
+
+#define AIE_TILE_TYPE_MASK_TILE BIT(AIE_TILE_TYPE_TILE)
+#define AIE_TILE_TYPE_MASK_SHIMPL BIT(AIE_TILE_TYPE_SHIMPL)
+/* SHIM NOC tile includes SHIM PL and SHIM NOC modules */
+#define AIE_TILE_TYPE_MASK_SHIMNOC BIT(AIE_TILE_TYPE_SHIMNOC)
+#define AIE_TILE_TYPE_MASK_MEMORY BIT(AIE_TILE_TYPE_MEMORY)
+
+/**
+ * struct aie_tile_regs - contiguous range of AI engine register
+ * within an AI engine tile
+ * @soff: start offset of the range
+ * @eoff: end offset of the range
+ * @attribute: registers attribute. It uses AIE_REGS_ATTR_* macros defined
+ * above.
+ */
+struct aie_tile_regs {
+ size_t soff;
+ size_t eoff;
+ u32 attribute;
+};
+
+/**
+ * struct aie_single_reg_field - AI engine single field register attribute
+ * @mask: field mask
+ * @regoff: register offset of the field
+ */
+struct aie_single_reg_field {
+ u32 mask;
+ u32 regoff;
+};
+
+struct aie_device;
+struct aie_partition;
+struct aie_aperture;
+
+/**
+ * struct aie_tile_operations - AI engine device operations
+ * @get_tile_type: get type of tile based on tile operation
+ * Different AI engine device version has its own device
+ * operation.
+ */
+struct aie_tile_operations {
+ u32 (*get_tile_type)(struct aie_device *adev, struct aie_location *loc);
+};
+
+/**
+ * struct aie_resource - AI engine resource structure
+ * @bitmap: resource bitmap
+ * @total: total number of resource
+ */
+struct aie_resource {
+ unsigned long *bitmap;
+ u32 total;
+};
+
+/**
+ * struct aie_range - AIE range information
+ * @start: start tile location
+ * @size: size of the range, number of columns and rows
+ */
+struct aie_range {
+ struct aie_location start;
+ struct aie_location size;
+};
+
+/**
+ * struct aie_tile_attr - AI engine device tile type attributes
+ * @start_row: start row
+ * @num_rows: number of rows
+ * @num_mods: number of modules of this tile type
+ * @mods: array of module types of this tile type
+ */
+struct aie_tile_attr {
+ u8 start_row;
+ u8 num_rows;
+ u8 num_mods;
+ const enum aie_module_type *mods;
+};
+
+/**
+ * struct aie_device - AI engine device structure
+ * @apertures: list of apertures
+ * @dev: device pointer for the AI engine device
+ * @mlock: protection for AI engine device operations
+ * @clk: AI enigne device clock
+ * @ops: tile operations
+ * @array_shift: array address shift
+ * @col_shift: column address shift
+ * @row_shift: row address shift
+ * @dev_gen: aie hardware device generation
+ * @pm_node_id: AI Engine platform management node ID
+ * @ttype_attr: tile type attributes
+ */
+struct aie_device {
+ struct list_head apertures;
+ struct device *dev;
+ struct mutex mlock; /* protection for AI engine apertures */
+ struct clk *clk;
+ const struct aie_tile_operations *ops;
+ u32 array_shift;
+ u32 col_shift;
+ u32 row_shift;
+ u32 dev_gen;
+ u32 pm_node_id;
+ struct aie_tile_attr ttype_attr[AIE_TILE_TYPE_MAX];
+};
+
+/**
+ * struct aie_aperture - AI engine aperture structure
+ * @node: list node
+ * @partitions: list of partitions of this aperture
+ * @dev: device pointer for the AI engine aperture device
+ * @adev: pointer to AI device instance
+ * @mlock: protection for AI engine aperture operations
+ * @base: AI engine aperture base virtual address
+ * @cols_res: AI engine columns resources to indicate
+ * while columns are occupied by partitions.
+ * @node_id: AI engine aperture node id which is to identify
+ * the aperture in the system in firmware
+ * @range: range of aperture
+ */
+struct aie_aperture {
+ struct list_head node;
+ struct list_head partitions;
+ struct device *dev;
+ struct aie_device *adev;
+ struct mutex mlock; /* protection for AI engine aperture operations */
+ void __iomem *base;
+ struct aie_resource cols_res;
+ u32 node_id;
+ struct aie_range range;
+};
+
+/**
+ * struct aie_partition - AI engine partition structure
+ * @node: list node
+ * @aperture: pointer to AI engine aperture
+ * @adev: pointer to AI device instance
+ * @range: range of partition
+ * @mlock: protection for AI engine partition operations
+ */
+struct aie_partition {
+ struct list_head node;
+ struct aie_aperture *aperture;
+ struct aie_device *adev;
+ struct aie_range range;
+ struct mutex mlock; /* protection for AI engine partition operations */
+};
+
+#define dev_to_aiedev(_dev) container_of((_dev), struct aie_device, dev)
+#define dev_to_aieaperture(_dev) container_of((_dev), struct aie_aperture, dev)
+#define dev_to_aiepart(_dev) container_of((_dev), struct aie_partition, dev)
+
+#define aie_col_mask(adev) ({ \
+ struct aie_device *_adev = (adev); \
+ GENMASK_ULL(_adev->array_shift - 1, _adev->col_shift); \
+ })
+
+#define aie_row_mask(adev) ({ \
+ struct aie_device *_adev = (adev); \
+ GENMASK_ULL(_adev->col_shift - 1, _adev->row_shift); \
+ })
+
+#define aie_tile_reg_mask(adev) ({ \
+ struct aie_device *_adev = (adev); \
+ GENMASK_ULL(_adev->row_shift - 1, 0); \
+ })
+
+/*
+ * Need to define field get, as AI engine shift mask is not constant.
+ * Cannot use FIELD_GET()
+ */
+#define aie_tile_reg_field_get(mask, shift, regoff) ( \
+ ((regoff) & (mask)) >> (shift))
+
+#define aie_cal_tile_reg(adev, regoff) ( \
+ aie_tile_reg_field_get(aie_tile_reg_mask(adev), 0, regoff))
+
+void aie_device_init(struct aie_device *adev);
+struct aie_partition *
+aie_aperture_request_part(struct aie_aperture *aperture,
+ struct aie_partition_req *req);
+int aie_aperture_probe(struct platform_device *pdev);
+void aie_aperture_remove(struct platform_device *pdev);
+struct aie_partition *aie_part_create(struct aie_aperture *aperture,
+ u8 start_col, u8 num_col);
+void aie_part_release(struct aie_partition *apart);
+int aie_resource_initialize(struct aie_resource *res, int count);
+void aie_resource_uninitialize(struct aie_resource *res);
+int aie_resource_check_region(struct aie_resource *res, u32 start,
+ u32 count);
+int aie_resource_get_region(struct aie_resource *res, u32 start,
+ u32 count);
+void aie_resource_put_region(struct aie_resource *res, int start, u32 count);
+
+#endif /* AIE_INTERNAL_H */
diff --git a/drivers/accel/amd-ai-engine/ai-engine-part.c b/drivers/accel/amd-ai-engine/ai-engine-part.c
new file mode 100644
index 000000000000..3675a72971d5
--- /dev/null
+++ b/drivers/accel/amd-ai-engine/ai-engine-part.c
@@ -0,0 +1,65 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * AMD AI Engine partition driver
+ *
+ * Copyright(C) 2025 Advanced Micro Devices, Inc. All rights reserved.
+ */
+
+#include <linux/amd-ai-engine.h>
+#include <linux/device.h>
+#include <linux/list.h>
+#include <linux/mutex.h>
+
+#include "ai-engine-internal.h"
+
+/**
+ * aie_part_release() - release an AI engine partition instance
+ * @apart: AI engine partition device
+ */
+void aie_part_release(struct aie_partition *apart)
+{
+ struct aie_aperture *aperture = apart->aperture;
+
+ mutex_lock(&aperture->mlock);
+
+ aie_resource_put_region(&aperture->cols_res,
+ apart->range.start.col -
+ aperture->range.start.col,
+ apart->range.size.col);
+ list_del(&apart->node);
+ devm_kfree(aperture->dev, apart);
+ mutex_unlock(&aperture->mlock);
+}
+
+/**
+ * aie_part_create() - create AI engine partition instance
+ * @aperture: AI engine aperture
+ * @start_col: start column of AI engine partition
+ * @num_col: number of columns of AI engine partition
+ *
+ * Return: created AI engine partition pointer for success, and PTR_ERR
+ * for failure.
+ *
+ * This function creates an AI engine partition instance.
+ * It creates AI engine partition, the AI engine partition device and
+ * the AI engine partition character device.
+ */
+struct aie_partition *aie_part_create(struct aie_aperture *aperture,
+ u8 start_col, u8 num_col)
+{
+ struct aie_partition *apart;
+
+ apart = devm_kzalloc(aperture->dev, sizeof(*apart), GFP_KERNEL);
+ if (!apart)
+ return ERR_PTR(-ENOMEM);
+
+ apart->aperture = aperture;
+ apart->adev = aperture->adev;
+ mutex_init(&apart->mlock);
+ apart->range.start.col = start_col;
+ apart->range.size.col = num_col;
+ apart->range.start.row = aperture->range.start.row;
+ apart->range.size.row = aperture->range.size.row;
+
+ return apart;
+}
diff --git a/drivers/accel/amd-ai-engine/ai-engine-res.c b/drivers/accel/amd-ai-engine/ai-engine-res.c
new file mode 100644
index 000000000000..6bbd7273686e
--- /dev/null
+++ b/drivers/accel/amd-ai-engine/ai-engine-res.c
@@ -0,0 +1,114 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * AMD AI Engine resource implementation
+ *
+ * Copyright(C) 2025 Advanced Micro Devices, Inc. All rights reserved.
+ */
+
+#include <linux/bitmap.h>
+
+#include "ai-engine-internal.h"
+
+/**
+ * aie_resource_initialize() - initialize AI engine resource
+ * @res: pointer to AI engine resource
+ * @count: total number of element of this resource
+ *
+ * Return: 0 for success, negative value for failure.
+ *
+ * This function will initialize the data structure for the
+ * resource.
+ */
+int aie_resource_initialize(struct aie_resource *res, int count)
+{
+ res->bitmap = bitmap_zalloc(count, GFP_KERNEL);
+ if (!res->bitmap)
+ return -ENOMEM;
+ res->total = count;
+
+ return 0;
+}
+
+/**
+ * aie_resource_uninitialize() - uninitialize AI engine resource
+ * @res: pointer to AI engine resource
+ *
+ * This function will release the AI engine resource data members.
+ */
+void aie_resource_uninitialize(struct aie_resource *res)
+{
+ res->total = 0;
+ if (res->bitmap)
+ bitmap_free(res->bitmap);
+}
+
+/**
+ * aie_resource_check_region() - check availability of requested resource
+ * @res: pointer to AI engine resource to check
+ * @start: start index of the required resource, it will only be used if
+ * @continuous is 1. It will check the available resource starting from
+ * @start
+ * @count: number of requested element
+ *
+ * Return: start resource id if the requested number of resources are available
+ * It will return negative value of errors.
+ *
+ * This function will check the availability. It will return start resource id
+ * if the requested number of resources are available.
+ */
+int aie_resource_check_region(struct aie_resource *res,
+ u32 start, u32 count)
+{
+ unsigned long id;
+
+ if (!res || !res->bitmap || !count)
+ return -EINVAL;
+ id = bitmap_find_next_zero_area(res->bitmap, res->total, start,
+ count, 0);
+ if (id >= res->total)
+ return -ERANGE;
+
+ return (int)id;
+}
+
+/**
+ * aie_resource_get_region() - get requested AI engine resource
+ * @res: pointer to AI engine resource to check
+ * @count: number of requested element
+ * @start: start index of the required resource
+ *
+ * Return: start resource id for success, and negative value for failure.
+ *
+ * This function check if the requested AI engine resource is available.
+ * If it is available, mark it used and return the start resource id.
+ */
+int aie_resource_get_region(struct aie_resource *res, u32 start, u32 count)
+{
+ unsigned long off;
+
+ if (!res || !res->bitmap || !count)
+ return -EINVAL;
+ off = bitmap_find_next_zero_area(res->bitmap, res->total, start,
+ count, 0);
+ if (off >= res->total)
+ return -ERANGE;
+
+ bitmap_set(res->bitmap, off, count);
+
+ return (int)off;
+}
+
+/**
+ * aie_resource_put_region() - release requested AI engine resource
+ * @res: pointer to AI engine resource to check
+ * @start: start index of the resource to release
+ * @count: number of elements to release
+ *
+ * This function release the requested AI engine resource.
+ */
+void aie_resource_put_region(struct aie_resource *res, int start, u32 count)
+{
+ if (!res || !count)
+ return;
+ bitmap_clear(res->bitmap, start, count);
+}
diff --git a/include/linux/amd-ai-engine.h b/include/linux/amd-ai-engine.h
new file mode 100644
index 000000000000..2a13362edd0c
--- /dev/null
+++ b/include/linux/amd-ai-engine.h
@@ -0,0 +1,46 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * amd-ai-engine.h - AMD AI engine external interface
+ *
+ * Copyright(C) 2025 Advanced Micro Devices, Inc. All rights reserved.
+ */
+
+#ifndef _AMD_AI_ENGINE_H_
+#define _AMD_AI_ENGINE_H_
+
+#include <linux/device.h>
+#include <linux/list.h>
+#include <linux/mutex.h>
+
+/**
+ * struct aie_partition_req - AIE request partition arguments
+ * @start_col: start column of the partition
+ * @num_col: number of columns in a partition
+ * @uid: image identifier loaded on the AI engine partition
+ * @meta_data: meta data to indicate which resources used by application.
+ * @flag: used for application to indicate particular driver requirements
+ * application wants to have for the partition. e.g. do not clean
+ * resource when closing the partition.
+ */
+struct aie_partition_req {
+ u8 start_col;
+ u8 num_col;
+ u32 uid;
+ u64 meta_data;
+ u32 flag;
+};
+
+/**
+ * struct aie_location - AIE location information
+ * @col: column id
+ * @row: row id
+ */
+struct aie_location {
+ u32 col;
+ u32 row;
+};
+
+void *aie_partition_request(struct device *dev, struct aie_partition_req *req);
+void aie_partition_release(void *apart);
+
+#endif
--
2.34.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH V1 6/9] accel: amd-ai-engine: Add support to enable/disable clocks and change clock frequency
2025-07-02 15:56 [PATCH V1 0/9] AMD AI Engine device driver for Versal Gregory Williams
` (4 preceding siblings ...)
2025-07-02 15:56 ` [PATCH V1 5/9] accel: amd-ai-engine: Add AMD AI Engine device driver Gregory Williams
@ 2025-07-02 15:56 ` Gregory Williams
2025-07-02 15:56 ` [PATCH V1 7/9] accel: amd-ai-engine: Add support for AIEML devices Gregory Williams
` (2 subsequent siblings)
8 siblings, 0 replies; 21+ messages in thread
From: Gregory Williams @ 2025-07-02 15:56 UTC (permalink / raw)
To: ogabbay, michal.simek, robh
Cc: Gregory Williams, dri-devel, devicetree, linux-kernel
Adds support for getting current and setting AI engine array frequency.
Frequency values are validated by the driver then passed to the Versal
firmware driver. Support is also added to request and release tiles.
Requesting tiles will enable the clocks of the tiles requested,
releasing tiles will disable the tile clocks.
Signed-off-by: Gregory Williams <gregory.williams@amd.com>
---
drivers/accel/amd-ai-engine/Makefile | 1 +
drivers/accel/amd-ai-engine/ai-engine-aie.c | 234 ++++++++++++
drivers/accel/amd-ai-engine/ai-engine-clock.c | 338 ++++++++++++++++++
.../accel/amd-ai-engine/ai-engine-internal.h | 46 +++
drivers/accel/amd-ai-engine/ai-engine-part.c | 28 +-
drivers/accel/amd-ai-engine/ai-engine-res.c | 54 +++
include/linux/amd-ai-engine.h | 3 +
7 files changed, 703 insertions(+), 1 deletion(-)
create mode 100644 drivers/accel/amd-ai-engine/ai-engine-clock.c
diff --git a/drivers/accel/amd-ai-engine/Makefile b/drivers/accel/amd-ai-engine/Makefile
index ed635a2f2602..9a830f7432d2 100644
--- a/drivers/accel/amd-ai-engine/Makefile
+++ b/drivers/accel/amd-ai-engine/Makefile
@@ -7,6 +7,7 @@ obj-$(CONFIG_DRM_ACCEL_AMDAIE) += amd-aie.o
amd-aie-$(CONFIG_DRM_ACCEL_AMDAIE) := \
ai-engine-aie.o \
ai-engine-aperture.o \
+ ai-engine-clock.o \
ai-engine-dev.o \
ai-engine-part.o \
ai-engine-res.o
diff --git a/drivers/accel/amd-ai-engine/ai-engine-aie.c b/drivers/accel/amd-ai-engine/ai-engine-aie.c
index 7b4be8d2c5eb..5e3cb44a16c8 100644
--- a/drivers/accel/amd-ai-engine/ai-engine-aie.c
+++ b/drivers/accel/amd-ai-engine/ai-engine-aie.c
@@ -6,6 +6,9 @@
*/
#include <linux/amd-ai-engine.h>
+#include <linux/bitmap.h>
+#include <linux/device.h>
+#include <linux/io.h>
#include "ai-engine-internal.h"
@@ -13,6 +16,20 @@
#define AIE_COL_SHIFT 23U
#define AIE_ROW_SHIFT 18U
+/*
+ * Register offsets
+ */
+#define AIE_SHIMPL_CLKCNTR_REGOFF 0x00036040U
+#define AIE_TILE_CORE_CLKCNTR_REGOFF 0x00036040U
+
+/*
+ * Register masks
+ */
+#define AIE_SHIMPL_CLKCNTR_COLBUF_MASK BIT(0)
+#define AIE_SHIMPL_CLKCNTR_NEXTCLK_MASK BIT(1)
+#define AIE_TILE_CLKCNTR_COLBUF_MASK BIT(0)
+#define AIE_TILE_CLKCNTR_NEXTCLK_MASK BIT(1)
+
static u32 aie_get_tile_type(struct aie_device *adev, struct aie_location *loc)
{
if (loc->row)
@@ -24,8 +41,225 @@ static u32 aie_get_tile_type(struct aie_device *adev, struct aie_location *loc)
return AIE_TILE_TYPE_SHIMNOC;
}
+/* aie_scan_part_clocks() - scan clocks of a partition
+ * @apart: AI engine partition
+ *
+ * Return: 0 for success, negative value for errors.
+ */
+static int aie_scan_part_clocks(struct aie_partition *apart)
+{
+ struct aie_aperture *aperture = apart->aperture;
+ struct aie_range *range = &apart->range;
+ struct aie_device *adev = apart->adev;
+ struct aie_location loc;
+ int ret;
+
+ /* Clear the bitmap of cores and memories clock state */
+ aie_resource_put_region(&apart->cores_clk_state, 0,
+ apart->cores_clk_state.total);
+
+ for (loc.col = range->start.col;
+ loc.col < range->start.col + range->size.col;
+ loc.col++) {
+ for (loc.row = range->start.row;
+ loc.row < range->start.row + range->size.row - 1;
+ loc.row++) {
+ void __iomem *va;
+ u32 val, nbitpos;
+
+ /*
+ * Reading registers of the current tile to see the next
+ * tile is clock gated.
+ */
+ nbitpos = (loc.col - range->start.col) *
+ (range->size.row - 1) + loc.row;
+
+ if (aie_get_tile_type(adev, &loc) !=
+ AIE_TILE_TYPE_TILE) {
+ /* Checks shim tile for next core tile */
+ va = aperture->base +
+ aie_cal_regoff(adev, loc,
+ AIE_SHIMPL_CLKCNTR_REGOFF);
+ val = readl(va);
+
+ /*
+ * check if the clock buffer and the next clock
+ * tile is set, if one of them is not set, the
+ * tiles of the column are clock gated.
+ */
+ if (!(val & AIE_SHIMPL_CLKCNTR_COLBUF_MASK) ||
+ !(val & AIE_SHIMPL_CLKCNTR_NEXTCLK_MASK))
+ break;
+
+ /* Set next tile in the row clock state on */
+ ret = aie_resource_set(&apart->cores_clk_state,
+ nbitpos, 1);
+ if (ret) {
+ dev_err(aperture->dev,
+ "failed to set clock state bitmap.");
+ return ret;
+ }
+ continue;
+ }
+
+ /* Checks core tile for next tile */
+ va = aperture->base +
+ aie_cal_regoff(adev, loc,
+ AIE_TILE_CORE_CLKCNTR_REGOFF);
+ val = readl(va);
+
+ /*
+ * If the next tile is gated, skip the rest of the
+ * column.
+ */
+ if (!(val & AIE_TILE_CLKCNTR_NEXTCLK_MASK))
+ break;
+
+ ret = aie_resource_set(&apart->cores_clk_state,
+ nbitpos, 1);
+ if (ret) {
+ dev_err(aperture->dev,
+ "failed to set clock state bitmap.");
+ return ret;
+ }
+ }
+ }
+
+ /*
+ * Set the tiles in use bitmap.
+ * In case of scanning, tiles which are powered on are considered as
+ * tiles in use.
+ */
+ bitmap_copy(apart->tiles_inuse.bitmap, apart->cores_clk_state.bitmap,
+ apart->tiles_inuse.total);
+
+ return 0;
+}
+
+/* aie_set_col_clocks() - set clocks of a range of tiles of a column
+ * @apart: AI engine partition
+ * @range: range of tiles of a column
+ * @enable: true to enable the clock, false to disable
+ *
+ * Return: 0 for success, negative value for errors.
+ */
+static int aie_set_col_clocks(struct aie_partition *apart,
+ struct aie_range *range, bool enable)
+{
+ struct aie_location ploc;
+ u32 startbit;
+ int ret;
+
+ /*
+ * check if the range is of single colum. only single column is allowed.
+ * check if the start row is tile row, only tile rows are allowed.
+ */
+ if (range->size.col != 1 || range->start.row < 1)
+ return -EINVAL;
+
+ ploc.col = range->start.col;
+ for (ploc.row = range->start.row - 1;
+ ploc.row < range->start.row + range->size.row;
+ ploc.row++) {
+ struct aie_aperture *aperture = apart->aperture;
+ struct aie_device *adev = apart->adev;
+ void __iomem *va;
+ u32 val = 0, regoff;
+
+ if (!ploc.row) {
+ if (enable)
+ val = AIE_SHIMPL_CLKCNTR_COLBUF_MASK |
+ AIE_SHIMPL_CLKCNTR_NEXTCLK_MASK;
+ regoff = AIE_SHIMPL_CLKCNTR_REGOFF;
+ } else {
+ if (enable)
+ val = AIE_TILE_CLKCNTR_COLBUF_MASK |
+ AIE_TILE_CLKCNTR_NEXTCLK_MASK;
+ regoff = AIE_TILE_CORE_CLKCNTR_REGOFF;
+ }
+
+ va = aperture->base + aie_cal_regoff(adev, ploc, regoff);
+ writel(val, va);
+
+ /* If the tile clock is not on, no need to go to next row */
+ if (!enable)
+ break;
+ }
+
+ /* Update clock state bitmap */
+ startbit = (range->start.col - apart->range.start.col) *
+ (apart->range.size.row - 1) + range->start.row - 1;
+ if (enable)
+ ret = aie_resource_set(&apart->cores_clk_state, startbit,
+ range->size.row);
+ else
+ ret = aie_resource_clear(&apart->cores_clk_state, startbit,
+ range->size.row);
+
+ return ret;
+}
+
+/* aie_set_part_clocks() - set clocks of a partition
+ * @apart: AI engine partition
+ *
+ * Return: 0 for success, negative value for errors.
+ */
+static int aie_set_part_clocks(struct aie_partition *apart)
+{
+ struct aie_aperture *aperture = apart->aperture;
+ struct aie_range *range = &apart->range, lrange;
+ struct aie_location rloc;
+ int ret = 0;
+
+ /*
+ * The tiles below the highest tile whose clock is on, need to have the
+ * clock on. The first for loop is to scan the clock states bitmap to
+ * see which tiles are required to be clocked on, and update the bitmap
+ * to make sure the tiles below are also required to be clocked on.
+ */
+ for (rloc.col = 0; rloc.col < range->size.col; rloc.col++) {
+ u32 startbit, inuse_toprow = 0, clk_toprow = 0;
+
+ startbit = rloc.col * (range->size.row - 1);
+
+ for (rloc.row = range->start.row + 1;
+ rloc.row < range->start.row + range->size.row;
+ rloc.row++) {
+ u32 bit = startbit + rloc.row - 1;
+
+ if (aie_resource_testbit(&apart->tiles_inuse, bit))
+ inuse_toprow = rloc.row;
+ if (aie_resource_testbit(&apart->cores_clk_state, bit))
+ clk_toprow = rloc.row;
+ }
+
+ /* Update clock states of a column */
+ lrange.start.col = rloc.col + range->start.col;
+ lrange.size.col = 1;
+ if (inuse_toprow < clk_toprow) {
+ lrange.start.row = inuse_toprow + 1;
+ lrange.size.row = clk_toprow - inuse_toprow;
+ ret = aie_set_col_clocks(apart, &lrange, false);
+ } else if (inuse_toprow > clk_toprow) {
+ lrange.start.row = clk_toprow + 1;
+ lrange.size.row = inuse_toprow - clk_toprow;
+ ret = aie_set_col_clocks(apart, &lrange, true);
+ }
+
+ if (ret) {
+ dev_err(aperture->dev,
+ "failed to set clocks for column %u.",
+ rloc.col);
+ return ret;
+ }
+ }
+
+ return 0;
+}
static const struct aie_tile_operations aie_ops = {
.get_tile_type = aie_get_tile_type,
+ .scan_part_clocks = aie_scan_part_clocks,
+ .set_part_clocks = aie_set_part_clocks,
};
/**
diff --git a/drivers/accel/amd-ai-engine/ai-engine-clock.c b/drivers/accel/amd-ai-engine/ai-engine-clock.c
new file mode 100644
index 000000000000..646ec1d1658c
--- /dev/null
+++ b/drivers/accel/amd-ai-engine/ai-engine-clock.c
@@ -0,0 +1,338 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * AMD AI Engine clock operations
+ *
+ * Copyright(C) 2025 Advanced Micro Devices, Inc. All rights reserved.
+ */
+
+#include <linux/amd-ai-engine.h>
+#include <linux/clk.h>
+#include <linux/firmware/xlnx-zynqmp.h>
+#include <linux/list.h>
+#include <linux/mutex.h>
+
+#include "ai-engine-internal.h"
+
+/**
+ * aie_part_get_clk_state_bit() - return bit position of the clock state of a
+ * tile
+ * @apart: AI engine partition
+ * @loc: AI engine tile location
+ *
+ * Return: bit position for success, negative value for failure
+ */
+static int aie_part_get_clk_state_bit(struct aie_partition *apart,
+ struct aie_location *loc)
+{
+ u32 ttype = apart->adev->ops->get_tile_type(apart->adev, loc);
+
+ if (ttype != AIE_TILE_TYPE_TILE && ttype != AIE_TILE_TYPE_MEMORY)
+ return -EINVAL;
+
+ return (loc->col - apart->range.start.col) *
+ (apart->range.size.row - 1) + loc->row - 1;
+}
+
+/**
+ * aie_part_scan_clk_state() - scan the clock states of tiles of the AI engine
+ * partition
+ * @apart: AI engine partition
+ *
+ * Return: 0 for success, negative value for failure.
+ *
+ * This function will scan the clock status of both the memory and core
+ * modules.
+ */
+int aie_part_scan_clk_state(struct aie_partition *apart)
+{
+ return apart->adev->ops->scan_part_clocks(apart);
+}
+
+/**
+ * aie_part_check_clk_enable_loc() - return if clock of a tile is enabled
+ * @apart: AI engine partition
+ * @loc: AI engine tile location
+ *
+ * Return: true for enabled, false for disabled
+ */
+bool aie_part_check_clk_enable_loc(struct aie_partition *apart,
+ struct aie_location *loc)
+{
+ u32 ttype = apart->adev->ops->get_tile_type(apart->adev, loc);
+ int bit;
+
+ if (ttype != AIE_TILE_TYPE_TILE && ttype != AIE_TILE_TYPE_MEMORY)
+ return true;
+
+ bit = aie_part_get_clk_state_bit(apart, loc);
+ return aie_resource_testbit(&apart->cores_clk_state, bit);
+}
+
+/**
+ * aie_part_request_tiles() - request tiles from an AI engine partition.
+ * @apart: AI engine partition
+ * @num_tiles: number of tiles to request. If it is 0, it means all tiles
+ * @locs: the AI engine tiles locations array which will be requested
+ *
+ * Return: 0 for success, negative value for failure.
+ *
+ * This function will enable clocks of the specified tiles.
+ */
+int aie_part_request_tiles(struct aie_partition *apart, int num_tiles,
+ struct aie_location *locs)
+{
+ int ret;
+
+ mutex_lock(&apart->mlock);
+ if (num_tiles == 0) {
+ aie_resource_set(&apart->tiles_inuse, 0,
+ apart->tiles_inuse.total);
+ } else {
+ u32 n;
+
+ if (!locs) {
+ mutex_unlock(&apart->mlock);
+ return -EINVAL;
+ }
+
+ for (n = 0; n < num_tiles; n++) {
+ int bit = aie_part_get_clk_state_bit(apart, &locs[n]);
+
+ if (bit >= 0)
+ aie_resource_set(&apart->tiles_inuse, bit, 1);
+ }
+ }
+ ret = apart->adev->ops->set_part_clocks(apart);
+ mutex_unlock(&apart->mlock);
+
+ return ret;
+}
+
+/**
+ * aie_part_release_tiles() - release tiles from an AI engine partition.
+ * @apart: AI engine partition
+ * @num_tiles: number of tiles to release. If it is 0, it means all tiles
+ * @locs: the AI engine tiles locations array which will be released
+ *
+ * Return: 0 for success, negative value for failure.
+ *
+ * This function will disable clocks of the specified tiles.
+ */
+int aie_part_release_tiles(struct aie_partition *apart, int num_tiles,
+ struct aie_location *locs)
+{
+ int ret;
+
+ mutex_lock(&apart->mlock);
+ if (num_tiles == 0) {
+ aie_resource_clear(&apart->tiles_inuse, 0,
+ apart->tiles_inuse.total);
+ } else {
+ u32 n;
+
+ if (!locs) {
+ mutex_unlock(&apart->mlock);
+ return -EINVAL;
+ }
+
+ for (n = 0; n < num_tiles; n++) {
+ int bit = aie_part_get_clk_state_bit(apart, &locs[n]);
+
+ if (bit >= 0)
+ aie_resource_clear(&apart->tiles_inuse, bit, 1);
+ }
+ }
+
+ ret = apart->adev->ops->set_part_clocks(apart);
+ mutex_unlock(&apart->mlock);
+
+ return ret;
+}
+
+/**
+ * aie_aperture_get_freq_req() - get current required frequency of aperture
+ * @aperture: AI engine aperture
+ *
+ * Return: required clock frequency of the aperture which is the largest
+ * required clock frequency of all partitions of the aperture. If
+ * return value is 0, it means no partition has specific frequency
+ * requirement.
+ */
+static unsigned long aie_aperture_get_freq_req(struct aie_aperture *aperture)
+{
+ struct aie_partition *apart;
+ unsigned long freq_req = 0;
+
+ mutex_lock(&aperture->mlock);
+ list_for_each_entry(apart, &aperture->partitions, node) {
+ if (apart->freq_req > freq_req)
+ freq_req = apart->freq_req;
+ }
+ mutex_unlock(&aperture->mlock);
+
+ return freq_req;
+}
+
+/**
+ * aie_part_set_freq() - set frequency requirement of an AI engine partition
+ *
+ * @apart: AI engine partition
+ * @freq: required frequency
+ *
+ * Return: 0 for success, negative value for failure
+ *
+ * This function sets frequency requirement for the partition.
+ * It will call aie_dev_set_freq() to check the frequency requirements
+ * of all partitions. it will send QoS EEMI request to request the max
+ * frequency of all the partitions.
+ */
+int aie_part_set_freq(struct aie_partition *apart, u64 freq)
+{
+ struct aie_aperture *aperture = apart->aperture;
+ struct aie_device *adev = apart->adev;
+ u32 boot_qos, current_qos, target_qos;
+ unsigned long clk_rate;
+ u64 temp_freq;
+ int ret;
+
+ clk_rate = clk_get_rate(adev->clk);
+ if (freq > (u64)clk_rate) {
+ dev_err(aperture->dev,
+ "Invalid frequency to set, larger than full frequency(%lu).\n",
+ clk_rate);
+ return -EINVAL;
+ }
+ mutex_lock(&apart->mlock);
+
+ temp_freq = apart->freq_req;
+ apart->freq_req = freq;
+
+ freq = aie_aperture_get_freq_req(aperture);
+ if (!freq) {
+ mutex_unlock(&apart->mlock);
+ return 0;
+ }
+
+ ret = zynqmp_pm_get_qos(aperture->node_id, &boot_qos, ¤t_qos);
+ if (ret < 0) {
+ dev_err(aperture->dev, "Failed to get clock divider value.\n");
+ goto out;
+ }
+
+ target_qos = (boot_qos * clk_rate) / freq;
+
+ /* The clock divisor value (QoS) is a 10-bit value */
+ if (target_qos > (BIT(10) - 1)) {
+ /*
+ * Reset the logged partition frequency requirement to its
+ * previous value.
+ */
+ dev_err(aperture->dev,
+ "Failed to set frequency requirement. Frequency value out-of bound.\n");
+ ret = -EINVAL;
+ goto out;
+ }
+
+ ret = zynqmp_pm_set_requirement(aperture->node_id,
+ ZYNQMP_PM_CAPABILITY_ACCESS, target_qos,
+ ZYNQMP_PM_REQUEST_ACK_BLOCKING);
+ if (ret < 0) {
+ dev_err(aperture->dev, "Failed to set frequency requirement.\n");
+ goto out;
+ }
+
+ mutex_unlock(&apart->mlock);
+ return 0;
+out:
+ apart->freq_req = temp_freq;
+ mutex_unlock(&apart->mlock);
+ return ret;
+}
+
+/**
+ * aie_partition_set_freq_req() - set partition frequency requirement
+ *
+ * @apart: AI engine partition instance
+ * @freq: required frequency
+ *
+ * Return: 0 for success, negative value for failure
+ *
+ * This function sets the minimum required frequency for the AI engine
+ * partition. If there are other partitions requiring a higher frequency in the
+ * system, AI engine device will be clocked at that value to satisfy frequency
+ * requirements of all partitions.
+ */
+int aie_partition_set_freq_req(void *apart, u64 freq)
+{
+ if (!apart)
+ return -EINVAL;
+ return aie_part_set_freq((struct aie_partition *)apart, freq);
+}
+EXPORT_SYMBOL_GPL(aie_partition_set_freq_req);
+
+/**
+ * aie_part_get_freq() - get running frequency of AI engine device.
+ *
+ * @apart: AI engine partition
+ * @freq: return running frequency
+ *
+ * Return: 0 for success, negative value for failure
+ *
+ * This function gets clock divider value with EEMI requests, and it gets the
+ * full clock frequency from common clock framework. And then it divides the
+ * full clock frequency by the divider value and returns the result.
+ */
+static int aie_part_get_freq(struct aie_partition *apart, u64 *freq)
+{
+ struct aie_aperture *aperture = apart->aperture;
+ struct aie_device *adev = apart->adev;
+ u32 boot_qos, current_qos;
+ unsigned long clk_rate;
+ int ret;
+
+ clk_rate = clk_get_rate(adev->clk);
+ ret = zynqmp_pm_get_qos(aperture->node_id, &boot_qos,
+ ¤t_qos);
+ if (ret < 0) {
+ dev_err(aperture->dev, "Failed to get clock divider value.\n");
+ return ret;
+ }
+
+ *freq = (clk_rate * boot_qos) / current_qos;
+ return 0;
+}
+
+/**
+ * aie_partition_get_freq() - get partition running frequency
+ *
+ * @apart: AI engine partition instance
+ * @freq: return running frequency
+ *
+ * Return: 0 for success, negative value for failure
+ */
+int aie_partition_get_freq(void *apart, u64 *freq)
+{
+ if (!apart || !freq)
+ return -EINVAL;
+ return aie_part_get_freq((struct aie_partition *)apart, freq);
+}
+EXPORT_SYMBOL_GPL(aie_partition_get_freq);
+
+/**
+ * aie_partition_get_freq_req() - get partition required frequency
+ *
+ * @apart: AI engine partition instance
+ * @freq: return partition required frequency. 0 means partition doesn't have
+ * frequency requirement.
+ *
+ * Return: 0 for success, negative value for failure
+ */
+int aie_partition_get_freq_req(void *apart, u64 *freq)
+{
+ if (!apart || !freq)
+ return -EINVAL;
+
+ *freq = ((struct aie_partition *)apart)->freq_req;
+ return 0;
+}
+EXPORT_SYMBOL_GPL(aie_partition_get_freq_req);
diff --git a/drivers/accel/amd-ai-engine/ai-engine-internal.h b/drivers/accel/amd-ai-engine/ai-engine-internal.h
index 4f1d8ace2977..495d56d5f993 100644
--- a/drivers/accel/amd-ai-engine/ai-engine-internal.h
+++ b/drivers/accel/amd-ai-engine/ai-engine-internal.h
@@ -70,11 +70,25 @@ struct aie_aperture;
/**
* struct aie_tile_operations - AI engine device operations
* @get_tile_type: get type of tile based on tile operation
+ * @scan_part_clocks: scan partition modules to check whether the modules are
+ * clock gated or not, and update the soft clock states
+ * structure. It is required to be called when the partition
+ * is requested so that the driver knows which modules are
+ * clock gated when the partition is requested. This function
+ * expects the caller to apply partition lock before calling
+ * this function.
+ * @set_part_clocks: set partition modules clocks gate registers based on the
+ * partition clock states bitmap. This function expects the
+ * caller to apply partition lock before calling this
+ * function. The caller function will need to set the bitmap
+ * on which tiles are required to be clocked on.
* Different AI engine device version has its own device
* operation.
*/
struct aie_tile_operations {
u32 (*get_tile_type)(struct aie_device *adev, struct aie_location *loc);
+ int (*scan_part_clocks)(struct aie_partition *apart);
+ int (*set_part_clocks)(struct aie_partition *apart);
};
/**
@@ -171,14 +185,20 @@ struct aie_aperture {
* @aperture: pointer to AI engine aperture
* @adev: pointer to AI device instance
* @range: range of partition
+ * @cores_clk_state: bitmap to indicate the power state of core and mem tiles
+ * @tiles_inuse: bitmap to indicate if a tile is in use
* @mlock: protection for AI engine partition operations
+ * @freq_req: required frequency
*/
struct aie_partition {
struct list_head node;
struct aie_aperture *aperture;
struct aie_device *adev;
struct aie_range range;
+ struct aie_resource cores_clk_state;
+ struct aie_resource tiles_inuse;
struct mutex mlock; /* protection for AI engine partition operations */
+ u64 freq_req;
};
#define dev_to_aiedev(_dev) container_of((_dev), struct aie_device, dev)
@@ -210,6 +230,21 @@ struct aie_partition {
#define aie_cal_tile_reg(adev, regoff) ( \
aie_tile_reg_field_get(aie_tile_reg_mask(adev), 0, regoff))
+/**
+ * aie_cal_regoff() - calculate register offset to the whole AI engine
+ * device start address
+ * @adev: AI engine device
+ * @loc: AI engine tile location
+ * @regoff_intile: register offset within a tile
+ * @return: register offset to the whole AI engine device start address
+ */
+static inline u32 aie_cal_regoff(struct aie_device *adev,
+ struct aie_location loc, u32 regoff_intile)
+{
+ return regoff_intile + (loc.col << adev->col_shift) +
+ (loc.row << adev->row_shift);
+}
+
void aie_device_init(struct aie_device *adev);
struct aie_partition *
aie_aperture_request_part(struct aie_aperture *aperture,
@@ -219,6 +254,14 @@ void aie_aperture_remove(struct platform_device *pdev);
struct aie_partition *aie_part_create(struct aie_aperture *aperture,
u8 start_col, u8 num_col);
void aie_part_release(struct aie_partition *apart);
+int aie_part_set_freq(struct aie_partition *apart, u64 freq);
+int aie_part_scan_clk_state(struct aie_partition *apart);
+bool aie_part_check_clk_enable_loc(struct aie_partition *apart,
+ struct aie_location *loc);
+int aie_part_request_tiles(struct aie_partition *apart, int num_tiles,
+ struct aie_location *locs);
+int aie_part_release_tiles(struct aie_partition *apart, int num_tiles,
+ struct aie_location *locs);
int aie_resource_initialize(struct aie_resource *res, int count);
void aie_resource_uninitialize(struct aie_resource *res);
int aie_resource_check_region(struct aie_resource *res, u32 start,
@@ -226,5 +269,8 @@ int aie_resource_check_region(struct aie_resource *res, u32 start,
int aie_resource_get_region(struct aie_resource *res, u32 start,
u32 count);
void aie_resource_put_region(struct aie_resource *res, int start, u32 count);
+int aie_resource_set(struct aie_resource *res, u32 start, u32 count);
+int aie_resource_clear(struct aie_resource *res, u32 start, u32 count);
+bool aie_resource_testbit(struct aie_resource *res, u32 bit);
#endif /* AIE_INTERNAL_H */
diff --git a/drivers/accel/amd-ai-engine/ai-engine-part.c b/drivers/accel/amd-ai-engine/ai-engine-part.c
index 3675a72971d5..83099cb60161 100644
--- a/drivers/accel/amd-ai-engine/ai-engine-part.c
+++ b/drivers/accel/amd-ai-engine/ai-engine-part.c
@@ -20,12 +20,14 @@ void aie_part_release(struct aie_partition *apart)
{
struct aie_aperture *aperture = apart->aperture;
+ aie_part_set_freq(apart, 0);
mutex_lock(&aperture->mlock);
-
aie_resource_put_region(&aperture->cols_res,
apart->range.start.col -
aperture->range.start.col,
apart->range.size.col);
+ aie_resource_uninitialize(&apart->cores_clk_state);
+ aie_resource_uninitialize(&apart->tiles_inuse);
list_del(&apart->node);
devm_kfree(aperture->dev, apart);
mutex_unlock(&aperture->mlock);
@@ -48,6 +50,7 @@ struct aie_partition *aie_part_create(struct aie_aperture *aperture,
u8 start_col, u8 num_col)
{
struct aie_partition *apart;
+ int ret, num_tiles;
apart = devm_kzalloc(aperture->dev, sizeof(*apart), GFP_KERNEL);
if (!apart)
@@ -61,5 +64,28 @@ struct aie_partition *aie_part_create(struct aie_aperture *aperture,
apart->range.start.row = aperture->range.start.row;
apart->range.size.row = aperture->range.size.row;
+ /* SHIM row always enabled so it is not needed in the bitmap */
+ num_tiles = apart->range.size.col * (apart->range.size.row - 1);
+ ret = aie_resource_initialize(&apart->cores_clk_state, num_tiles);
+ if (ret) {
+ dev_err(aperture->dev, "failed to initialize clock state resource.");
+ return ERR_PTR(ret);
+ }
+
+ ret = aie_resource_initialize(&apart->tiles_inuse, num_tiles);
+ if (ret) {
+ dev_err(aperture->dev, "failed to initialize tiles in use resource.");
+ aie_resource_uninitialize(&apart->cores_clk_state);
+ return ERR_PTR(ret);
+ }
+
+ ret = aie_part_scan_clk_state(apart);
+ if (ret) {
+ dev_err(aperture->dev, "failed to scan clock state.");
+ aie_resource_uninitialize(&apart->cores_clk_state);
+ aie_resource_uninitialize(&apart->tiles_inuse);
+ return ERR_PTR(ret);
+ }
+
return apart;
}
diff --git a/drivers/accel/amd-ai-engine/ai-engine-res.c b/drivers/accel/amd-ai-engine/ai-engine-res.c
index 6bbd7273686e..d71a3a5f7b29 100644
--- a/drivers/accel/amd-ai-engine/ai-engine-res.c
+++ b/drivers/accel/amd-ai-engine/ai-engine-res.c
@@ -112,3 +112,57 @@ void aie_resource_put_region(struct aie_resource *res, int start, u32 count)
return;
bitmap_clear(res->bitmap, start, count);
}
+
+/**
+ * aie_resource_set() - set the AI engine resource bits
+ * @res: pointer to AI engine resource
+ * @start: start bit to set
+ * @count: number of bits to set
+ *
+ * Return: 0 for success and negative value for failure
+ *
+ * This function sets the specified number bits in the resource.
+ */
+int aie_resource_set(struct aie_resource *res, u32 start, u32 count)
+{
+ if (!res || !res->bitmap || !count || start + count > res->total)
+ return -EINVAL;
+
+ bitmap_set(res->bitmap, start, count);
+ return 0;
+}
+
+/**
+ * aie_resource_clear() - clear the AI engine resource bits
+ * @res: pointer to AI engine resource
+ * @start: start bit to set
+ * @count: number of bits to clear
+ *
+ * Return: 0 for success and negative value for failure
+ *
+ * This function clears the specified number bits in the resource.
+ */
+int aie_resource_clear(struct aie_resource *res, u32 start, u32 count)
+{
+ if (!res || !res->bitmap || !count || start + count > res->total)
+ return -EINVAL;
+
+ bitmap_clear(res->bitmap, start, count);
+ return 0;
+}
+
+/**
+ * aie_resource_testbit() - test if a bit is set in a AI engine resource
+ * @res: pointer to AI engine resource
+ * @bit: bit to check
+ *
+ * Return: true for set, false for not set
+ */
+bool aie_resource_testbit(struct aie_resource *res, u32 bit)
+{
+ if (!res || !res->bitmap || bit >= res->total)
+ return false;
+
+ /* Locate the unsigned long the required bit belongs to */
+ return test_bit(bit, res->bitmap);
+}
diff --git a/include/linux/amd-ai-engine.h b/include/linux/amd-ai-engine.h
index 2a13362edd0c..f1f6543f9eae 100644
--- a/include/linux/amd-ai-engine.h
+++ b/include/linux/amd-ai-engine.h
@@ -42,5 +42,8 @@ struct aie_location {
void *aie_partition_request(struct device *dev, struct aie_partition_req *req);
void aie_partition_release(void *apart);
+int aie_partition_set_freq_req(void *apart, u64 freq);
+int aie_partition_get_freq(void *apart, u64 *freq);
+int aie_partition_get_freq_req(void *apart, u64 *freq);
#endif
--
2.34.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH V1 7/9] accel: amd-ai-engine: Add support for AIEML devices
2025-07-02 15:56 [PATCH V1 0/9] AMD AI Engine device driver for Versal Gregory Williams
` (5 preceding siblings ...)
2025-07-02 15:56 ` [PATCH V1 6/9] accel: amd-ai-engine: Add support to enable/disable clocks and change clock frequency Gregory Williams
@ 2025-07-02 15:56 ` Gregory Williams
2025-07-02 15:56 ` [PATCH V1 8/9] accel: amd-ai-engine: Create tile memory information Gregory Williams
2025-07-02 15:56 ` [PATCH V1 9/9] accel: amd-ai-engine: Adds AI Engine reset operations Gregory Williams
8 siblings, 0 replies; 21+ messages in thread
From: Gregory Williams @ 2025-07-02 15:56 UTC (permalink / raw)
To: ogabbay, michal.simek, robh
Cc: Gregory Williams, dri-devel, devicetree, linux-kernel
Adds driver support for AIEML generation devices. The following modules
are enabled:
- Get tile type from location (support for new memory tile type)
- Clock state tracking and request and release of tiles
Signed-off-by: Gregory Williams <gregory.williams@amd.com>
---
drivers/accel/amd-ai-engine/Makefile | 1 +
drivers/accel/amd-ai-engine/ai-engine-aieml.c | 210 ++++++++++++++++++
drivers/accel/amd-ai-engine/ai-engine-dev.c | 2 +
.../accel/amd-ai-engine/ai-engine-internal.h | 2 +
4 files changed, 215 insertions(+)
create mode 100644 drivers/accel/amd-ai-engine/ai-engine-aieml.c
diff --git a/drivers/accel/amd-ai-engine/Makefile b/drivers/accel/amd-ai-engine/Makefile
index 9a830f7432d2..66cbce4705ea 100644
--- a/drivers/accel/amd-ai-engine/Makefile
+++ b/drivers/accel/amd-ai-engine/Makefile
@@ -6,6 +6,7 @@ obj-$(CONFIG_DRM_ACCEL_AMDAIE) += amd-aie.o
amd-aie-$(CONFIG_DRM_ACCEL_AMDAIE) := \
ai-engine-aie.o \
+ ai-engine-aieml.o \
ai-engine-aperture.o \
ai-engine-clock.o \
ai-engine-dev.o \
diff --git a/drivers/accel/amd-ai-engine/ai-engine-aieml.c b/drivers/accel/amd-ai-engine/ai-engine-aieml.c
new file mode 100644
index 000000000000..328688942a6a
--- /dev/null
+++ b/drivers/accel/amd-ai-engine/ai-engine-aieml.c
@@ -0,0 +1,210 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * AMD AI Engine driver AIEML device specific implementation
+ *
+ * Copyright(C) 2025 Advanced Micro Devices, Inc. All rights reserved.
+ */
+
+#include <linux/amd-ai-engine.h>
+#include <linux/bitmap.h>
+#include <linux/device.h>
+#include <linux/firmware/xlnx-zynqmp.h>
+#include <linux/io.h>
+
+#include "ai-engine-internal.h"
+
+#define AIEML_ARRAY_SHIFT 32U
+#define AIEML_COL_SHIFT 25U
+#define AIEML_ROW_SHIFT 20U
+
+#define NUM_TYPES_OF_MEM 3U
+
+#define NUM_MODS_CORE_TILE 2U
+#define NUM_MODS_MEM_TILE 1U
+#define NUM_MODS_SHIMPL_TILE 1U
+
+/*
+ * Register offsets
+ */
+#define AIEML_SHIMPL_COLCLOCK_CTRL_REGOFF 0x000fff20U
+
+/*
+ * Register masks
+ */
+#define AIEML_SHIMPL_COLRESET_CTRL_MASK GENMASK(1, 0)
+#define AIEML_SHIMPL_COLCLOCK_CTRL_MASK GENMASK(1, 0)
+
+static u32 aieml_get_tile_type(struct aie_device *adev,
+ struct aie_location *loc)
+{
+ u8 num_mem_rows = adev->ttype_attr[AIE_TILE_TYPE_MEMORY].num_rows;
+
+ if (loc->row > num_mem_rows)
+ return AIE_TILE_TYPE_TILE;
+ if (loc->row && loc->row <= num_mem_rows)
+ return AIE_TILE_TYPE_MEMORY;
+ if (loc->row == 0)
+ if ((loc->col % 4) < 2)
+ return AIE_TILE_TYPE_SHIMPL;
+
+ return AIE_TILE_TYPE_SHIMNOC;
+}
+
+/* aieml_scan_part_clocks() - scan clocks of a partition
+ * @apart: AI engine partition
+ *
+ * Return: 0 for success, negative value for errors.
+ */
+static int aieml_scan_part_clocks(struct aie_partition *apart)
+{
+ struct aie_aperture *aperture = apart->aperture;
+ struct aie_range *range = &apart->range;
+ struct aie_device *adev = apart->adev;
+ struct aie_location loc;
+ int ret;
+
+ /* Clear the bitmap of cores and memories clock state */
+ aie_resource_put_region(&apart->cores_clk_state, 0,
+ apart->cores_clk_state.total);
+
+ /*
+ * In aieml if clock buffer on shim tile is enabled, the clock for all
+ * tiles in the same column is enabled.
+ */
+ for (loc.col = range->start.col;
+ loc.col < range->start.col + range->size.col;
+ loc.col++) {
+ void __iomem *va;
+ u32 val, nbitpos;
+
+ nbitpos = (loc.col - range->start.col) * (range->size.row - 1);
+
+ va = aperture->base +
+ aie_cal_regoff(adev, loc,
+ AIEML_SHIMPL_COLCLOCK_CTRL_REGOFF);
+ val = readl(va);
+
+ if (!(val & AIEML_SHIMPL_COLCLOCK_CTRL_MASK))
+ continue;
+
+ ret = aie_resource_set(&apart->cores_clk_state, nbitpos,
+ range->size.row - 1);
+ if (ret) {
+ dev_err(aperture->dev,
+ "failed to set clock state bitmaps for column %u",
+ loc.col);
+ return ret;
+ }
+ }
+ /*
+ * Set the tiles in use bitmap.
+ * In case of scanning, tiles which are powered on are considered as
+ * tiles in use.
+ */
+ bitmap_copy(apart->tiles_inuse.bitmap, apart->cores_clk_state.bitmap,
+ apart->tiles_inuse.total);
+
+ return 0;
+}
+
+/* aieml_set_part_clocks() - set clocks of a partition
+ * @apart: AI engine partition
+ *
+ * Return: 0 for success, negative value for errors.
+ */
+static int aieml_set_part_clocks(struct aie_partition *apart)
+{
+ struct aie_aperture *aperture = apart->aperture;
+ struct aie_range *range = &apart->range;
+ u32 node_id = apart->adev->pm_node_id;
+ struct aie_location loc;
+ int ret;
+
+ for (loc.col = range->start.col;
+ loc.col < range->start.col + range->size.col;
+ loc.col++) {
+ u32 startbit, col_inuse = 0;
+
+ startbit = (loc.col - range->start.col) * (range->size.row - 1);
+
+ for (loc.row = range->start.row + 1;
+ loc.row < range->start.row + range->size.row;
+ loc.row++) {
+ u32 nbitpos = startbit + loc.row - 1;
+
+ if (aie_resource_testbit(&apart->tiles_inuse, nbitpos)) {
+ col_inuse = 1;
+ break;
+ }
+ }
+
+ if (col_inuse) {
+ ret = zynqmp_pm_aie_operation(node_id, loc.col,
+ 1,
+ XILINX_AIE_OPS_ENB_COL_CLK_BUFF);
+ if (ret < 0) {
+ dev_err(aperture->dev,
+ "failed to enable clock for column: %d",
+ loc.col);
+ return ret;
+ }
+
+ ret = aie_resource_set(&apart->tiles_inuse,
+ startbit, apart->range.size.row - 1) |
+ aie_resource_set(&apart->cores_clk_state,
+ startbit, apart->range.size.row - 1);
+ if (ret) {
+ dev_err(aperture->dev,
+ "failed to set bitmaps for column: %d",
+ loc.col);
+ return ret;
+ }
+ } else {
+ ret = zynqmp_pm_aie_operation(node_id, loc.col,
+ 1,
+ XILINX_AIE_OPS_DIS_COL_CLK_BUFF);
+ if (ret < 0) {
+ dev_err(aperture->dev,
+ "failed to disable clock for column: %d",
+ loc.col);
+ return ret;
+ }
+
+ ret = aie_resource_clear(&apart->tiles_inuse,
+ startbit, apart->range.size.row - 1) |
+ aie_resource_clear(&apart->cores_clk_state,
+ startbit, apart->range.size.row - 1);
+ if (ret) {
+ dev_err(aperture->dev,
+ "failed to clear bitmaps for column: %d",
+ loc.col);
+ return ret;
+ }
+ }
+ }
+
+ return 0;
+}
+
+static const struct aie_tile_operations aieml_ops = {
+ .get_tile_type = aieml_get_tile_type,
+ .scan_part_clocks = aieml_scan_part_clocks,
+ .set_part_clocks = aieml_set_part_clocks,
+};
+
+/**
+ * aieml_device_init() - Initialize AI engine device struct AIEML specific
+ * @adev: AI engine device
+ *
+ * This function initialize the AI engine device structure device version
+ * specific elements such as register addressing related array shift,
+ * column shift, and row shift; AIEML device specific device operations, device
+ * columns resource.
+ */
+void aieml_device_init(struct aie_device *adev)
+{
+ adev->array_shift = AIEML_ARRAY_SHIFT;
+ adev->col_shift = AIEML_COL_SHIFT;
+ adev->row_shift = AIEML_ROW_SHIFT;
+ adev->ops = &aieml_ops;
+}
diff --git a/drivers/accel/amd-ai-engine/ai-engine-dev.c b/drivers/accel/amd-ai-engine/ai-engine-dev.c
index ba28257cbd04..f713d38ff8c3 100644
--- a/drivers/accel/amd-ai-engine/ai-engine-dev.c
+++ b/drivers/accel/amd-ai-engine/ai-engine-dev.c
@@ -154,6 +154,8 @@ static int amd_ai_engine_probe(struct platform_device *pdev)
adev->dev_gen = aie_gen;
if (aie_gen == AIE_DEVICE_GEN_AIE) {
aie_device_init(adev);
+ } else if (aie_gen == AIE_DEVICE_GEN_AIEML) {
+ aieml_device_init(adev);
} else {
dev_err(&pdev->dev, "Invalid device generation");
return -EINVAL;
diff --git a/drivers/accel/amd-ai-engine/ai-engine-internal.h b/drivers/accel/amd-ai-engine/ai-engine-internal.h
index 495d56d5f993..31a45575cc43 100644
--- a/drivers/accel/amd-ai-engine/ai-engine-internal.h
+++ b/drivers/accel/amd-ai-engine/ai-engine-internal.h
@@ -19,6 +19,7 @@
#include <linux/platform_device.h>
#define AIE_DEVICE_GEN_AIE 1U
+#define AIE_DEVICE_GEN_AIEML 2U
#define KBYTES(n) ((n) * SZ_1K)
@@ -246,6 +247,7 @@ static inline u32 aie_cal_regoff(struct aie_device *adev,
}
void aie_device_init(struct aie_device *adev);
+void aieml_device_init(struct aie_device *adev);
struct aie_partition *
aie_aperture_request_part(struct aie_aperture *aperture,
struct aie_partition_req *req);
--
2.34.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH V1 8/9] accel: amd-ai-engine: Create tile memory information
2025-07-02 15:56 [PATCH V1 0/9] AMD AI Engine device driver for Versal Gregory Williams
` (6 preceding siblings ...)
2025-07-02 15:56 ` [PATCH V1 7/9] accel: amd-ai-engine: Add support for AIEML devices Gregory Williams
@ 2025-07-02 15:56 ` Gregory Williams
2025-07-02 15:56 ` [PATCH V1 9/9] accel: amd-ai-engine: Adds AI Engine reset operations Gregory Williams
8 siblings, 0 replies; 21+ messages in thread
From: Gregory Williams @ 2025-07-02 15:56 UTC (permalink / raw)
To: ogabbay, michal.simek, robh
Cc: Gregory Williams, dri-devel, devicetree, linux-kernel
Creates tile memory information structure to store size and offsets for
core data and program memory and memory tile memory for AIEML.
Signed-off-by: Gregory Williams <gregory.williams@amd.com>
---
drivers/accel/amd-ai-engine/ai-engine-aie.c | 39 +++++++++
drivers/accel/amd-ai-engine/ai-engine-aieml.c | 47 ++++++++++
.../accel/amd-ai-engine/ai-engine-internal.h | 85 +++++++++++++------
drivers/accel/amd-ai-engine/ai-engine-part.c | 45 ++++++++++
4 files changed, 192 insertions(+), 24 deletions(-)
diff --git a/drivers/accel/amd-ai-engine/ai-engine-aie.c b/drivers/accel/amd-ai-engine/ai-engine-aie.c
index 5e3cb44a16c8..056db0b7be0e 100644
--- a/drivers/accel/amd-ai-engine/ai-engine-aie.c
+++ b/drivers/accel/amd-ai-engine/ai-engine-aie.c
@@ -16,6 +16,8 @@
#define AIE_COL_SHIFT 23U
#define AIE_ROW_SHIFT 18U
+#define NUM_TYPES_OF_MEM 2U
+
/*
* Register offsets
*/
@@ -41,6 +43,42 @@ static u32 aie_get_tile_type(struct aie_device *adev, struct aie_location *loc)
return AIE_TILE_TYPE_SHIMNOC;
}
+static unsigned int aie_get_mem_info(struct aie_device *adev,
+ struct aie_range *range,
+ struct aie_part_mem *pmem)
+{
+ u8 start_row, num_rows;
+ unsigned int i;
+
+ if (range->start.row + range->size.row <= 1) {
+ /* SHIM row only, no memories in this range */
+ return 0;
+ }
+ if (!pmem)
+ return NUM_TYPES_OF_MEM;
+
+ for (i = 0; i < NUM_TYPES_OF_MEM; i++) {
+ struct aie_mem *mem = &pmem[i].mem;
+
+ memcpy(&mem->range, range, sizeof(*range));
+ }
+
+ start_row = adev->ttype_attr[AIE_TILE_TYPE_TILE].start_row;
+ num_rows = adev->ttype_attr[AIE_TILE_TYPE_TILE].num_rows;
+ /* Setup tile data memory information */
+ pmem[0].mem.offset = 0;
+ pmem[0].mem.size = KBYTES(32);
+ pmem[0].mem.range.start.row = start_row;
+ pmem[0].mem.range.size.row = num_rows;
+ /* Setup program memory information */
+ pmem[1].mem.offset = 0x20000;
+ pmem[1].mem.size = KBYTES(16);
+ pmem[1].mem.range.start.row = start_row;
+ pmem[1].mem.range.size.row = num_rows;
+
+ return NUM_TYPES_OF_MEM;
+}
+
/* aie_scan_part_clocks() - scan clocks of a partition
* @apart: AI engine partition
*
@@ -258,6 +296,7 @@ static int aie_set_part_clocks(struct aie_partition *apart)
}
static const struct aie_tile_operations aie_ops = {
.get_tile_type = aie_get_tile_type,
+ .get_mem_info = aie_get_mem_info,
.scan_part_clocks = aie_scan_part_clocks,
.set_part_clocks = aie_set_part_clocks,
};
diff --git a/drivers/accel/amd-ai-engine/ai-engine-aieml.c b/drivers/accel/amd-ai-engine/ai-engine-aieml.c
index 328688942a6a..7730609ff7c0 100644
--- a/drivers/accel/amd-ai-engine/ai-engine-aieml.c
+++ b/drivers/accel/amd-ai-engine/ai-engine-aieml.c
@@ -50,6 +50,52 @@ static u32 aieml_get_tile_type(struct aie_device *adev,
return AIE_TILE_TYPE_SHIMNOC;
}
+static unsigned int aieml_get_mem_info(struct aie_device *adev,
+ struct aie_range *range,
+ struct aie_part_mem *pmem)
+{
+ u8 start_row, num_rows;
+ unsigned int i;
+
+ if (range->start.row + range->size.row <= 1) {
+ /* SHIM row only, no memories in this range */
+ return 0;
+ }
+
+ if (!pmem)
+ return NUM_TYPES_OF_MEM;
+
+ for (i = 0; i < NUM_TYPES_OF_MEM; i++) {
+ struct aie_mem *mem = &pmem[i].mem;
+
+ memcpy(&mem->range, range, sizeof(*range));
+ }
+
+ start_row = adev->ttype_attr[AIE_TILE_TYPE_TILE].start_row;
+ num_rows = adev->ttype_attr[AIE_TILE_TYPE_TILE].num_rows;
+ /* Setup tile data memory information */
+ pmem[0].mem.offset = 0;
+ pmem[0].mem.size = KBYTES(64);
+ pmem[0].mem.range.start.row = start_row;
+ pmem[0].mem.range.size.row = num_rows;
+
+ /* Setup program memory information */
+ pmem[1].mem.offset = 0x20000;
+ pmem[1].mem.size = KBYTES(16);
+ pmem[1].mem.range.start.row = start_row;
+ pmem[1].mem.range.size.row = num_rows;
+
+ start_row = adev->ttype_attr[AIE_TILE_TYPE_MEMORY].start_row;
+ num_rows = adev->ttype_attr[AIE_TILE_TYPE_MEMORY].num_rows;
+ /* Setup memory tile memory information */
+ pmem[2].mem.offset = 0;
+ pmem[2].mem.size = KBYTES(512);
+ pmem[2].mem.range.start.row = start_row;
+ pmem[2].mem.range.size.row = num_rows;
+
+ return NUM_TYPES_OF_MEM;
+}
+
/* aieml_scan_part_clocks() - scan clocks of a partition
* @apart: AI engine partition
*
@@ -188,6 +234,7 @@ static int aieml_set_part_clocks(struct aie_partition *apart)
static const struct aie_tile_operations aieml_ops = {
.get_tile_type = aieml_get_tile_type,
+ .get_mem_info = aieml_get_mem_info,
.scan_part_clocks = aieml_scan_part_clocks,
.set_part_clocks = aieml_set_part_clocks,
};
diff --git a/drivers/accel/amd-ai-engine/ai-engine-internal.h b/drivers/accel/amd-ai-engine/ai-engine-internal.h
index 31a45575cc43..13a39c4e3331 100644
--- a/drivers/accel/amd-ai-engine/ai-engine-internal.h
+++ b/drivers/accel/amd-ai-engine/ai-engine-internal.h
@@ -68,30 +68,6 @@ struct aie_device;
struct aie_partition;
struct aie_aperture;
-/**
- * struct aie_tile_operations - AI engine device operations
- * @get_tile_type: get type of tile based on tile operation
- * @scan_part_clocks: scan partition modules to check whether the modules are
- * clock gated or not, and update the soft clock states
- * structure. It is required to be called when the partition
- * is requested so that the driver knows which modules are
- * clock gated when the partition is requested. This function
- * expects the caller to apply partition lock before calling
- * this function.
- * @set_part_clocks: set partition modules clocks gate registers based on the
- * partition clock states bitmap. This function expects the
- * caller to apply partition lock before calling this
- * function. The caller function will need to set the bitmap
- * on which tiles are required to be clocked on.
- * Different AI engine device version has its own device
- * operation.
- */
-struct aie_tile_operations {
- u32 (*get_tile_type)(struct aie_device *adev, struct aie_location *loc);
- int (*scan_part_clocks)(struct aie_partition *apart);
- int (*set_part_clocks)(struct aie_partition *apart);
-};
-
/**
* struct aie_resource - AI engine resource structure
* @bitmap: resource bitmap
@@ -112,6 +88,37 @@ struct aie_range {
struct aie_location size;
};
+/**
+ * struct aie_mem - AIE memory information
+ * @range: range of tiles of the memory
+ * @offset: register offset within a tile of the memory
+ * @size: of a the memory in one tile
+ */
+struct aie_mem {
+ struct aie_range range;
+ __kernel_size_t offset;
+ __kernel_size_t size;
+};
+
+/**
+ * struct aie_part_mem - AI engine partition memory information structure
+ * @apart: AI engine partition
+ * @mem: memory information of a type of memory
+ * @size: size of the total memories in the partition
+ *
+ * This structure is to keep the information of a type of memory in a
+ * partition. The memory information will be stored in @mem property.
+ * The following information will be kept:
+ * * memory start address offset within a tile
+ * * memory size
+ * * what tiles contain this type of memory
+ */
+struct aie_part_mem {
+ struct aie_partition *apart;
+ struct aie_mem mem;
+ size_t size;
+};
+
/**
* struct aie_tile_attr - AI engine device tile type attributes
* @start_row: start row
@@ -126,6 +133,34 @@ struct aie_tile_attr {
const enum aie_module_type *mods;
};
+/**
+ * struct aie_tile_operations - AI engine device operations
+ * @get_tile_type: get type of tile based on tile operation
+ * @get_mem_info: get different types of memories information
+ * @scan_part_clocks: scan partition modules to check whether the modules are
+ * clock gated or not, and update the soft clock states
+ * structure. It is required to be called when the partition
+ * is requested so that the driver knows which modules are
+ * clock gated when the partition is requested. This function
+ * expects the caller to apply partition lock before calling
+ * this function.
+ * @set_part_clocks: set partition modules clocks gate registers based on the
+ * partition clock states bitmap. This function expects the
+ * caller to apply partition lock before calling this
+ * function. The caller function will need to set the bitmap
+ * on which tiles are required to be clocked on.
+ * Different AI engine device version has its own device
+ * operation.
+ */
+struct aie_tile_operations {
+ u32 (*get_tile_type)(struct aie_device *adev, struct aie_location *loc);
+ unsigned int (*get_mem_info)(struct aie_device *adev,
+ struct aie_range *range,
+ struct aie_part_mem *pmem);
+ int (*scan_part_clocks)(struct aie_partition *apart);
+ int (*set_part_clocks)(struct aie_partition *apart);
+};
+
/**
* struct aie_device - AI engine device structure
* @apertures: list of apertures
@@ -188,6 +223,7 @@ struct aie_aperture {
* @range: range of partition
* @cores_clk_state: bitmap to indicate the power state of core and mem tiles
* @tiles_inuse: bitmap to indicate if a tile is in use
+ * @pmems: pointer to partition memories types
* @mlock: protection for AI engine partition operations
* @freq_req: required frequency
*/
@@ -198,6 +234,7 @@ struct aie_partition {
struct aie_range range;
struct aie_resource cores_clk_state;
struct aie_resource tiles_inuse;
+ struct aie_part_mem *pmems;
struct mutex mlock; /* protection for AI engine partition operations */
u64 freq_req;
};
diff --git a/drivers/accel/amd-ai-engine/ai-engine-part.c b/drivers/accel/amd-ai-engine/ai-engine-part.c
index 83099cb60161..878597eff202 100644
--- a/drivers/accel/amd-ai-engine/ai-engine-part.c
+++ b/drivers/accel/amd-ai-engine/ai-engine-part.c
@@ -12,6 +12,44 @@
#include "ai-engine-internal.h"
+/**
+ * aie_part_create_mems_info() - creates array to store the AI engine partition
+ * different memories types information
+ * @apart: AI engine partition
+ *
+ * Return: 0 for success, negative value for failure
+ *
+ * This function will create array to store the information of different
+ * memories types in the partition. This array is stored in @apart->pmems.
+ */
+static int aie_part_create_mems_info(struct aie_partition *apart)
+{
+ unsigned int i, num_mems;
+
+ num_mems = apart->adev->ops->get_mem_info(apart->adev, &apart->range,
+ NULL);
+ if (!num_mems)
+ return 0;
+
+ apart->pmems = devm_kcalloc(apart->aperture->dev, num_mems,
+ sizeof(struct aie_part_mem),
+ GFP_KERNEL);
+ if (!apart->pmems)
+ return -ENOMEM;
+
+ apart->adev->ops->get_mem_info(apart->adev, &apart->range,
+ apart->pmems);
+ for (i = 0; i < num_mems; i++) {
+ struct aie_mem *mem = &apart->pmems[i].mem;
+
+ apart->pmems[i].apart = apart;
+ apart->pmems[i].size = mem->size *
+ mem->range.size.col *
+ mem->range.size.row;
+ }
+ return 0;
+}
+
/**
* aie_part_release() - release an AI engine partition instance
* @apart: AI engine partition device
@@ -29,6 +67,7 @@ void aie_part_release(struct aie_partition *apart)
aie_resource_uninitialize(&apart->cores_clk_state);
aie_resource_uninitialize(&apart->tiles_inuse);
list_del(&apart->node);
+ devm_kfree(aperture->dev, apart->pmems);
devm_kfree(aperture->dev, apart);
mutex_unlock(&aperture->mlock);
}
@@ -64,6 +103,12 @@ struct aie_partition *aie_part_create(struct aie_aperture *aperture,
apart->range.start.row = aperture->range.start.row;
apart->range.size.row = aperture->range.size.row;
+ ret = aie_part_create_mems_info(apart);
+ if (ret) {
+ dev_err(aperture->dev, "failed to create tile memory information.");
+ return ERR_PTR(ret);
+ }
+
/* SHIM row always enabled so it is not needed in the bitmap */
num_tiles = apart->range.size.col * (apart->range.size.row - 1);
ret = aie_resource_initialize(&apart->cores_clk_state, num_tiles);
--
2.34.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH V1 9/9] accel: amd-ai-engine: Adds AI Engine reset operations
2025-07-02 15:56 [PATCH V1 0/9] AMD AI Engine device driver for Versal Gregory Williams
` (7 preceding siblings ...)
2025-07-02 15:56 ` [PATCH V1 8/9] accel: amd-ai-engine: Create tile memory information Gregory Williams
@ 2025-07-02 15:56 ` Gregory Williams
8 siblings, 0 replies; 21+ messages in thread
From: Gregory Williams @ 2025-07-02 15:56 UTC (permalink / raw)
To: ogabbay, michal.simek, robh
Cc: Gregory Williams, dri-devel, devicetree, linux-kernel
Adds AI Engine hardware reset functionality. Adds call to
initialize and teardown partitions. Partition initialize resets
columns, resets shim tiles, sets up AXIMM error reporting, ungates column
clocks, sets up partition isolation, and zeroizes all tile memories.
Teardown resets columns, resets shim tiles, zeroizes all tile
memories, and gates column clocks.
Signed-off-by: Gregory Williams <gregory.williams@amd.com>
---
drivers/accel/amd-ai-engine/Makefile | 3 +-
drivers/accel/amd-ai-engine/ai-engine-aie.c | 104 ++++++
drivers/accel/amd-ai-engine/ai-engine-aieml.c | 105 ++++++
drivers/accel/amd-ai-engine/ai-engine-clock.c | 16 +-
.../accel/amd-ai-engine/ai-engine-internal.h | 45 +++
drivers/accel/amd-ai-engine/ai-engine-part.c | 31 ++
drivers/accel/amd-ai-engine/ai-engine-res.c | 16 +
drivers/accel/amd-ai-engine/ai-engine-reset.c | 300 ++++++++++++++++++
include/linux/amd-ai-engine.h | 24 ++
9 files changed, 629 insertions(+), 15 deletions(-)
create mode 100644 drivers/accel/amd-ai-engine/ai-engine-reset.c
diff --git a/drivers/accel/amd-ai-engine/Makefile b/drivers/accel/amd-ai-engine/Makefile
index 66cbce4705ea..8522222f330a 100644
--- a/drivers/accel/amd-ai-engine/Makefile
+++ b/drivers/accel/amd-ai-engine/Makefile
@@ -11,4 +11,5 @@ amd-aie-$(CONFIG_DRM_ACCEL_AMDAIE) := \
ai-engine-clock.o \
ai-engine-dev.o \
ai-engine-part.o \
- ai-engine-res.o
+ ai-engine-res.o \
+ ai-engine-reset.o
diff --git a/drivers/accel/amd-ai-engine/ai-engine-aie.c b/drivers/accel/amd-ai-engine/ai-engine-aie.c
index 056db0b7be0e..23733d384a2e 100644
--- a/drivers/accel/amd-ai-engine/ai-engine-aie.c
+++ b/drivers/accel/amd-ai-engine/ai-engine-aie.c
@@ -22,7 +22,14 @@
* Register offsets
*/
#define AIE_SHIMPL_CLKCNTR_REGOFF 0x00036040U
+#define AIE_SHIMPL_TILECTRL_REGOFF 0x00036030U
+
+#define AIE_TILE_CORE_AMH3_PART3_REGOFF 0x000307a0U
#define AIE_TILE_CORE_CLKCNTR_REGOFF 0x00036040U
+#define AIE_TILE_CORE_LC_REGOFF 0x00030520U
+#define AIE_TILE_CORE_R0_REGOFF 0x00030000U
+#define AIE_TILE_CORE_TILECTRL_REGOFF 0x00036030U
+#define AIE_TILE_CORE_VRL0_REGOFF 0x00030530U
/*
* Register masks
@@ -32,6 +39,27 @@
#define AIE_TILE_CLKCNTR_COLBUF_MASK BIT(0)
#define AIE_TILE_CLKCNTR_NEXTCLK_MASK BIT(1)
+static const struct aie_tile_regs aie_core_32bit_regs = {
+ .attribute = AIE_TILE_TYPE_TILE << AIE_REGS_ATTR_TILE_TYPE_SHIFT,
+ .soff = AIE_TILE_CORE_R0_REGOFF,
+ .eoff = AIE_TILE_CORE_LC_REGOFF,
+};
+
+static const struct aie_tile_regs aie_core_128bit_regs = {
+ .attribute = AIE_TILE_TYPE_TILE << AIE_REGS_ATTR_TILE_TYPE_SHIFT,
+ .soff = AIE_TILE_CORE_VRL0_REGOFF,
+ .eoff = AIE_TILE_CORE_AMH3_PART3_REGOFF,
+};
+
+static const struct aie_core_regs_attr aie_core_regs[] = {
+ {.core_regs = &aie_core_32bit_regs,
+ .width = 1,
+ },
+ {.core_regs = &aie_core_128bit_regs,
+ .width = 4,
+ },
+};
+
static u32 aie_get_tile_type(struct aie_device *adev, struct aie_location *loc)
{
if (loc->row)
@@ -79,6 +107,43 @@ static unsigned int aie_get_mem_info(struct aie_device *adev,
return NUM_TYPES_OF_MEM;
}
+/**
+ * aie_part_clear_mems() - clear memories of every tile in a partition
+ * @apart: AI engine partition
+ *
+ * Return: return 0 always.
+ */
+static int aie_part_clear_mems(struct aie_partition *apart)
+{
+ struct aie_part_mem *pmems = apart->pmems;
+ struct aie_device *adev = apart->adev;
+ u32 i;
+
+ /* Clear each type of memories in the partition */
+ for (i = 0; i < NUM_TYPES_OF_MEM; i++) {
+ struct aie_mem *mem = &pmems[i].mem;
+ struct aie_range *range = &mem->range;
+ u32 c, r;
+
+ for (c = range->start.col;
+ c < range->start.col + range->size.col; c++) {
+ for (r = range->start.row;
+ r < range->start.row + range->size.row; r++) {
+ struct aie_location loc;
+ u32 memoff;
+
+ loc.col = c;
+ loc.row = r;
+ memoff = aie_cal_regoff(adev, loc, mem->offset);
+ memset_io(apart->aperture->base + memoff, 0,
+ mem->size);
+ }
+ }
+ }
+
+ return 0;
+}
+
/* aie_scan_part_clocks() - scan clocks of a partition
* @apart: AI engine partition
*
@@ -294,11 +359,48 @@ static int aie_set_part_clocks(struct aie_partition *apart)
return 0;
}
+
+/**
+ * aie_set_tile_isolation() - Set isolation boundary of AI engine tile
+ * @apart: AI engine partition
+ * @loc: Location of tile
+ * @dir: Direction to block
+ *
+ * Possible direction values are:
+ * - AIE_ISOLATE_EAST_MASK
+ * - AIE_ISOLATE_NORTH_MASK
+ * - AIE_ISOLATE_WEST_MASK
+ * - AIE_ISOLATE_SOUTH_MASK
+ * - AIE_ISOLATE_ALL_MASK
+ * - or "OR" of multiple values
+ */
+static void aie_set_tile_isolation(struct aie_partition *apart,
+ struct aie_location *loc, u8 dir)
+{
+ struct aie_aperture *aperture = apart->aperture;
+ struct aie_device *adev = apart->adev;
+ void __iomem *va;
+ u32 ttype, val;
+
+ val = (u32)dir;
+ ttype = aie_get_tile_type(adev, loc);
+ if (ttype == AIE_TILE_TYPE_TILE) {
+ va = aperture->base +
+ aie_cal_regoff(adev, *loc, AIE_TILE_CORE_TILECTRL_REGOFF);
+ } else {
+ va = aperture->base +
+ aie_cal_regoff(adev, *loc, AIE_SHIMPL_TILECTRL_REGOFF);
+ }
+ writel(val, va);
+}
+
static const struct aie_tile_operations aie_ops = {
.get_tile_type = aie_get_tile_type,
.get_mem_info = aie_get_mem_info,
+ .mem_clear = aie_part_clear_mems,
.scan_part_clocks = aie_scan_part_clocks,
.set_part_clocks = aie_set_part_clocks,
+ .set_tile_isolation = aie_set_tile_isolation,
};
/**
@@ -316,4 +418,6 @@ void aie_device_init(struct aie_device *adev)
adev->col_shift = AIE_COL_SHIFT;
adev->row_shift = AIE_ROW_SHIFT;
adev->ops = &aie_ops;
+ adev->num_core_regs = ARRAY_SIZE(aie_core_regs);
+ adev->core_regs = aie_core_regs;
}
diff --git a/drivers/accel/amd-ai-engine/ai-engine-aieml.c b/drivers/accel/amd-ai-engine/ai-engine-aieml.c
index 7730609ff7c0..4ceb51ea13af 100644
--- a/drivers/accel/amd-ai-engine/ai-engine-aieml.c
+++ b/drivers/accel/amd-ai-engine/ai-engine-aieml.c
@@ -27,6 +27,17 @@
* Register offsets
*/
#define AIEML_SHIMPL_COLCLOCK_CTRL_REGOFF 0x000fff20U
+#define AIEML_SHIMPL_TILECTRL_REGOFF 0x00036030U
+
+#define AIEML_MEMORY_TILECTRL_REGOFF 0x00096030U
+
+#define AIEML_TILE_COREMOD_AMLL0_PART1_REGOFF 0x00030000U
+#define AIEML_TILE_COREMOD_AMHH8_PART2_REGOFF 0x00030470U
+#define AIEML_TILE_COREMOD_R0_REGOFF 0x00030c00U
+#define AIEML_TILE_COREMOD_R31_REGOFF 0x00030df0U
+#define AIEML_TILE_COREMOD_TILECTRL_REGOFF 0x00036030U
+#define AIEML_TILE_COREMOD_WL0_PART1_REGOFF 0x00030800U
+#define AIEML_TILE_COREMOD_WH11_PART2_REGOFF 0x00030af0U
/*
* Register masks
@@ -34,6 +45,36 @@
#define AIEML_SHIMPL_COLRESET_CTRL_MASK GENMASK(1, 0)
#define AIEML_SHIMPL_COLCLOCK_CTRL_MASK GENMASK(1, 0)
+static const struct aie_tile_regs aieml_core_amxx_regs = {
+ .attribute = AIE_TILE_TYPE_TILE << AIE_REGS_ATTR_TILE_TYPE_SHIFT,
+ .soff = AIEML_TILE_COREMOD_AMLL0_PART1_REGOFF,
+ .eoff = AIEML_TILE_COREMOD_AMHH8_PART2_REGOFF,
+};
+
+static const struct aie_tile_regs aieml_core_wx_regs = {
+ .attribute = AIE_TILE_TYPE_TILE << AIE_REGS_ATTR_TILE_TYPE_SHIFT,
+ .soff = AIEML_TILE_COREMOD_WL0_PART1_REGOFF,
+ .eoff = AIEML_TILE_COREMOD_WH11_PART2_REGOFF,
+};
+
+static const struct aie_tile_regs aieml_core_32bit_regs = {
+ .attribute = AIE_TILE_TYPE_TILE << AIE_REGS_ATTR_TILE_TYPE_SHIFT,
+ .soff = AIEML_TILE_COREMOD_R0_REGOFF,
+ .eoff = AIEML_TILE_COREMOD_R31_REGOFF,
+};
+
+static const struct aie_core_regs_attr aieml_core_regs[] = {
+ {.core_regs = &aieml_core_amxx_regs,
+ .width = 4,
+ },
+ {.core_regs = &aieml_core_wx_regs,
+ .width = 4,
+ },
+ {.core_regs = &aieml_core_32bit_regs,
+ .width = 1,
+ },
+};
+
static u32 aieml_get_tile_type(struct aie_device *adev,
struct aie_location *loc)
{
@@ -96,6 +137,27 @@ static unsigned int aieml_get_mem_info(struct aie_device *adev,
return NUM_TYPES_OF_MEM;
}
+/**
+ * aieml_part_clear_mems() - clear memories of every tile in a partition
+ * @apart: AI engine partition
+ *
+ * Return: return 0 for success, error code for failure
+ */
+static int aieml_part_clear_mems(struct aie_partition *apart)
+{
+ struct aie_range *range = &apart->range;
+ u32 node_id = apart->adev->pm_node_id;
+ int ret;
+
+ ret = zynqmp_pm_aie_operation(node_id, range->start.col,
+ range->size.col,
+ XILINX_AIE_OPS_ZEROISATION);
+ if (ret < 0)
+ dev_err(apart->aperture->dev, "failed to clear memory for partition\n");
+
+ return ret;
+}
+
/* aieml_scan_part_clocks() - scan clocks of a partition
* @apart: AI engine partition
*
@@ -232,11 +294,52 @@ static int aieml_set_part_clocks(struct aie_partition *apart)
return 0;
}
+/**
+ * aieml_set_tile_isolation() - Set isolation boundary of AI engile tile
+ * @apart: AI engine partition
+ * @loc: Location of tile
+ * @dir: Direction to block
+ *
+ * Possible direction values are:
+ * - AIE_ISOLATE_EAST_MASK
+ * - AIE_ISOLATE_NORTH_MASK
+ * - AIE_ISOLATE_WEST_MASK
+ * - AIE_ISOLATE_SOUTH_MASK
+ * - AIE_ISOLATE_ALL_MASK
+ * - or "OR" of multiple values
+ */
+static void aieml_set_tile_isolation(struct aie_partition *apart,
+ struct aie_location *loc, u8 dir)
+{
+ struct aie_aperture *aperture = apart->aperture;
+ struct aie_device *adev = apart->adev;
+ void __iomem *va;
+ u32 ttype, val;
+
+ /* For AIEML device, dir input will match register mask */
+ val = (u32)dir;
+ ttype = aieml_get_tile_type(adev, loc);
+ if (ttype == AIE_TILE_TYPE_TILE) {
+ va = aperture->base +
+ aie_cal_regoff(adev, *loc,
+ AIEML_TILE_COREMOD_TILECTRL_REGOFF);
+ } else if (ttype == AIE_TILE_TYPE_MEMORY) {
+ va = aperture->base +
+ aie_cal_regoff(adev, *loc, AIEML_MEMORY_TILECTRL_REGOFF);
+ } else {
+ va = aperture->base +
+ aie_cal_regoff(adev, *loc, AIEML_SHIMPL_TILECTRL_REGOFF);
+ }
+ writel(val, va);
+}
+
static const struct aie_tile_operations aieml_ops = {
.get_tile_type = aieml_get_tile_type,
.get_mem_info = aieml_get_mem_info,
+ .mem_clear = aieml_part_clear_mems,
.scan_part_clocks = aieml_scan_part_clocks,
.set_part_clocks = aieml_set_part_clocks,
+ .set_tile_isolation = aieml_set_tile_isolation,
};
/**
@@ -254,4 +357,6 @@ void aieml_device_init(struct aie_device *adev)
adev->col_shift = AIEML_COL_SHIFT;
adev->row_shift = AIEML_ROW_SHIFT;
adev->ops = &aieml_ops;
+ adev->num_core_regs = ARRAY_SIZE(aieml_core_regs);
+ adev->core_regs = aieml_core_regs;
}
diff --git a/drivers/accel/amd-ai-engine/ai-engine-clock.c b/drivers/accel/amd-ai-engine/ai-engine-clock.c
index 646ec1d1658c..6cf1348f135f 100644
--- a/drivers/accel/amd-ai-engine/ai-engine-clock.c
+++ b/drivers/accel/amd-ai-engine/ai-engine-clock.c
@@ -81,9 +81,6 @@ bool aie_part_check_clk_enable_loc(struct aie_partition *apart,
int aie_part_request_tiles(struct aie_partition *apart, int num_tiles,
struct aie_location *locs)
{
- int ret;
-
- mutex_lock(&apart->mlock);
if (num_tiles == 0) {
aie_resource_set(&apart->tiles_inuse, 0,
apart->tiles_inuse.total);
@@ -102,10 +99,7 @@ int aie_part_request_tiles(struct aie_partition *apart, int num_tiles,
aie_resource_set(&apart->tiles_inuse, bit, 1);
}
}
- ret = apart->adev->ops->set_part_clocks(apart);
- mutex_unlock(&apart->mlock);
-
- return ret;
+ return apart->adev->ops->set_part_clocks(apart);
}
/**
@@ -121,9 +115,6 @@ int aie_part_request_tiles(struct aie_partition *apart, int num_tiles,
int aie_part_release_tiles(struct aie_partition *apart, int num_tiles,
struct aie_location *locs)
{
- int ret;
-
- mutex_lock(&apart->mlock);
if (num_tiles == 0) {
aie_resource_clear(&apart->tiles_inuse, 0,
apart->tiles_inuse.total);
@@ -143,10 +134,7 @@ int aie_part_release_tiles(struct aie_partition *apart, int num_tiles,
}
}
- ret = apart->adev->ops->set_part_clocks(apart);
- mutex_unlock(&apart->mlock);
-
- return ret;
+ return apart->adev->ops->set_part_clocks(apart);
}
/**
diff --git a/drivers/accel/amd-ai-engine/ai-engine-internal.h b/drivers/accel/amd-ai-engine/ai-engine-internal.h
index 13a39c4e3331..864fe5d57be4 100644
--- a/drivers/accel/amd-ai-engine/ai-engine-internal.h
+++ b/drivers/accel/amd-ai-engine/ai-engine-internal.h
@@ -23,6 +23,9 @@
#define KBYTES(n) ((n) * SZ_1K)
+/* AIE core registers step size */
+#define AIE_CORE_REGS_STEP 0x10
+
/*
* Macros for AI engine tile type bitmasks
*/
@@ -40,6 +43,24 @@ enum aie_tile_type {
#define AIE_TILE_TYPE_MASK_SHIMNOC BIT(AIE_TILE_TYPE_SHIMNOC)
#define AIE_TILE_TYPE_MASK_MEMORY BIT(AIE_TILE_TYPE_MEMORY)
+/*
+ * Macros for attribute property of AI engine registers accessed by kernel
+ * 0 - 7 bits: tile type bits
+ * 8 - 15 bits: permission bits. If it is 1, it allows write from userspace
+ */
+#define AIE_REGS_ATTR_TILE_TYPE_SHIFT 0U
+#define AIE_REGS_ATTR_PERM_SHIFT 8U
+#define AIE_REGS_ATTR_TILE_TYPE_MASK GENMASK(AIE_REGS_ATTR_PERM_SHIFT - 1, \
+ AIE_REGS_ATTR_TILE_TYPE_SHIFT)
+#define AIE_REGS_ATTR_PERM_MASK GENMASK(15, \
+ AIE_REGS_ATTR_PERM_SHIFT)
+
+#define AIE_ISOLATE_EAST_MASK BIT(3)
+#define AIE_ISOLATE_NORTH_MASK BIT(2)
+#define AIE_ISOLATE_WEST_MASK BIT(1)
+#define AIE_ISOLATE_SOUTH_MASK BIT(0)
+#define AIE_ISOLATE_ALL_MASK GENMASK(3, 0)
+
/**
* struct aie_tile_regs - contiguous range of AI engine register
* within an AI engine tile
@@ -133,10 +154,21 @@ struct aie_tile_attr {
const enum aie_module_type *mods;
};
+/**
+ * struct aie_core_regs_attr - AI engine core register attributes structure
+ * @core_regs: core registers
+ * @width: number of 32 bit words
+ */
+struct aie_core_regs_attr {
+ const struct aie_tile_regs *core_regs;
+ u32 width;
+};
+
/**
* struct aie_tile_operations - AI engine device operations
* @get_tile_type: get type of tile based on tile operation
* @get_mem_info: get different types of memories information
+ * @mem_clear: clear data memory banks of the partition.
* @scan_part_clocks: scan partition modules to check whether the modules are
* clock gated or not, and update the soft clock states
* structure. It is required to be called when the partition
@@ -149,6 +181,7 @@ struct aie_tile_attr {
* caller to apply partition lock before calling this
* function. The caller function will need to set the bitmap
* on which tiles are required to be clocked on.
+ * @set_tile_isolation: set tile isolation boundary for input direction.
* Different AI engine device version has its own device
* operation.
*/
@@ -157,8 +190,11 @@ struct aie_tile_operations {
unsigned int (*get_mem_info)(struct aie_device *adev,
struct aie_range *range,
struct aie_part_mem *pmem);
+ int (*mem_clear)(struct aie_partition *apart);
int (*scan_part_clocks)(struct aie_partition *apart);
int (*set_part_clocks)(struct aie_partition *apart);
+ void (*set_tile_isolation)(struct aie_partition *apart,
+ struct aie_location *loc, u8 dir);
};
/**
@@ -167,12 +203,14 @@ struct aie_tile_operations {
* @dev: device pointer for the AI engine device
* @mlock: protection for AI engine device operations
* @clk: AI enigne device clock
+ * @core_regs: array of core registers
* @ops: tile operations
* @array_shift: array address shift
* @col_shift: column address shift
* @row_shift: row address shift
* @dev_gen: aie hardware device generation
* @pm_node_id: AI Engine platform management node ID
+ * @num_core_regs: number of core registers range
* @ttype_attr: tile type attributes
*/
struct aie_device {
@@ -180,12 +218,14 @@ struct aie_device {
struct device *dev;
struct mutex mlock; /* protection for AI engine apertures */
struct clk *clk;
+ const struct aie_core_regs_attr *core_regs;
const struct aie_tile_operations *ops;
u32 array_shift;
u32 col_shift;
u32 row_shift;
u32 dev_gen;
u32 pm_node_id;
+ u32 num_core_regs;
struct aie_tile_attr ttype_attr[AIE_TILE_TYPE_MAX];
};
@@ -301,6 +341,10 @@ int aie_part_request_tiles(struct aie_partition *apart, int num_tiles,
struct aie_location *locs);
int aie_part_release_tiles(struct aie_partition *apart, int num_tiles,
struct aie_location *locs);
+int aie_part_clean(struct aie_partition *apart);
+int aie_part_initialize(struct aie_partition *apart,
+ struct aie_partition_init_args *args);
+int aie_part_teardown(struct aie_partition *apart);
int aie_resource_initialize(struct aie_resource *res, int count);
void aie_resource_uninitialize(struct aie_resource *res);
int aie_resource_check_region(struct aie_resource *res, u32 start,
@@ -310,6 +354,7 @@ int aie_resource_get_region(struct aie_resource *res, u32 start,
void aie_resource_put_region(struct aie_resource *res, int start, u32 count);
int aie_resource_set(struct aie_resource *res, u32 start, u32 count);
int aie_resource_clear(struct aie_resource *res, u32 start, u32 count);
+int aie_resource_clear_all(struct aie_resource *res);
bool aie_resource_testbit(struct aie_resource *res, u32 bit);
#endif /* AIE_INTERNAL_H */
diff --git a/drivers/accel/amd-ai-engine/ai-engine-part.c b/drivers/accel/amd-ai-engine/ai-engine-part.c
index 878597eff202..97bb10d11309 100644
--- a/drivers/accel/amd-ai-engine/ai-engine-part.c
+++ b/drivers/accel/amd-ai-engine/ai-engine-part.c
@@ -12,6 +12,35 @@
#include "ai-engine-internal.h"
+/*
+ * aie_partition_initialize() - Initialize AI engine partition
+ * @apart: AI engine partition instance
+ * @args: User initialization options
+ *
+ * Return: 0 for success, negative value for failure
+ */
+int aie_partition_initialize(void *apart, struct aie_partition_init_args *args)
+{
+ if (!apart)
+ return -EINVAL;
+ return aie_part_initialize((struct aie_partition *)apart, args);
+}
+EXPORT_SYMBOL_GPL(aie_partition_initialize);
+
+/*
+ * aie_partition_reset() - Reset AI engine partition
+ * @apart: AI engine partition instance
+ *
+ * Return: 0 for success, negative value for failure
+ */
+int aie_partition_teardown(void *apart)
+{
+ if (!apart)
+ return -EINVAL;
+ return aie_part_teardown((struct aie_partition *)apart);
+}
+EXPORT_SYMBOL_GPL(aie_partition_teardown);
+
/**
* aie_part_create_mems_info() - creates array to store the AI engine partition
* different memories types information
@@ -58,6 +87,8 @@ void aie_part_release(struct aie_partition *apart)
{
struct aie_aperture *aperture = apart->aperture;
+ /* aie_part_clean() will do hardware reset */
+ aie_part_clean(apart);
aie_part_set_freq(apart, 0);
mutex_lock(&aperture->mlock);
aie_resource_put_region(&aperture->cols_res,
diff --git a/drivers/accel/amd-ai-engine/ai-engine-res.c b/drivers/accel/amd-ai-engine/ai-engine-res.c
index d71a3a5f7b29..eff41986d5b6 100644
--- a/drivers/accel/amd-ai-engine/ai-engine-res.c
+++ b/drivers/accel/amd-ai-engine/ai-engine-res.c
@@ -151,6 +151,22 @@ int aie_resource_clear(struct aie_resource *res, u32 start, u32 count)
return 0;
}
+/**
+ * aie_resource_clear_all() - clear all the AI engine resource bits
+ * @res: pointer to AI engine resource
+ * @return: 0 for success and negative value for failure
+ *
+ * This function clears all the bits in the resource.
+ */
+int aie_resource_clear_all(struct aie_resource *res)
+{
+ if (!res || !res->bitmap)
+ return -EINVAL;
+
+ bitmap_clear(res->bitmap, 0, res->total);
+ return 0;
+}
+
/**
* aie_resource_testbit() - test if a bit is set in a AI engine resource
* @res: pointer to AI engine resource
diff --git a/drivers/accel/amd-ai-engine/ai-engine-reset.c b/drivers/accel/amd-ai-engine/ai-engine-reset.c
new file mode 100644
index 000000000000..650811063232
--- /dev/null
+++ b/drivers/accel/amd-ai-engine/ai-engine-reset.c
@@ -0,0 +1,300 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * AMD AI Engine device driver reset implementation
+ *
+ * Copyright (C) 2025 Advanced Micro Devices, Inc.
+ */
+
+#include <linux/amd-ai-engine.h>
+#include <linux/firmware/xlnx-zynqmp.h>
+#include <linux/io.h>
+#include <linux/mutex.h>
+
+#include "ai-engine-internal.h"
+
+/**
+ * aie_part_clear_core_regs_of_tile() - clear registers of aie core
+ * @apart: AI engine partition
+ * @loc: location of aie tile to clear
+ */
+static void aie_part_clear_core_regs_of_tile(struct aie_partition *apart,
+ struct aie_location loc)
+{
+ struct aie_device *adev = apart->adev;
+ struct aie_aperture *aperture = apart->aperture;
+ const struct aie_core_regs_attr *regs = adev->core_regs;
+ u32 i;
+
+ for (i = 0; i < adev->num_core_regs; i++) {
+ u32 j, soff, eoff, reg;
+
+ soff = aie_cal_regoff(adev, loc, regs[i].core_regs->soff);
+ eoff = aie_cal_regoff(adev, loc, regs[i].core_regs->eoff);
+
+ for (reg = soff; reg <= eoff; reg += AIE_CORE_REGS_STEP) {
+ for (j = 0; j < regs[i].width; j++)
+ writel(0, aperture->base + reg + j * 4);
+ }
+ }
+}
+
+/**
+ * aie_part_clear_core_regs - clear registers of aie core of a partition
+ * @apart: AI engine partition
+ */
+static void aie_part_clear_core_regs(struct aie_partition *apart)
+{
+ struct aie_range *range = &apart->range;
+ u32 c, r;
+
+ /* clear core registers for each tile in the partition */
+ for (c = range->start.col; c < range->start.col + range->size.col;
+ c++) {
+ for (r = range->start.row;
+ r < range->start.row + range->size.row; r++) {
+ struct aie_location loc;
+ u32 ttype;
+
+ loc.row = r;
+ loc.col = c;
+ ttype = apart->adev->ops->get_tile_type(apart->adev,
+ &loc);
+ if (ttype == AIE_TILE_TYPE_TILE &&
+ aie_part_check_clk_enable_loc(apart, &loc))
+ aie_part_clear_core_regs_of_tile(apart, loc);
+ }
+ }
+}
+
+/**
+ * aie_part_clean() - reset and clear AI engine partition
+ * @apart: AI engine partition
+ *
+ * Return: 0 for success and negative value for failure
+ *
+ * This function will:
+ * * gate all the columns
+ * * reset AI engine partition columns
+ * * reset AI engine shims
+ * * clear the memories
+ * * clear core registers
+ * * gate all the tiles in a partition
+ * * update clock state bitmap
+ *
+ * This function will not validate the partition, the caller will need to
+ * provide a valid AI engine partition.
+ */
+int aie_part_clean(struct aie_partition *apart)
+{
+ u32 node_id = apart->adev->pm_node_id;
+ int ret;
+
+ mutex_lock(&apart->mlock);
+ ret = zynqmp_pm_aie_operation(node_id, apart->range.start.col,
+ apart->range.size.col,
+ XILINX_AIE_OPS_DIS_COL_CLK_BUFF);
+ if (ret < 0)
+ goto exit;
+
+ ret = zynqmp_pm_aie_operation(node_id, apart->range.start.col,
+ apart->range.size.col,
+ XILINX_AIE_OPS_COL_RST |
+ XILINX_AIE_OPS_SHIM_RST);
+ if (ret < 0)
+ goto exit;
+
+ ret = zynqmp_pm_aie_operation(node_id, apart->range.start.col,
+ apart->range.size.col,
+ XILINX_AIE_OPS_ENB_COL_CLK_BUFF);
+ if (ret < 0)
+ goto exit;
+
+ apart->adev->ops->mem_clear(apart);
+ aie_part_clear_core_regs(apart);
+ ret = zynqmp_pm_aie_operation(node_id, apart->range.start.col,
+ apart->range.size.col,
+ XILINX_AIE_OPS_DIS_COL_CLK_BUFF);
+ if (ret < 0)
+ goto exit;
+
+ aie_resource_clear_all(&apart->cores_clk_state);
+
+exit:
+ mutex_unlock(&apart->mlock);
+ return ret;
+}
+
+/**
+ * aie_part_init_isolation() - Set isolation boundary of AI engine partition
+ * @apart: AI engine partition
+ */
+static void aie_part_init_isolation(struct aie_partition *apart)
+{
+ struct aie_range *range = &apart->range;
+ u32 c, r;
+ u8 dir;
+
+ for (c = range->start.col;
+ c < range->start.col + range->size.col; c++) {
+ if (c == range->start.col)
+ dir = AIE_ISOLATE_WEST_MASK;
+ else if (c == (range->start.col + range->size.col - 1))
+ dir = AIE_ISOLATE_EAST_MASK;
+ else
+ dir = 0;
+
+ for (r = range->start.row;
+ r < range->start.row + range->size.row; r++) {
+ struct aie_location loc;
+
+ loc.col = c;
+ loc.row = r;
+ apart->adev->ops->set_tile_isolation(apart, &loc, dir);
+ }
+ }
+}
+
+/**
+ * aie_part_initialize() - AI engine partition initialization
+ * @apart: AI engine partition
+ * @args: User initialization options
+ *
+ * Return: 0 for success and negative value for failure
+ *
+ * This function will:
+ * - gate all columns
+ * - enable column reset
+ * - ungate all columns
+ * - disable column reset
+ * - reset shim tiles
+ * - setup axi mm to raise events
+ * - setup partition isolation
+ * - zeroize memory
+ */
+int aie_part_initialize(struct aie_partition *apart,
+ struct aie_partition_init_args *args)
+{
+ u32 node_id = apart->adev->pm_node_id;
+ int ret;
+
+ if (!args)
+ return -EINVAL;
+
+ mutex_lock(&apart->mlock);
+
+ /* Clear resources */
+ aie_resource_clear_all(&apart->tiles_inuse);
+ aie_resource_clear_all(&apart->cores_clk_state);
+
+ /* This operation will do first 4 steps of sequence */
+ if (args->init_opts & AIE_PART_INIT_OPT_COLUMN_RST) {
+ ret = zynqmp_pm_aie_operation(node_id, apart->range.start.col,
+ apart->range.size.col,
+ XILINX_AIE_OPS_COL_RST);
+ if (ret < 0)
+ goto exit;
+ }
+
+ /* Reset Shims */
+ if (args->init_opts & AIE_PART_INIT_OPT_SHIM_RST) {
+ ret = zynqmp_pm_aie_operation(node_id, apart->range.start.col,
+ apart->range.size.col,
+ XILINX_AIE_OPS_SHIM_RST);
+ if (ret < 0)
+ goto exit;
+ }
+
+ /* Setup AXIMM events */
+ if (args->init_opts & AIE_PART_INIT_OPT_BLOCK_NOCAXIMMERR) {
+ ret = zynqmp_pm_aie_operation(node_id, apart->range.start.col,
+ apart->range.size.col,
+ XILINX_AIE_OPS_ENB_AXI_MM_ERR_EVENT);
+ if (ret < 0)
+ goto exit;
+ }
+
+ /* Setup partition isolation */
+ if (args->init_opts & AIE_PART_INIT_OPT_ISOLATE)
+ aie_part_init_isolation(apart);
+
+ /* Zeroize memory */
+ if (args->init_opts & AIE_PART_INIT_OPT_ZEROIZEMEM) {
+ ret = zynqmp_pm_aie_operation(node_id, apart->range.start.col,
+ apart->range.size.col,
+ XILINX_AIE_OPS_ZEROISATION);
+ if (ret < 0)
+ goto exit;
+ }
+
+ /* Set L2 interrupt */
+ ret = zynqmp_pm_aie_operation(node_id, apart->range.start.col,
+ apart->range.size.col,
+ XILINX_AIE_OPS_SET_L2_CTRL_NPI_INTR);
+ if (ret < 0)
+ goto exit;
+
+ /* Request tile locations */
+ ret = aie_part_request_tiles(apart, args->num_tiles, args->locs);
+
+exit:
+ mutex_unlock(&apart->mlock);
+ return ret;
+}
+
+/**
+ * aie_part_teardown() - AI engine partition teardown
+ * @apart: AI engine partition
+ *
+ * Return: 0 for success and negative value for failure
+ *
+ * This function will:
+ * - gate all columns
+ * - enable column reset
+ * - ungate all columns
+ * - disable column reset
+ * - reset shim tiles
+ * - zeroize memory
+ * - gate all columns
+ */
+int aie_part_teardown(struct aie_partition *apart)
+{
+ u32 node_id = apart->adev->pm_node_id;
+ int ret;
+
+ mutex_lock(&apart->mlock);
+
+ /* This operation will do first 4 steps of sequence */
+ ret = zynqmp_pm_aie_operation(node_id, apart->range.start.col,
+ apart->range.size.col,
+ XILINX_AIE_OPS_COL_RST);
+ if (ret < 0)
+ goto exit;
+
+ /* Reset shims */
+ ret = zynqmp_pm_aie_operation(node_id, apart->range.start.col,
+ apart->range.size.col,
+ XILINX_AIE_OPS_SHIM_RST);
+ if (ret < 0)
+ goto exit;
+
+ /* Zeroize mem */
+ ret = zynqmp_pm_aie_operation(node_id, apart->range.start.col,
+ apart->range.size.col,
+ XILINX_AIE_OPS_ZEROISATION);
+ if (ret < 0)
+ goto exit;
+
+ /* Gate all columns */
+ ret = zynqmp_pm_aie_operation(node_id, apart->range.start.col,
+ apart->range.size.col,
+ XILINX_AIE_OPS_DIS_COL_CLK_BUFF);
+ if (ret < 0)
+ goto exit;
+
+ /* Clear tile_inuse bitmap */
+ ret = aie_part_release_tiles(apart, 0U, NULL);
+
+exit:
+ mutex_unlock(&apart->mlock);
+ return ret;
+}
diff --git a/include/linux/amd-ai-engine.h b/include/linux/amd-ai-engine.h
index f1f6543f9eae..3e4ae8cb5e91 100644
--- a/include/linux/amd-ai-engine.h
+++ b/include/linux/amd-ai-engine.h
@@ -12,6 +12,16 @@
#include <linux/list.h>
#include <linux/mutex.h>
+/*
+ * AI engine partition initialize options
+ */
+#define AIE_PART_INIT_OPT_COLUMN_RST BIT(0)
+#define AIE_PART_INIT_OPT_SHIM_RST BIT(1)
+#define AIE_PART_INIT_OPT_BLOCK_NOCAXIMMERR BIT(2)
+#define AIE_PART_INIT_OPT_ISOLATE BIT(3)
+#define AIE_PART_INIT_OPT_ZEROIZEMEM BIT(4)
+#define AIE_PART_INIT_OPT_DEFAULT GENMASK(3, 0)
+
/**
* struct aie_partition_req - AIE request partition arguments
* @start_col: start column of the partition
@@ -40,8 +50,22 @@ struct aie_location {
u32 row;
};
+/**
+ * struct aie_partition_init_args - AIE partition initialization arguments
+ * @locs: Allocated array of tile locations that will be used
+ * @num_tiles: Number of tiles to use
+ * @init_opts: Partition initialization options
+ */
+struct aie_partition_init_args {
+ struct aie_location *locs;
+ u32 num_tiles;
+ u32 init_opts;
+};
+
void *aie_partition_request(struct device *dev, struct aie_partition_req *req);
void aie_partition_release(void *apart);
+int aie_partition_initialize(void *apart, struct aie_partition_init_args *args);
+int aie_partition_teardown(void *apart);
int aie_partition_set_freq_req(void *apart, u64 freq);
int aie_partition_get_freq(void *apart, u64 *freq);
int aie_partition_get_freq_req(void *apart, u64 *freq);
--
2.34.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH V1 3/9] dt-bindings: power: Add AMD Versal power domain bindings
2025-07-02 15:56 ` [PATCH V1 3/9] dt-bindings: power: Add AMD Versal power domain bindings Gregory Williams
@ 2025-07-03 6:43 ` Krzysztof Kozlowski
2025-07-10 18:53 ` Williams, Gregory
0 siblings, 1 reply; 21+ messages in thread
From: Krzysztof Kozlowski @ 2025-07-03 6:43 UTC (permalink / raw)
To: Gregory Williams, ogabbay, michal.simek, robh
Cc: dri-devel, devicetree, linux-kernel
On 02/07/2025 17:56, Gregory Williams wrote:
> Define Versal power domain value macros.
>
> Signed-off-by: Gregory Williams <gregory.williams@amd.com>
> ---
> include/dt-bindings/power/xlnx-versal-power.h | 55 +++++++++++++++++++
<form letter>
Please use scripts/get_maintainers.pl to get a list of necessary people
and lists to CC (and consider --no-git-fallback argument, so you will
not CC people just because they made one commit years ago). It might
happen, that command when run on an older kernel, gives you outdated
entries. Therefore please be sure you base your patches on recent Linux
kernel.
Tools like b4 or scripts/get_maintainer.pl provide you proper list of
people, so fix your workflow. Tools might also fail if you work on some
ancient tree (don't, instead use mainline) or work on fork of kernel
(don't, instead use mainline). Just use b4 and everything should be
fine, although remember about `b4 prep --auto-to-cc` if you added new
patches to the patchset.
</form letter>
> 1 file changed, 55 insertions(+)
> create mode 100644 include/dt-bindings/power/xlnx-versal-power.h
>
> diff --git a/include/dt-bindings/power/xlnx-versal-power.h b/include/dt-bindings/power/xlnx-versal-power.h
> new file mode 100644
> index 000000000000..effbc70e5a12
> --- /dev/null
> +++ b/include/dt-bindings/power/xlnx-versal-power.h
> @@ -0,0 +1,55 @@
> +/* SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause) */
> +/*
> + * Copyright (C) 2019 - 2021 Xilinx, Inc.
> + * Copyright (C) 2024 Advanced Micro Devices, Inc.
> + */
> +
> +#ifndef _DT_BINDINGS_VERSAL_POWER_H
> +#define _DT_BINDINGS_VERSAL_POWER_H
> +
> +#define PM_DEV_RPU0_0 (0x18110005U)
> +#define PM_DEV_RPU0_1 (0x18110006U)
Bindings ID start from 0 or 1 and are decimal numbers. None of these are
bindings (and commit msg does not explain here anything).
Also, where is the compatible using these? Why is this a separate patch?
Best regards,
Krzysztof
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH V1 4/9] dt-bindings: soc: xilinx: Add AI engine DT binding
2025-07-02 15:56 ` [PATCH V1 4/9] dt-bindings: soc: xilinx: Add AI engine DT binding Gregory Williams
@ 2025-07-03 6:48 ` Krzysztof Kozlowski
2025-07-10 19:03 ` Williams, Gregory
0 siblings, 1 reply; 21+ messages in thread
From: Krzysztof Kozlowski @ 2025-07-03 6:48 UTC (permalink / raw)
To: Gregory Williams, ogabbay, michal.simek, robh
Cc: dri-devel, devicetree, linux-kernel
On 02/07/2025 17:56, Gregory Williams wrote:
> In the device tree, there will be device node for the AI engine device,
> and device nodes for the statically configured AI engine apertures.
No, describe the hardware, not DTS.
> Apertures are an isolated set of columns with in the AI engine device
> with their own address space and interrupt.
>
> Signed-off-by: Gregory Williams <gregory.williams@amd.com>
> ---
> .../bindings/soc/xilinx/xlnx,ai-engine.yaml | 151 ++++++++++++++++++
> 1 file changed, 151 insertions(+)
> create mode 100644 Documentation/devicetree/bindings/soc/xilinx/xlnx,ai-engine.yaml
>
> diff --git a/Documentation/devicetree/bindings/soc/xilinx/xlnx,ai-engine.yaml b/Documentation/devicetree/bindings/soc/xilinx/xlnx,ai-engine.yaml
> new file mode 100644
> index 000000000000..7d9a36c56366
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/soc/xilinx/xlnx,ai-engine.yaml
Filename matching compatible.
> @@ -0,0 +1,151 @@
> +# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/soc/xilinx/xlnx,ai-engine.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: AMD AI Engine
That's really too generic...
> +
> +maintainers:
> + - Gregory Williams <gregory.williams@amd.com>
> +
> +description:
> + The AMD AI Engine is a tile processor with many cores (up to 400) that
> + can run in parallel. The data routing between cores is configured through
> + internal switches, and shim tiles interface with external interconnect, such
> + as memory or PL. One AI engine device can have multiple apertures, each
> + has its own address space and interrupt. At runtime application can create
> + multiple partitions within an aperture which are groups of columns of AI
> + engine tiles. Each AI engine partition is the minimum resetable unit for an
> + AI engine application.
> +
> +properties:
> + compatible:
> + const: xlnx,ai-engine-v2.0
What does v2.0 stands for? Versioning is discouraged, unless mapping is
well documented.
> +
> + reg:
> + maxItems: 1
> +
> + '#address-cells':
> + const: 2
> +
> + '#size-cells':
> + const: 2
> +
> + power-domains:
Missing constraints.
> + description:
> + Platform management node id used to request power management services
> + from the firmware driver.
Drop description, redundant.
> +
> + xlnx,aie-gen:
> + $ref: /schemas/types.yaml#/definitions/uint8
Why uint8?
> + description:
> + Hardware generation of AI engine device. E.g. the current values supported
> + are 1 (AIE) and 2 (AIEML).
No clue what's that, but it is implied by compatible, isn't it?
Missing constraints.
> +
> + xlnx,shim-rows:
> + $ref: /schemas/types.yaml#/definitions/uint8-array
> + description:
> + start row and the number of rows of SHIM tiles of the AI engine device
Implied by compatible.
Missing constraints.
> +
> + xlnx,core-rows:
> + $ref: /schemas/types.yaml#/definitions/uint8-array
> + description:
> + start row and the number of rows of core tiles of the AI engine device
> +
> + xlnx,mem-rows:
> + $ref: /schemas/types.yaml#/definitions/uint8-array
> + description:
> + start row and the number of rows of memory tiles of the AI engine device
> +
Same comments everywhere.
> +required:
> + - compatible
> + - reg
> + - power-domains
> + - xlnx,aie-gen
> + - xlnx,shim-rows
> + - xlnx,core-rows
> + - xlnx,mem-rows
> +
> +patternProperties:
This goes after properties.
> + "^aperture@[0-9]+$":
> + type: object
> + description:
> + AI engine aperture which is a group of column based tiles of the
> + AI engine device. Each AI engine apertures isolated from the
> + other AI engine apertures. An AI engine aperture is defined by
> + AMD/Xilinx platform design tools.
> +
> + properties:
> + compatible:
> + const: xlnx,ai-engine-aperture
> +
> + reg:
> + description:
> + Physical base address and length of the aperture registers.
> + The AI engine address space assigned to Linux is defined by
> + Xilinx/AMD platform design tool.
Missing constraints. Description is redundant - can it be anything else?
Plus you clearly miss ranges.
> +
> + interrupts:
> + maxItems: 3
> +
> + interrupt-names:
> + items:
> + - const: interrupt1
> + - const: interrupt2
> + - const: interrupt3
Useless names, drop entirely.
> +
> + xlnx,columns:
> + $ref: /schemas/types.yaml#/definitions/uint32-array
> + description:
> + It describes the location of the aperture. It specifies the start
> + column and the number of columns. E.g. an aperture starts from
> + column 0 and there are 50 columns, it will be presented as <0 50>.
Same comments as before
> +
> + xlnx,node-id:
> + $ref: /schemas/types.yaml#/definitions/uint32
> + description:
> + AI engine aperture node ID, which is defined by AMD/Xilinx platform
> + design tool to identify the AI engine aperture in the firmware.
No, you do not get node ID. Recently every day a patch comes for that...
> +
> + required:
> + - compatible
> + - reg
> + - xlnx,columns
> + - xlnx,node-id
> +
> + additionalProperties: false
> +
> +additionalProperties: false
> +
> +examples:
> + - |
> + #include <dt-bindings/power/xlnx-versal-power.h>
> + bus {
> + #address-cells = <2>;
> + #size-cells = <2>;
> + ai_engine: ai-engine@20000000000 {
> + compatible = "xlnx,ai-engine-v2.0";
> + reg = <0x200 0x00 0x01 0x00>;
> + #address-cells = <2>;
> + #size-cells = <2>;
> + power-domains = <&versal_firmware PM_DEV_AI>;
> + xlnx,aie-gen = /bits/ 8 <0x1>;
> + xlnx,core-rows = /bits/ 8 <1 8>;
> + xlnx,mem-rows = /bits/ 8 <0 0>;
> + xlnx,shim-rows = /bits/ 8 <0 1>
This cannot be without ranges... I am surprised it actually works, but
for sure was not tested and produces warnings.
> +
> + aperture0: aperture@200000000000 {
> + /* 50 columns and 8 core tile rows + 1 SHIM row */
> + compatible = "xlnx,ai-engine-aperture";
> + reg = <0x200 0x0 0x1 0x0>;
> + interrupts = <0x0 0x94 0x4>,
> + <0x0 0x95 0x4>,
> + <0x0 0x96 0x4>;
Use proper flags.
Best regards,
Krzysztof
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH V1 1/9] firmware: xilinx: Add IOCTL support for the AIE run time operations
2025-07-02 15:56 ` [PATCH V1 1/9] firmware: xilinx: Add IOCTL support for the AIE run time operations Gregory Williams
@ 2025-07-03 6:50 ` Krzysztof Kozlowski
2025-07-10 18:49 ` Williams, Gregory
0 siblings, 1 reply; 21+ messages in thread
From: Krzysztof Kozlowski @ 2025-07-03 6:50 UTC (permalink / raw)
To: Gregory Williams, ogabbay, michal.simek, robh
Cc: Ronak Jain, dri-devel, devicetree, linux-kernel
On 02/07/2025 17:56, Gregory Williams wrote:
> From: Ronak Jain <ronak.jain@amd.com>
>
> Add IOCTL support for the AIE run time operations listed below
> - Column Reset
> - Shim Reset
> - Enabling of column clock buffer
> - Zeroisation of Program and data memories
> - Disabling of column clock buffer
> - Enabling AXI-MM error event
> - Set L2 controller NPI INTR
>
> Signed-off-by: Ronak Jain <ronak.jain@amd.com>
Incomplete chain. Read submitting patches.
Best regards,
Krzysztof
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH V1 1/9] firmware: xilinx: Add IOCTL support for the AIE run time operations
2025-07-03 6:50 ` Krzysztof Kozlowski
@ 2025-07-10 18:49 ` Williams, Gregory
0 siblings, 0 replies; 21+ messages in thread
From: Williams, Gregory @ 2025-07-10 18:49 UTC (permalink / raw)
To: Krzysztof Kozlowski, Gregory Williams, ogabbay, michal.simek,
robh
Cc: Ronak Jain, dri-devel, devicetree, linux-kernel
On 7/3/2025 12:50 AM, Krzysztof Kozlowski wrote:>
>
> On 02/07/2025 17:56, Gregory Williams wrote:
>> From: Ronak Jain <ronak.jain@amd.com>
>>
>> Add IOCTL support for the AIE run time operations listed below
>> - Column Reset
>> - Shim Reset
>> - Enabling of column clock buffer
>> - Zeroisation of Program and data memories
>> - Disabling of column clock buffer
>> - Enabling AXI-MM error event
>> - Set L2 controller NPI INTR
>>
>> Signed-off-by: Ronak Jain <ronak.jain@amd.com>
>
> Incomplete chain. Read submitting patches.
Hi Krzysztof, thanks for the review I appreciate it! I will make sure to fix the chain when I send V2.
Thanks,
Greg
>
> Best regards,
> Krzysztof
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH V1 3/9] dt-bindings: power: Add AMD Versal power domain bindings
2025-07-03 6:43 ` Krzysztof Kozlowski
@ 2025-07-10 18:53 ` Williams, Gregory
2025-07-10 21:34 ` Krzysztof Kozlowski
0 siblings, 1 reply; 21+ messages in thread
From: Williams, Gregory @ 2025-07-10 18:53 UTC (permalink / raw)
To: Krzysztof Kozlowski, Gregory Williams, ogabbay, michal.simek,
robh
Cc: dri-devel, devicetree, linux-kernel
On 7/3/2025 12:43 AM, Krzysztof Kozlowski wrote:
>
> On 02/07/2025 17:56, Gregory Williams wrote:
>> Define Versal power domain value macros.
>>
>> Signed-off-by: Gregory Williams <gregory.williams@amd.com>
>> ---
>> include/dt-bindings/power/xlnx-versal-power.h | 55 +++++++++++++++++++
>
> <form letter>
> Please use scripts/get_maintainers.pl to get a list of necessary people
> and lists to CC (and consider --no-git-fallback argument, so you will
> not CC people just because they made one commit years ago). It might
> happen, that command when run on an older kernel, gives you outdated
> entries. Therefore please be sure you base your patches on recent Linux
> kernel.
>
> Tools like b4 or scripts/get_maintainer.pl provide you proper list of
> people, so fix your workflow. Tools might also fail if you work on some
> ancient tree (don't, instead use mainline) or work on fork of kernel
> (don't, instead use mainline). Just use b4 and everything should be
> fine, although remember about `b4 prep --auto-to-cc` if you added new
> patches to the patchset.
> </form letter>
>
>
>> 1 file changed, 55 insertions(+)
>> create mode 100644 include/dt-bindings/power/xlnx-versal-power.h
>>
>> diff --git a/include/dt-bindings/power/xlnx-versal-power.h b/include/dt-bindings/power/xlnx-versal-power.h
>> new file mode 100644
>> index 000000000000..effbc70e5a12
>> --- /dev/null
>> +++ b/include/dt-bindings/power/xlnx-versal-power.h
>> @@ -0,0 +1,55 @@
>> +/* SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause) */
>> +/*
>> + * Copyright (C) 2019 - 2021 Xilinx, Inc.
>> + * Copyright (C) 2024 Advanced Micro Devices, Inc.
>> + */
>> +
>> +#ifndef _DT_BINDINGS_VERSAL_POWER_H
>> +#define _DT_BINDINGS_VERSAL_POWER_H
>> +
>> +#define PM_DEV_RPU0_0 (0x18110005U)
>> +#define PM_DEV_RPU0_1 (0x18110006U)
>
> Bindings ID start from 0 or 1 and are decimal numbers. None of these are
> bindings (and commit msg does not explain here anything).
>
> Also, where is the compatible using these? Why is this a separate patch?
In 'Submitting DT binding patches' it says: "The Documentation/ and include/dt-bindings/ portion of the patch should be a separate patch".
This define was only used in the device tree binding example, I see the issue with this and will remove for V2.
Thanks,
Greg
>
>
>
> Best regards,
> Krzysztof
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH V1 4/9] dt-bindings: soc: xilinx: Add AI engine DT binding
2025-07-03 6:48 ` Krzysztof Kozlowski
@ 2025-07-10 19:03 ` Williams, Gregory
2025-07-10 21:38 ` Krzysztof Kozlowski
0 siblings, 1 reply; 21+ messages in thread
From: Williams, Gregory @ 2025-07-10 19:03 UTC (permalink / raw)
To: Krzysztof Kozlowski, Gregory Williams, ogabbay, michal.simek,
robh
Cc: dri-devel, devicetree, linux-kernel
On 7/3/2025 12:48 AM, Krzysztof Kozlowski wrote:
> On 02/07/2025 17:56, Gregory Williams wrote:
>> In the device tree, there will be device node for the AI engine device,
>> and device nodes for the statically configured AI engine apertures.
>
> No, describe the hardware, not DTS.
>
>> Apertures are an isolated set of columns with in the AI engine device
>> with their own address space and interrupt.
>>
>> Signed-off-by: Gregory Williams <gregory.williams@amd.com>
>> ---
>> .../bindings/soc/xilinx/xlnx,ai-engine.yaml | 151 ++++++++++++++++++
>> 1 file changed, 151 insertions(+)
>> create mode 100644 Documentation/devicetree/bindings/soc/xilinx/xlnx,ai-engine.yaml
>>
>> diff --git a/Documentation/devicetree/bindings/soc/xilinx/xlnx,ai-engine.yaml b/Documentation/devicetree/bindings/soc/xilinx/xlnx,ai-engine.yaml
>> new file mode 100644
>> index 000000000000..7d9a36c56366
>> --- /dev/null
>> +++ b/Documentation/devicetree/bindings/soc/xilinx/xlnx,ai-engine.yaml
>
> Filename matching compatible.
>
>> @@ -0,0 +1,151 @@
>> +# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
>> +%YAML 1.2
>> +---
>> +$id: http://devicetree.org/schemas/soc/xilinx/xlnx,ai-engine.yaml#
>> +$schema: http://devicetree.org/meta-schemas/core.yaml#
>> +
>> +title: AMD AI Engine
>
> That's really too generic...
>
>> +
>> +maintainers:
>> + - Gregory Williams <gregory.williams@amd.com>
>> +
>> +description:
>> + The AMD AI Engine is a tile processor with many cores (up to 400) that
>> + can run in parallel. The data routing between cores is configured through
>> + internal switches, and shim tiles interface with external interconnect, such
>> + as memory or PL. One AI engine device can have multiple apertures, each
>> + has its own address space and interrupt. At runtime application can create
>> + multiple partitions within an aperture which are groups of columns of AI
>> + engine tiles. Each AI engine partition is the minimum resetable unit for an
>> + AI engine application.
>> +
>> +properties:
>> + compatible:
>> + const: xlnx,ai-engine-v2.0
>
> What does v2.0 stands for? Versioning is discouraged, unless mapping is
> well documented.
Sure, I will remove the versioning in V2 patch.
>
>> +
>> + reg:
>> + maxItems: 1
>> +
>> + '#address-cells':
>> + const: 2
>> +
>> + '#size-cells':
>> + const: 2
>> +
>> + power-domains:
>
> Missing constraints.
>
>> + description:
>> + Platform management node id used to request power management services
>> + from the firmware driver.
>
> Drop description, redundant.
>
>> +
>> + xlnx,aie-gen:
>> + $ref: /schemas/types.yaml#/definitions/uint8
>
> Why uint8?
>
>> + description:
>> + Hardware generation of AI engine device. E.g. the current values supported
>> + are 1 (AIE) and 2 (AIEML).
>
> No clue what's that, but it is implied by compatible, isn't it?
The driver supports multiple hardware generations. During driver probe, this value is read from the device tree and hardware generation specific
data structures are loaded based on this value. The compatible string is the same between devices.
>
> Missing constraints.
>
>> +
>> + xlnx,shim-rows:
>> + $ref: /schemas/types.yaml#/definitions/uint8-array
>> + description:
>> + start row and the number of rows of SHIM tiles of the AI engine device
>
> Implied by compatible.
The AI Engine device can have different configurations for number of rows and column (even if it is the same hardware generation). This property
tells the driver the size and layout of the array, this is not implied by compatible.
>
> Missing constraints.
>
>
>> +
>> + xlnx,core-rows:
>> + $ref: /schemas/types.yaml#/definitions/uint8-array
>> + description:
>> + start row and the number of rows of core tiles of the AI engine device
>> +
>> + xlnx,mem-rows:
>> + $ref: /schemas/types.yaml#/definitions/uint8-array
>> + description:
>> + start row and the number of rows of memory tiles of the AI engine device
>> +
>
> Same comments everywhere.
>
>> +required:
>> + - compatible
>> + - reg
>> + - power-domains
>> + - xlnx,aie-gen
>> + - xlnx,shim-rows
>> + - xlnx,core-rows
>> + - xlnx,mem-rows
>> +
>> +patternProperties:
>
> This goes after properties.
>
>> + "^aperture@[0-9]+$":
>> + type: object
>> + description:
>> + AI engine aperture which is a group of column based tiles of the
>> + AI engine device. Each AI engine apertures isolated from the
>> + other AI engine apertures. An AI engine aperture is defined by
>> + AMD/Xilinx platform design tools.
>> +
>> + properties:
>> + compatible:
>> + const: xlnx,ai-engine-aperture
>> +
>> + reg:
>> + description:
>> + Physical base address and length of the aperture registers.
>> + The AI engine address space assigned to Linux is defined by
>> + Xilinx/AMD platform design tool.
>
> Missing constraints. Description is redundant - can it be anything else?
>
> Plus you clearly miss ranges.
>
>
>> +
>> + interrupts:
>> + maxItems: 3
>> +
>> + interrupt-names:
>> + items:
>> + - const: interrupt1
>> + - const: interrupt2
>> + - const: interrupt3
>
> Useless names, drop entirely.
>
>> +
>> + xlnx,columns:
>> + $ref: /schemas/types.yaml#/definitions/uint32-array
>> + description:
>> + It describes the location of the aperture. It specifies the start
>> + column and the number of columns. E.g. an aperture starts from
>> + column 0 and there are 50 columns, it will be presented as <0 50>.
>
> Same comments as before
>
>> +
>> + xlnx,node-id:
>> + $ref: /schemas/types.yaml#/definitions/uint32
>> + description:
>> + AI engine aperture node ID, which is defined by AMD/Xilinx platform
>> + design tool to identify the AI engine aperture in the firmware.
>
> No, you do not get node ID. Recently every day a patch comes for that...
>
>> +
>> + required:
>> + - compatible
>> + - reg
>> + - xlnx,columns
>> + - xlnx,node-id
>> +
>> + additionalProperties: false
>> +
>> +additionalProperties: false
>> +
>> +examples:
>> + - |
>> + #include <dt-bindings/power/xlnx-versal-power.h>
>> + bus {
>> + #address-cells = <2>;
>> + #size-cells = <2>;
>> + ai_engine: ai-engine@20000000000 {
>> + compatible = "xlnx,ai-engine-v2.0";
>> + reg = <0x200 0x00 0x01 0x00>;
>> + #address-cells = <2>;
>> + #size-cells = <2>;
>> + power-domains = <&versal_firmware PM_DEV_AI>;
>> + xlnx,aie-gen = /bits/ 8 <0x1>;
>> + xlnx,core-rows = /bits/ 8 <1 8>;
>> + xlnx,mem-rows = /bits/ 8 <0 0>;
>> + xlnx,shim-rows = /bits/ 8 <0 1>
>
> This cannot be without ranges... I am surprised it actually works, but
> for sure was not tested and produces warnings.
>
>> +
>> + aperture0: aperture@200000000000 {
>> + /* 50 columns and 8 core tile rows + 1 SHIM row */
>> + compatible = "xlnx,ai-engine-aperture";
>> + reg = <0x200 0x0 0x1 0x0>;
>> + interrupts = <0x0 0x94 0x4>,
>> + <0x0 0x95 0x4>,
>> + <0x0 0x96 0x4>;
> Use proper flags.
>
> Best regards,
> Krzysztof
Thanks again for the review Krzysztof, I appreciate your time. I will address the remaining comments in a V2 patch.
Thanks,
Greg
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH V1 3/9] dt-bindings: power: Add AMD Versal power domain bindings
2025-07-10 18:53 ` Williams, Gregory
@ 2025-07-10 21:34 ` Krzysztof Kozlowski
0 siblings, 0 replies; 21+ messages in thread
From: Krzysztof Kozlowski @ 2025-07-10 21:34 UTC (permalink / raw)
To: Williams, Gregory, Gregory Williams, ogabbay, michal.simek, robh
Cc: dri-devel, devicetree, linux-kernel
On 10/07/2025 20:53, Williams, Gregory wrote:
>>
>>
>>> 1 file changed, 55 insertions(+)
>>> create mode 100644 include/dt-bindings/power/xlnx-versal-power.h
>>>
>>> diff --git a/include/dt-bindings/power/xlnx-versal-power.h b/include/dt-bindings/power/xlnx-versal-power.h
>>> new file mode 100644
>>> index 000000000000..effbc70e5a12
>>> --- /dev/null
>>> +++ b/include/dt-bindings/power/xlnx-versal-power.h
>>> @@ -0,0 +1,55 @@
>>> +/* SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause) */
>>> +/*
>>> + * Copyright (C) 2019 - 2021 Xilinx, Inc.
>>> + * Copyright (C) 2024 Advanced Micro Devices, Inc.
>>> + */
>>> +
>>> +#ifndef _DT_BINDINGS_VERSAL_POWER_H
>>> +#define _DT_BINDINGS_VERSAL_POWER_H
>>> +
>>> +#define PM_DEV_RPU0_0 (0x18110005U)
>>> +#define PM_DEV_RPU0_1 (0x18110006U)
>>
>> Bindings ID start from 0 or 1 and are decimal numbers. None of these are
>> bindings (and commit msg does not explain here anything).
>>
>> Also, where is the compatible using these? Why is this a separate patch?
> In 'Submitting DT binding patches' it says: "The Documentation/ and include/dt-bindings/ portion of the patch should be a separate patch".
Separate from the driver. But that's a single patch.
Best regards,
Krzysztof
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH V1 4/9] dt-bindings: soc: xilinx: Add AI engine DT binding
2025-07-10 19:03 ` Williams, Gregory
@ 2025-07-10 21:38 ` Krzysztof Kozlowski
2025-07-11 18:33 ` Williams, Gregory
0 siblings, 1 reply; 21+ messages in thread
From: Krzysztof Kozlowski @ 2025-07-10 21:38 UTC (permalink / raw)
To: Williams, Gregory, Gregory Williams, ogabbay, michal.simek, robh
Cc: dri-devel, devicetree, linux-kernel
On 10/07/2025 21:03, Williams, Gregory wrote:
> On 7/3/2025 12:48 AM, Krzysztof Kozlowski wrote:
>> On 02/07/2025 17:56, Gregory Williams wrote:
>>> In the device tree, there will be device node for the AI engine device,
>>> and device nodes for the statically configured AI engine apertures.
>>
>> No, describe the hardware, not DTS.
>>
>>> Apertures are an isolated set of columns with in the AI engine device
>>> with their own address space and interrupt.
>>>
>>> Signed-off-by: Gregory Williams <gregory.williams@amd.com>
>>> ---
>>> .../bindings/soc/xilinx/xlnx,ai-engine.yaml | 151 ++++++++++++++++++
>>> 1 file changed, 151 insertions(+)
>>> create mode 100644 Documentation/devicetree/bindings/soc/xilinx/xlnx,ai-engine.yaml
>>>
>>> diff --git a/Documentation/devicetree/bindings/soc/xilinx/xlnx,ai-engine.yaml b/Documentation/devicetree/bindings/soc/xilinx/xlnx,ai-engine.yaml
>>> new file mode 100644
>>> index 000000000000..7d9a36c56366
>>> --- /dev/null
>>> +++ b/Documentation/devicetree/bindings/soc/xilinx/xlnx,ai-engine.yaml
>>
>> Filename matching compatible.
>>
>>> @@ -0,0 +1,151 @@
>>> +# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
>>> +%YAML 1.2
>>> +---
>>> +$id: http://devicetree.org/schemas/soc/xilinx/xlnx,ai-engine.yaml#
>>> +$schema: http://devicetree.org/meta-schemas/core.yaml#
>>> +
>>> +title: AMD AI Engine
>>
>> That's really too generic...
You did not answer to other comments here and other patches, so I just
assume you did not ignore them.
>>
>>> +
>>> +maintainers:
>>> + - Gregory Williams <gregory.williams@amd.com>
>>> +
>>> +description:
>>> + The AMD AI Engine is a tile processor with many cores (up to 400) that
>>> + can run in parallel. The data routing between cores is configured through
>>> + internal switches, and shim tiles interface with external interconnect, such
>>> + as memory or PL. One AI engine device can have multiple apertures, each
>>> + has its own address space and interrupt. At runtime application can create
>>> + multiple partitions within an aperture which are groups of columns of AI
>>> + engine tiles. Each AI engine partition is the minimum resetable unit for an
>>> + AI engine application.
>>> +
>>> +properties:
>>> + compatible:
>>> + const: xlnx,ai-engine-v2.0
>>
>> What does v2.0 stands for? Versioning is discouraged, unless mapping is
>> well documented.
>
> Sure, I will remove the versioning in V2 patch.
This should be specific to product, so use the actual product/model name.
Is this part of a Soc? Then standard rules apply... but I could not
deduce it from the descriptions or commit msgs.
>
>>
>>> +
>>> + reg:
>>> + maxItems: 1
>>> +
>>> + '#address-cells':
>>> + const: 2
>>> +
>>> + '#size-cells':
>>> + const: 2
>>> +
>>> + power-domains:
>>
>> Missing constraints.
>>
>>> + description:
>>> + Platform management node id used to request power management services
>>> + from the firmware driver.
>>
>> Drop description, redundant.
>>
>>> +
>>> + xlnx,aie-gen:
>>> + $ref: /schemas/types.yaml#/definitions/uint8
>>
>> Why uint8?
>>
>>> + description:
>>> + Hardware generation of AI engine device. E.g. the current values supported
>>> + are 1 (AIE) and 2 (AIEML).
>>
>> No clue what's that, but it is implied by compatible, isn't it?
>
> The driver supports multiple hardware generations. During driver probe, this value is read from the device tree and hardware generation specific
Bindings are about hardware, not driver, so your driver arguments are
not valid.
> data structures are loaded based on this value. The compatible string is the same between devices.
No. See writing bindings.
>
>>
>> Missing constraints.
>>
>>> +
>>> + xlnx,shim-rows:
>>> + $ref: /schemas/types.yaml#/definitions/uint8-array
>>> + description:
>>> + start row and the number of rows of SHIM tiles of the AI engine device
>>
>> Implied by compatible.
>
> The AI Engine device can have different configurations for number of rows and column (even if it is the same hardware generation). This property
> tells the driver the size and layout of the array, this is not implied by compatible.
Wrap your emails correctly.
Again driver.. no, please describe the hardware, not your drivers.
Best regards,
Krzysztof
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH V1 4/9] dt-bindings: soc: xilinx: Add AI engine DT binding
2025-07-10 21:38 ` Krzysztof Kozlowski
@ 2025-07-11 18:33 ` Williams, Gregory
2025-07-12 7:33 ` Krzysztof Kozlowski
0 siblings, 1 reply; 21+ messages in thread
From: Williams, Gregory @ 2025-07-11 18:33 UTC (permalink / raw)
To: Krzysztof Kozlowski, Gregory Williams, ogabbay, michal.simek,
robh
Cc: dri-devel, devicetree, linux-kernel
On 7/10/2025 3:38 PM, Krzysztof Kozlowski wrote:
> On 10/07/2025 21:03, Williams, Gregory wrote:
>> On 7/3/2025 12:48 AM, Krzysztof Kozlowski wrote:
>>> On 02/07/2025 17:56, Gregory Williams wrote:
>>>> In the device tree, there will be device node for the AI engine device,
>>>> and device nodes for the statically configured AI engine apertures.
>>>
>>> No, describe the hardware, not DTS.
>>>
>>>> Apertures are an isolated set of columns with in the AI engine device
>>>> with their own address space and interrupt.
>>>>
>>>> Signed-off-by: Gregory Williams <gregory.williams@amd.com>
>>>> ---
>>>> .../bindings/soc/xilinx/xlnx,ai-engine.yaml | 151 ++++++++++++++++++
>>>> 1 file changed, 151 insertions(+)
>>>> create mode 100644 Documentation/devicetree/bindings/soc/xilinx/xlnx,ai-engine.yaml
>>>>
>>>> diff --git a/Documentation/devicetree/bindings/soc/xilinx/xlnx,ai-engine.yaml b/Documentation/devicetree/bindings/soc/xilinx/xlnx,ai-engine.yaml
>>>> new file mode 100644
>>>> index 000000000000..7d9a36c56366
>>>> --- /dev/null
>>>> +++ b/Documentation/devicetree/bindings/soc/xilinx/xlnx,ai-engine.yaml
>>>
>>> Filename matching compatible.
>>>
>>>> @@ -0,0 +1,151 @@
>>>> +# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
>>>> +%YAML 1.2
>>>> +---
>>>> +$id: http://devicetree.org/schemas/soc/xilinx/xlnx,ai-engine.yaml#
>>>> +$schema: http://devicetree.org/meta-schemas/core.yaml#
>>>> +
>>>> +title: AMD AI Engine
>>>
>>> That's really too generic...
>
> You did not answer to other comments here and other patches, so I just
> assume you did not ignore them.
No, they were not ignored. I will make sure to address in a V2 patch.
>
>>>
>>>> +
>>>> +maintainers:
>>>> + - Gregory Williams <gregory.williams@amd.com>
>>>> +
>>>> +description:
>>>> + The AMD AI Engine is a tile processor with many cores (up to 400) that
>>>> + can run in parallel. The data routing between cores is configured through
>>>> + internal switches, and shim tiles interface with external interconnect, such
>>>> + as memory or PL. One AI engine device can have multiple apertures, each
>>>> + has its own address space and interrupt. At runtime application can create
>>>> + multiple partitions within an aperture which are groups of columns of AI
>>>> + engine tiles. Each AI engine partition is the minimum resetable unit for an
>>>> + AI engine application.
>>>> +
>>>> +properties:
>>>> + compatible:
>>>> + const: xlnx,ai-engine-v2.0
>>>
>>> What does v2.0 stands for? Versioning is discouraged, unless mapping is
>>> well documented.
>>
>> Sure, I will remove the versioning in V2 patch.
>
> This should be specific to product, so use the actual product/model name.
>
> Is this part of a Soc? Then standard rules apply... but I could not
> deduce it from the descriptions or commit msgs.
Yes this is part of an SoC. I will be more descriptive in V2 patch.
>
>
>>
>>>
>>>> +
>>>> + reg:
>>>> + maxItems: 1
>>>> +
>>>> + '#address-cells':
>>>> + const: 2
>>>> +
>>>> + '#size-cells':
>>>> + const: 2
>>>> +
>>>> + power-domains:
>>>
>>> Missing constraints.
>>>
>>>> + description:
>>>> + Platform management node id used to request power management services
>>>> + from the firmware driver.
>>>
>>> Drop description, redundant.
>>>
>>>> +
>>>> + xlnx,aie-gen:
>>>> + $ref: /schemas/types.yaml#/definitions/uint8
>>>
>>> Why uint8?
>>>
>>>> + description:
>>>> + Hardware generation of AI engine device. E.g. the current values supported
>>>> + are 1 (AIE) and 2 (AIEML).
>>>
>>> No clue what's that, but it is implied by compatible, isn't it?
>>
>> The driver supports multiple hardware generations. During driver probe, this value is read from the device tree and hardware generation specific
>
> Bindings are about hardware, not driver, so your driver arguments are
> not valid.
Understood.
>
>> data structures are loaded based on this value. The compatible string is the same between devices.
>
> No. See writing bindings.
Ok so there should be a different compatible strings based on hardware
generation. I will fix this for a V2 patch.
>
>>
>>>
>>> Missing constraints.
>>>
>>>> +
>>>> + xlnx,shim-rows:
>>>> + $ref: /schemas/types.yaml#/definitions/uint8-array
>>>> + description:
>>>> + start row and the number of rows of SHIM tiles of the AI engine device
>>>
>>> Implied by compatible.
>>
>> The AI Engine device can have different configurations for number of rows and column (even if it is the same hardware generation). This property
>> tells the driver the size and layout of the array, this is not implied by compatible.
>
> Wrap your emails correctly.
>
> Again driver.. no, please describe the hardware, not your drivers.
I see in 'writing bindings' that I should use device-based compatible
string. I will do this and remove these nodes for V2 patch.
Thanks again for your time,
Greg
>
>
> Best regards,
> Krzysztof
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH V1 4/9] dt-bindings: soc: xilinx: Add AI engine DT binding
2025-07-11 18:33 ` Williams, Gregory
@ 2025-07-12 7:33 ` Krzysztof Kozlowski
2025-07-14 14:51 ` Williams, Gregory
0 siblings, 1 reply; 21+ messages in thread
From: Krzysztof Kozlowski @ 2025-07-12 7:33 UTC (permalink / raw)
To: Williams, Gregory, Gregory Williams, ogabbay, michal.simek, robh
Cc: dri-devel, devicetree, linux-kernel
On 11/07/2025 20:33, Williams, Gregory wrote:
>>>>> +
>>>>> +maintainers:
>>>>> + - Gregory Williams <gregory.williams@amd.com>
>>>>> +
>>>>> +description:
>>>>> + The AMD AI Engine is a tile processor with many cores (up to 400) that
>>>>> + can run in parallel. The data routing between cores is configured through
>>>>> + internal switches, and shim tiles interface with external interconnect, such
>>>>> + as memory or PL. One AI engine device can have multiple apertures, each
>>>>> + has its own address space and interrupt. At runtime application can create
>>>>> + multiple partitions within an aperture which are groups of columns of AI
>>>>> + engine tiles. Each AI engine partition is the minimum resetable unit for an
>>>>> + AI engine application.
>>>>> +
>>>>> +properties:
>>>>> + compatible:
>>>>> + const: xlnx,ai-engine-v2.0
>>>>
>>>> What does v2.0 stands for? Versioning is discouraged, unless mapping is
>>>> well documented.
>>>
>>> Sure, I will remove the versioning in V2 patch.
>>
>> This should be specific to product, so use the actual product/model name.
>>
>> Is this part of a Soc? Then standard rules apply... but I could not
>> deduce it from the descriptions or commit msgs.
>
> Yes this is part of an SoC. I will be more descriptive in V2 patch.
Huh... so you MUST use SoC compatibles. Don't upstream things entirely
different than everything else.
Best regards,
Krzysztof
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH V1 4/9] dt-bindings: soc: xilinx: Add AI engine DT binding
2025-07-12 7:33 ` Krzysztof Kozlowski
@ 2025-07-14 14:51 ` Williams, Gregory
0 siblings, 0 replies; 21+ messages in thread
From: Williams, Gregory @ 2025-07-14 14:51 UTC (permalink / raw)
To: Krzysztof Kozlowski, Gregory Williams, ogabbay, michal.simek,
robh
Cc: dri-devel, devicetree, linux-kernel
On 7/12/2025 1:33 AM, Krzysztof Kozlowski wrote:
> On 11/07/2025 20:33, Williams, Gregory wrote:
>>>>>> +
>>>>>> +maintainers:
>>>>>> + - Gregory Williams <gregory.williams@amd.com>
>>>>>> +
>>>>>> +description:
>>>>>> + The AMD AI Engine is a tile processor with many cores (up to 400) that
>>>>>> + can run in parallel. The data routing between cores is configured through
>>>>>> + internal switches, and shim tiles interface with external interconnect, such
>>>>>> + as memory or PL. One AI engine device can have multiple apertures, each
>>>>>> + has its own address space and interrupt. At runtime application can create
>>>>>> + multiple partitions within an aperture which are groups of columns of AI
>>>>>> + engine tiles. Each AI engine partition is the minimum resetable unit for an
>>>>>> + AI engine application.
>>>>>> +
>>>>>> +properties:
>>>>>> + compatible:
>>>>>> + const: xlnx,ai-engine-v2.0
>>>>>
>>>>> What does v2.0 stands for? Versioning is discouraged, unless mapping is
>>>>> well documented.
>>>>
>>>> Sure, I will remove the versioning in V2 patch.
>>>
>>> This should be specific to product, so use the actual product/model name.
>>>
>>> Is this part of a Soc? Then standard rules apply... but I could not
>>> deduce it from the descriptions or commit msgs.
>>
>> Yes this is part of an SoC. I will be more descriptive in V2 patch.
>
> Huh... so you MUST use SoC compatibles. Don't upstream things entirely
> different than everything else.
I will fix this in V2 patch.
Thanks,
Greg
>
> Best regards,
> Krzysztof
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2025-07-14 14:51 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-02 15:56 [PATCH V1 0/9] AMD AI Engine device driver for Versal Gregory Williams
2025-07-02 15:56 ` [PATCH V1 1/9] firmware: xilinx: Add IOCTL support for the AIE run time operations Gregory Williams
2025-07-03 6:50 ` Krzysztof Kozlowski
2025-07-10 18:49 ` Williams, Gregory
2025-07-02 15:56 ` [PATCH V1 2/9] firmware: xilinx: Add IOCTL support to query QoS Gregory Williams
2025-07-02 15:56 ` [PATCH V1 3/9] dt-bindings: power: Add AMD Versal power domain bindings Gregory Williams
2025-07-03 6:43 ` Krzysztof Kozlowski
2025-07-10 18:53 ` Williams, Gregory
2025-07-10 21:34 ` Krzysztof Kozlowski
2025-07-02 15:56 ` [PATCH V1 4/9] dt-bindings: soc: xilinx: Add AI engine DT binding Gregory Williams
2025-07-03 6:48 ` Krzysztof Kozlowski
2025-07-10 19:03 ` Williams, Gregory
2025-07-10 21:38 ` Krzysztof Kozlowski
2025-07-11 18:33 ` Williams, Gregory
2025-07-12 7:33 ` Krzysztof Kozlowski
2025-07-14 14:51 ` Williams, Gregory
2025-07-02 15:56 ` [PATCH V1 5/9] accel: amd-ai-engine: Add AMD AI Engine device driver Gregory Williams
2025-07-02 15:56 ` [PATCH V1 6/9] accel: amd-ai-engine: Add support to enable/disable clocks and change clock frequency Gregory Williams
2025-07-02 15:56 ` [PATCH V1 7/9] accel: amd-ai-engine: Add support for AIEML devices Gregory Williams
2025-07-02 15:56 ` [PATCH V1 8/9] accel: amd-ai-engine: Create tile memory information Gregory Williams
2025-07-02 15:56 ` [PATCH V1 9/9] accel: amd-ai-engine: Adds AI Engine reset operations Gregory Williams
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).