* [PATCH 0/5] Introduce Panfrost JM contexts
@ 2025-08-28 2:34 Adrián Larumbe
2025-08-28 2:34 ` [PATCH 1/5] drm/panfrost: Add job slot register defs for affinity Adrián Larumbe
` (5 more replies)
0 siblings, 6 replies; 14+ messages in thread
From: Adrián Larumbe @ 2025-08-28 2:34 UTC (permalink / raw)
To: linux-kernel; +Cc: dri-devel, Boris Brezillon, kernel, Adrián Larumbe
This patch series brings the notion of JM contexts into Panfrost.
UM will be able to create contexts, get a handle for them and attach
it to a job submission. Contexts describe basic HW resource assignment
to jobs, and also their job slot priorities.
A Mesa MR with UM changes that leverage this new kernel driver feature
is still in the making.
Boris Brezillon (5):
drm/panfrost: Add job slot register defs for affinity
drm/panfrost: Introduce uAPI for JM context creation
drm/panfrost: Introduce JM context for manging job resources
drm/panfrost: Expose JM context IOCTLs to UM
drm/panfrost: Display list of device JM contexts over debugfs
drivers/gpu/drm/panfrost/panfrost_device.h | 4 +-
drivers/gpu/drm/panfrost/panfrost_drv.c | 152 +++++++++++-
drivers/gpu/drm/panfrost/panfrost_job.c | 270 +++++++++++++++++----
drivers/gpu/drm/panfrost/panfrost_job.h | 27 ++-
drivers/gpu/drm/panfrost/panfrost_regs.h | 6 +
include/uapi/drm/panfrost_drm.h | 93 +++++++
6 files changed, 494 insertions(+), 58 deletions(-)
base-commit: 5c76c794bf29399394ebacaa5af8436b8bed0d46
--
2.50.0
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 1/5] drm/panfrost: Add job slot register defs for affinity
2025-08-28 2:34 [PATCH 0/5] Introduce Panfrost JM contexts Adrián Larumbe
@ 2025-08-28 2:34 ` Adrián Larumbe
2025-08-28 2:34 ` [PATCH 2/5] drm/panfrost: Introduce uAPI for JM context creation Adrián Larumbe
` (4 subsequent siblings)
5 siblings, 0 replies; 14+ messages in thread
From: Adrián Larumbe @ 2025-08-28 2:34 UTC (permalink / raw)
To: linux-kernel
Cc: dri-devel, Boris Brezillon, kernel, Adrián Larumbe,
Rob Herring, Steven Price, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter
From: Boris Brezillon <boris.brezillon@collabora.com>
This will let us set the affinity (L2 caches and tiler units that can
process a job) in future commits.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
---
drivers/gpu/drm/panfrost/panfrost_regs.h | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/gpu/drm/panfrost/panfrost_regs.h b/drivers/gpu/drm/panfrost/panfrost_regs.h
index 2b8f1617b836..a63cd65b344a 100644
--- a/drivers/gpu/drm/panfrost/panfrost_regs.h
+++ b/drivers/gpu/drm/panfrost/panfrost_regs.h
@@ -253,6 +253,7 @@
#define JS_AFFINITY_NEXT_LO(n) (JS_BASE + ((n) * JS_SLOT_STRIDE) + 0x50)
#define JS_AFFINITY_NEXT_HI(n) (JS_BASE + ((n) * JS_SLOT_STRIDE) + 0x54)
#define JS_CONFIG_NEXT(n) (JS_BASE + ((n) * JS_SLOT_STRIDE) + 0x58)
+#define JS_XAFFINITY_NEXT(n) (JS_BASE + ((n) * JS_SLOT_STRIDE) + 0x5c)
#define JS_COMMAND_NEXT(n) (JS_BASE + ((n) * JS_SLOT_STRIDE) + 0x60)
#define JS_FLUSH_ID_NEXT(n) (JS_BASE + ((n) * JS_SLOT_STRIDE) + 0x70)
@@ -267,6 +268,11 @@
#define JS_CONFIG_DISABLE_DESCRIPTOR_WR_BK BIT(15)
#define JS_CONFIG_THREAD_PRI(n) ((n) << 16)
+/* Possible values of JS_XAFFINITY and JS_XAFFINITY_NEXT registers */
+#define JS_XAFFINITY_ENABLE BIT(0)
+#define JS_XAFFINITY_TILER_MASK(x) (((u32)(x) & GENMASK(7, 0)) << 8)
+#define JS_XAFFINITY_L2_MASK(x) (((u32)(x) & GENMASK(15, 0)) << 16)
+
#define JS_COMMAND_NOP 0x00
#define JS_COMMAND_START 0x01
#define JS_COMMAND_SOFT_STOP 0x02 /* Gently stop processing a job chain */
--
2.50.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 2/5] drm/panfrost: Introduce uAPI for JM context creation
2025-08-28 2:34 [PATCH 0/5] Introduce Panfrost JM contexts Adrián Larumbe
2025-08-28 2:34 ` [PATCH 1/5] drm/panfrost: Add job slot register defs for affinity Adrián Larumbe
@ 2025-08-28 2:34 ` Adrián Larumbe
2025-09-01 10:52 ` Steven Price
2025-08-28 2:34 ` [PATCH 3/5] drm/panfrost: Introduce JM context for manging job resources Adrián Larumbe
` (3 subsequent siblings)
5 siblings, 1 reply; 14+ messages in thread
From: Adrián Larumbe @ 2025-08-28 2:34 UTC (permalink / raw)
To: linux-kernel
Cc: dri-devel, Boris Brezillon, kernel, Adrián Larumbe,
Rob Herring, Steven Price, David Airlie, Simona Vetter,
Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann
From: Boris Brezillon <boris.brezillon@collabora.com>
The new uAPI lets user space query the KM driver for the available
priorities a job can be given at submit time. These are managed through
the notion of a context, which besides a priority, codifies the list
of L2 caches, shading cores and tiler units a job is allowed to use,
for all three of the available device job slots.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
---
include/uapi/drm/panfrost_drm.h | 93 +++++++++++++++++++++++++++++++++
1 file changed, 93 insertions(+)
diff --git a/include/uapi/drm/panfrost_drm.h b/include/uapi/drm/panfrost_drm.h
index ed67510395bd..2d8b32448e68 100644
--- a/include/uapi/drm/panfrost_drm.h
+++ b/include/uapi/drm/panfrost_drm.h
@@ -22,6 +22,8 @@ extern "C" {
#define DRM_PANFROST_PERFCNT_DUMP 0x07
#define DRM_PANFROST_MADVISE 0x08
#define DRM_PANFROST_SET_LABEL_BO 0x09
+#define DRM_PANFROST_JM_CTX_CREATE 0x0a
+#define DRM_PANFROST_JM_CTX_DESTROY 0x0b
#define DRM_IOCTL_PANFROST_SUBMIT DRM_IOW(DRM_COMMAND_BASE + DRM_PANFROST_SUBMIT, struct drm_panfrost_submit)
#define DRM_IOCTL_PANFROST_WAIT_BO DRM_IOW(DRM_COMMAND_BASE + DRM_PANFROST_WAIT_BO, struct drm_panfrost_wait_bo)
@@ -31,6 +33,8 @@ extern "C" {
#define DRM_IOCTL_PANFROST_GET_BO_OFFSET DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_GET_BO_OFFSET, struct drm_panfrost_get_bo_offset)
#define DRM_IOCTL_PANFROST_MADVISE DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_MADVISE, struct drm_panfrost_madvise)
#define DRM_IOCTL_PANFROST_SET_LABEL_BO DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_SET_LABEL_BO, struct drm_panfrost_set_label_bo)
+#define DRM_IOCTL_PANFROST_JM_CTX_CREATE DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_JM_CTX_CREATE, struct drm_panfrost_jm_ctx_create)
+#define DRM_IOCTL_PANFROST_JM_CTX_DESTROY DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_JM_CTX_DESTROY, struct drm_panfrost_jm_ctx_destroy)
/*
* Unstable ioctl(s): only exposed when the unsafe unstable_ioctls module
@@ -71,6 +75,12 @@ struct drm_panfrost_submit {
/** A combination of PANFROST_JD_REQ_* */
__u32 requirements;
+
+ /** JM context handle. Zero if you want to use the default context. */
+ __u32 jm_ctx_handle;
+
+ /** Padding field. MBZ. */
+ __u32 pad;
};
/**
@@ -177,6 +187,7 @@ enum drm_panfrost_param {
DRM_PANFROST_PARAM_AFBC_FEATURES,
DRM_PANFROST_PARAM_SYSTEM_TIMESTAMP,
DRM_PANFROST_PARAM_SYSTEM_TIMESTAMP_FREQUENCY,
+ DRM_PANFROST_PARAM_ALLOWED_JM_CTX_PRIORITIES,
};
struct drm_panfrost_get_param {
@@ -299,6 +310,88 @@ struct panfrost_dump_registers {
__u32 value;
};
+enum drm_panfrost_jm_ctx_priority {
+ /**
+ * @PANFROST_JM_CTX_PRIORITY_LOW: Low priority context.
+ */
+ PANFROST_JM_CTX_PRIORITY_LOW = 0,
+
+ /**
+ * @PANFROST_JM_CTX_PRIORITY_MEDIUM: Medium priority context.
+ */
+ PANFROST_JM_CTX_PRIORITY_MEDIUM,
+
+ /**
+ * @PANFROST_JM_CTX_PRIORITY_HIGH: High priority context.
+ *
+ * Requires CAP_SYS_NICE or DRM_MASTER.
+ */
+ PANFROST_JM_CTX_PRIORITY_HIGH,
+};
+
+#define PANFROST_JS_FLAG_ENABLED (1 << 0)
+
+struct drm_panfrost_js_ctx_info {
+ /** @flags: Combination of PANFROST_JS_FLAG_xxx values */
+ __u32 flags;
+
+ /** @priority: Context priority (see enum drm_panfrost_jm_ctx_priority). */
+ __u8 priority;
+
+ /**
+ * @tiler_mask: Mask encoding tiler units that can be used by the job slot
+ *
+ * When this field is zero, it means the tiler won't be used.
+ *
+ * The bits set here should also be set in drm_panthor_gpu_info::tiler_present.
+ */
+ __u8 tiler_mask;
+
+ /**
+ * @l2_mask: Mask encoding L2 caches that can be used by the job slot
+ *
+ * The bits set here should also be set in drm_panthor_gpu_info::l2_present.:
+ */
+ __u16 l2_mask;
+
+ /**
+ * @core_mask: Mask encoding cores that can be used by the job slot
+ *
+ * When this field is zero, it means the queue won't be used.
+ *
+ * The bits set here should also be set in drm_panthor_gpu_info::shader_present.
+ */
+ __u64 core_mask;
+};
+
+struct drm_panfrost_jm_ctx_create {
+ /** @handle: Handle of the created JM context */
+ __u32 handle;
+
+ /** @pad: Padding field, MBZ. */
+ __u32 pad;
+
+ /**
+ * @slots: Job slots
+ *
+ * This field must be greater than zero and less than 8 (only three slots
+ * available).
+ */
+ struct drm_panfrost_js_ctx_info slots[3];
+};
+
+struct drm_panfrost_jm_ctx_destroy {
+ /**
+ * @handle: Handle of the JM context to destroy.
+ *
+ * Must be a valid context handle returned by DRM_IOCTL_PANTHOR_JM_CTX_CREATE.
+ */
+ __u32 handle;
+
+ /** @pad: Padding field, MBZ. */
+ __u32 pad;
+};
+
#if defined(__cplusplus)
}
#endif
--
2.50.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 3/5] drm/panfrost: Introduce JM context for manging job resources
2025-08-28 2:34 [PATCH 0/5] Introduce Panfrost JM contexts Adrián Larumbe
2025-08-28 2:34 ` [PATCH 1/5] drm/panfrost: Add job slot register defs for affinity Adrián Larumbe
2025-08-28 2:34 ` [PATCH 2/5] drm/panfrost: Introduce uAPI for JM context creation Adrián Larumbe
@ 2025-08-28 2:34 ` Adrián Larumbe
2025-08-30 8:12 ` Daniel Stone
2025-08-28 2:34 ` [PATCH 4/5] drm/panfrost: Expose JM context IOCTLs to UM Adrián Larumbe
` (2 subsequent siblings)
5 siblings, 1 reply; 14+ messages in thread
From: Adrián Larumbe @ 2025-08-28 2:34 UTC (permalink / raw)
To: linux-kernel
Cc: dri-devel, Boris Brezillon, kernel, Adrián Larumbe,
Rob Herring, Steven Price, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter
From: Boris Brezillon <boris.brezillon@collabora.com>
A context described the affinity properties of a job, like the mask of
L2 caches, shader cores and tiler units a job can use, and its priority
within its job slot. Every context must define these properties for the
job slots in the hardware.
Until context creation and destruction and attaching a context ID to
a job submission are exposed to UM, all jobs shall be bound to the
default Panfrost file context, which has medium priorities and lets
a job use any of the above resources.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
---
drivers/gpu/drm/panfrost/panfrost_device.h | 4 +-
drivers/gpu/drm/panfrost/panfrost_drv.c | 19 +-
drivers/gpu/drm/panfrost/panfrost_job.c | 270 +++++++++++++++++----
drivers/gpu/drm/panfrost/panfrost_job.h | 27 ++-
4 files changed, 262 insertions(+), 58 deletions(-)
diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h b/drivers/gpu/drm/panfrost/panfrost_device.h
index 077525a3ad68..5b164871eb95 100644
--- a/drivers/gpu/drm/panfrost/panfrost_device.h
+++ b/drivers/gpu/drm/panfrost/panfrost_device.h
@@ -15,6 +15,7 @@
#include <drm/gpu_scheduler.h>
#include "panfrost_devfreq.h"
+#include "panfrost_job.h"
struct panfrost_device;
struct panfrost_mmu;
@@ -22,7 +23,6 @@ struct panfrost_job_slot;
struct panfrost_job;
struct panfrost_perfcnt;
-#define NUM_JOB_SLOTS 3
#define MAX_PM_DOMAINS 5
enum panfrost_drv_comp_bits {
@@ -206,7 +206,7 @@ struct panfrost_engine_usage {
struct panfrost_file_priv {
struct panfrost_device *pfdev;
- struct drm_sched_entity sched_entity[NUM_JOB_SLOTS];
+ struct xarray jm_ctxs;
struct panfrost_mmu *mmu;
diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
index 1ea6c509a5d5..398c067457d9 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -279,6 +279,7 @@ static int panfrost_ioctl_submit(struct drm_device *dev, void *data,
struct panfrost_file_priv *file_priv = file->driver_priv;
struct drm_panfrost_submit *args = data;
struct drm_syncobj *sync_out = NULL;
+ struct panfrost_jm_ctx *jm_ctx;
struct panfrost_job *job;
int ret = 0, slot;
@@ -294,10 +295,17 @@ static int panfrost_ioctl_submit(struct drm_device *dev, void *data,
return -ENODEV;
}
+ /* TODO: Use the default JM context until ctx management IOCTLs are exposed */
+ jm_ctx = panfrost_jm_ctx_from_handle(file, 0);
+ if (!jm_ctx) {
+ ret = -EINVAL;
+ goto out_put_syncout;
+ }
+
job = kzalloc(sizeof(*job), GFP_KERNEL);
if (!job) {
ret = -ENOMEM;
- goto out_put_syncout;
+ goto out_put_jm_ctx;
}
kref_init(&job->refcount);
@@ -307,12 +315,13 @@ static int panfrost_ioctl_submit(struct drm_device *dev, void *data,
job->requirements = args->requirements;
job->flush_id = panfrost_gpu_get_latest_flush_id(pfdev);
job->mmu = file_priv->mmu;
+ job->ctx = panfrost_jm_ctx_get(jm_ctx);
job->engine_usage = &file_priv->engine_usage;
slot = panfrost_job_get_slot(job);
ret = drm_sched_job_init(&job->base,
- &file_priv->sched_entity[slot],
+ &jm_ctx->slots[slot].sched_entity,
1, NULL, file->client_id);
if (ret)
goto out_put_job;
@@ -338,6 +347,8 @@ static int panfrost_ioctl_submit(struct drm_device *dev, void *data,
drm_sched_job_cleanup(&job->base);
out_put_job:
panfrost_job_put(job);
+out_put_jm_ctx:
+ panfrost_jm_ctx_put(jm_ctx);
out_put_syncout:
if (sync_out)
drm_syncobj_put(sync_out);
@@ -564,7 +575,7 @@ panfrost_open(struct drm_device *dev, struct drm_file *file)
goto err_free;
}
- ret = panfrost_job_open(panfrost_priv);
+ ret = panfrost_job_open(file);
if (ret)
goto err_job;
@@ -583,7 +594,7 @@ panfrost_postclose(struct drm_device *dev, struct drm_file *file)
struct panfrost_file_priv *panfrost_priv = file->driver_priv;
panfrost_perfcnt_close(file);
- panfrost_job_close(panfrost_priv);
+ panfrost_job_close(file);
panfrost_mmu_ctx_put(panfrost_priv->mmu);
kfree(panfrost_priv);
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
index 82acabb21b27..571bd75d5b40 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -8,6 +8,7 @@
#include <linux/platform_device.h>
#include <linux/pm_runtime.h>
#include <linux/dma-resv.h>
+#include <drm/drm_auth.h>
#include <drm/gpu_scheduler.h>
#include <drm/panfrost_drm.h>
@@ -22,6 +23,7 @@
#include "panfrost_mmu.h"
#include "panfrost_dump.h"
+#define MAX_JM_CTX_PER_FILE 128
#define JOB_TIMEOUT_MS 500
#define job_write(dev, reg, data) writel(data, dev->iomem + (reg))
@@ -125,23 +127,6 @@ int panfrost_job_get_slot(struct panfrost_job *job)
return 1;
}
-static void panfrost_job_write_affinity(struct panfrost_device *pfdev,
- u32 requirements,
- int js)
-{
- u64 affinity;
-
- /*
- * Use all cores for now.
- * Eventually we may need to support tiler only jobs and h/w with
- * multiple (2) coherent core groups
- */
- affinity = pfdev->features.shader_present;
-
- job_write(pfdev, JS_AFFINITY_NEXT_LO(js), lower_32_bits(affinity));
- job_write(pfdev, JS_AFFINITY_NEXT_HI(js), upper_32_bits(affinity));
-}
-
static u32
panfrost_get_job_chain_flag(const struct panfrost_job *job)
{
@@ -198,6 +183,7 @@ panfrost_enqueue_job(struct panfrost_device *pfdev, int slot,
static void panfrost_job_hw_submit(struct panfrost_job *job, int js)
{
struct panfrost_device *pfdev = job->pfdev;
+ struct panfrost_js_ctx *js_ctx = &job->ctx->slots[js];
unsigned int subslot;
u32 cfg;
u64 jc_head = job->jc;
@@ -218,11 +204,15 @@ static void panfrost_job_hw_submit(struct panfrost_job *job, int js)
job_write(pfdev, JS_HEAD_NEXT_LO(js), lower_32_bits(jc_head));
job_write(pfdev, JS_HEAD_NEXT_HI(js), upper_32_bits(jc_head));
- panfrost_job_write_affinity(pfdev, job->requirements, js);
+ job_write(pfdev, JS_AFFINITY_NEXT_LO(js), lower_32_bits(js_ctx->affinity));
+ job_write(pfdev, JS_AFFINITY_NEXT_HI(js), upper_32_bits(js_ctx->affinity));
+
+ if (panfrost_has_hw_feature(pfdev, HW_FEATURE_XAFFINITY))
+ job_write(pfdev, JS_XAFFINITY_NEXT(js), js_ctx->xaffinity);
/* start MMU, medium priority, cache clean/flush on end, clean/flush on
* start */
- cfg |= JS_CONFIG_THREAD_PRI(8) |
+ cfg |= js_ctx->config |
JS_CONFIG_START_FLUSH_CLEAN_INVALIDATE |
JS_CONFIG_END_FLUSH_CLEAN_INVALIDATE |
panfrost_get_job_chain_flag(job);
@@ -359,6 +349,7 @@ static void panfrost_job_cleanup(struct kref *ref)
kvfree(job->bos);
}
+ panfrost_jm_ctx_put(job->ctx);
kfree(job);
}
@@ -917,39 +908,229 @@ void panfrost_job_fini(struct panfrost_device *pfdev)
destroy_workqueue(pfdev->reset.wq);
}
-int panfrost_job_open(struct panfrost_file_priv *panfrost_priv)
+int panfrost_job_open(struct drm_file *file)
{
+ struct panfrost_file_priv *panfrost_priv = file->driver_priv;
struct panfrost_device *pfdev = panfrost_priv->pfdev;
+ int ret;
+
+ struct drm_panfrost_jm_ctx_create default_jm_ctx = {
+ /* Fragment queue */
+ .slots[0] = {
+ .flags = PANFROST_JS_FLAG_ENABLED,
+ .priority = PANFROST_JM_CTX_PRIORITY_MEDIUM,
+ .tiler_mask = 0,
+ .l2_mask = pfdev->features.l2_present,
+ .core_mask = pfdev->features.shader_present,
+ },
+ /* Vertex/tiler/compute queue */
+ .slots[1] = {
+ .flags = PANFROST_JS_FLAG_ENABLED,
+ .priority = PANFROST_JM_CTX_PRIORITY_MEDIUM,
+ .tiler_mask = pfdev->features.tiler_present,
+ .l2_mask = pfdev->features.l2_present,
+ .core_mask = pfdev->features.shader_present,
+ },
+ };
+
+ xa_init_flags(&panfrost_priv->jm_ctxs, XA_FLAGS_ALLOC);
+
+ ret = panfrost_jm_ctx_create(file, &default_jm_ctx);
+ if (ret)
+ return ret;
+
+ /* We expect the default context to be assigned handle 0. */
+ if (WARN_ON(default_jm_ctx.handle))
+ return -EINVAL;
+
+ return 0;
+}
+
+void panfrost_job_close(struct drm_file *file)
+{
+ struct panfrost_file_priv *panfrost_priv = file->driver_priv;
+ struct panfrost_jm_ctx *jm_ctx;
+ unsigned long i;
+
+ xa_for_each(&panfrost_priv->jm_ctxs, i, jm_ctx)
+ panfrost_jm_ctx_destroy(file, i);
+
+ xa_destroy(&panfrost_priv->jm_ctxs);
+}
+
+int panfrost_job_is_idle(struct panfrost_device *pfdev)
+{
struct panfrost_job_slot *js = pfdev->js;
- struct drm_gpu_scheduler *sched;
- int ret, i;
+ int i;
for (i = 0; i < NUM_JOB_SLOTS; i++) {
- sched = &js->queue[i].sched;
- ret = drm_sched_entity_init(&panfrost_priv->sched_entity[i],
- DRM_SCHED_PRIORITY_NORMAL, &sched,
- 1, NULL);
- if (WARN_ON(ret))
- return ret;
+ /* If there are any jobs in the HW queue, we're not idle */
+ if (atomic_read(&js->queue[i].sched.credit_count))
+ return false;
+ }
+
+ return true;
+}
+
+static void panfrost_jm_ctx_release(struct kref *kref)
+{
+ struct panfrost_jm_ctx *jm_ctx = container_of(kref, struct panfrost_jm_ctx, refcnt);
+
+ for (u32 i = 0; i < ARRAY_SIZE(jm_ctx->slots); i++)
+ drm_sched_entity_destroy(&jm_ctx->slots[i].sched_entity);
+
+ kfree(jm_ctx);
+}
+
+void
+panfrost_jm_ctx_put(struct panfrost_jm_ctx *jm_ctx)
+{
+ if (jm_ctx)
+ kref_put(&jm_ctx->refcnt, panfrost_jm_ctx_release);
+}
+
+struct panfrost_jm_ctx *
+panfrost_jm_ctx_get(struct panfrost_jm_ctx *jm_ctx)
+{
+ if (jm_ctx)
+ kref_get(&jm_ctx->refcnt);
+
+ return jm_ctx;
+}
+
+struct panfrost_jm_ctx *
+panfrost_jm_ctx_from_handle(struct drm_file *file, u32 handle)
+{
+ struct panfrost_file_priv *priv = file->driver_priv;
+ struct panfrost_jm_ctx *jm_ctx;
+
+ xa_lock(&priv->jm_ctxs);
+ jm_ctx = panfrost_jm_ctx_get(xa_load(&priv->jm_ctxs, handle));
+ xa_unlock(&priv->jm_ctxs);
+
+ return jm_ctx;
+}
+
+static int jm_ctx_prio_to_drm_sched_prio(struct drm_file *file,
+ enum drm_panfrost_jm_ctx_priority in,
+ enum drm_sched_priority *out)
+{
+ switch (in) {
+ case PANFROST_JM_CTX_PRIORITY_LOW:
+ *out = DRM_SCHED_PRIORITY_LOW;
+ return 0;
+ case PANFROST_JM_CTX_PRIORITY_MEDIUM:
+ *out = DRM_SCHED_PRIORITY_NORMAL;
+ return 0;
+ case PANFROST_JM_CTX_PRIORITY_HIGH:
+ /* Higher priorities require CAP_SYS_NICE or DRM_MASTER */
+ if (!capable(CAP_SYS_NICE) && !drm_is_current_master(file))
+ return -EACCES;
+
+ *out = DRM_SCHED_PRIORITY_HIGH;
+ return 0;
+ default:
+ return -EINVAL;
+ }
+}
+
+#define PANFROST_JS_VALID_FLAGS PANFROST_JS_FLAG_ENABLED
+
+int panfrost_jm_ctx_create(struct drm_file *file,
+ struct drm_panfrost_jm_ctx_create *args)
+{
+ struct panfrost_file_priv *priv = file->driver_priv;
+ struct panfrost_device *pfdev = priv->pfdev;
+ struct panfrost_jm_ctx *jm_ctx;
+ int ret;
+
+ if (args->pad != 0)
+ return -EINVAL;
+
+ jm_ctx = kzalloc(sizeof(*jm_ctx), GFP_KERNEL);
+ if (!jm_ctx)
+ return -ENOMEM;
+
+ kref_init(&jm_ctx->refcnt);
+
+ for (u32 i = 0; i < ARRAY_SIZE(args->slots); i++) {
+ struct drm_gpu_scheduler *sched = &pfdev->js->queue[i].sched;
+ const struct drm_panfrost_js_ctx_info *js_info = &args->slots[i];
+ struct panfrost_js_ctx *js_ctx = &jm_ctx->slots[i];
+ enum drm_sched_priority sched_prio;
+
+ if (js_info->flags & ~PANFROST_JS_VALID_FLAGS)
+ goto err_put_jm_ctx;
+
+ if (!(js_info->flags & PANFROST_JS_FLAG_ENABLED))
+ continue;
+
+ ret = jm_ctx_prio_to_drm_sched_prio(file, js_info->priority, &sched_prio);
+ if (ret)
+ goto err_put_jm_ctx;
+
+ if (js_info->core_mask & ~pfdev->features.shader_present)
+ goto err_put_jm_ctx;
+
+ if (js_info->tiler_mask & ~pfdev->features.tiler_present)
+ goto err_put_jm_ctx;
+
+ if (js_info->l2_mask & ~pfdev->features.l2_present)
+ goto err_put_jm_ctx;
+
+ js_ctx->affinity = js_info->core_mask;
+ js_ctx->config = JS_CONFIG_THREAD_PRI(js_info->priority);
+
+ if (panfrost_has_hw_feature(pfdev, HW_FEATURE_XAFFINITY)) {
+ js_ctx->xaffinity = JS_XAFFINITY_ENABLE |
+ JS_XAFFINITY_TILER_MASK(js_info->tiler_mask) |
+ JS_XAFFINITY_L2_MASK(js_info->l2_mask);
+ }
+
+ ret = drm_sched_entity_init(&js_ctx->sched_entity, sched_prio,
+ &sched, 1, NULL);
+ if (ret)
+ goto err_put_jm_ctx;
+
+ js_ctx->enabled = true;
}
+
+ ret = xa_alloc(&priv->jm_ctxs, &args->handle, jm_ctx,
+ XA_LIMIT(0, MAX_JM_CTX_PER_FILE), GFP_KERNEL);
+ if (ret)
+ goto err_put_jm_ctx;
+
return 0;
+
+err_put_jm_ctx:
+ panfrost_jm_ctx_put(jm_ctx);
+ return ret;
}
-void panfrost_job_close(struct panfrost_file_priv *panfrost_priv)
+int panfrost_jm_ctx_destroy(struct drm_file *file, u32 handle)
{
- struct panfrost_device *pfdev = panfrost_priv->pfdev;
- int i;
+ struct panfrost_file_priv *priv = file->driver_priv;
+ struct panfrost_device *pfdev = priv->pfdev;
+ struct panfrost_jm_ctx *jm_ctx;
- for (i = 0; i < NUM_JOB_SLOTS; i++)
- drm_sched_entity_destroy(&panfrost_priv->sched_entity[i]);
+ jm_ctx = xa_erase(&priv->jm_ctxs, handle);
+ if (!jm_ctx)
+ return -EINVAL;
+
+ for (u32 i = 0; i < ARRAY_SIZE(jm_ctx->slots); i++) {
+ if (jm_ctx->slots[i].enabled)
+ drm_sched_entity_destroy(&jm_ctx->slots[i].sched_entity);
+ }
/* Kill in-flight jobs */
spin_lock(&pfdev->js->job_lock);
- for (i = 0; i < NUM_JOB_SLOTS; i++) {
- struct drm_sched_entity *entity = &panfrost_priv->sched_entity[i];
- int j;
+ for (u32 i = 0; i < ARRAY_SIZE(jm_ctx->slots); i++) {
+ struct drm_sched_entity *entity = &jm_ctx->slots[i].sched_entity;
+
+ if (!jm_ctx->slots[i].enabled)
+ continue;
- for (j = ARRAY_SIZE(pfdev->jobs[0]) - 1; j >= 0; j--) {
+ for (int j = ARRAY_SIZE(pfdev->jobs[0]) - 1; j >= 0; j--) {
struct panfrost_job *job = pfdev->jobs[i][j];
u32 cmd;
@@ -980,18 +1161,7 @@ void panfrost_job_close(struct panfrost_file_priv *panfrost_priv)
}
}
spin_unlock(&pfdev->js->job_lock);
-}
-
-int panfrost_job_is_idle(struct panfrost_device *pfdev)
-{
- struct panfrost_job_slot *js = pfdev->js;
- int i;
-
- for (i = 0; i < NUM_JOB_SLOTS; i++) {
- /* If there are any jobs in the HW queue, we're not idle */
- if (atomic_read(&js->queue[i].sched.credit_count))
- return false;
- }
- return true;
+ panfrost_jm_ctx_put(jm_ctx);
+ return 0;
}
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.h b/drivers/gpu/drm/panfrost/panfrost_job.h
index ec581b97852b..caf394e070f4 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.h
+++ b/drivers/gpu/drm/panfrost/panfrost_job.h
@@ -18,6 +18,7 @@ struct panfrost_job {
struct panfrost_device *pfdev;
struct panfrost_mmu *mmu;
+ struct panfrost_jm_ctx *ctx;
/* Fence to be signaled by IRQ handler when the job is complete. */
struct dma_fence *done_fence;
@@ -39,10 +40,32 @@ struct panfrost_job {
u64 start_cycles;
};
+struct panfrost_js_ctx {
+ struct drm_sched_entity sched_entity;
+ u32 config;
+ u32 xaffinity;
+ u64 affinity;
+ bool enabled;
+};
+
+#define NUM_JOB_SLOTS 3
+
+struct panfrost_jm_ctx {
+ struct kref refcnt;
+ struct panfrost_js_ctx slots[NUM_JOB_SLOTS];
+};
+
+int panfrost_jm_ctx_create(struct drm_file *file,
+ struct drm_panfrost_jm_ctx_create *args);
+int panfrost_jm_ctx_destroy(struct drm_file *file, u32 handle);
+void panfrost_jm_ctx_put(struct panfrost_jm_ctx *jm_ctx);
+struct panfrost_jm_ctx *panfrost_jm_ctx_get(struct panfrost_jm_ctx *jm_ctx);
+struct panfrost_jm_ctx *panfrost_jm_ctx_from_handle(struct drm_file *file, u32 handle);
+
int panfrost_job_init(struct panfrost_device *pfdev);
void panfrost_job_fini(struct panfrost_device *pfdev);
-int panfrost_job_open(struct panfrost_file_priv *panfrost_priv);
-void panfrost_job_close(struct panfrost_file_priv *panfrost_priv);
+int panfrost_job_open(struct drm_file *file);
+void panfrost_job_close(struct drm_file *file);
int panfrost_job_get_slot(struct panfrost_job *job);
int panfrost_job_push(struct panfrost_job *job);
void panfrost_job_put(struct panfrost_job *job);
--
2.50.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 4/5] drm/panfrost: Expose JM context IOCTLs to UM
2025-08-28 2:34 [PATCH 0/5] Introduce Panfrost JM contexts Adrián Larumbe
` (2 preceding siblings ...)
2025-08-28 2:34 ` [PATCH 3/5] drm/panfrost: Introduce JM context for manging job resources Adrián Larumbe
@ 2025-08-28 2:34 ` Adrián Larumbe
2025-08-28 2:34 ` [PATCH 5/5] drm/panfrost: Display list of device JM contexts over debugfs Adrián Larumbe
2025-08-28 23:19 ` [PATCH 0/5] Introduce Panfrost JM contexts Adrián Larumbe
5 siblings, 0 replies; 14+ messages in thread
From: Adrián Larumbe @ 2025-08-28 2:34 UTC (permalink / raw)
To: linux-kernel
Cc: dri-devel, Boris Brezillon, kernel, Adrián Larumbe,
Rob Herring, Steven Price, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter
From: Boris Brezillon <boris.brezillon@collabora.com>
Minor revision of the driver must be bumped because this expands the
uAPI. On top of that, let user know the available priorities so that
they can create contexts with legal priority values.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
---
drivers/gpu/drm/panfrost/panfrost_drv.c | 32 +++++++++++++++++++++++++
1 file changed, 32 insertions(+)
diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
index 398c067457d9..b54cdd589ec4 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -13,6 +13,7 @@
#include <linux/platform_device.h>
#include <linux/pm_runtime.h>
#include <drm/panfrost_drm.h>
+#include <drm/drm_auth.h>
#include <drm/drm_debugfs.h>
#include <drm/drm_drv.h>
#include <drm/drm_ioctl.h>
@@ -109,6 +110,15 @@ static int panfrost_ioctl_get_param(struct drm_device *ddev, void *data, struct
#endif
break;
+ case DRM_PANFROST_PARAM_ALLOWED_JM_CTX_PRIORITIES:
+ param->value = BIT(PANFROST_JM_CTX_PRIORITY_LOW) |
+ BIT(PANFROST_JM_CTX_PRIORITY_MEDIUM);
+
+ /* High prio require CAP_SYS_NICE or DRM_MASTER */
+ if (capable(CAP_SYS_NICE) || drm_is_current_master(file))
+ param->value |= BIT(PANFROST_JM_CTX_PRIORITY_HIGH);
+ break;
+
default:
return -EINVAL;
}
@@ -547,6 +557,24 @@ static int panfrost_ioctl_set_label_bo(struct drm_device *ddev, void *data,
return ret;
}
+static int panfrost_ioctl_jm_ctx_create(struct drm_device *dev, void *data,
+ struct drm_file *file)
+{
+ return panfrost_jm_ctx_create(file, data);
+}
+
+static int panfrost_ioctl_jm_ctx_destroy(struct drm_device *dev, void *data,
+ struct drm_file *file)
+{
+ const struct drm_panfrost_jm_ctx_destroy *args = data;
+
+ /* We can't destroy the default context created when the file is opened. */
+ if (!args->handle)
+ return -EINVAL;
+
+ return panfrost_jm_ctx_destroy(file, args->handle);
+}
+
int panfrost_unstable_ioctl_check(void)
{
if (!unstable_ioctls)
@@ -614,6 +642,8 @@ static const struct drm_ioctl_desc panfrost_drm_driver_ioctls[] = {
PANFROST_IOCTL(PERFCNT_DUMP, perfcnt_dump, DRM_RENDER_ALLOW),
PANFROST_IOCTL(MADVISE, madvise, DRM_RENDER_ALLOW),
PANFROST_IOCTL(SET_LABEL_BO, set_label_bo, DRM_RENDER_ALLOW),
+ PANFROST_IOCTL(JM_CTX_CREATE, jm_ctx_create, DRM_RENDER_ALLOW),
+ PANFROST_IOCTL(JM_CTX_DESTROY, jm_ctx_destroy, DRM_RENDER_ALLOW),
};
static void panfrost_gpu_show_fdinfo(struct panfrost_device *pfdev,
@@ -710,6 +740,8 @@ static void panfrost_debugfs_init(struct drm_minor *minor)
* - 1.3 - adds JD_REQ_CYCLE_COUNT job requirement for SUBMIT
* - adds SYSTEM_TIMESTAMP and SYSTEM_TIMESTAMP_FREQUENCY queries
* - 1.4 - adds SET_LABEL_BO
+ * - 1.5 - adds JM_CTX_{CREATE,DESTROY} ioctls and extend SUBMIT to allow
+ * context creation with configurable priorities/affinity
*/
static const struct drm_driver panfrost_drm_driver = {
.driver_features = DRIVER_RENDER | DRIVER_GEM | DRIVER_SYNCOBJ,
--
2.50.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 5/5] drm/panfrost: Display list of device JM contexts over debugfs
2025-08-28 2:34 [PATCH 0/5] Introduce Panfrost JM contexts Adrián Larumbe
` (3 preceding siblings ...)
2025-08-28 2:34 ` [PATCH 4/5] drm/panfrost: Expose JM context IOCTLs to UM Adrián Larumbe
@ 2025-08-28 2:34 ` Adrián Larumbe
2025-08-28 23:19 ` [PATCH 0/5] Introduce Panfrost JM contexts Adrián Larumbe
5 siblings, 0 replies; 14+ messages in thread
From: Adrián Larumbe @ 2025-08-28 2:34 UTC (permalink / raw)
To: linux-kernel
Cc: dri-devel, Boris Brezillon, kernel, Adrián Larumbe,
Rob Herring, Steven Price, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter
From: Boris Brezillon <boris.brezillon@collabora.com>
For DebugFS builds, create a filesystem knob that, for every single open
file of the Panfrost DRM device, shows its command name information and
PID (when applicable), and all of its existing JM contexts.
For every context, show also register values that a UM tool could decode
back into a mask of L2 caches, tiling units and shader cores which jobs
submitted under that context can use.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
---
drivers/gpu/drm/panfrost/panfrost_drv.c | 101 ++++++++++++++++++++++++
1 file changed, 101 insertions(+)
diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
index b54cdd589ec4..3ba43180ca8d 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -713,6 +713,52 @@ static int panthor_gems_show(struct seq_file *m, void *data)
return 0;
}
+static void show_panfrost_jm_ctx(struct panfrost_jm_ctx *jm_ctx, u32 handle,
+ struct seq_file *m)
+{
+ static const char * const prios[] = {
+ [DRM_SCHED_PRIORITY_HIGH] = "HIGH",
+ [DRM_SCHED_PRIORITY_NORMAL] = "NORMAL",
+ [DRM_SCHED_PRIORITY_LOW] = "LOW",
+ };
+
+ seq_printf(m, " JM context %u:\n", handle);
+
+ for (u32 i = 0; i < ARRAY_SIZE(jm_ctx->slots); i++) {
+ const struct panfrost_js_ctx *slot = &jm_ctx->slots[i];
+ const char *prio = NULL;
+
+ if (!slot->enabled)
+ continue;
+
+ if (slot->sched_entity.priority < ARRAY_SIZE(prios))
+ prio = prios[slot->sched_entity.priority];
+
+ seq_printf(m, " slot %u: priority %s config %x affinity %llx xaffinity %x\n",
+ i, prio ? prio : "UNKNOWN", slot->config,
+ slot->affinity, slot->xaffinity);
+ }
+}
+
+static int show_file_jm_ctxs(struct panfrost_file_priv *pfile,
+ struct seq_file *m)
+{
+ struct panfrost_jm_ctx *jm_ctx;
+ unsigned long i;
+
+ xa_lock(&pfile->jm_ctxs);
+ xa_for_each(&pfile->jm_ctxs, i, jm_ctx) {
+ jm_ctx = panfrost_jm_ctx_get(jm_ctx);
+ xa_unlock(&pfile->jm_ctxs);
+ show_panfrost_jm_ctx(jm_ctx, i, m);
+ panfrost_jm_ctx_put(jm_ctx);
+ xa_lock(&pfile->jm_ctxs);
+ }
+ xa_unlock(&pfile->jm_ctxs);
+
+ return 0;
+}
+
static struct drm_info_list panthor_debugfs_list[] = {
{"gems", panthor_gems_show, 0, NULL},
};
@@ -726,9 +772,64 @@ static int panthor_gems_debugfs_init(struct drm_minor *minor)
return 0;
}
+static int show_each_file(struct seq_file *m, void *arg)
+{
+ struct drm_info_node *node = (struct drm_info_node *)m->private;
+ struct drm_device *ddev = node->minor->dev;
+ int (*show)(struct panfrost_file_priv *, struct seq_file *) =
+ node->info_ent->data;
+ struct drm_file *file;
+ int ret;
+
+ ret = mutex_lock_interruptible(&ddev->filelist_mutex);
+ if (ret)
+ return ret;
+
+ list_for_each_entry(file, &ddev->filelist, lhead) {
+ struct task_struct *task;
+ struct panfrost_file_priv *pfile = file->driver_priv;
+ struct pid *pid;
+
+ /*
+ * Although we have a valid reference on file->pid, that does
+ * not guarantee that the task_struct who called get_pid() is
+ * still alive (e.g. get_pid(current) => fork() => exit()).
+ * Therefore, we need to protect this ->comm access using RCU.
+ */
+ rcu_read_lock();
+ pid = rcu_dereference(file->pid);
+ task = pid_task(pid, PIDTYPE_TGID);
+ seq_printf(m, "client_id %8llu pid %8d command %s:\n",
+ file->client_id, pid_nr(pid),
+ task ? task->comm : "<unknown>");
+ rcu_read_unlock();
+
+ ret = show(pfile, m);
+ if (ret < 0)
+ break;
+
+ seq_puts(m, "\n");
+ }
+
+ mutex_unlock(&ddev->filelist_mutex);
+ return ret;
+}
+
+static struct drm_info_list panfrost_sched_debugfs_list[] = {
+ { "sched_ctxs", show_each_file, 0, show_file_jm_ctxs },
+};
+
+static void panfrost_sched_debugfs_init(struct drm_minor *minor)
+{
+ drm_debugfs_create_files(panfrost_sched_debugfs_list,
+ ARRAY_SIZE(panfrost_sched_debugfs_list),
+ minor->debugfs_root, minor);
+}
+
static void panfrost_debugfs_init(struct drm_minor *minor)
{
panthor_gems_debugfs_init(minor);
+ panfrost_sched_debugfs_init(minor);
}
#endif
--
2.50.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 0/5] Introduce Panfrost JM contexts
2025-08-28 2:34 [PATCH 0/5] Introduce Panfrost JM contexts Adrián Larumbe
` (4 preceding siblings ...)
2025-08-28 2:34 ` [PATCH 5/5] drm/panfrost: Display list of device JM contexts over debugfs Adrián Larumbe
@ 2025-08-28 23:19 ` Adrián Larumbe
5 siblings, 0 replies; 14+ messages in thread
From: Adrián Larumbe @ 2025-08-28 23:19 UTC (permalink / raw)
To: linux-kernel; +Cc: dri-devel, Boris Brezillon, kernel
Mesa MR with the UM driver changes is available at:
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37075
On 28.08.2025 03:34, Adrián Larumbe wrote:
> This patch series brings the notion of JM contexts into Panfrost.
> UM will be able to create contexts, get a handle for them and attach
> it to a job submission. Contexts describe basic HW resource assignment
> to jobs, and also their job slot priorities.
>
> A Mesa MR with UM changes that leverage this new kernel driver feature
> is still in the making.
>
> Boris Brezillon (5):
> drm/panfrost: Add job slot register defs for affinity
> drm/panfrost: Introduce uAPI for JM context creation
> drm/panfrost: Introduce JM context for manging job resources
> drm/panfrost: Expose JM context IOCTLs to UM
> drm/panfrost: Display list of device JM contexts over debugfs
>
> drivers/gpu/drm/panfrost/panfrost_device.h | 4 +-
> drivers/gpu/drm/panfrost/panfrost_drv.c | 152 +++++++++++-
> drivers/gpu/drm/panfrost/panfrost_job.c | 270 +++++++++++++++++----
> drivers/gpu/drm/panfrost/panfrost_job.h | 27 ++-
> drivers/gpu/drm/panfrost/panfrost_regs.h | 6 +
> include/uapi/drm/panfrost_drm.h | 93 +++++++
> 6 files changed, 494 insertions(+), 58 deletions(-)
>
>
> base-commit: 5c76c794bf29399394ebacaa5af8436b8bed0d46
> --
> 2.50.0
Adrian Larumbe
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 3/5] drm/panfrost: Introduce JM context for manging job resources
2025-08-28 2:34 ` [PATCH 3/5] drm/panfrost: Introduce JM context for manging job resources Adrián Larumbe
@ 2025-08-30 8:12 ` Daniel Stone
2025-09-01 7:54 ` Boris Brezillon
0 siblings, 1 reply; 14+ messages in thread
From: Daniel Stone @ 2025-08-30 8:12 UTC (permalink / raw)
To: Adrián Larumbe
Cc: linux-kernel, dri-devel, Boris Brezillon, kernel, Rob Herring,
Steven Price, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
David Airlie, Simona Vetter
Hi Adrian,
On Thu, 28 Aug 2025 at 04:35, Adrián Larumbe
<adrian.larumbe@collabora.com> wrote:
> -void panfrost_job_close(struct panfrost_file_priv *panfrost_priv)
> +int panfrost_jm_ctx_destroy(struct drm_file *file, u32 handle)
> {
> - struct panfrost_device *pfdev = panfrost_priv->pfdev;
> - int i;
> + struct panfrost_file_priv *priv = file->driver_priv;
> + struct panfrost_device *pfdev = priv->pfdev;
> + struct panfrost_jm_ctx *jm_ctx;
>
> - for (i = 0; i < NUM_JOB_SLOTS; i++)
> - drm_sched_entity_destroy(&panfrost_priv->sched_entity[i]);
> + jm_ctx = xa_erase(&priv->jm_ctxs, handle);
> + if (!jm_ctx)
> + return -EINVAL;
> +
> + for (u32 i = 0; i < ARRAY_SIZE(jm_ctx->slots); i++) {
> + if (jm_ctx->slots[i].enabled)
> + drm_sched_entity_destroy(&jm_ctx->slots[i].sched_entity);
> + }
>
> /* Kill in-flight jobs */
> spin_lock(&pfdev->js->job_lock);
> - for (i = 0; i < NUM_JOB_SLOTS; i++) {
> - struct drm_sched_entity *entity = &panfrost_priv->sched_entity[i];
> - int j;
> + for (u32 i = 0; i < ARRAY_SIZE(jm_ctx->slots); i++) {
> + struct drm_sched_entity *entity = &jm_ctx->slots[i].sched_entity;
> +
> + if (!jm_ctx->slots[i].enabled)
> + continue;
>
> - for (j = ARRAY_SIZE(pfdev->jobs[0]) - 1; j >= 0; j--) {
> + for (int j = ARRAY_SIZE(pfdev->jobs[0]) - 1; j >= 0; j--) {
> struct panfrost_job *job = pfdev->jobs[i][j];
> u32 cmd;
>
> @@ -980,18 +1161,7 @@ void panfrost_job_close(struct panfrost_file_priv *panfrost_priv)
> }
> }
> spin_unlock(&pfdev->js->job_lock);
> -}
> -
> -int panfrost_job_is_idle(struct panfrost_device *pfdev)
> -{
> - struct panfrost_job_slot *js = pfdev->js;
> - int i;
> -
> - for (i = 0; i < NUM_JOB_SLOTS; i++) {
> - /* If there are any jobs in the HW queue, we're not idle */
> - if (atomic_read(&js->queue[i].sched.credit_count))
> - return false;
> - }
>
> - return true;
> + panfrost_jm_ctx_put(jm_ctx);
> + return 0;
> }
It seems odd that both panfrost_jm_ctx_destroy() and
panfrost_jm_ctx_release() share lifetime responsibilities. I'd expect
calling panfrost_jm_ctx_destroy() to just release the xarray handle
and drop the refcount.
I can see why calling panfrost_jm_ctx_destroy() is the one to go try
to cancel the jobs - because the jobs keep a refcount on the context,
so we need to break that cycle somehow. But having both the
handle-release and object-release function drop a ref on the sched
entity seems odd?
It doesn't help much that panfrost_job is used both for actual jobs
(as the type) and the capability for a device to have multiple
job-manager contexts (as a function prefix). Would be great to clean
that up, so you don't have to think about whether e.g.
panfrost_job_close() is actually operating on a panfrost_job, or
operating on multiple panfrost_jm_ctx which operate on multiple
panfrost_job.
Cheers,
Daniel
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 3/5] drm/panfrost: Introduce JM context for manging job resources
2025-08-30 8:12 ` Daniel Stone
@ 2025-09-01 7:54 ` Boris Brezillon
0 siblings, 0 replies; 14+ messages in thread
From: Boris Brezillon @ 2025-09-01 7:54 UTC (permalink / raw)
To: Daniel Stone
Cc: Adrián Larumbe, linux-kernel, dri-devel, kernel, Rob Herring,
Steven Price, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
David Airlie, Simona Vetter
On Sat, 30 Aug 2025 10:12:32 +0200
Daniel Stone <daniel@fooishbar.org> wrote:
> Hi Adrian,
>
> On Thu, 28 Aug 2025 at 04:35, Adrián Larumbe
> <adrian.larumbe@collabora.com> wrote:
> > -void panfrost_job_close(struct panfrost_file_priv *panfrost_priv)
> > +int panfrost_jm_ctx_destroy(struct drm_file *file, u32 handle)
> > {
> > - struct panfrost_device *pfdev = panfrost_priv->pfdev;
> > - int i;
> > + struct panfrost_file_priv *priv = file->driver_priv;
> > + struct panfrost_device *pfdev = priv->pfdev;
> > + struct panfrost_jm_ctx *jm_ctx;
> >
> > - for (i = 0; i < NUM_JOB_SLOTS; i++)
> > - drm_sched_entity_destroy(&panfrost_priv->sched_entity[i]);
> > + jm_ctx = xa_erase(&priv->jm_ctxs, handle);
> > + if (!jm_ctx)
> > + return -EINVAL;
> > +
> > + for (u32 i = 0; i < ARRAY_SIZE(jm_ctx->slots); i++) {
> > + if (jm_ctx->slots[i].enabled)
> > + drm_sched_entity_destroy(&jm_ctx->slots[i].sched_entity);
> > + }
> >
> > /* Kill in-flight jobs */
> > spin_lock(&pfdev->js->job_lock);
> > - for (i = 0; i < NUM_JOB_SLOTS; i++) {
> > - struct drm_sched_entity *entity = &panfrost_priv->sched_entity[i];
> > - int j;
> > + for (u32 i = 0; i < ARRAY_SIZE(jm_ctx->slots); i++) {
> > + struct drm_sched_entity *entity = &jm_ctx->slots[i].sched_entity;
> > +
> > + if (!jm_ctx->slots[i].enabled)
> > + continue;
> >
> > - for (j = ARRAY_SIZE(pfdev->jobs[0]) - 1; j >= 0; j--) {
> > + for (int j = ARRAY_SIZE(pfdev->jobs[0]) - 1; j >= 0; j--) {
> > struct panfrost_job *job = pfdev->jobs[i][j];
> > u32 cmd;
> >
> > @@ -980,18 +1161,7 @@ void panfrost_job_close(struct panfrost_file_priv *panfrost_priv)
> > }
> > }
> > spin_unlock(&pfdev->js->job_lock);
> > -}
> > -
> > -int panfrost_job_is_idle(struct panfrost_device *pfdev)
> > -{
> > - struct panfrost_job_slot *js = pfdev->js;
> > - int i;
> > -
> > - for (i = 0; i < NUM_JOB_SLOTS; i++) {
> > - /* If there are any jobs in the HW queue, we're not idle */
> > - if (atomic_read(&js->queue[i].sched.credit_count))
> > - return false;
> > - }
> >
> > - return true;
> > + panfrost_jm_ctx_put(jm_ctx);
> > + return 0;
> > }
>
> It seems odd that both panfrost_jm_ctx_destroy() and
> panfrost_jm_ctx_release() share lifetime responsibilities. I'd expect
> calling panfrost_jm_ctx_destroy() to just release the xarray handle
> and drop the refcount.
I guess you refer to the drm_sched_entity_destroy() calls. If so, I
agree that they should be removed from panfrost_jm_ctx_release() because
panfrost_jm_ctx_destroy() should always be called for the JM ctx
refcount to drop to zero.
>
> I can see why calling panfrost_jm_ctx_destroy() is the one to go try
> to cancel the jobs - because the jobs keep a refcount on the context,
> so we need to break that cycle somehow. But having both the
> handle-release and object-release function drop a ref on the sched
> entity seems odd?
Note that drm_sched_entity_destroy() doesn't really drop a ref, it just
flushes/cancels the jobs, and makes sure the entity is no longer
considered by the scheduler. After the first drm_sched_entity_destroy()
happens (in jm_ctx_destroy()), I'd expect entity->rq to be NULL, making
the subsequent call to drm_sched_entity_destroy() (in jm_ctx_release())
a NOP (both drm_sched_entity_{flush,fini}() bail out early if
entity->rq is NULL). Now, there might be other things in
drm_sched_entity that are not safe to cleanup twice, and I agree that
drm_sched_entity_destroy() shouldn't be called in both places anyway.
>
> It doesn't help much that panfrost_job is used both for actual jobs
> (as the type) and the capability for a device to have multiple
> job-manager contexts (as a function prefix). Would be great to clean
> that up, so you don't have to think about whether e.g.
> panfrost_job_close() is actually operating on a panfrost_job, or
> operating on multiple panfrost_jm_ctx which operate on multiple
> panfrost_job.
Yep, we should definitely change the prefix to panthor_jm_ when the
function manipulates the JM scheduler context.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/5] drm/panfrost: Introduce uAPI for JM context creation
2025-08-28 2:34 ` [PATCH 2/5] drm/panfrost: Introduce uAPI for JM context creation Adrián Larumbe
@ 2025-09-01 10:52 ` Steven Price
2025-09-01 12:08 ` Adrián Larumbe
2025-09-01 12:14 ` Boris Brezillon
0 siblings, 2 replies; 14+ messages in thread
From: Steven Price @ 2025-09-01 10:52 UTC (permalink / raw)
To: Adrián Larumbe, linux-kernel
Cc: dri-devel, Boris Brezillon, kernel, Rob Herring, David Airlie,
Simona Vetter, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann
On 28/08/2025 03:34, Adrián Larumbe wrote:
> From: Boris Brezillon <boris.brezillon@collabora.com>
>
> The new uAPI lets user space query the KM driver for the available
> priorities a job can be given at submit time. These are managed through
> the notion of a context, which besides a priority, codifies the list
> of L2 caches, shading cores and tiler units a job is allowed to use,
> for all three of the available device job slots.
>
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
There's no cover letter for this series, so maybe I'm missing some
context. But I'm not sure why we want to expose the tiler/l2/core masks
to user space.
If you were trying to better support OpenCL on T628 I can just about
understand the core mask. But, I doubt you are... (does anyone care
about that anymore? ;) ). And really it's the core groups that matter
rather than the raw affinities.
The tiler/l2 affinities (and the XAFFINITY register in general) is there
as a power saving mechanism. If we know that a job is not going to use
the shader cores at all (a tiler-only job) then we can avoid turning
them on, but obviously we still need the L2 and tiler blocks to be powered.
kbase handled this with a "core_req" field which listed the required
cores for each job. We already have a "requirements" field which we
could extend for the same purpose (PANFROST_JD_REQ_TILER_ONLY or
similar). I don't think this makes sense to include in a "context".
But like I said, maybe I'm missing something - what is the use case for
controlling affinity?
[The priority parts look ok here, but that's mixed in with the affinity
changes.]
> ---
> include/uapi/drm/panfrost_drm.h | 93 +++++++++++++++++++++++++++++++++
> 1 file changed, 93 insertions(+)
>
> diff --git a/include/uapi/drm/panfrost_drm.h b/include/uapi/drm/panfrost_drm.h
> index ed67510395bd..2d8b32448e68 100644
> --- a/include/uapi/drm/panfrost_drm.h
> +++ b/include/uapi/drm/panfrost_drm.h
> @@ -22,6 +22,8 @@ extern "C" {
> #define DRM_PANFROST_PERFCNT_DUMP 0x07
> #define DRM_PANFROST_MADVISE 0x08
> #define DRM_PANFROST_SET_LABEL_BO 0x09
> +#define DRM_PANFROST_JM_CTX_CREATE 0x0a
> +#define DRM_PANFROST_JM_CTX_DESTROY 0x0b
>
> #define DRM_IOCTL_PANFROST_SUBMIT DRM_IOW(DRM_COMMAND_BASE + DRM_PANFROST_SUBMIT, struct drm_panfrost_submit)
> #define DRM_IOCTL_PANFROST_WAIT_BO DRM_IOW(DRM_COMMAND_BASE + DRM_PANFROST_WAIT_BO, struct drm_panfrost_wait_bo)
> @@ -31,6 +33,8 @@ extern "C" {
> #define DRM_IOCTL_PANFROST_GET_BO_OFFSET DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_GET_BO_OFFSET, struct drm_panfrost_get_bo_offset)
> #define DRM_IOCTL_PANFROST_MADVISE DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_MADVISE, struct drm_panfrost_madvise)
> #define DRM_IOCTL_PANFROST_SET_LABEL_BO DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_SET_LABEL_BO, struct drm_panfrost_set_label_bo)
> +#define DRM_IOCTL_PANFROST_JM_CTX_CREATE DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_JM_CTX_CREATE, struct drm_panfrost_jm_ctx_create)
> +#define DRM_IOCTL_PANFROST_JM_CTX_DESTROY DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_JM_CTX_DESTROY, struct drm_panfrost_jm_ctx_destroy)
>
> /*
> * Unstable ioctl(s): only exposed when the unsafe unstable_ioctls module
> @@ -71,6 +75,12 @@ struct drm_panfrost_submit {
>
> /** A combination of PANFROST_JD_REQ_* */
> __u32 requirements;
> +
> + /** JM context handle. Zero if you want to use the default context. */
> + __u32 jm_ctx_handle;
> +
> + /** Padding field. MBZ. */
> + __u32 pad;
> };
>
> /**
> @@ -177,6 +187,7 @@ enum drm_panfrost_param {
> DRM_PANFROST_PARAM_AFBC_FEATURES,
> DRM_PANFROST_PARAM_SYSTEM_TIMESTAMP,
> DRM_PANFROST_PARAM_SYSTEM_TIMESTAMP_FREQUENCY,
> + DRM_PANFROST_PARAM_ALLOWED_JM_CTX_PRIORITIES,
> };
>
> struct drm_panfrost_get_param {
> @@ -299,6 +310,88 @@ struct panfrost_dump_registers {
> __u32 value;
> };
>
> +enum drm_panfrost_jm_ctx_priority {
> + /**
> + * @PANFROST_JM_CTX_PRIORITY_LOW: Low priority context.
> + */
> + PANFROST_JM_CTX_PRIORITY_LOW = 0,
> +
> + /**
> + * @PANFROST_JM_CTX_PRIORITY_MEDIUM: Medium priority context.
> + */
> + PANFROST_JM_CTX_PRIORITY_MEDIUM,
> +
> + /**
> + * @PANFROST_JM_CTX_PRIORITY_HIGH: High priority context.
> + *
> + * Requires CAP_SYS_NICE or DRM_MASTER.
> + */
> + PANFROST_JM_CTX_PRIORITY_HIGH,
> +};
> +
> +#define PANFROST_JS_FLAG_ENABLED (1 << 0)
> +
> +struct drm_panfrost_js_ctx_info {
> + /** @flags: Combination of PANFROST_JS_FLAG_xxx values */
> + __u32 flags;
> +
> + /** @priority: Context priority (see enum drm_panfrost_jm_ctx_priority). */
> + __u8 priority;
> +
> + /**
> + * @tiler_mask: Mask encoding tiler units that can be used by the job slot
> + *
> + * When this field is zero, it means the tiler won't be used.
> + *
> + * The bits set here should also be set in drm_panthor_gpu_info::tiler_present.
> + */
> + __u8 tiler_mask;
> +
> + /**
> + * @l2_mask: Mask encoding L2 caches that can be used by the job slot
> + *
> + * The bits set here should also be set in drm_panthor_gpu_info::l2_present.:
> + */
> + __u16 l2_mask;
> +
> + /**
> + * @core_mask: Mask encoding cores that can be used by the job slot
> + *
> + * When this field is zero, it means the queue won't be used.
> + *
> + * The bits set here should also be set in drm_panthor_gpu_info::shader_present.
> + */
> + __u64 core_mask;
> +};
> +
> +struct drm_panfrost_jm_ctx_create {
> + /** @handle: Handle of the created JM context */
> + __u32 handle;
> +
> + /** @pad: Padding field, MBZ. */
> + __u32 pad;
> +
> + /**
> + * @slots: Job slots
> + *
> + * This field must be greater than zero and less than 8 (only three slots
> + * available).
> + */
> + struct drm_panfrost_js_ctx_info slots[3];
We don't allow user space to choose which slot is being targetted, so
this feels odd. I guess this allows deliberately disabling slot 1 to
force slot 2. But the code in this series doesn't seem to implement
this. I'm also not sure I understand why you would want a different
priority for different slots?
Thanks,
Steve
> +};
> +
> +struct drm_panfrost_jm_ctx_destroy {
> + /**
> + * @handle: Handle of the JM context to destroy.
> + *
> + * Must be a valid context handle returned by DRM_IOCTL_PANTHOR_JM_CTX_CREATE.
> + */
> + __u32 handle;
> +
> + /** @pad: Padding field, MBZ. */
> + __u32 pad;
> +};
> +
> #if defined(__cplusplus)
> }
> #endif
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/5] drm/panfrost: Introduce uAPI for JM context creation
2025-09-01 10:52 ` Steven Price
@ 2025-09-01 12:08 ` Adrián Larumbe
2025-09-01 13:45 ` Steven Price
2025-09-01 12:14 ` Boris Brezillon
1 sibling, 1 reply; 14+ messages in thread
From: Adrián Larumbe @ 2025-09-01 12:08 UTC (permalink / raw)
To: Steven Price
Cc: linux-kernel, dri-devel, Boris Brezillon, kernel, Rob Herring,
David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann
Hi Steven,
On 01.09.2025 11:52, Steven Price wrote:
>
> On 28/08/2025 03:34, Adrián Larumbe wrote:
> > From: Boris Brezillon <boris.brezillon@collabora.com>
> >
> > The new uAPI lets user space query the KM driver for the available
> > priorities a job can be given at submit time. These are managed through
> > the notion of a context, which besides a priority, codifies the list
> > of L2 caches, shading cores and tiler units a job is allowed to use,
> > for all three of the available device job slots.
> >
> > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> > Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
>
> There's no cover letter for this series, so maybe I'm missing some
> context. But I'm not sure why we want to expose the tiler/l2/core masks
> to user space.
Sorry about this, I believe this is the second time you face this issue, must be
something to do with my git sendmail configuration that only sends the cover letter
to a limited number of recipients, unlike the rest of patches in the series.
You can find it here:
https://lore.kernel.org/dri-devel/20250828023422.2404784-1-adrian.larumbe@collabora.com/
> If you were trying to better support OpenCL on T628 I can just about
> understand the core mask. But, I doubt you are... (does anyone care
> about that anymore? ;) ). And really it's the core groups that matter
> rather than the raw affinities.
>
> The tiler/l2 affinities (and the XAFFINITY register in general) is there
> as a power saving mechanism. If we know that a job is not going to use
> the shader cores at all (a tiler-only job) then we can avoid turning
> them on, but obviously we still need the L2 and tiler blocks to be powered.
>
> kbase handled this with a "core_req" field which listed the required
> cores for each job. We already have a "requirements" field which we
> could extend for the same purpose (PANFROST_JD_REQ_TILER_ONLY or
> similar). I don't think this makes sense to include in a "context".
>
> But like I said, maybe I'm missing something - what is the use case for
> controlling affinity?
You're right here, this seems to be a case of overengineering. I guess because
Panthor offers finer grained affinity control in its queue submission mechanism,
we probably thought there might be a use case for it on Panfrost in the future.
In fact in the Mesa MR I do mention that at the moment UM simply selects the
full mask of available l2 caches and tiling/shading cores for all new contexts,
so in practice we're not using it.
> [The priority parts look ok here, but that's mixed in with the affinity
> changes.]
I'll drop exposure of affinity parameters in a v2 of this series.
> > ---
> > include/uapi/drm/panfrost_drm.h | 93 +++++++++++++++++++++++++++++++++
> > 1 file changed, 93 insertions(+)
> >
> > diff --git a/include/uapi/drm/panfrost_drm.h b/include/uapi/drm/panfrost_drm.h
> > index ed67510395bd..2d8b32448e68 100644
> > --- a/include/uapi/drm/panfrost_drm.h
> > +++ b/include/uapi/drm/panfrost_drm.h
> > @@ -22,6 +22,8 @@ extern "C" {
> > #define DRM_PANFROST_PERFCNT_DUMP 0x07
> > #define DRM_PANFROST_MADVISE 0x08
> > #define DRM_PANFROST_SET_LABEL_BO 0x09
> > +#define DRM_PANFROST_JM_CTX_CREATE 0x0a
> > +#define DRM_PANFROST_JM_CTX_DESTROY 0x0b
> >
> > #define DRM_IOCTL_PANFROST_SUBMIT DRM_IOW(DRM_COMMAND_BASE + DRM_PANFROST_SUBMIT, struct drm_panfrost_submit)
> > #define DRM_IOCTL_PANFROST_WAIT_BO DRM_IOW(DRM_COMMAND_BASE + DRM_PANFROST_WAIT_BO, struct drm_panfrost_wait_bo)
> > @@ -31,6 +33,8 @@ extern "C" {
> > #define DRM_IOCTL_PANFROST_GET_BO_OFFSET DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_GET_BO_OFFSET, struct drm_panfrost_get_bo_offset)
> > #define DRM_IOCTL_PANFROST_MADVISE DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_MADVISE, struct drm_panfrost_madvise)
> > #define DRM_IOCTL_PANFROST_SET_LABEL_BO DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_SET_LABEL_BO, struct drm_panfrost_set_label_bo)
> > +#define DRM_IOCTL_PANFROST_JM_CTX_CREATE DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_JM_CTX_CREATE, struct drm_panfrost_jm_ctx_create)
> > +#define DRM_IOCTL_PANFROST_JM_CTX_DESTROY DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_JM_CTX_DESTROY, struct drm_panfrost_jm_ctx_destroy)
> >
> > /*
> > * Unstable ioctl(s): only exposed when the unsafe unstable_ioctls module
> > @@ -71,6 +75,12 @@ struct drm_panfrost_submit {
> >
> > /** A combination of PANFROST_JD_REQ_* */
> > __u32 requirements;
> > +
> > + /** JM context handle. Zero if you want to use the default context. */
> > + __u32 jm_ctx_handle;
> > +
> > + /** Padding field. MBZ. */
> > + __u32 pad;
> > };
> >
> > /**
> > @@ -177,6 +187,7 @@ enum drm_panfrost_param {
> > DRM_PANFROST_PARAM_AFBC_FEATURES,
> > DRM_PANFROST_PARAM_SYSTEM_TIMESTAMP,
> > DRM_PANFROST_PARAM_SYSTEM_TIMESTAMP_FREQUENCY,
> > + DRM_PANFROST_PARAM_ALLOWED_JM_CTX_PRIORITIES,
> > };
> >
> > struct drm_panfrost_get_param {
> > @@ -299,6 +310,88 @@ struct panfrost_dump_registers {
> > __u32 value;
> > };
> >
> > +enum drm_panfrost_jm_ctx_priority {
> > + /**
> > + * @PANFROST_JM_CTX_PRIORITY_LOW: Low priority context.
> > + */
> > + PANFROST_JM_CTX_PRIORITY_LOW = 0,
> > +
> > + /**
> > + * @PANFROST_JM_CTX_PRIORITY_MEDIUM: Medium priority context.
> > + */
> > + PANFROST_JM_CTX_PRIORITY_MEDIUM,
> > +
> > + /**
> > + * @PANFROST_JM_CTX_PRIORITY_HIGH: High priority context.
> > + *
> > + * Requires CAP_SYS_NICE or DRM_MASTER.
> > + */
> > + PANFROST_JM_CTX_PRIORITY_HIGH,
> > +};
> > +
> > +#define PANFROST_JS_FLAG_ENABLED (1 << 0)
> > +
> > +struct drm_panfrost_js_ctx_info {
> > + /** @flags: Combination of PANFROST_JS_FLAG_xxx values */
> > + __u32 flags;
> > +
> > + /** @priority: Context priority (see enum drm_panfrost_jm_ctx_priority). */
> > + __u8 priority;
> > +
> > + /**
> > + * @tiler_mask: Mask encoding tiler units that can be used by the job slot
> > + *
> > + * When this field is zero, it means the tiler won't be used.
> > + *
> > + * The bits set here should also be set in drm_panthor_gpu_info::tiler_present.
> > + */
> > + __u8 tiler_mask;
> > +
> > + /**
> > + * @l2_mask: Mask encoding L2 caches that can be used by the job slot
> > + *
> > + * The bits set here should also be set in drm_panthor_gpu_info::l2_present.:
> > + */
> > + __u16 l2_mask;
> > +
> > + /**
> > + * @core_mask: Mask encoding cores that can be used by the job slot
> > + *
> > + * When this field is zero, it means the queue won't be used.
> > + *
> > + * The bits set here should also be set in drm_panthor_gpu_info::shader_present.
> > + */
> > + __u64 core_mask;
> > +};
> > +
> > +struct drm_panfrost_jm_ctx_create {
> > + /** @handle: Handle of the created JM context */
> > + __u32 handle;
> > +
> > + /** @pad: Padding field, MBZ. */
> > + __u32 pad;
> > +
> > + /**
> > + * @slots: Job slots
> > + *
> > + * This field must be greater than zero and less than 8 (only three slots
> > + * available).
> > + */
> > + struct drm_panfrost_js_ctx_info slots[3];
>
> We don't allow user space to choose which slot is being targetted, so
> this feels odd. I guess this allows deliberately disabling slot 1 to
> force slot 2. But the code in this series doesn't seem to implement
> this. I'm also not sure I understand why you would want a different
> priority for different slots?
>
> Thanks,
> Steve
>
> > +};
> > +
> > +struct drm_panfrost_jm_ctx_destroy {
> > + /**
> > + * @handle: Handle of the JM context to destroy.
> > + *
> > + * Must be a valid context handle returned by DRM_IOCTL_PANTHOR_JM_CTX_CREATE.
> > + */
> > + __u32 handle;
> > +
> > + /** @pad: Padding field, MBZ. */
> > + __u32 pad;
> > +};
> > +
> > #if defined(__cplusplus)
> > }
> > #endif
Adrian Larumbe
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/5] drm/panfrost: Introduce uAPI for JM context creation
2025-09-01 10:52 ` Steven Price
2025-09-01 12:08 ` Adrián Larumbe
@ 2025-09-01 12:14 ` Boris Brezillon
2025-09-01 14:15 ` Steven Price
1 sibling, 1 reply; 14+ messages in thread
From: Boris Brezillon @ 2025-09-01 12:14 UTC (permalink / raw)
To: Steven Price
Cc: Adrián Larumbe, linux-kernel, dri-devel, kernel, Rob Herring,
David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann
Hi Steve,
On Mon, 1 Sep 2025 11:52:02 +0100
Steven Price <steven.price@arm.com> wrote:
> On 28/08/2025 03:34, Adrián Larumbe wrote:
> > From: Boris Brezillon <boris.brezillon@collabora.com>
> >
> > The new uAPI lets user space query the KM driver for the available
> > priorities a job can be given at submit time. These are managed through
> > the notion of a context, which besides a priority, codifies the list
> > of L2 caches, shading cores and tiler units a job is allowed to use,
> > for all three of the available device job slots.
> >
> > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> > Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
>
> There's no cover letter for this series, so maybe I'm missing some
> context. But I'm not sure why we want to expose the tiler/l2/core masks
> to user space.
tiler/l2 masks, I'm not sure we need, especially if there's only just
one tiler unit / l2 cache. I exposed the core mask so one can reserve
cores for an application.
>
> If you were trying to better support OpenCL on T628 I can just about
> understand the core mask. But, I doubt you are... (does anyone care
> about that anymore? ;) ). And really it's the core groups that matter
> rather than the raw affinities.
Ok, so low vs high bits (don't know the granularity of the core group,
so low/high might actually bit low/middle-low/middle-high/high) in the
the affinity register, right?
>
> The tiler/l2 affinities (and the XAFFINITY register in general) is there
> as a power saving mechanism. If we know that a job is not going to use
> the shader cores at all (a tiler-only job) then we can avoid turning
> them on, but obviously we still need the L2 and tiler blocks to be powered.
Okay, I thought it was more of a "use only these cores, the rest is
reserved for something else", my bad.
>
> kbase handled this with a "core_req" field which listed the required
> cores for each job. We already have a "requirements" field which we
> could extend for the same purpose (PANFROST_JD_REQ_TILER_ONLY or
> similar). I don't think this makes sense to include in a "context".
It was more a core reservation mechanism, which I expected to be forced
at context creation time. I mean, it can still be at the UMD level, and
we would pass the mask of cores to use at job submission time. The
problem I see with just expressing the maximum number of cores one can
use is that it doesn't work for core reservation. Also, I went for this
interface because that's more or less what panthor exposes (mask of
cores that can be used, and maximum of number of cores that can be used
in this pool).
>
> But like I said, maybe I'm missing something - what is the use case for
> controlling affinity?
>
> [The priority parts look ok here, but that's mixed in with the affinity
> changes.]
>
> > ---
> > include/uapi/drm/panfrost_drm.h | 93 +++++++++++++++++++++++++++++++++
> > 1 file changed, 93 insertions(+)
> >
> > diff --git a/include/uapi/drm/panfrost_drm.h b/include/uapi/drm/panfrost_drm.h
> > index ed67510395bd..2d8b32448e68 100644
> > --- a/include/uapi/drm/panfrost_drm.h
> > +++ b/include/uapi/drm/panfrost_drm.h
> > @@ -22,6 +22,8 @@ extern "C" {
> > #define DRM_PANFROST_PERFCNT_DUMP 0x07
> > #define DRM_PANFROST_MADVISE 0x08
> > #define DRM_PANFROST_SET_LABEL_BO 0x09
> > +#define DRM_PANFROST_JM_CTX_CREATE 0x0a
> > +#define DRM_PANFROST_JM_CTX_DESTROY 0x0b
> >
> > #define DRM_IOCTL_PANFROST_SUBMIT DRM_IOW(DRM_COMMAND_BASE + DRM_PANFROST_SUBMIT, struct drm_panfrost_submit)
> > #define DRM_IOCTL_PANFROST_WAIT_BO DRM_IOW(DRM_COMMAND_BASE + DRM_PANFROST_WAIT_BO, struct drm_panfrost_wait_bo)
> > @@ -31,6 +33,8 @@ extern "C" {
> > #define DRM_IOCTL_PANFROST_GET_BO_OFFSET DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_GET_BO_OFFSET, struct drm_panfrost_get_bo_offset)
> > #define DRM_IOCTL_PANFROST_MADVISE DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_MADVISE, struct drm_panfrost_madvise)
> > #define DRM_IOCTL_PANFROST_SET_LABEL_BO DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_SET_LABEL_BO, struct drm_panfrost_set_label_bo)
> > +#define DRM_IOCTL_PANFROST_JM_CTX_CREATE DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_JM_CTX_CREATE, struct drm_panfrost_jm_ctx_create)
> > +#define DRM_IOCTL_PANFROST_JM_CTX_DESTROY DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_JM_CTX_DESTROY, struct drm_panfrost_jm_ctx_destroy)
> >
> > /*
> > * Unstable ioctl(s): only exposed when the unsafe unstable_ioctls module
> > @@ -71,6 +75,12 @@ struct drm_panfrost_submit {
> >
> > /** A combination of PANFROST_JD_REQ_* */
> > __u32 requirements;
> > +
> > + /** JM context handle. Zero if you want to use the default context. */
> > + __u32 jm_ctx_handle;
> > +
> > + /** Padding field. MBZ. */
> > + __u32 pad;
> > };
> >
> > /**
> > @@ -177,6 +187,7 @@ enum drm_panfrost_param {
> > DRM_PANFROST_PARAM_AFBC_FEATURES,
> > DRM_PANFROST_PARAM_SYSTEM_TIMESTAMP,
> > DRM_PANFROST_PARAM_SYSTEM_TIMESTAMP_FREQUENCY,
> > + DRM_PANFROST_PARAM_ALLOWED_JM_CTX_PRIORITIES,
> > };
> >
> > struct drm_panfrost_get_param {
> > @@ -299,6 +310,88 @@ struct panfrost_dump_registers {
> > __u32 value;
> > };
> >
> > +enum drm_panfrost_jm_ctx_priority {
> > + /**
> > + * @PANFROST_JM_CTX_PRIORITY_LOW: Low priority context.
> > + */
> > + PANFROST_JM_CTX_PRIORITY_LOW = 0,
> > +
> > + /**
> > + * @PANFROST_JM_CTX_PRIORITY_MEDIUM: Medium priority context.
> > + */
> > + PANFROST_JM_CTX_PRIORITY_MEDIUM,
> > +
> > + /**
> > + * @PANFROST_JM_CTX_PRIORITY_HIGH: High priority context.
> > + *
> > + * Requires CAP_SYS_NICE or DRM_MASTER.
> > + */
> > + PANFROST_JM_CTX_PRIORITY_HIGH,
> > +};
> > +
> > +#define PANFROST_JS_FLAG_ENABLED (1 << 0)
> > +
> > +struct drm_panfrost_js_ctx_info {
> > + /** @flags: Combination of PANFROST_JS_FLAG_xxx values */
> > + __u32 flags;
> > +
> > + /** @priority: Context priority (see enum drm_panfrost_jm_ctx_priority). */
> > + __u8 priority;
> > +
> > + /**
> > + * @tiler_mask: Mask encoding tiler units that can be used by the job slot
> > + *
> > + * When this field is zero, it means the tiler won't be used.
> > + *
> > + * The bits set here should also be set in drm_panthor_gpu_info::tiler_present.
> > + */
> > + __u8 tiler_mask;
> > +
> > + /**
> > + * @l2_mask: Mask encoding L2 caches that can be used by the job slot
> > + *
> > + * The bits set here should also be set in drm_panthor_gpu_info::l2_present.:
> > + */
> > + __u16 l2_mask;
> > +
> > + /**
> > + * @core_mask: Mask encoding cores that can be used by the job slot
> > + *
> > + * When this field is zero, it means the queue won't be used.
> > + *
> > + * The bits set here should also be set in drm_panthor_gpu_info::shader_present.
> > + */
> > + __u64 core_mask;
> > +};
> > +
> > +struct drm_panfrost_jm_ctx_create {
> > + /** @handle: Handle of the created JM context */
> > + __u32 handle;
> > +
> > + /** @pad: Padding field, MBZ. */
> > + __u32 pad;
> > +
> > + /**
> > + * @slots: Job slots
> > + *
> > + * This field must be greater than zero and less than 8 (only three slots
> > + * available).
Not sure what this doc referred to, but slots is not an integer :D.
> > + */
> > + struct drm_panfrost_js_ctx_info slots[3];
>
> We don't allow user space to choose which slot is being targetted, so
> this feels odd.
Some of this has been extracted from the panthor-ification of JM, and
you're probably right that it doesn't make sense to expose the
subqueues in panfrost.
> I guess this allows deliberately disabling slot 1 to
> force slot 2. But the code in this series doesn't seem to implement
> this. I'm also not sure I understand why you would want a different
> priority for different slots?
Internally, a slot maps to a sched entity, which is where the priority
is defined. Sure, we could have a global priority for the whole context,
but I figured I'd just expose what the KMD is capable of (per subqueue
priority) and let the UMD assign the same priority to all slots. But if
we don't expose the slots directly, we might as well just define a
priority and the set of resources that can be used by any of the
subqueues.
Regards,
Boris
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/5] drm/panfrost: Introduce uAPI for JM context creation
2025-09-01 12:08 ` Adrián Larumbe
@ 2025-09-01 13:45 ` Steven Price
0 siblings, 0 replies; 14+ messages in thread
From: Steven Price @ 2025-09-01 13:45 UTC (permalink / raw)
To: Adrián Larumbe
Cc: linux-kernel, dri-devel, Boris Brezillon, kernel, Rob Herring,
David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann
On 01/09/2025 13:08, Adrián Larumbe wrote:
> Hi Steven,
>
> On 01.09.2025 11:52, Steven Price wrote:
>>
>> On 28/08/2025 03:34, Adrián Larumbe wrote:
>>> From: Boris Brezillon <boris.brezillon@collabora.com>
>>>
>>> The new uAPI lets user space query the KM driver for the available
>>> priorities a job can be given at submit time. These are managed through
>>> the notion of a context, which besides a priority, codifies the list
>>> of L2 caches, shading cores and tiler units a job is allowed to use,
>>> for all three of the available device job slots.
>>>
>>> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
>>> Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
>>
>> There's no cover letter for this series, so maybe I'm missing some
>> context. But I'm not sure why we want to expose the tiler/l2/core masks
>> to user space.
>
> Sorry about this, I believe this is the second time you face this issue, must be
> something to do with my git sendmail configuration that only sends the cover letter
> to a limited number of recipients, unlike the rest of patches in the series.
>
> You can find it here:
>
> https://lore.kernel.org/dri-devel/20250828023422.2404784-1-adrian.larumbe@collabora.com/
Ah, I didn't think to go looking on the list. Thanks for the link.
>
>> If you were trying to better support OpenCL on T628 I can just about
>> understand the core mask. But, I doubt you are... (does anyone care
>> about that anymore? ;) ). And really it's the core groups that matter
>> rather than the raw affinities.
>>
>> The tiler/l2 affinities (and the XAFFINITY register in general) is there
>> as a power saving mechanism. If we know that a job is not going to use
>> the shader cores at all (a tiler-only job) then we can avoid turning
>> them on, but obviously we still need the L2 and tiler blocks to be powered.
>>
>> kbase handled this with a "core_req" field which listed the required
>> cores for each job. We already have a "requirements" field which we
>> could extend for the same purpose (PANFROST_JD_REQ_TILER_ONLY or
>> similar). I don't think this makes sense to include in a "context".
>>
>> But like I said, maybe I'm missing something - what is the use case for
>> controlling affinity?
>
> You're right here, this seems to be a case of overengineering. I guess because
> Panthor offers finer grained affinity control in its queue submission mechanism,
> we probably thought there might be a use case for it on Panfrost in the future.
I thought that might be the case, in case you're interested here's a
potted history (feel free to skip!):
<mali history lesson>
The original design for T60x was that it could scale up to a large
number of cores. But there was a limit to the number of cores that could
be (at least sensibly) connected to the L2 cache. So the design included
the possibility of multiple L2 caches.
However the GPU generally wasn't cache coherent - the coherency traffic
generated by the GPU back would overwhelm the memory systems of the day
(at least the ones we were designing for). So cache maintenance is
'manual'. Sadly this extended to the individual L2s within the GPU (they
were non-coherent with each other).
So the concept of "core groups" was created. Each L2 has a set of cores
connected to it that was considered a separate group. For graphics this
worked well - vertex processing was light so could run on a single
group, and fragment processing didn't need coherency between tiles so
could be easily split across core groups. The problem was compute (as
OpenCL was predicted to be a big thing).
OpenCL expects a certain level of coherency, and so the "solution" here
was to expose each core group as a separate device. This is the reason
for slot 2: compute work can be submitted to both slots 1 and 2 with the
affinity masks restricting each slot to a single core group.
What we discovered was that this scheme doesn't actually work well in
practice. Exposing two OpenCL devices isn't a great solution, and vertex
processing was increasing in importance (along with other compute APIs
starting to appear). Also the only way of scaling beyond 2 L2 caches
would have been to add extra job slots making the software effects worse.
In the end I believe the only GPU which was actually produced which
implemented more than 1 core group was the T628 (two core groups of 4
cores). The hardware design was changed to distribute accesses across
the L2 caches and effectively make the L2 caches coherent[1].
So T628 is the only GPU to report more than 1 L2 cache. Later GPUs
technically had more than 1 L2, but the register still reads as 0x1. The
3 slots remained even though slot 2 was effectively pointless -
overlapping affinities between slots 1 and 2 cause various problems, and
so it's more efficient to send everything to the one slot with the
complete set of cores in the affinity mask.
Of course then everything changed again once we switched to the CSF
GPUs. CSF no longer has the "slot" concept and so affinity can be used
more flexibly.
[1] My understanding is that the caches are striped, so they are not
technically coherent, but an address will only ever be stored in one
cache. So there's no coherency logic required by software (at least
within the GPU).
</mali history lesson>
TLDR; Affinity control for job manager GPUs only really makes sense on
the T628 (and occasionally for testing on other GPUs). CSF is different
because we don't have the slot concept.
> In fact in the Mesa MR I do mention that at the moment UM simply selects the
> full mask of available l2 caches and tiling/shading cores for all new contexts,
> so in practice we're not using it.
>
>> [The priority parts look ok here, but that's mixed in with the affinity
>> changes.]
>
> I'll drop exposure of affinity parameters in a v2 of this series.
Thanks!
Steve
>>> ---
>>> include/uapi/drm/panfrost_drm.h | 93 +++++++++++++++++++++++++++++++++
>>> 1 file changed, 93 insertions(+)
>>>
>>> diff --git a/include/uapi/drm/panfrost_drm.h b/include/uapi/drm/panfrost_drm.h
>>> index ed67510395bd..2d8b32448e68 100644
>>> --- a/include/uapi/drm/panfrost_drm.h
>>> +++ b/include/uapi/drm/panfrost_drm.h
>>> @@ -22,6 +22,8 @@ extern "C" {
>>> #define DRM_PANFROST_PERFCNT_DUMP 0x07
>>> #define DRM_PANFROST_MADVISE 0x08
>>> #define DRM_PANFROST_SET_LABEL_BO 0x09
>>> +#define DRM_PANFROST_JM_CTX_CREATE 0x0a
>>> +#define DRM_PANFROST_JM_CTX_DESTROY 0x0b
>>>
>>> #define DRM_IOCTL_PANFROST_SUBMIT DRM_IOW(DRM_COMMAND_BASE + DRM_PANFROST_SUBMIT, struct drm_panfrost_submit)
>>> #define DRM_IOCTL_PANFROST_WAIT_BO DRM_IOW(DRM_COMMAND_BASE + DRM_PANFROST_WAIT_BO, struct drm_panfrost_wait_bo)
>>> @@ -31,6 +33,8 @@ extern "C" {
>>> #define DRM_IOCTL_PANFROST_GET_BO_OFFSET DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_GET_BO_OFFSET, struct drm_panfrost_get_bo_offset)
>>> #define DRM_IOCTL_PANFROST_MADVISE DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_MADVISE, struct drm_panfrost_madvise)
>>> #define DRM_IOCTL_PANFROST_SET_LABEL_BO DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_SET_LABEL_BO, struct drm_panfrost_set_label_bo)
>>> +#define DRM_IOCTL_PANFROST_JM_CTX_CREATE DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_JM_CTX_CREATE, struct drm_panfrost_jm_ctx_create)
>>> +#define DRM_IOCTL_PANFROST_JM_CTX_DESTROY DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_JM_CTX_DESTROY, struct drm_panfrost_jm_ctx_destroy)
>>>
>>> /*
>>> * Unstable ioctl(s): only exposed when the unsafe unstable_ioctls module
>>> @@ -71,6 +75,12 @@ struct drm_panfrost_submit {
>>>
>>> /** A combination of PANFROST_JD_REQ_* */
>>> __u32 requirements;
>>> +
>>> + /** JM context handle. Zero if you want to use the default context. */
>>> + __u32 jm_ctx_handle;
>>> +
>>> + /** Padding field. MBZ. */
>>> + __u32 pad;
>>> };
>>>
>>> /**
>>> @@ -177,6 +187,7 @@ enum drm_panfrost_param {
>>> DRM_PANFROST_PARAM_AFBC_FEATURES,
>>> DRM_PANFROST_PARAM_SYSTEM_TIMESTAMP,
>>> DRM_PANFROST_PARAM_SYSTEM_TIMESTAMP_FREQUENCY,
>>> + DRM_PANFROST_PARAM_ALLOWED_JM_CTX_PRIORITIES,
>>> };
>>>
>>> struct drm_panfrost_get_param {
>>> @@ -299,6 +310,88 @@ struct panfrost_dump_registers {
>>> __u32 value;
>>> };
>>>
>>> +enum drm_panfrost_jm_ctx_priority {
>>> + /**
>>> + * @PANFROST_JM_CTX_PRIORITY_LOW: Low priority context.
>>> + */
>>> + PANFROST_JM_CTX_PRIORITY_LOW = 0,
>>> +
>>> + /**
>>> + * @PANFROST_JM_CTX_PRIORITY_MEDIUM: Medium priority context.
>>> + */
>>> + PANFROST_JM_CTX_PRIORITY_MEDIUM,
>>> +
>>> + /**
>>> + * @PANFROST_JM_CTX_PRIORITY_HIGH: High priority context.
>>> + *
>>> + * Requires CAP_SYS_NICE or DRM_MASTER.
>>> + */
>>> + PANFROST_JM_CTX_PRIORITY_HIGH,
>>> +};
>>> +
>>> +#define PANFROST_JS_FLAG_ENABLED (1 << 0)
>>> +
>>> +struct drm_panfrost_js_ctx_info {
>>> + /** @flags: Combination of PANFROST_JS_FLAG_xxx values */
>>> + __u32 flags;
>>> +
>>> + /** @priority: Context priority (see enum drm_panfrost_jm_ctx_priority). */
>>> + __u8 priority;
>>> +
>>> + /**
>>> + * @tiler_mask: Mask encoding tiler units that can be used by the job slot
>>> + *
>>> + * When this field is zero, it means the tiler won't be used.
>>> + *
>>> + * The bits set here should also be set in drm_panthor_gpu_info::tiler_present.
>>> + */
>>> + __u8 tiler_mask;
>>> +
>>> + /**
>>> + * @l2_mask: Mask encoding L2 caches that can be used by the job slot
>>> + *
>>> + * The bits set here should also be set in drm_panthor_gpu_info::l2_present.:
>>> + */
>>> + __u16 l2_mask;
>>> +
>>> + /**
>>> + * @core_mask: Mask encoding cores that can be used by the job slot
>>> + *
>>> + * When this field is zero, it means the queue won't be used.
>>> + *
>>> + * The bits set here should also be set in drm_panthor_gpu_info::shader_present.
>>> + */
>>> + __u64 core_mask;
>>> +};
>>> +
>>> +struct drm_panfrost_jm_ctx_create {
>>> + /** @handle: Handle of the created JM context */
>>> + __u32 handle;
>>> +
>>> + /** @pad: Padding field, MBZ. */
>>> + __u32 pad;
>>> +
>>> + /**
>>> + * @slots: Job slots
>>> + *
>>> + * This field must be greater than zero and less than 8 (only three slots
>>> + * available).
>>> + */
>>> + struct drm_panfrost_js_ctx_info slots[3];
>>
>> We don't allow user space to choose which slot is being targetted, so
>> this feels odd. I guess this allows deliberately disabling slot 1 to
>> force slot 2. But the code in this series doesn't seem to implement
>> this. I'm also not sure I understand why you would want a different
>> priority for different slots?
>>
>> Thanks,
>> Steve
>>
>>> +};
>>> +
>>> +struct drm_panfrost_jm_ctx_destroy {
>>> + /**
>>> + * @handle: Handle of the JM context to destroy.
>>> + *
>>> + * Must be a valid context handle returned by DRM_IOCTL_PANTHOR_JM_CTX_CREATE.
>>> + */
>>> + __u32 handle;
>>> +
>>> + /** @pad: Padding field, MBZ. */
>>> + __u32 pad;
>>> +};
>>> +
>>> #if defined(__cplusplus)
>>> }
>>> #endif
>
> Adrian Larumbe
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/5] drm/panfrost: Introduce uAPI for JM context creation
2025-09-01 12:14 ` Boris Brezillon
@ 2025-09-01 14:15 ` Steven Price
0 siblings, 0 replies; 14+ messages in thread
From: Steven Price @ 2025-09-01 14:15 UTC (permalink / raw)
To: Boris Brezillon
Cc: Adrián Larumbe, linux-kernel, dri-devel, kernel, Rob Herring,
David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann
Hi Boris,
On 01/09/2025 13:14, Boris Brezillon wrote:
> Hi Steve,
>
> On Mon, 1 Sep 2025 11:52:02 +0100
> Steven Price <steven.price@arm.com> wrote:
>
>> On 28/08/2025 03:34, Adrián Larumbe wrote:
>>> From: Boris Brezillon <boris.brezillon@collabora.com>
>>>
>>> The new uAPI lets user space query the KM driver for the available
>>> priorities a job can be given at submit time. These are managed through
>>> the notion of a context, which besides a priority, codifies the list
>>> of L2 caches, shading cores and tiler units a job is allowed to use,
>>> for all three of the available device job slots.
>>>
>>> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
>>> Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
>>
>> There's no cover letter for this series, so maybe I'm missing some
>> context. But I'm not sure why we want to expose the tiler/l2/core masks
>> to user space.
>
> tiler/l2 masks, I'm not sure we need, especially if there's only just
> one tiler unit / l2 cache. I exposed the core mask so one can reserve
> cores for an application.
Reserving cores is tricky because we have limited slots and they cannot
have overlapping affinity. If there's a specific use case then, sure we
can discuss, but generally it's much better to time-slice the GPU than
try to partition by cores.
>>
>> If you were trying to better support OpenCL on T628 I can just about
>> understand the core mask. But, I doubt you are... (does anyone care
>> about that anymore? ;) ). And really it's the core groups that matter
>> rather than the raw affinities.
>
> Ok, so low vs high bits (don't know the granularity of the core group,
> so low/high might actually bit low/middle-low/middle-high/high) in the
> the affinity register, right?
Thankfully we never had more than 2 core groups - so low/high ;)
>>
>> The tiler/l2 affinities (and the XAFFINITY register in general) is there
>> as a power saving mechanism. If we know that a job is not going to use
>> the shader cores at all (a tiler-only job) then we can avoid turning
>> them on, but obviously we still need the L2 and tiler blocks to be powered.
>
> Okay, I thought it was more of a "use only these cores, the rest is
> reserved for something else", my bad.
XAFFINITY is really a "whoops we didn't think about that" fix ;) The
main AFFINITY register doesn't have a separate control for the tiler
(the register is shared between both shader cores and tilers), so to
enable the tiler, core 0 also needs to be enabled. And the hardware
"helpfully" also checks that when you submit a job that the intersection
of the powered cores and the affinity register isn't 0. So if you submit
a tiler job you have to power on (and include in the affinity register)
shader core 0.
When you enable XAFFINITY the link is broken and you can specify the
tiler affinity separately. Not that the concept of multiple tilers ever
got seriously proposed. But this allows shader core 0 to remain
unpowered when running a tiling only job.
>>
>> kbase handled this with a "core_req" field which listed the required
>> cores for each job. We already have a "requirements" field which we
>> could extend for the same purpose (PANFROST_JD_REQ_TILER_ONLY or
>> similar). I don't think this makes sense to include in a "context".
>
> It was more a core reservation mechanism, which I expected to be forced
> at context creation time. I mean, it can still be at the UMD level, and
> we would pass the mask of cores to use at job submission time. The
> problem I see with just expressing the maximum number of cores one can
> use is that it doesn't work for core reservation. Also, I went for this
> interface because that's more or less what panthor exposes (mask of
> cores that can be used, and maximum of number of cores that can be used
> in this pool).
I included my history lesson in the email to Adrián, but panthor is
different because CSF made things more flexible by removing the slot
concept. I'm very dubious it makes any sense to reserve cores on JM GPUs
- time slicing makes more sense as it avoids thrashing the caches
between two different work loads.
>>
>> But like I said, maybe I'm missing something - what is the use case for
>> controlling affinity?
>>
>> [The priority parts look ok here, but that's mixed in with the affinity
>> changes.]
>>
>>> ---
>>> include/uapi/drm/panfrost_drm.h | 93 +++++++++++++++++++++++++++++++++
>>> 1 file changed, 93 insertions(+)
>>>
>>> diff --git a/include/uapi/drm/panfrost_drm.h b/include/uapi/drm/panfrost_drm.h
>>> index ed67510395bd..2d8b32448e68 100644
>>> --- a/include/uapi/drm/panfrost_drm.h
>>> +++ b/include/uapi/drm/panfrost_drm.h
>>> @@ -22,6 +22,8 @@ extern "C" {
>>> #define DRM_PANFROST_PERFCNT_DUMP 0x07
>>> #define DRM_PANFROST_MADVISE 0x08
>>> #define DRM_PANFROST_SET_LABEL_BO 0x09
>>> +#define DRM_PANFROST_JM_CTX_CREATE 0x0a
>>> +#define DRM_PANFROST_JM_CTX_DESTROY 0x0b
>>>
>>> #define DRM_IOCTL_PANFROST_SUBMIT DRM_IOW(DRM_COMMAND_BASE + DRM_PANFROST_SUBMIT, struct drm_panfrost_submit)
>>> #define DRM_IOCTL_PANFROST_WAIT_BO DRM_IOW(DRM_COMMAND_BASE + DRM_PANFROST_WAIT_BO, struct drm_panfrost_wait_bo)
>>> @@ -31,6 +33,8 @@ extern "C" {
>>> #define DRM_IOCTL_PANFROST_GET_BO_OFFSET DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_GET_BO_OFFSET, struct drm_panfrost_get_bo_offset)
>>> #define DRM_IOCTL_PANFROST_MADVISE DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_MADVISE, struct drm_panfrost_madvise)
>>> #define DRM_IOCTL_PANFROST_SET_LABEL_BO DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_SET_LABEL_BO, struct drm_panfrost_set_label_bo)
>>> +#define DRM_IOCTL_PANFROST_JM_CTX_CREATE DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_JM_CTX_CREATE, struct drm_panfrost_jm_ctx_create)
>>> +#define DRM_IOCTL_PANFROST_JM_CTX_DESTROY DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_JM_CTX_DESTROY, struct drm_panfrost_jm_ctx_destroy)
>>>
>>> /*
>>> * Unstable ioctl(s): only exposed when the unsafe unstable_ioctls module
>>> @@ -71,6 +75,12 @@ struct drm_panfrost_submit {
>>>
>>> /** A combination of PANFROST_JD_REQ_* */
>>> __u32 requirements;
>>> +
>>> + /** JM context handle. Zero if you want to use the default context. */
>>> + __u32 jm_ctx_handle;
>>> +
>>> + /** Padding field. MBZ. */
>>> + __u32 pad;
>>> };
>>>
>>> /**
>>> @@ -177,6 +187,7 @@ enum drm_panfrost_param {
>>> DRM_PANFROST_PARAM_AFBC_FEATURES,
>>> DRM_PANFROST_PARAM_SYSTEM_TIMESTAMP,
>>> DRM_PANFROST_PARAM_SYSTEM_TIMESTAMP_FREQUENCY,
>>> + DRM_PANFROST_PARAM_ALLOWED_JM_CTX_PRIORITIES,
>>> };
>>>
>>> struct drm_panfrost_get_param {
>>> @@ -299,6 +310,88 @@ struct panfrost_dump_registers {
>>> __u32 value;
>>> };
>>>
>>> +enum drm_panfrost_jm_ctx_priority {
>>> + /**
>>> + * @PANFROST_JM_CTX_PRIORITY_LOW: Low priority context.
>>> + */
>>> + PANFROST_JM_CTX_PRIORITY_LOW = 0,
>>> +
>>> + /**
>>> + * @PANFROST_JM_CTX_PRIORITY_MEDIUM: Medium priority context.
>>> + */
>>> + PANFROST_JM_CTX_PRIORITY_MEDIUM,
>>> +
>>> + /**
>>> + * @PANFROST_JM_CTX_PRIORITY_HIGH: High priority context.
>>> + *
>>> + * Requires CAP_SYS_NICE or DRM_MASTER.
>>> + */
>>> + PANFROST_JM_CTX_PRIORITY_HIGH,
>>> +};
>>> +
>>> +#define PANFROST_JS_FLAG_ENABLED (1 << 0)
>>> +
>>> +struct drm_panfrost_js_ctx_info {
>>> + /** @flags: Combination of PANFROST_JS_FLAG_xxx values */
>>> + __u32 flags;
>>> +
>>> + /** @priority: Context priority (see enum drm_panfrost_jm_ctx_priority). */
>>> + __u8 priority;
>>> +
>>> + /**
>>> + * @tiler_mask: Mask encoding tiler units that can be used by the job slot
>>> + *
>>> + * When this field is zero, it means the tiler won't be used.
>>> + *
>>> + * The bits set here should also be set in drm_panthor_gpu_info::tiler_present.
>>> + */
>>> + __u8 tiler_mask;
>>> +
>>> + /**
>>> + * @l2_mask: Mask encoding L2 caches that can be used by the job slot
>>> + *
>>> + * The bits set here should also be set in drm_panthor_gpu_info::l2_present.:
>>> + */
>>> + __u16 l2_mask;
>>> +
>>> + /**
>>> + * @core_mask: Mask encoding cores that can be used by the job slot
>>> + *
>>> + * When this field is zero, it means the queue won't be used.
>>> + *
>>> + * The bits set here should also be set in drm_panthor_gpu_info::shader_present.
>>> + */
>>> + __u64 core_mask;
>>> +};
>>> +
>>> +struct drm_panfrost_jm_ctx_create {
>>> + /** @handle: Handle of the created JM context */
>>> + __u32 handle;
>>> +
>>> + /** @pad: Padding field, MBZ. */
>>> + __u32 pad;
>>> +
>>> + /**
>>> + * @slots: Job slots
>>> + *
>>> + * This field must be greater than zero and less than 8 (only three slots
>>> + * available).
>
> Not sure what this doc referred to, but slots is not an integer :D.
>
>>> + */
>>> + struct drm_panfrost_js_ctx_info slots[3];
>>
>> We don't allow user space to choose which slot is being targetted, so
>> this feels odd.
>
> Some of this has been extracted from the panthor-ification of JM, and
> you're probably right that it doesn't make sense to expose the
> subqueues in panfrost.
>
>> I guess this allows deliberately disabling slot 1 to
>> force slot 2. But the code in this series doesn't seem to implement
>> this. I'm also not sure I understand why you would want a different
>> priority for different slots?
>
> Internally, a slot maps to a sched entity, which is where the priority
> is defined. Sure, we could have a global priority for the whole context,
> but I figured I'd just expose what the KMD is capable of (per subqueue
> priority) and let the UMD assign the same priority to all slots. But if
> we don't expose the slots directly, we might as well just define a
> priority and the set of resources that can be used by any of the
> subqueues.
Mostly I was trying to figure out whether there was actually a reason
for exposing multiple priorities like this. Since there isn't generally
a choice which slot to target (it depends on the job type) it seems odd.
Generally my preference is to keep the uAPI minimal and add features as
we need them. Although at least for panfrost we don't need to worry
about potential new GPU designs any more ;)
Thanks,
Steve
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2025-09-01 14:15 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-28 2:34 [PATCH 0/5] Introduce Panfrost JM contexts Adrián Larumbe
2025-08-28 2:34 ` [PATCH 1/5] drm/panfrost: Add job slot register defs for affinity Adrián Larumbe
2025-08-28 2:34 ` [PATCH 2/5] drm/panfrost: Introduce uAPI for JM context creation Adrián Larumbe
2025-09-01 10:52 ` Steven Price
2025-09-01 12:08 ` Adrián Larumbe
2025-09-01 13:45 ` Steven Price
2025-09-01 12:14 ` Boris Brezillon
2025-09-01 14:15 ` Steven Price
2025-08-28 2:34 ` [PATCH 3/5] drm/panfrost: Introduce JM context for manging job resources Adrián Larumbe
2025-08-30 8:12 ` Daniel Stone
2025-09-01 7:54 ` Boris Brezillon
2025-08-28 2:34 ` [PATCH 4/5] drm/panfrost: Expose JM context IOCTLs to UM Adrián Larumbe
2025-08-28 2:34 ` [PATCH 5/5] drm/panfrost: Display list of device JM contexts over debugfs Adrián Larumbe
2025-08-28 23:19 ` [PATCH 0/5] Introduce Panfrost JM contexts Adrián Larumbe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).