[PATCH v7 0/7] Improve gpu_scheduler trace events + uAPI

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v7 0/7] Improve gpu_scheduler trace events + uAPI
@ 2025-01-31 11:02 Pierre-Eric Pelloux-Prayer
  2025-01-31 11:02 ` [PATCH v7 1/7] drm/debugfs: output client_id in in drm_clients_info Pierre-Eric Pelloux-Prayer
                   ` (7 more replies)
  0 siblings, 8 replies; 11+ messages in thread
From: Pierre-Eric Pelloux-Prayer @ 2025-01-31 11:02 UTC (permalink / raw)
  Cc: Pierre-Eric Pelloux-Prayer, amd-gfx, dri-devel, etnaviv,
	freedreno, intel-xe, lima, linaro-mm-sig, linux-arm-msm,
	linux-doc, linux-kernel, linux-media, nouveau

Hi,

The initial goal of this series was to improve the drm and amdgpu
trace events to be able to expose more of the inner workings of
the scheduler and drivers to developers via tools.

Then, the series evolved to become focused only on gpu_scheduler.
The changes around vblank events will be part of a different
series, as well as the amdgpu ones.

Moreover Sima suggested to make some trace events stable uAPI,
so tools can rely on them long term.

The first patches extend and cleanup the gpu scheduler events.

The last one adds a documentation entry in drm-uapi.rst.

Changes since v6:
* Addressed comments from Philipp, Tvrtko, Steven
* The commit storing drm_client_id in sched_fence adds an extra
field which looks like a duplicate of the owner field. Currently
amdgpu uses the owner field with some magic values, so we have to
have 2 separate fields for now, but ultimately one could be removed.
Similarly storing the drm_client_id in the sched_entity is not
really possible as there's nothing preventing a driver to use a
sched_entity with multiple drm_file instances.


Useful links:
- userspace tool using the updated events:
https://gitlab.freedesktop.org/tomstdenis/umr/-/merge_requests/37
- v6:
https://lists.freedesktop.org/archives/dri-devel/2024-November/477644.html

Pierre-Eric Pelloux-Prayer (7):
  drm/debugfs: output client_id in in drm_clients_info
  drm/sched: store the drm client_id in drm_sched_fence
  drm/sched: add device name to the drm_sched_process_job event
  drm/sched: cleanup gpu_scheduler trace events
  drm/sched: trace dependencies for gpu jobs
  drm/sched: add the drm_client_id to the drm_sched_run/exec_job events
  drm/doc: document some tracepoints as uAPI

 Documentation/gpu/drm-uapi.rst                |  19 +++
 drivers/accel/amdxdna/aie2_ctx.c              |   3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c    |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c        |   3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c       |   8 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.h       |   3 +-
 drivers/gpu/drm/drm_debugfs.c                 |  10 +-
 drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c  |   2 +-
 drivers/gpu/drm/imagination/pvr_job.c         |   2 +-
 drivers/gpu/drm/imagination/pvr_queue.c       |   5 +-
 drivers/gpu/drm/imagination/pvr_queue.h       |   2 +-
 drivers/gpu/drm/lima/lima_gem.c               |   2 +-
 drivers/gpu/drm/lima/lima_sched.c             |   6 +-
 drivers/gpu/drm/lima/lima_sched.h             |   3 +-
 drivers/gpu/drm/msm/msm_gem_submit.c          |   8 +-
 drivers/gpu/drm/nouveau/nouveau_sched.c       |   3 +-
 drivers/gpu/drm/panfrost/panfrost_drv.c       |   2 +-
 drivers/gpu/drm/panthor/panthor_drv.c         |   3 +-
 drivers/gpu/drm/panthor/panthor_mmu.c         |   2 +-
 drivers/gpu/drm/panthor/panthor_sched.c       |   5 +-
 drivers/gpu/drm/panthor/panthor_sched.h       |   3 +-
 .../gpu/drm/scheduler/gpu_scheduler_trace.h   | 123 ++++++++++++++----
 drivers/gpu/drm/scheduler/sched_entity.c      |   8 +-
 drivers/gpu/drm/scheduler/sched_fence.c       |   4 +-
 drivers/gpu/drm/scheduler/sched_main.c        |   8 +-
 drivers/gpu/drm/v3d/v3d_submit.c              |   2 +-
 drivers/gpu/drm/xe/xe_sched_job.c             |   3 +-
 include/drm/gpu_scheduler.h                   |  12 +-
 28 files changed, 192 insertions(+), 64 deletions(-)

-- 
2.47.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v7 1/7] drm/debugfs: output client_id in in drm_clients_info
  2025-01-31 11:02 [PATCH v7 0/7] Improve gpu_scheduler trace events + uAPI Pierre-Eric Pelloux-Prayer
@ 2025-01-31 11:02 ` Pierre-Eric Pelloux-Prayer
  2025-01-31 11:03 ` [PATCH v7 2/7] drm/sched: store the drm client_id in drm_sched_fence Pierre-Eric Pelloux-Prayer
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Pierre-Eric Pelloux-Prayer @ 2025-01-31 11:02 UTC (permalink / raw)
  To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter
  Cc: Pierre-Eric Pelloux-Prayer, dri-devel, linux-kernel

client_id is a unique id used by fdinfo. Having it listed in 'clients'
output means a userspace application can correlate the fields, eg:
given a fdinfo id get the fdinfo name.

Geiven that client_id is a uint64_t, we use a %20llu printf format to
keep the output aligned (20 = digit count of the biggest uint64_t).

Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
---
 drivers/gpu/drm/drm_debugfs.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/drm_debugfs.c b/drivers/gpu/drm/drm_debugfs.c
index 536409a35df4..061601211bc2 100644
--- a/drivers/gpu/drm/drm_debugfs.c
+++ b/drivers/gpu/drm/drm_debugfs.c
@@ -77,14 +77,15 @@ static int drm_clients_info(struct seq_file *m, void *data)
 	kuid_t uid;
 
 	seq_printf(m,
-		   "%20s %5s %3s master a %5s %10s %*s\n",
+		   "%20s %5s %3s master a %5s %10s %*s %20s\n",
 		   "command",
 		   "tgid",
 		   "dev",
 		   "uid",
 		   "magic",
 		   DRM_CLIENT_NAME_MAX_LEN,
-		   "name");
+		   "name",
+		   "id");
 
 	/* dev->filelist is sorted youngest first, but we want to present
 	 * oldest first (i.e. kernel, servers, clients), so walk backwardss.
@@ -100,7 +101,7 @@ static int drm_clients_info(struct seq_file *m, void *data)
 		pid = rcu_dereference(priv->pid);
 		task = pid_task(pid, PIDTYPE_TGID);
 		uid = task ? __task_cred(task)->euid : GLOBAL_ROOT_UID;
-		seq_printf(m, "%20s %5d %3d   %c    %c %5d %10u %*s\n",
+		seq_printf(m, "%20s %5d %3d   %c    %c %5d %10u %*s %20llu\n",
 			   task ? task->comm : "<unknown>",
 			   pid_vnr(pid),
 			   priv->minor->index,
@@ -109,7 +110,8 @@ static int drm_clients_info(struct seq_file *m, void *data)
 			   from_kuid_munged(seq_user_ns(m), uid),
 			   priv->magic,
 			   DRM_CLIENT_NAME_MAX_LEN,
-			   priv->client_name ? priv->client_name : "<unset>");
+			   priv->client_name ? priv->client_name : "<unset>",
+			   priv->client_id);
 		rcu_read_unlock();
 		mutex_unlock(&priv->client_name_lock);
 	}
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v7 2/7] drm/sched: store the drm client_id in drm_sched_fence
  2025-01-31 11:02 [PATCH v7 0/7] Improve gpu_scheduler trace events + uAPI Pierre-Eric Pelloux-Prayer
  2025-01-31 11:02 ` [PATCH v7 1/7] drm/debugfs: output client_id in in drm_clients_info Pierre-Eric Pelloux-Prayer
@ 2025-01-31 11:03 ` Pierre-Eric Pelloux-Prayer
  2025-01-31 11:03 ` [PATCH v7 3/7] drm/sched: add device name to the drm_sched_process_job event Pierre-Eric Pelloux-Prayer
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Pierre-Eric Pelloux-Prayer @ 2025-01-31 11:03 UTC (permalink / raw)
  To: Min Ma, Lizhi Hou, Oded Gabbay, Felix Kuehling, Alex Deucher,
	Christian König, Xinhui Pan, David Airlie, Simona Vetter,
	Lucas Stach, Russell King, Christian Gmeiner, Frank Binns,
	Matt Coster, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Qiang Yu, Rob Clark, Abhinav Kumar, Dmitry Baryshkov, Sean Paul,
	Marijn Suijten, Karol Herbst, Lyude Paul, Danilo Krummrich,
	Boris Brezillon, Rob Herring, Steven Price, Liviu Dudau,
	Luben Tuikov, Matthew Brost, Philipp Stanner, Melissa Wen,
	Maíra Canal, Lucas De Marchi, Thomas Hellström,
	Rodrigo Vivi
  Cc: Pierre-Eric Pelloux-Prayer, dri-devel, linux-kernel, amd-gfx,
	etnaviv, lima, linux-arm-msm, freedreno, nouveau, intel-xe

This will be used in a later commit to trace the drm client_id in
some of the gpu_scheduler trace events.

This requires changing all the users of drm_sched_job_init to
add an extra parameter.

The newly added drm_client_id field in the drm_sched_fence is a bit
of a duplicate of the owner one. One suggestion I received was to
merge those 2 fields - this can't be done right now as amdgpu uses
some special values (AMDGPU_FENCE_OWNER_*) that can't really be
translated into a client id. Christian is working on getting rid of
those; when it's done we should be able to squash owner/drm_client_id
together.

Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
---
 drivers/accel/amdxdna/aie2_ctx.c             |  3 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c   |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c       |  3 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c      |  8 +++++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.h      |  3 ++-
 drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c |  2 +-
 drivers/gpu/drm/imagination/pvr_job.c        |  2 +-
 drivers/gpu/drm/imagination/pvr_queue.c      |  5 +++--
 drivers/gpu/drm/imagination/pvr_queue.h      |  2 +-
 drivers/gpu/drm/lima/lima_gem.c              |  2 +-
 drivers/gpu/drm/lima/lima_sched.c            |  6 ++++--
 drivers/gpu/drm/lima/lima_sched.h            |  3 ++-
 drivers/gpu/drm/msm/msm_gem_submit.c         |  8 +++++---
 drivers/gpu/drm/nouveau/nouveau_sched.c      |  3 ++-
 drivers/gpu/drm/panfrost/panfrost_drv.c      |  2 +-
 drivers/gpu/drm/panthor/panthor_drv.c        |  3 ++-
 drivers/gpu/drm/panthor/panthor_mmu.c        |  2 +-
 drivers/gpu/drm/panthor/panthor_sched.c      |  5 +++--
 drivers/gpu/drm/panthor/panthor_sched.h      |  3 ++-
 drivers/gpu/drm/scheduler/sched_fence.c      |  4 +++-
 drivers/gpu/drm/scheduler/sched_main.c       |  6 ++++--
 drivers/gpu/drm/v3d/v3d_submit.c             |  2 +-
 drivers/gpu/drm/xe/xe_sched_job.c            |  3 ++-
 include/drm/gpu_scheduler.h                  | 12 ++++++++++--
 24 files changed, 61 insertions(+), 33 deletions(-)

diff --git a/drivers/accel/amdxdna/aie2_ctx.c b/drivers/accel/amdxdna/aie2_ctx.c
index 5f43db02b240..d33e62f1b127 100644
--- a/drivers/accel/amdxdna/aie2_ctx.c
+++ b/drivers/accel/amdxdna/aie2_ctx.c
@@ -819,7 +819,8 @@ int aie2_cmd_submit(struct amdxdna_hwctx *hwctx, struct amdxdna_sched_job *job,
 		goto up_sem;
 	}
 
-	ret = drm_sched_job_init(&job->base, &hwctx->priv->entity, 1, hwctx);
+	ret = drm_sched_job_init(&job->base, &hwctx->priv->entity, 1, hwctx,
+				 hwctx->client->filp->client_id);
 	if (ret) {
 		XDNA_ERR(xdna, "DRM job init failed, ret %d", ret);
 		goto free_chain;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index 2e5732dfd425..d480be803784 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -681,7 +681,7 @@ int amdgpu_amdkfd_submit_ib(struct amdgpu_device *adev,
 		goto err;
 	}
 
-	ret = amdgpu_job_alloc(adev, NULL, NULL, NULL, 1, &job);
+	ret = amdgpu_job_alloc(adev, NULL, NULL, NULL, 1, &job, 0);
 	if (ret)
 		goto err;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 5cc5f59e3018..d115fa10f8ee 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -293,7 +293,8 @@ static int amdgpu_cs_pass1(struct amdgpu_cs_parser *p,
 
 	for (i = 0; i < p->gang_size; ++i) {
 		ret = amdgpu_job_alloc(p->adev, vm, p->entities[i], vm,
-				       num_ibs[i], &p->jobs[i]);
+				       num_ibs[i], &p->jobs[i],
+				       p->filp->client_id);
 		if (ret)
 			goto free_all_kdata;
 		p->jobs[i]->enforce_isolation = p->adev->enforce_isolation[fpriv->xcp_id];
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index 100f04475943..0a9a53f2e78a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -185,7 +185,8 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct drm_sched_job *s_job)
 
 int amdgpu_job_alloc(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 		     struct drm_sched_entity *entity, void *owner,
-		     unsigned int num_ibs, struct amdgpu_job **job)
+		     unsigned int num_ibs, struct amdgpu_job **job,
+		     u64 drm_client_id)
 {
 	if (num_ibs == 0)
 		return -EINVAL;
@@ -203,7 +204,8 @@ int amdgpu_job_alloc(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 	if (!entity)
 		return 0;
 
-	return drm_sched_job_init(&(*job)->base, entity, 1, owner);
+	return drm_sched_job_init(&(*job)->base, entity, 1, owner,
+				  drm_client_id);
 }
 
 int amdgpu_job_alloc_with_ib(struct amdgpu_device *adev,
@@ -213,7 +215,7 @@ int amdgpu_job_alloc_with_ib(struct amdgpu_device *adev,
 {
 	int r;
 
-	r = amdgpu_job_alloc(adev, NULL, entity, owner, 1, job);
+	r = amdgpu_job_alloc(adev, NULL, entity, owner, 1, job, 0);
 	if (r)
 		return r;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
index ce6b9ba967ff..5a8bc6342222 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
@@ -90,7 +90,8 @@ static inline struct amdgpu_ring *amdgpu_job_ring(struct amdgpu_job *job)
 
 int amdgpu_job_alloc(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 		     struct drm_sched_entity *entity, void *owner,
-		     unsigned int num_ibs, struct amdgpu_job **job);
+		     unsigned int num_ibs, struct amdgpu_job **job,
+		     u64 drm_client_id);
 int amdgpu_job_alloc_with_ib(struct amdgpu_device *adev,
 			     struct drm_sched_entity *entity, void *owner,
 			     size_t size, enum amdgpu_ib_pool_type pool_type,
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
index 3c0a5c3e0e3d..76c742328edb 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
@@ -534,7 +534,7 @@ int etnaviv_ioctl_gem_submit(struct drm_device *dev, void *data,
 
 	ret = drm_sched_job_init(&submit->sched_job,
 				 &ctx->sched_entity[args->pipe],
-				 1, submit->ctx);
+				 1, submit->ctx, file->client_id);
 	if (ret)
 		goto err_submit_put;
 
diff --git a/drivers/gpu/drm/imagination/pvr_job.c b/drivers/gpu/drm/imagination/pvr_job.c
index 618503a212a7..64152b57e8b1 100644
--- a/drivers/gpu/drm/imagination/pvr_job.c
+++ b/drivers/gpu/drm/imagination/pvr_job.c
@@ -446,7 +446,7 @@ create_job(struct pvr_device *pvr_dev,
 	if (err)
 		goto err_put_job;
 
-	err = pvr_queue_job_init(job);
+	err = pvr_queue_job_init(job, pvr_file->file->client_id);
 	if (err)
 		goto err_put_job;
 
diff --git a/drivers/gpu/drm/imagination/pvr_queue.c b/drivers/gpu/drm/imagination/pvr_queue.c
index c4f08432882b..240d85e1006d 100644
--- a/drivers/gpu/drm/imagination/pvr_queue.c
+++ b/drivers/gpu/drm/imagination/pvr_queue.c
@@ -1059,6 +1059,7 @@ static int pvr_queue_cleanup_fw_context(struct pvr_queue *queue)
 /**
  * pvr_queue_job_init() - Initialize queue related fields in a pvr_job object.
  * @job: The job to initialize.
+ * @drm_client_id: drm_file.client_id submitting the job
  *
  * Bind the job to a queue and allocate memory to guarantee pvr_queue_job_arm()
  * and pvr_queue_job_push() can't fail. We also make sure the context type is
@@ -1068,7 +1069,7 @@ static int pvr_queue_cleanup_fw_context(struct pvr_queue *queue)
  *  * 0 on success, or
  *  * An error code if something failed.
  */
-int pvr_queue_job_init(struct pvr_job *job)
+int pvr_queue_job_init(struct pvr_job *job, u64 drm_client_id)
 {
 	/* Fragment jobs need at least one native fence wait on the geometry job fence. */
 	u32 min_native_dep_count = job->type == DRM_PVR_JOB_TYPE_FRAGMENT ? 1 : 0;
@@ -1085,7 +1086,7 @@ int pvr_queue_job_init(struct pvr_job *job)
 	if (!pvr_cccb_cmdseq_can_fit(&queue->cccb, job_cmds_size(job, min_native_dep_count)))
 		return -E2BIG;
 
-	err = drm_sched_job_init(&job->base, &queue->entity, 1, THIS_MODULE);
+	err = drm_sched_job_init(&job->base, &queue->entity, 1, THIS_MODULE, drm_client_id);
 	if (err)
 		return err;
 
diff --git a/drivers/gpu/drm/imagination/pvr_queue.h b/drivers/gpu/drm/imagination/pvr_queue.h
index e06ced69302f..0cfbe1cd311e 100644
--- a/drivers/gpu/drm/imagination/pvr_queue.h
+++ b/drivers/gpu/drm/imagination/pvr_queue.h
@@ -139,7 +139,7 @@ struct pvr_queue {
 
 bool pvr_queue_fence_is_ufo_backed(struct dma_fence *f);
 
-int pvr_queue_job_init(struct pvr_job *job);
+int pvr_queue_job_init(struct pvr_job *job, u64 drm_client_id);
 
 void pvr_queue_job_cleanup(struct pvr_job *job);
 
diff --git a/drivers/gpu/drm/lima/lima_gem.c b/drivers/gpu/drm/lima/lima_gem.c
index 9bb997dbb4b9..f46f961afc56 100644
--- a/drivers/gpu/drm/lima/lima_gem.c
+++ b/drivers/gpu/drm/lima/lima_gem.c
@@ -341,7 +341,7 @@ int lima_gem_submit(struct drm_file *file, struct lima_submit *submit)
 
 	err = lima_sched_task_init(
 		submit->task, submit->ctx->context + submit->pipe,
-		bos, submit->nr_bos, vm);
+		bos, submit->nr_bos, vm, file->client_id);
 	if (err)
 		goto err_out1;
 
diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
index b40c90e97d7e..ffc369dd4301 100644
--- a/drivers/gpu/drm/lima/lima_sched.c
+++ b/drivers/gpu/drm/lima/lima_sched.c
@@ -113,7 +113,8 @@ static inline struct lima_sched_pipe *to_lima_pipe(struct drm_gpu_scheduler *sch
 int lima_sched_task_init(struct lima_sched_task *task,
 			 struct lima_sched_context *context,
 			 struct lima_bo **bos, int num_bos,
-			 struct lima_vm *vm)
+			 struct lima_vm *vm,
+			 u64 drm_client_id)
 {
 	int err, i;
 
@@ -124,7 +125,8 @@ int lima_sched_task_init(struct lima_sched_task *task,
 	for (i = 0; i < num_bos; i++)
 		drm_gem_object_get(&bos[i]->base.base);
 
-	err = drm_sched_job_init(&task->base, &context->base, 1, vm);
+	err = drm_sched_job_init(&task->base, &context->base, 1, vm,
+				 drm_client_id);
 	if (err) {
 		kfree(task->bos);
 		return err;
diff --git a/drivers/gpu/drm/lima/lima_sched.h b/drivers/gpu/drm/lima/lima_sched.h
index 85b23ba901d5..1a08faf8a529 100644
--- a/drivers/gpu/drm/lima/lima_sched.h
+++ b/drivers/gpu/drm/lima/lima_sched.h
@@ -88,7 +88,8 @@ struct lima_sched_pipe {
 int lima_sched_task_init(struct lima_sched_task *task,
 			 struct lima_sched_context *context,
 			 struct lima_bo **bos, int num_bos,
-			 struct lima_vm *vm);
+			 struct lima_vm *vm,
+			 u64 drm_client_id);
 void lima_sched_task_fini(struct lima_sched_task *task);
 
 int lima_sched_context_init(struct lima_sched_pipe *pipe,
diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
index dee470403036..378119155673 100644
--- a/drivers/gpu/drm/msm/msm_gem_submit.c
+++ b/drivers/gpu/drm/msm/msm_gem_submit.c
@@ -30,7 +30,7 @@
 static struct msm_gem_submit *submit_create(struct drm_device *dev,
 		struct msm_gpu *gpu,
 		struct msm_gpu_submitqueue *queue, uint32_t nr_bos,
-		uint32_t nr_cmds)
+		uint32_t nr_cmds, u64 drm_client_id)
 {
 	static atomic_t ident = ATOMIC_INIT(0);
 	struct msm_gem_submit *submit;
@@ -54,7 +54,8 @@ static struct msm_gem_submit *submit_create(struct drm_device *dev,
 		return ERR_PTR(ret);
 	}
 
-	ret = drm_sched_job_init(&submit->base, queue->entity, 1, queue);
+	ret = drm_sched_job_init(&submit->base, queue->entity, 1, queue,
+				 drm_client_id);
 	if (ret) {
 		kfree(submit->hw_fence);
 		kfree(submit);
@@ -693,7 +694,8 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
 		}
 	}
 
-	submit = submit_create(dev, gpu, queue, args->nr_bos, args->nr_cmds);
+	submit = submit_create(dev, gpu, queue, args->nr_bos, args->nr_cmds,
+			       file->client_id);
 	if (IS_ERR(submit)) {
 		ret = PTR_ERR(submit);
 		goto out_post_unlock;
diff --git a/drivers/gpu/drm/nouveau/nouveau_sched.c b/drivers/gpu/drm/nouveau/nouveau_sched.c
index 4412f2711fb5..ebc31adea39a 100644
--- a/drivers/gpu/drm/nouveau/nouveau_sched.c
+++ b/drivers/gpu/drm/nouveau/nouveau_sched.c
@@ -87,7 +87,8 @@ nouveau_job_init(struct nouveau_job *job,
 	}
 
 	ret = drm_sched_job_init(&job->base, &sched->entity,
-				 args->credits, NULL);
+				 args->credits, NULL,
+				 job->file_priv->client_id);
 	if (ret)
 		goto err_free_chains;
 
diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
index 0f3935556ac7..f4f20d11ecd9 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -312,7 +312,7 @@ static int panfrost_ioctl_submit(struct drm_device *dev, void *data,
 
 	ret = drm_sched_job_init(&job->base,
 				 &file_priv->sched_entity[slot],
-				 1, NULL);
+				 1, NULL, file->client_id);
 	if (ret)
 		goto out_put_job;
 
diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c
index d5dcd3d1b33a..bf8c92ebcc2d 100644
--- a/drivers/gpu/drm/panthor/panthor_drv.c
+++ b/drivers/gpu/drm/panthor/panthor_drv.c
@@ -988,7 +988,8 @@ static int panthor_ioctl_group_submit(struct drm_device *ddev, void *data,
 		const struct drm_panthor_queue_submit *qsubmit = &jobs_args[i];
 		struct drm_sched_job *job;
 
-		job = panthor_job_create(pfile, args->group_handle, qsubmit);
+		job = panthor_job_create(pfile, args->group_handle, qsubmit,
+					 file->client_id);
 		if (IS_ERR(job)) {
 			ret = PTR_ERR(job);
 			goto out_cleanup_submit_ctx;
diff --git a/drivers/gpu/drm/panthor/panthor_mmu.c b/drivers/gpu/drm/panthor/panthor_mmu.c
index c39e3eb1c15d..9c49b1aef759 100644
--- a/drivers/gpu/drm/panthor/panthor_mmu.c
+++ b/drivers/gpu/drm/panthor/panthor_mmu.c
@@ -2480,7 +2480,7 @@ panthor_vm_bind_job_create(struct drm_file *file,
 	kref_init(&job->refcount);
 	job->vm = panthor_vm_get(vm);
 
-	ret = drm_sched_job_init(&job->base, &vm->entity, 1, vm);
+	ret = drm_sched_job_init(&job->base, &vm->entity, 1, vm, file->client_id);
 	if (ret)
 		goto err_put_job;
 
diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
index 77b184c3fb0c..b81c428bf66c 100644
--- a/drivers/gpu/drm/panthor/panthor_sched.c
+++ b/drivers/gpu/drm/panthor/panthor_sched.c
@@ -3672,7 +3672,8 @@ struct panthor_vm *panthor_job_vm(struct drm_sched_job *sched_job)
 struct drm_sched_job *
 panthor_job_create(struct panthor_file *pfile,
 		   u16 group_handle,
-		   const struct drm_panthor_queue_submit *qsubmit)
+		   const struct drm_panthor_queue_submit *qsubmit,
+		   u64 drm_client_id)
 {
 	struct panthor_group_pool *gpool = pfile->groups;
 	struct panthor_job *job;
@@ -3744,7 +3745,7 @@ panthor_job_create(struct panthor_file *pfile,
 
 	ret = drm_sched_job_init(&job->base,
 				 &job->group->queues[job->queue_idx]->entity,
-				 credits, job->group);
+				 credits, job->group, drm_client_id);
 	if (ret)
 		goto err_put_job;
 
diff --git a/drivers/gpu/drm/panthor/panthor_sched.h b/drivers/gpu/drm/panthor/panthor_sched.h
index 5ae6b4bde7c5..d80c040ebf13 100644
--- a/drivers/gpu/drm/panthor/panthor_sched.h
+++ b/drivers/gpu/drm/panthor/panthor_sched.h
@@ -28,7 +28,8 @@ int panthor_group_get_state(struct panthor_file *pfile,
 struct drm_sched_job *
 panthor_job_create(struct panthor_file *pfile,
 		   u16 group_handle,
-		   const struct drm_panthor_queue_submit *qsubmit);
+		   const struct drm_panthor_queue_submit *qsubmit,
+		   u64 drm_client_id);
 struct drm_sched_job *panthor_job_get(struct drm_sched_job *job);
 struct panthor_vm *panthor_job_vm(struct drm_sched_job *sched_job);
 void panthor_job_put(struct drm_sched_job *job);
diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c
index 0f35f009b9d3..41c3d596603c 100644
--- a/drivers/gpu/drm/scheduler/sched_fence.c
+++ b/drivers/gpu/drm/scheduler/sched_fence.c
@@ -204,7 +204,8 @@ struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f)
 EXPORT_SYMBOL(to_drm_sched_fence);
 
 struct drm_sched_fence *drm_sched_fence_alloc(struct drm_sched_entity *entity,
-					      void *owner)
+					      void *owner,
+					      u64 drm_client_id)
 {
 	struct drm_sched_fence *fence = NULL;
 
@@ -213,6 +214,7 @@ struct drm_sched_fence *drm_sched_fence_alloc(struct drm_sched_entity *entity,
 		return NULL;
 
 	fence->owner = owner;
+	fence->drm_client_id = drm_client_id;
 	spin_lock_init(&fence->lock);
 
 	return fence;
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 57da84908752..ad306d3d7282 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -776,6 +776,7 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs);
  * @credits: the number of credits this job contributes to the schedulers
  * credit limit
  * @owner: job owner for debugging
+ * @drm_client_id: drm_file.client_id of the owner
  *
  * Refer to drm_sched_entity_push_job() documentation
  * for locking considerations.
@@ -796,7 +797,8 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs);
  */
 int drm_sched_job_init(struct drm_sched_job *job,
 		       struct drm_sched_entity *entity,
-		       u32 credits, void *owner)
+		       u32 credits, void *owner,
+		       uint64_t drm_client_id)
 {
 	if (!entity->rq) {
 		/* This will most likely be followed by missing frames
@@ -822,7 +824,7 @@ int drm_sched_job_init(struct drm_sched_job *job,
 
 	job->entity = entity;
 	job->credits = credits;
-	job->s_fence = drm_sched_fence_alloc(entity, owner);
+	job->s_fence = drm_sched_fence_alloc(entity, owner, drm_client_id);
 	if (!job->s_fence)
 		return -ENOMEM;
 
diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index 4ff5de46fb22..5171ffe9012d 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -169,7 +169,7 @@ v3d_job_init(struct v3d_dev *v3d, struct drm_file *file_priv,
 	job->file = file_priv;
 
 	ret = drm_sched_job_init(&job->base, &v3d_priv->sched_entity[queue],
-				 1, v3d_priv);
+				 1, v3d_priv, file_priv->client_id);
 	if (ret)
 		return ret;
 
diff --git a/drivers/gpu/drm/xe/xe_sched_job.c b/drivers/gpu/drm/xe/xe_sched_job.c
index 1905ca590965..f4679cb9a56b 100644
--- a/drivers/gpu/drm/xe/xe_sched_job.c
+++ b/drivers/gpu/drm/xe/xe_sched_job.c
@@ -113,7 +113,8 @@ struct xe_sched_job *xe_sched_job_create(struct xe_exec_queue *q,
 	kref_init(&job->refcount);
 	xe_exec_queue_get(job->q);
 
-	err = drm_sched_job_init(&job->drm, q->entity, 1, NULL);
+	err = drm_sched_job_init(&job->drm, q->entity, 1, NULL,
+				 q->xef->drm->client_id);
 	if (err)
 		goto err_free;
 
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 95e17504e46a..7982ca8d7ca5 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -311,6 +311,13 @@ struct drm_sched_fence {
          * @owner: job owner for debugging
          */
 	void				*owner;
+
+	/**
+	 * @drm_client_id:
+	 *
+	 * The client_id of the drm_file which owns the job.
+	 */
+	uint64_t			drm_client_id;
 };
 
 struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f);
@@ -563,7 +570,8 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
 void drm_sched_fini(struct drm_gpu_scheduler *sched);
 int drm_sched_job_init(struct drm_sched_job *job,
 		       struct drm_sched_entity *entity,
-		       u32 credits, void *owner);
+		       u32 credits, void *owner,
+		       u64 drm_client_id);
 void drm_sched_job_arm(struct drm_sched_job *job);
 int drm_sched_job_add_dependency(struct drm_sched_job *job,
 				 struct dma_fence *fence);
@@ -624,7 +632,7 @@ bool drm_sched_entity_is_ready(struct drm_sched_entity *entity);
 int drm_sched_entity_error(struct drm_sched_entity *entity);
 
 struct drm_sched_fence *drm_sched_fence_alloc(
-	struct drm_sched_entity *s_entity, void *owner);
+	struct drm_sched_entity *s_entity, void *owner, u64 drm_client_id);
 void drm_sched_fence_init(struct drm_sched_fence *fence,
 			  struct drm_sched_entity *entity);
 void drm_sched_fence_free(struct drm_sched_fence *fence);
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v7 3/7] drm/sched: add device name to the drm_sched_process_job event
  2025-01-31 11:02 [PATCH v7 0/7] Improve gpu_scheduler trace events + uAPI Pierre-Eric Pelloux-Prayer
  2025-01-31 11:02 ` [PATCH v7 1/7] drm/debugfs: output client_id in in drm_clients_info Pierre-Eric Pelloux-Prayer
  2025-01-31 11:03 ` [PATCH v7 2/7] drm/sched: store the drm client_id in drm_sched_fence Pierre-Eric Pelloux-Prayer
@ 2025-01-31 11:03 ` Pierre-Eric Pelloux-Prayer
  2025-01-31 11:03 ` [PATCH v7 4/7] drm/sched: cleanup gpu_scheduler trace events Pierre-Eric Pelloux-Prayer
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Pierre-Eric Pelloux-Prayer @ 2025-01-31 11:03 UTC (permalink / raw)
  To: Luben Tuikov, Matthew Brost, Danilo Krummrich, Philipp Stanner,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter
  Cc: Pierre-Eric Pelloux-Prayer, dri-devel, linux-kernel

Since switching the scheduler from using kthreads to workqueues in
commit a6149f039369 ("drm/sched: Convert drm scheduler to use a work
queue rather than kthread") userspace applications cannot determine
the device from the PID of the threads sending the trace events
anymore.

Each queue had its own kthread which had a given PID for the whole
time. So, at least for amdgpu, it was possible to associate a PID
to the hardware queues of each GPU in the system. Then, when a
drm_run_job trace event was received by userspace, the source PID
allowed to associate it back to the correct GPU.

With workqueues this is not possible anymore, so the event needs to
contain the dev_name() to identify the device.

Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
---
 drivers/gpu/drm/scheduler/gpu_scheduler_trace.h | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler_trace.h b/drivers/gpu/drm/scheduler/gpu_scheduler_trace.h
index c75302ca3427..c4ec28540656 100644
--- a/drivers/gpu/drm/scheduler/gpu_scheduler_trace.h
+++ b/drivers/gpu/drm/scheduler/gpu_scheduler_trace.h
@@ -42,6 +42,7 @@ DECLARE_EVENT_CLASS(drm_sched_job,
 			     __field(uint64_t, id)
 			     __field(u32, job_count)
 			     __field(int, hw_job_count)
+			     __string(dev, dev_name(sched_job->sched->dev))
 			     ),
 
 	    TP_fast_assign(
@@ -52,9 +53,10 @@ DECLARE_EVENT_CLASS(drm_sched_job,
 			   __entry->job_count = spsc_queue_count(&entity->job_queue);
 			   __entry->hw_job_count = atomic_read(
 				   &sched_job->sched->credit_count);
+			   __assign_str(dev);
 			   ),
-	    TP_printk("entity=%p, id=%llu, fence=%p, ring=%s, job count:%u, hw job count:%d",
-		      __entry->entity, __entry->id,
+	    TP_printk("dev=%s, entity=%p, id=%llu, fence=%p, ring=%s, job count:%u, hw job count:%d",
+		      __get_str(dev), __entry->entity, __entry->id,
 		      __entry->fence, __get_str(name),
 		      __entry->job_count, __entry->hw_job_count)
 );
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v7 4/7] drm/sched: cleanup gpu_scheduler trace events
  2025-01-31 11:02 [PATCH v7 0/7] Improve gpu_scheduler trace events + uAPI Pierre-Eric Pelloux-Prayer
                   ` (2 preceding siblings ...)
  2025-01-31 11:03 ` [PATCH v7 3/7] drm/sched: add device name to the drm_sched_process_job event Pierre-Eric Pelloux-Prayer
@ 2025-01-31 11:03 ` Pierre-Eric Pelloux-Prayer
  2025-02-04 15:22   ` Tvrtko Ursulin
  2025-01-31 11:03 ` [PATCH v7 5/7] drm/sched: trace dependencies for gpu jobs Pierre-Eric Pelloux-Prayer
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 11+ messages in thread
From: Pierre-Eric Pelloux-Prayer @ 2025-01-31 11:03 UTC (permalink / raw)
  To: Luben Tuikov, Matthew Brost, Danilo Krummrich, Philipp Stanner,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Sumit Semwal, Christian König
  Cc: Pierre-Eric Pelloux-Prayer, dri-devel, linux-kernel, linux-media,
	linaro-mm-sig

A fence uniquely identify a job, so this commits updates the places
where a kernel pointer was used as an identifier by:

   "fence=%llu:%llu"

Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
---
 .../gpu/drm/scheduler/gpu_scheduler_trace.h   | 41 +++++++++++--------
 1 file changed, 23 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler_trace.h b/drivers/gpu/drm/scheduler/gpu_scheduler_trace.h
index c4ec28540656..3cdd8d8f8021 100644
--- a/drivers/gpu/drm/scheduler/gpu_scheduler_trace.h
+++ b/drivers/gpu/drm/scheduler/gpu_scheduler_trace.h
@@ -36,28 +36,29 @@ DECLARE_EVENT_CLASS(drm_sched_job,
 	    TP_PROTO(struct drm_sched_job *sched_job, struct drm_sched_entity *entity),
 	    TP_ARGS(sched_job, entity),
 	    TP_STRUCT__entry(
-			     __field(struct drm_sched_entity *, entity)
-			     __field(struct dma_fence *, fence)
-			     __string(name, sched_job->sched->name)
 			     __field(uint64_t, id)
+			     __string(name, sched_job->sched->name)
 			     __field(u32, job_count)
 			     __field(int, hw_job_count)
 			     __string(dev, dev_name(sched_job->sched->dev))
+			     __field(uint64_t, fence_context)
+			     __field(uint64_t, fence_seqno)
 			     ),
 
 	    TP_fast_assign(
-			   __entry->entity = entity;
 			   __entry->id = sched_job->id;
-			   __entry->fence = &sched_job->s_fence->finished;
 			   __assign_str(name);
 			   __entry->job_count = spsc_queue_count(&entity->job_queue);
 			   __entry->hw_job_count = atomic_read(
 				   &sched_job->sched->credit_count);
 			   __assign_str(dev);
+			   __entry->fence_context = sched_job->s_fence->finished.context;
+			   __entry->fence_seqno = sched_job->s_fence->finished.seqno;
+
 			   ),
-	    TP_printk("dev=%s, entity=%p, id=%llu, fence=%p, ring=%s, job count:%u, hw job count:%d",
-		      __get_str(dev), __entry->entity, __entry->id,
-		      __entry->fence, __get_str(name),
+	    TP_printk("dev=%s, id=%llu, fence=%llu:%llu, ring=%s, job count:%u, hw job count:%d",
+		      __get_str(dev), __entry->id,
+		      __entry->fence_context, __entry->fence_seqno, __get_str(name),
 		      __entry->job_count, __entry->hw_job_count)
 );
 
@@ -75,37 +76,41 @@ TRACE_EVENT(drm_sched_process_job,
 	    TP_PROTO(struct drm_sched_fence *fence),
 	    TP_ARGS(fence),
 	    TP_STRUCT__entry(
-		    __field(struct dma_fence *, fence)
+		    __field(uint64_t, fence_context)
+		    __field(uint64_t, fence_seqno)
 		    ),
 
 	    TP_fast_assign(
-		    __entry->fence = &fence->finished;
+		    __entry->fence_context = fence->finished.context;
+		    __entry->fence_seqno = fence->finished.seqno;
 		    ),
-	    TP_printk("fence=%p signaled", __entry->fence)
+	    TP_printk("fence=%llu:%llu signaled",
+		      __entry->fence_context, __entry->fence_seqno)
 );
 
 TRACE_EVENT(drm_sched_job_wait_dep,
 	    TP_PROTO(struct drm_sched_job *sched_job, struct dma_fence *fence),
 	    TP_ARGS(sched_job, fence),
 	    TP_STRUCT__entry(
-			     __string(name, sched_job->sched->name)
+			     __field(uint64_t, fence_context)
+			     __field(uint64_t, fence_seqno)
 			     __field(uint64_t, id)
 			     __field(struct dma_fence *, fence)
 			     __field(uint64_t, ctx)
-			     __field(unsigned, seqno)
+			     __field(uint64_t, seqno)
 			     ),
 
 	    TP_fast_assign(
-			   __assign_str(name);
+			   __entry->fence_context = sched_job->s_fence->finished.context;
+			   __entry->fence_seqno = sched_job->s_fence->finished.seqno;
 			   __entry->id = sched_job->id;
 			   __entry->fence = fence;
 			   __entry->ctx = fence->context;
 			   __entry->seqno = fence->seqno;
 			   ),
-	    TP_printk("job ring=%s, id=%llu, depends fence=%p, context=%llu, seq=%u",
-		      __get_str(name), __entry->id,
-		      __entry->fence, __entry->ctx,
-		      __entry->seqno)
+	    TP_printk("fence=%llu:%llu, id=%llu, dependencies:{fence=%llu:%llu}",
+		      __entry->fence_context, __entry->fence_seqno, __entry->id,
+		      __entry->ctx, __entry->seqno)
 );
 
 #endif
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v7 5/7] drm/sched: trace dependencies for gpu jobs
  2025-01-31 11:02 [PATCH v7 0/7] Improve gpu_scheduler trace events + uAPI Pierre-Eric Pelloux-Prayer
                   ` (3 preceding siblings ...)
  2025-01-31 11:03 ` [PATCH v7 4/7] drm/sched: cleanup gpu_scheduler trace events Pierre-Eric Pelloux-Prayer
@ 2025-01-31 11:03 ` Pierre-Eric Pelloux-Prayer
  2025-02-04 15:31   ` Tvrtko Ursulin
  2025-01-31 11:03 ` [PATCH v7 6/7] drm/sched: add the drm_client_id to the drm_sched_run/exec_job events Pierre-Eric Pelloux-Prayer
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 11+ messages in thread
From: Pierre-Eric Pelloux-Prayer @ 2025-01-31 11:03 UTC (permalink / raw)
  To: Luben Tuikov, Matthew Brost, Danilo Krummrich, Philipp Stanner,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Sumit Semwal, Christian König
  Cc: Pierre-Eric Pelloux-Prayer, dri-devel, linux-kernel, linux-media,
	linaro-mm-sig

Trace the fence dependencies similarly to how we print fences:

 ... , dependencies:{fence=606:38006}

This allows tools to analyze the dependencies between the jobs (previously
it was only possible for fences traced by drm_sched_job_wait_dep).

Since drm_sched_job and drm_run_job use the same base event class,
the caller of trace_drm_run_job have to pass a dep_count of 0 (which
is ignored because dependencies are only considered at submit time).

Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
---
 .../gpu/drm/scheduler/gpu_scheduler_trace.h   | 59 ++++++++++++++++---
 drivers/gpu/drm/scheduler/sched_entity.c      |  8 ++-
 drivers/gpu/drm/scheduler/sched_main.c        |  2 +-
 3 files changed, 58 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler_trace.h b/drivers/gpu/drm/scheduler/gpu_scheduler_trace.h
index 3cdd8d8f8021..ca19cd9a146a 100644
--- a/drivers/gpu/drm/scheduler/gpu_scheduler_trace.h
+++ b/drivers/gpu/drm/scheduler/gpu_scheduler_trace.h
@@ -26,15 +26,41 @@
 
 #include <linux/stringify.h>
 #include <linux/types.h>
+#include <linux/trace_seq.h>
 #include <linux/tracepoint.h>
 
 #undef TRACE_SYSTEM
 #define TRACE_SYSTEM gpu_scheduler
 #define TRACE_INCLUDE_FILE gpu_scheduler_trace
 
+#ifndef __TRACE_EVENT_GPU_SCHEDULER_PRINT_FN
+#define __TRACE_EVENT_GPU_SCHEDULER_PRINT_FN
+/* Similar to trace_print_array_seq but for fences. */
+static inline const char *__print_dma_fence_array(struct trace_seq *p, const void *buf, int count)
+{
+	const char *ret = trace_seq_buffer_ptr(p);
+	u64 *fences = (u64 *) buf;
+	const char *prefix = "";
+
+	trace_seq_putc(p, '{');
+	for (int i = 0; i < count; i++) {
+		u64 context = fences[2 * i], seqno = fences[2 * i + 1];
+
+		trace_seq_printf(p, "%sfence=%llu:%llu",
+				 prefix, context, seqno);
+		prefix = ",";
+	}
+	trace_seq_putc(p, '}');
+	trace_seq_putc(p, 0);
+
+	return ret;
+}
+#endif
+
 DECLARE_EVENT_CLASS(drm_sched_job,
-	    TP_PROTO(struct drm_sched_job *sched_job, struct drm_sched_entity *entity),
-	    TP_ARGS(sched_job, entity),
+	    TP_PROTO(struct drm_sched_job *sched_job, struct drm_sched_entity *entity,
+		     unsigned int dep_count),
+	    TP_ARGS(sched_job, entity, dep_count),
 	    TP_STRUCT__entry(
 			     __field(uint64_t, id)
 			     __string(name, sched_job->sched->name)
@@ -43,9 +69,14 @@ DECLARE_EVENT_CLASS(drm_sched_job,
 			     __string(dev, dev_name(sched_job->sched->dev))
 			     __field(uint64_t, fence_context)
 			     __field(uint64_t, fence_seqno)
+			     __field(int, n_deps)
+			     __dynamic_array(u64, deps, dep_count * 2)
 			     ),
 
 	    TP_fast_assign(
+			   unsigned long idx;
+			   struct dma_fence *fence;
+			   u64 *dyn_arr;
 			   __entry->id = sched_job->id;
 			   __assign_str(name);
 			   __entry->job_count = spsc_queue_count(&entity->job_queue);
@@ -54,22 +85,32 @@ DECLARE_EVENT_CLASS(drm_sched_job,
 			   __assign_str(dev);
 			   __entry->fence_context = sched_job->s_fence->finished.context;
 			   __entry->fence_seqno = sched_job->s_fence->finished.seqno;
-
+			   __entry->n_deps = dep_count;
+			   if (dep_count) {
+				dyn_arr = __get_dynamic_array(deps);
+				xa_for_each(&sched_job->dependencies, idx, fence) {
+					dyn_arr[2 * idx] = fence->context;
+					dyn_arr[2 * idx + 1] = fence->seqno;
+				}
+			   }
 			   ),
-	    TP_printk("dev=%s, id=%llu, fence=%llu:%llu, ring=%s, job count:%u, hw job count:%d",
+	    TP_printk("dev=%s, id=%llu, fence=%llu:%llu, ring=%s, job count:%u, hw job count:%d, dependencies:%s",
 		      __get_str(dev), __entry->id,
 		      __entry->fence_context, __entry->fence_seqno, __get_str(name),
-		      __entry->job_count, __entry->hw_job_count)
+		      __entry->job_count, __entry->hw_job_count,
+		      __print_dma_fence_array(p, __get_dynamic_array(deps), __entry->n_deps))
 );
 
 DEFINE_EVENT(drm_sched_job, drm_sched_job,
-	    TP_PROTO(struct drm_sched_job *sched_job, struct drm_sched_entity *entity),
-	    TP_ARGS(sched_job, entity)
+	    TP_PROTO(struct drm_sched_job *sched_job, struct drm_sched_entity *entity,
+		     unsigned int dep_count),
+	    TP_ARGS(sched_job, entity, dep_count)
 );
 
 DEFINE_EVENT(drm_sched_job, drm_run_job,
-	    TP_PROTO(struct drm_sched_job *sched_job, struct drm_sched_entity *entity),
-	    TP_ARGS(sched_job, entity)
+	    TP_PROTO(struct drm_sched_job *sched_job, struct drm_sched_entity *entity,
+	    	     unsigned int dep_count),
+	    TP_ARGS(sched_job, entity, 0)
 );
 
 TRACE_EVENT(drm_sched_process_job,
diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index 69bcf0e99d57..0ce3a82fe6fd 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -592,7 +592,13 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
 	bool first;
 	ktime_t submit_ts;
 
-	trace_drm_sched_job(sched_job, entity);
+	if (trace_drm_sched_job_enabled()) {
+		unsigned long index, n = 0;
+		void *f;
+
+		xa_for_each(&sched_job->dependencies, index, f) { n++; }
+		trace_drm_sched_job(sched_job, entity, n);
+	}
 	atomic_inc(entity->rq->sched->score);
 	WRITE_ONCE(entity->last_user, current->group_leader);
 
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index ad306d3d7282..c645f07ebe26 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -1214,7 +1214,7 @@ static void drm_sched_run_job_work(struct work_struct *w)
 	atomic_add(sched_job->credits, &sched->credit_count);
 	drm_sched_job_begin(sched_job);
 
-	trace_drm_run_job(sched_job, entity);
+	trace_drm_run_job(sched_job, entity, 0);
 	fence = sched->ops->run_job(sched_job);
 	complete_all(&entity->entity_idle);
 	drm_sched_fence_scheduled(s_fence, fence);
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v7 6/7] drm/sched: add the drm_client_id to the drm_sched_run/exec_job events
  2025-01-31 11:02 [PATCH v7 0/7] Improve gpu_scheduler trace events + uAPI Pierre-Eric Pelloux-Prayer
                   ` (4 preceding siblings ...)
  2025-01-31 11:03 ` [PATCH v7 5/7] drm/sched: trace dependencies for gpu jobs Pierre-Eric Pelloux-Prayer
@ 2025-01-31 11:03 ` Pierre-Eric Pelloux-Prayer
  2025-01-31 11:03 ` [PATCH v7 7/7] drm/doc: document some tracepoints as uAPI Pierre-Eric Pelloux-Prayer
  2025-01-31 13:12 ` [PATCH v7 0/7] Improve gpu_scheduler trace events + uAPI Christian König
  7 siblings, 0 replies; 11+ messages in thread
From: Pierre-Eric Pelloux-Prayer @ 2025-01-31 11:03 UTC (permalink / raw)
  To: Luben Tuikov, Matthew Brost, Danilo Krummrich, Philipp Stanner,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter
  Cc: Pierre-Eric Pelloux-Prayer, dri-devel, linux-kernel

For processes with multiple drm_file instances, the drm_client_id is
the only way to map jobs back to their unique owner.

It's even more useful if drm client_name is set, because now a tool
can map jobs to the client name instead of only having access to
the process name.

Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
---
 drivers/gpu/drm/scheduler/gpu_scheduler_trace.h | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler_trace.h b/drivers/gpu/drm/scheduler/gpu_scheduler_trace.h
index ca19cd9a146a..5c740cb80037 100644
--- a/drivers/gpu/drm/scheduler/gpu_scheduler_trace.h
+++ b/drivers/gpu/drm/scheduler/gpu_scheduler_trace.h
@@ -71,6 +71,7 @@ DECLARE_EVENT_CLASS(drm_sched_job,
 			     __field(uint64_t, fence_seqno)
 			     __field(int, n_deps)
 			     __dynamic_array(u64, deps, dep_count * 2)
+			     __field(u64, client_id)
 			     ),
 
 	    TP_fast_assign(
@@ -93,12 +94,14 @@ DECLARE_EVENT_CLASS(drm_sched_job,
 					dyn_arr[2 * idx + 1] = fence->seqno;
 				}
 			   }
+			   __entry->client_id = sched_job->s_fence->drm_client_id;
 			   ),
-	    TP_printk("dev=%s, id=%llu, fence=%llu:%llu, ring=%s, job count:%u, hw job count:%d, dependencies:%s",
+	    TP_printk("dev=%s, id=%llu, fence=%llu:%llu, ring=%s, job count:%u, hw job count:%d, dependencies:%s, client_id:%lld",
 		      __get_str(dev), __entry->id,
 		      __entry->fence_context, __entry->fence_seqno, __get_str(name),
 		      __entry->job_count, __entry->hw_job_count,
-		      __print_dma_fence_array(p, __get_dynamic_array(deps), __entry->n_deps))
+		      __print_dma_fence_array(p, __get_dynamic_array(deps), __entry->n_deps),
+		      __entry->client_id)
 );
 
 DEFINE_EVENT(drm_sched_job, drm_sched_job,
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v7 7/7] drm/doc: document some tracepoints as uAPI
  2025-01-31 11:02 [PATCH v7 0/7] Improve gpu_scheduler trace events + uAPI Pierre-Eric Pelloux-Prayer
                   ` (5 preceding siblings ...)
  2025-01-31 11:03 ` [PATCH v7 6/7] drm/sched: add the drm_client_id to the drm_sched_run/exec_job events Pierre-Eric Pelloux-Prayer
@ 2025-01-31 11:03 ` Pierre-Eric Pelloux-Prayer
  2025-01-31 13:12 ` [PATCH v7 0/7] Improve gpu_scheduler trace events + uAPI Christian König
  7 siblings, 0 replies; 11+ messages in thread
From: Pierre-Eric Pelloux-Prayer @ 2025-01-31 11:03 UTC (permalink / raw)
  To: David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Jonathan Corbet, Luben Tuikov, Matthew Brost,
	Danilo Krummrich, Philipp Stanner, Sumit Semwal,
	Christian König
  Cc: Pierre-Eric Pelloux-Prayer, Lucas Stach, Maíra Canal,
	dri-devel, linux-doc, linux-kernel, linux-media, linaro-mm-sig

This commit adds a document section in drm-uapi.rst about tracepoints,
and mark the events gpu_scheduler_trace.h as stable uAPI.

The goal is to explicitly state that tools can rely on the fields,
formats and semantics of these events.

Acked-by: Lucas Stach <l.stach@pengutronix.de>
Acked-by: Maíra Canal <mcanal@igalia.com>
Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
---
 Documentation/gpu/drm-uapi.rst                | 19 ++++++++++++++++
 .../gpu/drm/scheduler/gpu_scheduler_trace.h   | 22 +++++++++++++++++++
 2 files changed, 41 insertions(+)

diff --git a/Documentation/gpu/drm-uapi.rst b/Documentation/gpu/drm-uapi.rst
index b75cc9a70d1f..9603ac0f4c09 100644
--- a/Documentation/gpu/drm-uapi.rst
+++ b/Documentation/gpu/drm-uapi.rst
@@ -583,3 +583,22 @@ dma-buf interoperability
 
 Please see Documentation/userspace-api/dma-buf-alloc-exchange.rst for
 information on how dma-buf is integrated and exposed within DRM.
+
+
+Trace events
+============
+
+See Documentation/trace/tracepoints.rst for information about using
+Linux Kernel Tracepoints.
+In the DRM subsystem, some events are considered stable uAPI to avoid
+breaking tools (e.g.: GPUVis, umr) relying on them. Stable means that fields
+cannot be removed, nor their formatting updated. Adding new fields is
+possible, under the normal uAPI requirements.
+
+Stable uAPI events
+------------------
+
+From ``drivers/gpu/drm/scheduler/gpu_scheduler_trace.h``
+
+.. kernel-doc::  drivers/gpu/drm/scheduler/gpu_scheduler_trace.h
+   :doc: uAPI trace events
\ No newline at end of file
diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler_trace.h b/drivers/gpu/drm/scheduler/gpu_scheduler_trace.h
index 5c740cb80037..85d547f3fadd 100644
--- a/drivers/gpu/drm/scheduler/gpu_scheduler_trace.h
+++ b/drivers/gpu/drm/scheduler/gpu_scheduler_trace.h
@@ -33,6 +33,28 @@
 #define TRACE_SYSTEM gpu_scheduler
 #define TRACE_INCLUDE_FILE gpu_scheduler_trace
 
+
+/**
+ * DOC: uAPI trace events
+ *
+ * ``drm_sched_job``, ``drm_run_job``, ``drm_sched_process_job``,
+ * and ``drm_sched_job_wait_dep`` are considered stable uAPI.
+ *
+ * Common trace events attributes:
+ *
+ * * ``id``    - this is &drm_sched_job->id. It uniquely idenfies a job
+ *   inside a &struct drm_gpu_scheduler.
+ *
+ * * ``dev``   - the dev_name() of the device running the job.
+ *
+ * * ``ring``  - the hardware ring running the job. Together with ``dev`` it
+ *   uniquely identifies where the job is going to be executed.
+ *
+ * * ``fence`` - the &dma_fence.context and the &dma_fence.seqno of
+ *   &drm_sched_fence.finished
+ *
+ */
+
 #ifndef __TRACE_EVENT_GPU_SCHEDULER_PRINT_FN
 #define __TRACE_EVENT_GPU_SCHEDULER_PRINT_FN
 /* Similar to trace_print_array_seq but for fences. */
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v7 0/7] Improve gpu_scheduler trace events + uAPI
  2025-01-31 11:02 [PATCH v7 0/7] Improve gpu_scheduler trace events + uAPI Pierre-Eric Pelloux-Prayer
                   ` (6 preceding siblings ...)
  2025-01-31 11:03 ` [PATCH v7 7/7] drm/doc: document some tracepoints as uAPI Pierre-Eric Pelloux-Prayer
@ 2025-01-31 13:12 ` Christian König
  7 siblings, 0 replies; 11+ messages in thread
From: Christian König @ 2025-01-31 13:12 UTC (permalink / raw)
  To: Pierre-Eric Pelloux-Prayer
  Cc: amd-gfx, dri-devel, etnaviv, freedreno, intel-xe, lima,
	linaro-mm-sig, linux-arm-msm, linux-doc, linux-kernel,
	linux-media, nouveau

I can't see patch #1 in my inbox for some reason, but I already know 
what it does from your repository.

Feel free to add Reviewed-by: Christian König <christian.koenig@amd.com> 
to the entire series.

Regards,
Christian.

Am 31.01.25 um 12:02 schrieb Pierre-Eric Pelloux-Prayer:
> Hi,
>
> The initial goal of this series was to improve the drm and amdgpu
> trace events to be able to expose more of the inner workings of
> the scheduler and drivers to developers via tools.
>
> Then, the series evolved to become focused only on gpu_scheduler.
> The changes around vblank events will be part of a different
> series, as well as the amdgpu ones.
>
> Moreover Sima suggested to make some trace events stable uAPI,
> so tools can rely on them long term.
>
> The first patches extend and cleanup the gpu scheduler events.
>
> The last one adds a documentation entry in drm-uapi.rst.
>
> Changes since v6:
> * Addressed comments from Philipp, Tvrtko, Steven
> * The commit storing drm_client_id in sched_fence adds an extra
> field which looks like a duplicate of the owner field. Currently
> amdgpu uses the owner field with some magic values, so we have to
> have 2 separate fields for now, but ultimately one could be removed.
> Similarly storing the drm_client_id in the sched_entity is not
> really possible as there's nothing preventing a driver to use a
> sched_entity with multiple drm_file instances.
>
>
> Useful links:
> - userspace tool using the updated events:
> https://gitlab.freedesktop.org/tomstdenis/umr/-/merge_requests/37
> - v6:
> https://lists.freedesktop.org/archives/dri-devel/2024-November/477644.html
>
> Pierre-Eric Pelloux-Prayer (7):
>    drm/debugfs: output client_id in in drm_clients_info
>    drm/sched: store the drm client_id in drm_sched_fence
>    drm/sched: add device name to the drm_sched_process_job event
>    drm/sched: cleanup gpu_scheduler trace events
>    drm/sched: trace dependencies for gpu jobs
>    drm/sched: add the drm_client_id to the drm_sched_run/exec_job events
>    drm/doc: document some tracepoints as uAPI
>
>   Documentation/gpu/drm-uapi.rst                |  19 +++
>   drivers/accel/amdxdna/aie2_ctx.c              |   3 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c    |   2 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c        |   3 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c       |   8 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.h       |   3 +-
>   drivers/gpu/drm/drm_debugfs.c                 |  10 +-
>   drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c  |   2 +-
>   drivers/gpu/drm/imagination/pvr_job.c         |   2 +-
>   drivers/gpu/drm/imagination/pvr_queue.c       |   5 +-
>   drivers/gpu/drm/imagination/pvr_queue.h       |   2 +-
>   drivers/gpu/drm/lima/lima_gem.c               |   2 +-
>   drivers/gpu/drm/lima/lima_sched.c             |   6 +-
>   drivers/gpu/drm/lima/lima_sched.h             |   3 +-
>   drivers/gpu/drm/msm/msm_gem_submit.c          |   8 +-
>   drivers/gpu/drm/nouveau/nouveau_sched.c       |   3 +-
>   drivers/gpu/drm/panfrost/panfrost_drv.c       |   2 +-
>   drivers/gpu/drm/panthor/panthor_drv.c         |   3 +-
>   drivers/gpu/drm/panthor/panthor_mmu.c         |   2 +-
>   drivers/gpu/drm/panthor/panthor_sched.c       |   5 +-
>   drivers/gpu/drm/panthor/panthor_sched.h       |   3 +-
>   .../gpu/drm/scheduler/gpu_scheduler_trace.h   | 123 ++++++++++++++----
>   drivers/gpu/drm/scheduler/sched_entity.c      |   8 +-
>   drivers/gpu/drm/scheduler/sched_fence.c       |   4 +-
>   drivers/gpu/drm/scheduler/sched_main.c        |   8 +-
>   drivers/gpu/drm/v3d/v3d_submit.c              |   2 +-
>   drivers/gpu/drm/xe/xe_sched_job.c             |   3 +-
>   include/drm/gpu_scheduler.h                   |  12 +-
>   28 files changed, 192 insertions(+), 64 deletions(-)
>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v7 4/7] drm/sched: cleanup gpu_scheduler trace events
  2025-01-31 11:03 ` [PATCH v7 4/7] drm/sched: cleanup gpu_scheduler trace events Pierre-Eric Pelloux-Prayer
@ 2025-02-04 15:22   ` Tvrtko Ursulin
  0 siblings, 0 replies; 11+ messages in thread
From: Tvrtko Ursulin @ 2025-02-04 15:22 UTC (permalink / raw)
  To: Pierre-Eric Pelloux-Prayer, Luben Tuikov, Matthew Brost,
	Danilo Krummrich, Philipp Stanner, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
	Sumit Semwal, Christian König
  Cc: dri-devel, linux-kernel, linux-media, linaro-mm-sig


On 31/01/2025 11:03, Pierre-Eric Pelloux-Prayer wrote:
> A fence uniquely identify a job, so this commits updates the places
> where a kernel pointer was used as an identifier by:
> 
>     "fence=%llu:%llu"
> 
> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
> ---
>   .../gpu/drm/scheduler/gpu_scheduler_trace.h   | 41 +++++++++++--------
>   1 file changed, 23 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler_trace.h b/drivers/gpu/drm/scheduler/gpu_scheduler_trace.h
> index c4ec28540656..3cdd8d8f8021 100644
> --- a/drivers/gpu/drm/scheduler/gpu_scheduler_trace.h
> +++ b/drivers/gpu/drm/scheduler/gpu_scheduler_trace.h
> @@ -36,28 +36,29 @@ DECLARE_EVENT_CLASS(drm_sched_job,
>   	    TP_PROTO(struct drm_sched_job *sched_job, struct drm_sched_entity *entity),
>   	    TP_ARGS(sched_job, entity),
>   	    TP_STRUCT__entry(
> -			     __field(struct drm_sched_entity *, entity)
> -			     __field(struct dma_fence *, fence)
> -			     __string(name, sched_job->sched->name)
>   			     __field(uint64_t, id)
> +			     __string(name, sched_job->sched->name)
>   			     __field(u32, job_count)
>   			     __field(int, hw_job_count)
>   			     __string(dev, dev_name(sched_job->sched->dev))
> +			     __field(uint64_t, fence_context)
> +			     __field(uint64_t, fence_seqno)
>   			     ),
>   
>   	    TP_fast_assign(
> -			   __entry->entity = entity;
>   			   __entry->id = sched_job->id;
> -			   __entry->fence = &sched_job->s_fence->finished;
>   			   __assign_str(name);
>   			   __entry->job_count = spsc_queue_count(&entity->job_queue);
>   			   __entry->hw_job_count = atomic_read(
>   				   &sched_job->sched->credit_count);
>   			   __assign_str(dev);
> +			   __entry->fence_context = sched_job->s_fence->finished.context;
> +			   __entry->fence_seqno = sched_job->s_fence->finished.seqno;
> +
>   			   ),
> -	    TP_printk("dev=%s, entity=%p, id=%llu, fence=%p, ring=%s, job count:%u, hw job count:%d",
> -		      __get_str(dev), __entry->entity, __entry->id,
> -		      __entry->fence, __get_str(name),
> +	    TP_printk("dev=%s, id=%llu, fence=%llu:%llu, ring=%s, job count:%u, hw job count:%d",
> +		      __get_str(dev), __entry->id,
> +		      __entry->fence_context, __entry->fence_seqno, __get_str(name),
>   		      __entry->job_count, __entry->hw_job_count)
>   );
>   
> @@ -75,37 +76,41 @@ TRACE_EVENT(drm_sched_process_job,
>   	    TP_PROTO(struct drm_sched_fence *fence),
>   	    TP_ARGS(fence),
>   	    TP_STRUCT__entry(
> -		    __field(struct dma_fence *, fence)
> +		    __field(uint64_t, fence_context)
> +		    __field(uint64_t, fence_seqno)
>   		    ),
>   
>   	    TP_fast_assign(
> -		    __entry->fence = &fence->finished;
> +		    __entry->fence_context = fence->finished.context;
> +		    __entry->fence_seqno = fence->finished.seqno;
>   		    ),
> -	    TP_printk("fence=%p signaled", __entry->fence)
> +	    TP_printk("fence=%llu:%llu signaled",
> +		      __entry->fence_context, __entry->fence_seqno)

Any chance to rename this tracepoint while changing things around? For 
me "process" is not intuitive to what stage it refers so maybe a set of 
tracepoints like:

drm_sched_job_(wait_)dependenc(y|ies) - more on this in the next patch
drm_sched_job_queue
drm_sched_job_run
drm_sched_job_done

So the naming is standardised.

>   );
>   
>   TRACE_EVENT(drm_sched_job_wait_dep,
>   	    TP_PROTO(struct drm_sched_job *sched_job, struct dma_fence *fence),
>   	    TP_ARGS(sched_job, fence),
>   	    TP_STRUCT__entry(
> -			     __string(name, sched_job->sched->name)
> +			     __field(uint64_t, fence_context)
> +			     __field(uint64_t, fence_seqno)
>   			     __field(uint64_t, id)
>   			     __field(struct dma_fence *, fence)
>   			     __field(uint64_t, ctx)
> -			     __field(unsigned, seqno)
> +			     __field(uint64_t, seqno)
>   			     ),
>   
>   	    TP_fast_assign(
> -			   __assign_str(name);
> +			   __entry->fence_context = sched_job->s_fence->finished.context;
> +			   __entry->fence_seqno = sched_job->s_fence->finished.seqno;
>   			   __entry->id = sched_job->id;
>   			   __entry->fence = fence;
>   			   __entry->ctx = fence->context;
>   			   __entry->seqno = fence->seqno;
>   			   ),
> -	    TP_printk("job ring=%s, id=%llu, depends fence=%p, context=%llu, seq=%u",
> -		      __get_str(name), __entry->id,
> -		      __entry->fence, __entry->ctx,
> -		      __entry->seqno)
> +	    TP_printk("fence=%llu:%llu, id=%llu, dependencies:{fence=%llu:%llu}",

Will dependencies ever become a list here? Just wondering if plural and 
curlies bring anything of value, or if it would be more readable as 
"fence=%llu:%llu, id=%llu, dependency=%llu:%llu".

Regards,

Tvrtko

> +		      __entry->fence_context, __entry->fence_seqno, __entry->id,
> +		      __entry->ctx, __entry->seqno)
>   );
>   
>   #endif

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v7 5/7] drm/sched: trace dependencies for gpu jobs
  2025-01-31 11:03 ` [PATCH v7 5/7] drm/sched: trace dependencies for gpu jobs Pierre-Eric Pelloux-Prayer
@ 2025-02-04 15:31   ` Tvrtko Ursulin
  0 siblings, 0 replies; 11+ messages in thread
From: Tvrtko Ursulin @ 2025-02-04 15:31 UTC (permalink / raw)
  To: Pierre-Eric Pelloux-Prayer, Luben Tuikov, Matthew Brost,
	Danilo Krummrich, Philipp Stanner, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
	Sumit Semwal, Christian König
  Cc: dri-devel, linux-kernel, linux-media, linaro-mm-sig


On 31/01/2025 11:03, Pierre-Eric Pelloux-Prayer wrote:
> Trace the fence dependencies similarly to how we print fences:
> 
>   ... , dependencies:{fence=606:38006}
> 
> This allows tools to analyze the dependencies between the jobs (previously
> it was only possible for fences traced by drm_sched_job_wait_dep).
> 
> Since drm_sched_job and drm_run_job use the same base event class,
> the caller of trace_drm_run_job have to pass a dep_count of 0 (which
> is ignored because dependencies are only considered at submit time).
> 
> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
> ---
>   .../gpu/drm/scheduler/gpu_scheduler_trace.h   | 59 ++++++++++++++++---
>   drivers/gpu/drm/scheduler/sched_entity.c      |  8 ++-
>   drivers/gpu/drm/scheduler/sched_main.c        |  2 +-
>   3 files changed, 58 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler_trace.h b/drivers/gpu/drm/scheduler/gpu_scheduler_trace.h
> index 3cdd8d8f8021..ca19cd9a146a 100644
> --- a/drivers/gpu/drm/scheduler/gpu_scheduler_trace.h
> +++ b/drivers/gpu/drm/scheduler/gpu_scheduler_trace.h
> @@ -26,15 +26,41 @@
>   
>   #include <linux/stringify.h>
>   #include <linux/types.h>
> +#include <linux/trace_seq.h>
>   #include <linux/tracepoint.h>
>   
>   #undef TRACE_SYSTEM
>   #define TRACE_SYSTEM gpu_scheduler
>   #define TRACE_INCLUDE_FILE gpu_scheduler_trace
>   
> +#ifndef __TRACE_EVENT_GPU_SCHEDULER_PRINT_FN
> +#define __TRACE_EVENT_GPU_SCHEDULER_PRINT_FN
> +/* Similar to trace_print_array_seq but for fences. */
> +static inline const char *__print_dma_fence_array(struct trace_seq *p, const void *buf, int count)
> +{
> +	const char *ret = trace_seq_buffer_ptr(p);
> +	u64 *fences = (u64 *) buf;
> +	const char *prefix = "";
> +
> +	trace_seq_putc(p, '{');
> +	for (int i = 0; i < count; i++) {
> +		u64 context = fences[2 * i], seqno = fences[2 * i + 1];
> +
> +		trace_seq_printf(p, "%sfence=%llu:%llu",
> +				 prefix, context, seqno);
> +		prefix = ",";
> +	}
> +	trace_seq_putc(p, '}');
> +	trace_seq_putc(p, 0);
> +
> +	return ret;
> +}
> +#endif
> +
>   DECLARE_EVENT_CLASS(drm_sched_job,
> -	    TP_PROTO(struct drm_sched_job *sched_job, struct drm_sched_entity *entity),
> -	    TP_ARGS(sched_job, entity),
> +	    TP_PROTO(struct drm_sched_job *sched_job, struct drm_sched_entity *entity,
> +		     unsigned int dep_count),
> +	    TP_ARGS(sched_job, entity, dep_count),
>   	    TP_STRUCT__entry(
>   			     __field(uint64_t, id)
>   			     __string(name, sched_job->sched->name)
> @@ -43,9 +69,14 @@ DECLARE_EVENT_CLASS(drm_sched_job,
>   			     __string(dev, dev_name(sched_job->sched->dev))
>   			     __field(uint64_t, fence_context)
>   			     __field(uint64_t, fence_seqno)
> +			     __field(int, n_deps)
> +			     __dynamic_array(u64, deps, dep_count * 2)
>   			     ),
>   
>   	    TP_fast_assign(
> +			   unsigned long idx;
> +			   struct dma_fence *fence;
> +			   u64 *dyn_arr;
>   			   __entry->id = sched_job->id;
>   			   __assign_str(name);
>   			   __entry->job_count = spsc_queue_count(&entity->job_queue);
> @@ -54,22 +85,32 @@ DECLARE_EVENT_CLASS(drm_sched_job,
>   			   __assign_str(dev);
>   			   __entry->fence_context = sched_job->s_fence->finished.context;
>   			   __entry->fence_seqno = sched_job->s_fence->finished.seqno;
> -
> +			   __entry->n_deps = dep_count;
> +			   if (dep_count) {
> +				dyn_arr = __get_dynamic_array(deps);
> +				xa_for_each(&sched_job->dependencies, idx, fence) {
> +					dyn_arr[2 * idx] = fence->context;
> +					dyn_arr[2 * idx + 1] = fence->seqno;
> +				}
> +			   }
>   			   ),
> -	    TP_printk("dev=%s, id=%llu, fence=%llu:%llu, ring=%s, job count:%u, hw job count:%d",
> +	    TP_printk("dev=%s, id=%llu, fence=%llu:%llu, ring=%s, job count:%u, hw job count:%d, dependencies:%s",
>   		      __get_str(dev), __entry->id,
>   		      __entry->fence_context, __entry->fence_seqno, __get_str(name),
> -		      __entry->job_count, __entry->hw_job_count)
> +		      __entry->job_count, __entry->hw_job_count,
> +		      __print_dma_fence_array(p, __get_dynamic_array(deps), __entry->n_deps))
>   );
>   
>   DEFINE_EVENT(drm_sched_job, drm_sched_job,
> -	    TP_PROTO(struct drm_sched_job *sched_job, struct drm_sched_entity *entity),
> -	    TP_ARGS(sched_job, entity)
> +	    TP_PROTO(struct drm_sched_job *sched_job, struct drm_sched_entity *entity,
> +		     unsigned int dep_count),
> +	    TP_ARGS(sched_job, entity, dep_count)
>   );
>   
>   DEFINE_EVENT(drm_sched_job, drm_run_job,
> -	    TP_PROTO(struct drm_sched_job *sched_job, struct drm_sched_entity *entity),
> -	    TP_ARGS(sched_job, entity)
> +	    TP_PROTO(struct drm_sched_job *sched_job, struct drm_sched_entity *entity,
> +	    	     unsigned int dep_count),
> +	    TP_ARGS(sched_job, entity, 0)
>   );
>   
>   TRACE_EVENT(drm_sched_process_job,
> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
> index 69bcf0e99d57..0ce3a82fe6fd 100644
> --- a/drivers/gpu/drm/scheduler/sched_entity.c
> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> @@ -592,7 +592,13 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
>   	bool first;
>   	ktime_t submit_ts;
>   
> -	trace_drm_sched_job(sched_job, entity);
> +	if (trace_drm_sched_job_enabled()) {
> +		unsigned long index, n = 0;
> +		void *f;
> +
> +		xa_for_each(&sched_job->dependencies, index, f) { n++; }
> +		trace_drm_sched_job(sched_job, entity, n);
> +	}
I wonder if it would be good enough to simply trace dependencies one by 
one in drm_sched_job_add_dependency and drop all the array magic/support 
code?

In which case that was the drm_sched_job_depdency tracepoint I hinted in 
my reply to 4/7.

I know you are at v7 but as we are going towards the stable ABI route I 
am thinking it is worth discussing this option.

We could even leave the separate "wait dep" vs "add dep" and have a set 
like:

drm_sched_job_add_dependency
drm_sched_job_queue
drm_sched_job_wait_dependency
drm_sched_job_run
drm_sched_job_done

By tracing them one by one we remove the complications from the kernel 
side and let the userspace parse multiple lines. Shouldn't be a problem 
for there.

Regards,

Tvrtko

>   	atomic_inc(entity->rq->sched->score);
>   	WRITE_ONCE(entity->last_user, current->group_leader);
>   
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index ad306d3d7282..c645f07ebe26 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -1214,7 +1214,7 @@ static void drm_sched_run_job_work(struct work_struct *w)
>   	atomic_add(sched_job->credits, &sched->credit_count);
>   	drm_sched_job_begin(sched_job);
>   
> -	trace_drm_run_job(sched_job, entity);
> +	trace_drm_run_job(sched_job, entity, 0);
>   	fence = sched->ops->run_job(sched_job);
>   	complete_all(&entity->entity_idle);
>   	drm_sched_fence_scheduled(s_fence, fence);

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2025-02-04 15:31 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-31 11:02 [PATCH v7 0/7] Improve gpu_scheduler trace events + uAPI Pierre-Eric Pelloux-Prayer
2025-01-31 11:02 ` [PATCH v7 1/7] drm/debugfs: output client_id in in drm_clients_info Pierre-Eric Pelloux-Prayer
2025-01-31 11:03 ` [PATCH v7 2/7] drm/sched: store the drm client_id in drm_sched_fence Pierre-Eric Pelloux-Prayer
2025-01-31 11:03 ` [PATCH v7 3/7] drm/sched: add device name to the drm_sched_process_job event Pierre-Eric Pelloux-Prayer
2025-01-31 11:03 ` [PATCH v7 4/7] drm/sched: cleanup gpu_scheduler trace events Pierre-Eric Pelloux-Prayer
2025-02-04 15:22   ` Tvrtko Ursulin
2025-01-31 11:03 ` [PATCH v7 5/7] drm/sched: trace dependencies for gpu jobs Pierre-Eric Pelloux-Prayer
2025-02-04 15:31   ` Tvrtko Ursulin
2025-01-31 11:03 ` [PATCH v7 6/7] drm/sched: add the drm_client_id to the drm_sched_run/exec_job events Pierre-Eric Pelloux-Prayer
2025-01-31 11:03 ` [PATCH v7 7/7] drm/doc: document some tracepoints as uAPI Pierre-Eric Pelloux-Prayer
2025-01-31 13:12 ` [PATCH v7 0/7] Improve gpu_scheduler trace events + uAPI Christian König

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox