All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] A bit of struct drm_sched_job cleanup
@ 2025-01-20 16:52 Tvrtko Ursulin
  2025-01-20 16:52 ` [PATCH 1/4] drm/sched: Add job popping API Tvrtko Ursulin
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Tvrtko Ursulin @ 2025-01-20 16:52 UTC (permalink / raw)
  To: amd-gfx, dri-devel
  Cc: kernel-dev, Tvrtko Ursulin, Christian König,
	Danilo Krummrich, Matthew Brost, Philipp Stanner

At one point I thought I wanted to add a member to struct drm_sched_job. As I
noticed there is a hole in the struct, I went to re-order some members to get
rid of it (the hole), at which point I was greeted by a subtle bug cause by the
frequent pattern of:

 job = to_drm_sched_job(spsc_queue_peek|pop(...))

Because to_drm_sched_job is a container of, it implies a lot of the scheduler
(and amdgpu) assumes job->queue_node is the first struct element. Dare to change
that, code will compile but weird things will happen at runtime.

So I thought lets clean that up and remove that hardcoded sneaky assumption and
that is what this series does.

1) Adds a helper so amdgpu can touch the scheduler internals a little bit less.
2) Removes the assumption job->queue_node must be the first element.
3) And finally removes the hole from struct drm_sched_job, fixing one instance
   of type confusion in passing too.

Cc: Christian König <christian.koenig@amd.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Philipp Stanner <phasta@kernel.org>

Tvrtko Ursulin (4):
  drm/sched: Add job popping API
  drm/amdgpu: Use the new low level job popping helper
  drm/sched: Remove to_drm_sched_job internal helper
  drm/sched: Make the type of drm_sched_job->last_dependency consistent

 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c  |  5 +-
 drivers/gpu/drm/scheduler/sched_entity.c | 20 ++++----
 drivers/gpu/drm/scheduler/sched_main.c   | 10 ++--
 include/drm/gpu_scheduler.h              | 59 ++++++++++++++++--------
 4 files changed, 57 insertions(+), 37 deletions(-)

-- 
2.47.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/4] drm/sched: Add job popping API
  2025-01-20 16:52 [PATCH 0/4] A bit of struct drm_sched_job cleanup Tvrtko Ursulin
@ 2025-01-20 16:52 ` Tvrtko Ursulin
  2025-01-20 17:13   ` Danilo Krummrich
  2025-01-20 16:52 ` [PATCH 2/4] drm/amdgpu: Use the new low level job popping helper Tvrtko Ursulin
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 10+ messages in thread
From: Tvrtko Ursulin @ 2025-01-20 16:52 UTC (permalink / raw)
  To: amd-gfx, dri-devel
  Cc: kernel-dev, Tvrtko Ursulin, Christian König,
	Danilo Krummrich, Matthew Brost, Philipp Stanner

Idea is to add a helper for popping jobs from the entity with the goal
of making amdgpu use it in the next patch. That way amdgpu will decouple
itself a tiny bit more from scheduler implementation details.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Philipp Stanner <phasta@kernel.org>
---
 drivers/gpu/drm/scheduler/sched_entity.c |  2 +-
 include/drm/gpu_scheduler.h              | 21 +++++++++++++++++++++
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index 69bcf0e99d57..7c0d266a89ef 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -255,7 +255,7 @@ static void drm_sched_entity_kill(struct drm_sched_entity *entity)
 	/* The entity is guaranteed to not be used by the scheduler */
 	prev = rcu_dereference_check(entity->last_scheduled, true);
 	dma_fence_get(prev);
-	while ((job = to_drm_sched_job(spsc_queue_pop(&entity->job_queue)))) {
+	while ((job = __drm_sched_entity_pop_job(entity))) {
 		struct drm_sched_fence *s_fence = job->s_fence;
 
 		dma_fence_get(&s_fence->finished);
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index a0ff08123f07..092242f2464f 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -611,6 +611,27 @@ void drm_sched_entity_set_priority(struct drm_sched_entity *entity,
 bool drm_sched_entity_is_ready(struct drm_sched_entity *entity);
 int drm_sched_entity_error(struct drm_sched_entity *entity);
 
+/**
+ * __drm_sched_entity_pop_job - Low level helper for popping queued jobs
+ *
+ * @entity: scheduler entity
+ *
+ * Low level helper for popping queued jobs. Drivers should not use it.
+ *
+ * Returns the job dequeued or NULL.
+ */
+static inline struct drm_sched_job *
+__drm_sched_entity_pop_job(struct drm_sched_entity *entity)
+{
+	struct spsc_node *node;
+
+	node = spsc_queue_pop(&entity->job_queue);
+	if (!node)
+		return NULL;
+
+	return container_of(node, struct drm_sched_job, queue_node);
+}
+
 struct drm_sched_fence *drm_sched_fence_alloc(
 	struct drm_sched_entity *s_entity, void *owner);
 void drm_sched_fence_init(struct drm_sched_fence *fence,
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/4] drm/amdgpu: Use the new low level job popping helper
  2025-01-20 16:52 [PATCH 0/4] A bit of struct drm_sched_job cleanup Tvrtko Ursulin
  2025-01-20 16:52 ` [PATCH 1/4] drm/sched: Add job popping API Tvrtko Ursulin
@ 2025-01-20 16:52 ` Tvrtko Ursulin
  2025-01-20 16:52 ` [PATCH 3/4] drm/sched: Remove to_drm_sched_job internal helper Tvrtko Ursulin
  2025-01-20 16:52 ` [PATCH 4/4] drm/sched: Make the type of drm_sched_job->last_dependency consistent Tvrtko Ursulin
  3 siblings, 0 replies; 10+ messages in thread
From: Tvrtko Ursulin @ 2025-01-20 16:52 UTC (permalink / raw)
  To: amd-gfx, dri-devel
  Cc: kernel-dev, Tvrtko Ursulin, Christian König,
	Danilo Krummrich, Matthew Brost, Philipp Stanner

Amdgpu is the only driver which peeks into scheduler internals in this way
so lets make it use the previously added helper. This allows removing the
local copy of the to_drm_sched_job macro.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Philipp Stanner <phasta@kernel.org>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index 100f04475943..cbbdf33f10ef 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -411,9 +411,6 @@ static struct dma_fence *amdgpu_job_run(struct drm_sched_job *sched_job)
 	return fence;
 }
 
-#define to_drm_sched_job(sched_job)		\
-		container_of((sched_job), struct drm_sched_job, queue_node)
-
 void amdgpu_job_stop_all_jobs_on_sched(struct drm_gpu_scheduler *sched)
 {
 	struct drm_sched_job *s_job;
@@ -425,7 +422,7 @@ void amdgpu_job_stop_all_jobs_on_sched(struct drm_gpu_scheduler *sched)
 		struct drm_sched_rq *rq = sched->sched_rq[i];
 		spin_lock(&rq->lock);
 		list_for_each_entry(s_entity, &rq->entities, list) {
-			while ((s_job = to_drm_sched_job(spsc_queue_pop(&s_entity->job_queue)))) {
+			while ((s_job = __drm_sched_entity_pop_job(s_entity))) {
 				struct drm_sched_fence *s_fence = s_job->s_fence;
 
 				dma_fence_signal(&s_fence->scheduled);
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/4] drm/sched: Remove to_drm_sched_job internal helper
  2025-01-20 16:52 [PATCH 0/4] A bit of struct drm_sched_job cleanup Tvrtko Ursulin
  2025-01-20 16:52 ` [PATCH 1/4] drm/sched: Add job popping API Tvrtko Ursulin
  2025-01-20 16:52 ` [PATCH 2/4] drm/amdgpu: Use the new low level job popping helper Tvrtko Ursulin
@ 2025-01-20 16:52 ` Tvrtko Ursulin
  2025-01-20 17:17   ` Danilo Krummrich
  2025-01-20 16:52 ` [PATCH 4/4] drm/sched: Make the type of drm_sched_job->last_dependency consistent Tvrtko Ursulin
  3 siblings, 1 reply; 10+ messages in thread
From: Tvrtko Ursulin @ 2025-01-20 16:52 UTC (permalink / raw)
  To: amd-gfx, dri-devel
  Cc: kernel-dev, Tvrtko Ursulin, Christian König,
	Danilo Krummrich, Matthew Brost, Philipp Stanner

The code assumes queue node is the first element in struct
drm_sched_job. Since this is not documented it can be very fragile so lets
just remove the internal helper and explicitly check for "nothing
dequeued", before converting the node to a sched job.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Philipp Stanner <phasta@kernel.org>
---
 drivers/gpu/drm/scheduler/sched_entity.c | 18 +++++++++---------
 drivers/gpu/drm/scheduler/sched_main.c   | 10 +++++-----
 2 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index 7c0d266a89ef..8992bb432ec6 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -30,9 +30,6 @@
 
 #include "gpu_scheduler_trace.h"
 
-#define to_drm_sched_job(sched_job)		\
-		container_of((sched_job), struct drm_sched_job, queue_node)
-
 /**
  * drm_sched_entity_init - Init a context entity used by scheduler when
  * submit to HW ring.
@@ -476,11 +473,14 @@ drm_sched_job_dependency(struct drm_sched_job *job,
 struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity)
 {
 	struct drm_sched_job *sched_job;
+	struct spsc_node *node;
 
-	sched_job = to_drm_sched_job(spsc_queue_peek(&entity->job_queue));
-	if (!sched_job)
+	node = spsc_queue_peek(&entity->job_queue);
+	if (!node)
 		return NULL;
 
+	sched_job = container_of(node, typeof(*sched_job), queue_node);
+
 	while ((entity->dependency =
 			drm_sched_job_dependency(sched_job, entity))) {
 		trace_drm_sched_job_wait_dep(sched_job, entity->dependency);
@@ -511,10 +511,10 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity)
 	 * the timestamp of the next job, if any.
 	 */
 	if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) {
-		struct drm_sched_job *next;
-
-		next = to_drm_sched_job(spsc_queue_peek(&entity->job_queue));
-		if (next) {
+		node = spsc_queue_peek(&entity->job_queue);
+		if (node) {
+			struct drm_sched_job *next =
+				container_of(node, typeof(*next), queue_node);
 			struct drm_sched_rq *rq;
 
 			spin_lock(&entity->lock);
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index a48be16ab84f..66eee6372253 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -87,9 +87,6 @@ static struct lockdep_map drm_sched_lockdep_map = {
 };
 #endif
 
-#define to_drm_sched_job(sched_job)		\
-		container_of((sched_job), struct drm_sched_job, queue_node)
-
 int drm_sched_policy = DRM_SCHED_POLICY_FIFO;
 
 /**
@@ -122,11 +119,14 @@ static bool drm_sched_can_queue(struct drm_gpu_scheduler *sched,
 				struct drm_sched_entity *entity)
 {
 	struct drm_sched_job *s_job;
+	struct spsc_node *node;
 
-	s_job = to_drm_sched_job(spsc_queue_peek(&entity->job_queue));
-	if (!s_job)
+	node = spsc_queue_peek(&entity->job_queue);
+	if (!node)
 		return false;
 
+	s_job = container_of(node, typeof(*s_job), queue_node);
+
 	/* If a job exceeds the credit limit, truncate it to the credit limit
 	 * itself to guarantee forward progress.
 	 */
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 4/4] drm/sched: Make the type of drm_sched_job->last_dependency consistent
  2025-01-20 16:52 [PATCH 0/4] A bit of struct drm_sched_job cleanup Tvrtko Ursulin
                   ` (2 preceding siblings ...)
  2025-01-20 16:52 ` [PATCH 3/4] drm/sched: Remove to_drm_sched_job internal helper Tvrtko Ursulin
@ 2025-01-20 16:52 ` Tvrtko Ursulin
  3 siblings, 0 replies; 10+ messages in thread
From: Tvrtko Ursulin @ 2025-01-20 16:52 UTC (permalink / raw)
  To: amd-gfx, dri-devel
  Cc: kernel-dev, Tvrtko Ursulin, Christian König,
	Danilo Krummrich, Matthew Brost, Philipp Stanner

Dependency tracking via xarray uses xa_limit_32b so there is not need for
the struct member to be unsigned long.

At the same time re-order some struct members and take u32 credits outside
of the pointer sandwich and avoid a hole.

Pahole report before:
        /* size: 160, cachelines: 3, members: 14 */
        /* sum members: 156, holes: 1, sum holes: 4 */
        /* last cacheline: 32 bytes */

And after:
        /* size: 152, cachelines: 3, members: 14 */
        /* last cacheline: 24 bytes */

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Philipp Stanner <phasta@kernel.org>
---
 include/drm/gpu_scheduler.h | 38 +++++++++++++++++++------------------
 1 file changed, 20 insertions(+), 18 deletions(-)

diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 092242f2464f..9feb7ce6eff0 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -338,8 +338,14 @@ struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f);
  * to schedule the job.
  */
 struct drm_sched_job {
-	struct spsc_node		queue_node;
-	struct list_head		list;
+	u64				id;
+
+	/**
+	 * @submit_ts:
+	 *
+	 * When the job was pushed into the entity queue.
+	 */
+	ktime_t                         submit_ts;
 
 	/**
 	 * @sched:
@@ -349,24 +355,30 @@ struct drm_sched_job {
 	 * has finished.
 	 */
 	struct drm_gpu_scheduler	*sched;
+
 	struct drm_sched_fence		*s_fence;
+	struct drm_sched_entity         *entity;
 
+	enum drm_sched_priority		s_priority;
 	u32				credits;
+	/** @last_dependency: tracks @dependencies as they signal */
+	unsigned int			last_dependency;
+	atomic_t			karma;
+
+	struct spsc_node		queue_node;
+	struct list_head		list;
 
 	/*
 	 * work is used only after finish_cb has been used and will not be
 	 * accessed anymore.
 	 */
 	union {
-		struct dma_fence_cb		finish_cb;
-		struct work_struct		work;
+		struct dma_fence_cb	finish_cb;
+		struct work_struct	work;
 	};
 
-	uint64_t			id;
-	atomic_t			karma;
-	enum drm_sched_priority		s_priority;
-	struct drm_sched_entity         *entity;
 	struct dma_fence_cb		cb;
+
 	/**
 	 * @dependencies:
 	 *
@@ -375,16 +387,6 @@ struct drm_sched_job {
 	 * drm_sched_job_add_implicit_dependencies().
 	 */
 	struct xarray			dependencies;
-
-	/** @last_dependency: tracks @dependencies as they signal */
-	unsigned long			last_dependency;
-
-	/**
-	 * @submit_ts:
-	 *
-	 * When the job was pushed into the entity queue.
-	 */
-	ktime_t                         submit_ts;
 };
 
 static inline bool drm_sched_invalidate_job(struct drm_sched_job *s_job,
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/4] drm/sched: Add job popping API
  2025-01-20 16:52 ` [PATCH 1/4] drm/sched: Add job popping API Tvrtko Ursulin
@ 2025-01-20 17:13   ` Danilo Krummrich
  2025-01-20 18:49     ` Christian König
  0 siblings, 1 reply; 10+ messages in thread
From: Danilo Krummrich @ 2025-01-20 17:13 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: amd-gfx, dri-devel, kernel-dev, Christian König,
	Matthew Brost, Philipp Stanner

On Mon, Jan 20, 2025 at 04:52:37PM +0000, Tvrtko Ursulin wrote:
> Idea is to add a helper for popping jobs from the entity with the goal
> of making amdgpu use it in the next patch. That way amdgpu will decouple
> itself a tiny bit more from scheduler implementation details.

I object to adding this function if it's *only* used for something that looks
like an abuse of the API by amdgpu. Let's not make that more convenient.

> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Danilo Krummrich <dakr@kernel.org>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Philipp Stanner <phasta@kernel.org>
> ---
>  drivers/gpu/drm/scheduler/sched_entity.c |  2 +-
>  include/drm/gpu_scheduler.h              | 21 +++++++++++++++++++++
>  2 files changed, 22 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
> index 69bcf0e99d57..7c0d266a89ef 100644
> --- a/drivers/gpu/drm/scheduler/sched_entity.c
> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> @@ -255,7 +255,7 @@ static void drm_sched_entity_kill(struct drm_sched_entity *entity)
>  	/* The entity is guaranteed to not be used by the scheduler */
>  	prev = rcu_dereference_check(entity->last_scheduled, true);
>  	dma_fence_get(prev);
> -	while ((job = to_drm_sched_job(spsc_queue_pop(&entity->job_queue)))) {
> +	while ((job = __drm_sched_entity_pop_job(entity))) {
>  		struct drm_sched_fence *s_fence = job->s_fence;
>  
>  		dma_fence_get(&s_fence->finished);
> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> index a0ff08123f07..092242f2464f 100644
> --- a/include/drm/gpu_scheduler.h
> +++ b/include/drm/gpu_scheduler.h
> @@ -611,6 +611,27 @@ void drm_sched_entity_set_priority(struct drm_sched_entity *entity,
>  bool drm_sched_entity_is_ready(struct drm_sched_entity *entity);
>  int drm_sched_entity_error(struct drm_sched_entity *entity);
>  
> +/**
> + * __drm_sched_entity_pop_job - Low level helper for popping queued jobs
> + *
> + * @entity: scheduler entity
> + *
> + * Low level helper for popping queued jobs. Drivers should not use it.
> + *
> + * Returns the job dequeued or NULL.
> + */
> +static inline struct drm_sched_job *
> +__drm_sched_entity_pop_job(struct drm_sched_entity *entity)
> +{
> +	struct spsc_node *node;
> +
> +	node = spsc_queue_pop(&entity->job_queue);
> +	if (!node)
> +		return NULL;
> +
> +	return container_of(node, struct drm_sched_job, queue_node);
> +}
> +
>  struct drm_sched_fence *drm_sched_fence_alloc(
>  	struct drm_sched_entity *s_entity, void *owner);
>  void drm_sched_fence_init(struct drm_sched_fence *fence,
> -- 
> 2.47.1
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/4] drm/sched: Remove to_drm_sched_job internal helper
  2025-01-20 16:52 ` [PATCH 3/4] drm/sched: Remove to_drm_sched_job internal helper Tvrtko Ursulin
@ 2025-01-20 17:17   ` Danilo Krummrich
  2025-01-21  9:45     ` Tvrtko Ursulin
  0 siblings, 1 reply; 10+ messages in thread
From: Danilo Krummrich @ 2025-01-20 17:17 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: amd-gfx, dri-devel, kernel-dev, Christian König,
	Matthew Brost, Philipp Stanner

On Mon, Jan 20, 2025 at 04:52:39PM +0000, Tvrtko Ursulin wrote:
> The code assumes queue node is the first element in struct
> drm_sched_job.

I'd add that this assumption lies in doing the NULL check after the
container_of(). Without saying that, it might not be that obvious.

> Since this is not documented it can be very fragile so lets
> just remove the internal helper and explicitly check for "nothing
> dequeued", before converting the node to a sched job.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Danilo Krummrich <dakr@kernel.org>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Philipp Stanner <phasta@kernel.org>
> ---
>  drivers/gpu/drm/scheduler/sched_entity.c | 18 +++++++++---------
>  drivers/gpu/drm/scheduler/sched_main.c   | 10 +++++-----
>  2 files changed, 14 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
> index 7c0d266a89ef..8992bb432ec6 100644
> --- a/drivers/gpu/drm/scheduler/sched_entity.c
> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> @@ -30,9 +30,6 @@
>  
>  #include "gpu_scheduler_trace.h"
>  
> -#define to_drm_sched_job(sched_job)		\
> -		container_of((sched_job), struct drm_sched_job, queue_node)
> -
>  /**
>   * drm_sched_entity_init - Init a context entity used by scheduler when
>   * submit to HW ring.
> @@ -476,11 +473,14 @@ drm_sched_job_dependency(struct drm_sched_job *job,
>  struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity)
>  {
>  	struct drm_sched_job *sched_job;
> +	struct spsc_node *node;
>  
> -	sched_job = to_drm_sched_job(spsc_queue_peek(&entity->job_queue));
> -	if (!sched_job)
> +	node = spsc_queue_peek(&entity->job_queue);
> +	if (!node)
>  		return NULL;
>  
> +	sched_job = container_of(node, typeof(*sched_job), queue_node);

So, here you have this pattern for a valid used case and even twice. I think you
should add drm_sched_entity_peek_job() instead and document what the
preconditions are to be allowed to peek given it's an spsc queue.

> +
>  	while ((entity->dependency =
>  			drm_sched_job_dependency(sched_job, entity))) {
>  		trace_drm_sched_job_wait_dep(sched_job, entity->dependency);
> @@ -511,10 +511,10 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity)
>  	 * the timestamp of the next job, if any.
>  	 */
>  	if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) {
> -		struct drm_sched_job *next;
> -
> -		next = to_drm_sched_job(spsc_queue_peek(&entity->job_queue));
> -		if (next) {
> +		node = spsc_queue_peek(&entity->job_queue);
> +		if (node) {
> +			struct drm_sched_job *next =
> +				container_of(node, typeof(*next), queue_node);
>  			struct drm_sched_rq *rq;
>  
>  			spin_lock(&entity->lock);
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index a48be16ab84f..66eee6372253 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -87,9 +87,6 @@ static struct lockdep_map drm_sched_lockdep_map = {
>  };
>  #endif
>  
> -#define to_drm_sched_job(sched_job)		\
> -		container_of((sched_job), struct drm_sched_job, queue_node)
> -
>  int drm_sched_policy = DRM_SCHED_POLICY_FIFO;
>  
>  /**
> @@ -122,11 +119,14 @@ static bool drm_sched_can_queue(struct drm_gpu_scheduler *sched,
>  				struct drm_sched_entity *entity)
>  {
>  	struct drm_sched_job *s_job;
> +	struct spsc_node *node;
>  
> -	s_job = to_drm_sched_job(spsc_queue_peek(&entity->job_queue));
> -	if (!s_job)
> +	node = spsc_queue_peek(&entity->job_queue);
> +	if (!node)
>  		return false;
>  
> +	s_job = container_of(node, typeof(*s_job), queue_node);
> +
>  	/* If a job exceeds the credit limit, truncate it to the credit limit
>  	 * itself to guarantee forward progress.
>  	 */
> -- 
> 2.47.1
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/4] drm/sched: Add job popping API
  2025-01-20 17:13   ` Danilo Krummrich
@ 2025-01-20 18:49     ` Christian König
  2025-01-21  9:38       ` Tvrtko Ursulin
  0 siblings, 1 reply; 10+ messages in thread
From: Christian König @ 2025-01-20 18:49 UTC (permalink / raw)
  To: Danilo Krummrich, Tvrtko Ursulin
  Cc: amd-gfx, dri-devel, kernel-dev, Matthew Brost, Philipp Stanner

Am 20.01.25 um 18:13 schrieb Danilo Krummrich:
> On Mon, Jan 20, 2025 at 04:52:37PM +0000, Tvrtko Ursulin wrote:
>> Idea is to add a helper for popping jobs from the entity with the goal
>> of making amdgpu use it in the next patch. That way amdgpu will decouple
>> itself a tiny bit more from scheduler implementation details.
> I object to adding this function if it's *only* used for something that looks
> like an abuse of the API by amdgpu. Let's not make that more convenient.

Completely agree. Since when do we have that?

I don't remember agreeing to anything which messes so badly with 
scheduler internals.

Regards,
Christian.

>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>> Cc: Christian König <christian.koenig@amd.com>
>> Cc: Danilo Krummrich <dakr@kernel.org>
>> Cc: Matthew Brost <matthew.brost@intel.com>
>> Cc: Philipp Stanner <phasta@kernel.org>
>> ---
>>   drivers/gpu/drm/scheduler/sched_entity.c |  2 +-
>>   include/drm/gpu_scheduler.h              | 21 +++++++++++++++++++++
>>   2 files changed, 22 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
>> index 69bcf0e99d57..7c0d266a89ef 100644
>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
>> @@ -255,7 +255,7 @@ static void drm_sched_entity_kill(struct drm_sched_entity *entity)
>>   	/* The entity is guaranteed to not be used by the scheduler */
>>   	prev = rcu_dereference_check(entity->last_scheduled, true);
>>   	dma_fence_get(prev);
>> -	while ((job = to_drm_sched_job(spsc_queue_pop(&entity->job_queue)))) {
>> +	while ((job = __drm_sched_entity_pop_job(entity))) {
>>   		struct drm_sched_fence *s_fence = job->s_fence;
>>   
>>   		dma_fence_get(&s_fence->finished);
>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>> index a0ff08123f07..092242f2464f 100644
>> --- a/include/drm/gpu_scheduler.h
>> +++ b/include/drm/gpu_scheduler.h
>> @@ -611,6 +611,27 @@ void drm_sched_entity_set_priority(struct drm_sched_entity *entity,
>>   bool drm_sched_entity_is_ready(struct drm_sched_entity *entity);
>>   int drm_sched_entity_error(struct drm_sched_entity *entity);
>>   
>> +/**
>> + * __drm_sched_entity_pop_job - Low level helper for popping queued jobs
>> + *
>> + * @entity: scheduler entity
>> + *
>> + * Low level helper for popping queued jobs. Drivers should not use it.
>> + *
>> + * Returns the job dequeued or NULL.
>> + */
>> +static inline struct drm_sched_job *
>> +__drm_sched_entity_pop_job(struct drm_sched_entity *entity)
>> +{
>> +	struct spsc_node *node;
>> +
>> +	node = spsc_queue_pop(&entity->job_queue);
>> +	if (!node)
>> +		return NULL;
>> +
>> +	return container_of(node, struct drm_sched_job, queue_node);
>> +}
>> +
>>   struct drm_sched_fence *drm_sched_fence_alloc(
>>   	struct drm_sched_entity *s_entity, void *owner);
>>   void drm_sched_fence_init(struct drm_sched_fence *fence,
>> -- 
>> 2.47.1
>>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/4] drm/sched: Add job popping API
  2025-01-20 18:49     ` Christian König
@ 2025-01-21  9:38       ` Tvrtko Ursulin
  0 siblings, 0 replies; 10+ messages in thread
From: Tvrtko Ursulin @ 2025-01-21  9:38 UTC (permalink / raw)
  To: Christian König, Danilo Krummrich
  Cc: amd-gfx, dri-devel, kernel-dev, Matthew Brost, Philipp Stanner


On 20/01/2025 18:49, Christian König wrote:
> Am 20.01.25 um 18:13 schrieb Danilo Krummrich:
>> On Mon, Jan 20, 2025 at 04:52:37PM +0000, Tvrtko Ursulin wrote:
>>> Idea is to add a helper for popping jobs from the entity with the goal
>>> of making amdgpu use it in the next patch. That way amdgpu will decouple
>>> itself a tiny bit more from scheduler implementation details.
>> I object to adding this function if it's *only* used for something 
>> that looks
>> like an abuse of the API by amdgpu. Let's not make that more convenient.
> 
> Completely agree. Since when do we have that?
> 
> I don't remember agreeing to anything which messes so badly with 
> scheduler internals.

Since 7c6e68c777f1 ("drm/amdgpu: Avoid HW GPU reset for RAS."). Where it 
looks to be implementing a permanently wedged state - drops everything 
submitted so far and claims to be blocking any future submission.

Remove that part instead and let the unsubmitted jobs be cleaned up when 
userspace clients exit? Would need specific hardware to test so it would 
need to be done on the AMD side.

Regards,

Tvrtko

>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>>> Cc: Christian König <christian.koenig@amd.com>
>>> Cc: Danilo Krummrich <dakr@kernel.org>
>>> Cc: Matthew Brost <matthew.brost@intel.com>
>>> Cc: Philipp Stanner <phasta@kernel.org>
>>> ---
>>>   drivers/gpu/drm/scheduler/sched_entity.c |  2 +-
>>>   include/drm/gpu_scheduler.h              | 21 +++++++++++++++++++++
>>>   2 files changed, 22 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
>>> b/drivers/gpu/drm/scheduler/sched_entity.c
>>> index 69bcf0e99d57..7c0d266a89ef 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
>>> @@ -255,7 +255,7 @@ static void drm_sched_entity_kill(struct 
>>> drm_sched_entity *entity)
>>>       /* The entity is guaranteed to not be used by the scheduler */
>>>       prev = rcu_dereference_check(entity->last_scheduled, true);
>>>       dma_fence_get(prev);
>>> -    while ((job = 
>>> to_drm_sched_job(spsc_queue_pop(&entity->job_queue)))) {
>>> +    while ((job = __drm_sched_entity_pop_job(entity))) {
>>>           struct drm_sched_fence *s_fence = job->s_fence;
>>>           dma_fence_get(&s_fence->finished);
>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>>> index a0ff08123f07..092242f2464f 100644
>>> --- a/include/drm/gpu_scheduler.h
>>> +++ b/include/drm/gpu_scheduler.h
>>> @@ -611,6 +611,27 @@ void drm_sched_entity_set_priority(struct 
>>> drm_sched_entity *entity,
>>>   bool drm_sched_entity_is_ready(struct drm_sched_entity *entity);
>>>   int drm_sched_entity_error(struct drm_sched_entity *entity);
>>> +/**
>>> + * __drm_sched_entity_pop_job - Low level helper for popping queued 
>>> jobs
>>> + *
>>> + * @entity: scheduler entity
>>> + *
>>> + * Low level helper for popping queued jobs. Drivers should not use it.
>>> + *
>>> + * Returns the job dequeued or NULL.
>>> + */
>>> +static inline struct drm_sched_job *
>>> +__drm_sched_entity_pop_job(struct drm_sched_entity *entity)
>>> +{
>>> +    struct spsc_node *node;
>>> +
>>> +    node = spsc_queue_pop(&entity->job_queue);
>>> +    if (!node)
>>> +        return NULL;
>>> +
>>> +    return container_of(node, struct drm_sched_job, queue_node);
>>> +}
>>> +
>>>   struct drm_sched_fence *drm_sched_fence_alloc(
>>>       struct drm_sched_entity *s_entity, void *owner);
>>>   void drm_sched_fence_init(struct drm_sched_fence *fence,
>>> -- 
>>> 2.47.1
>>>
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/4] drm/sched: Remove to_drm_sched_job internal helper
  2025-01-20 17:17   ` Danilo Krummrich
@ 2025-01-21  9:45     ` Tvrtko Ursulin
  0 siblings, 0 replies; 10+ messages in thread
From: Tvrtko Ursulin @ 2025-01-21  9:45 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: amd-gfx, dri-devel, kernel-dev, Christian König,
	Matthew Brost, Philipp Stanner


On 20/01/2025 17:17, Danilo Krummrich wrote:
> On Mon, Jan 20, 2025 at 04:52:39PM +0000, Tvrtko Ursulin wrote:
>> The code assumes queue node is the first element in struct
>> drm_sched_job.
> 
> I'd add that this assumption lies in doing the NULL check after the
> container_of(). Without saying that, it might not be that obvious.

Good point, I only put the full explanation in the cover letter.

>> Since this is not documented it can be very fragile so lets
>> just remove the internal helper and explicitly check for "nothing
>> dequeued", before converting the node to a sched job.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>> Cc: Christian König <christian.koenig@amd.com>
>> Cc: Danilo Krummrich <dakr@kernel.org>
>> Cc: Matthew Brost <matthew.brost@intel.com>
>> Cc: Philipp Stanner <phasta@kernel.org>
>> ---
>>   drivers/gpu/drm/scheduler/sched_entity.c | 18 +++++++++---------
>>   drivers/gpu/drm/scheduler/sched_main.c   | 10 +++++-----
>>   2 files changed, 14 insertions(+), 14 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
>> index 7c0d266a89ef..8992bb432ec6 100644
>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
>> @@ -30,9 +30,6 @@
>>   
>>   #include "gpu_scheduler_trace.h"
>>   
>> -#define to_drm_sched_job(sched_job)		\
>> -		container_of((sched_job), struct drm_sched_job, queue_node)
>> -
>>   /**
>>    * drm_sched_entity_init - Init a context entity used by scheduler when
>>    * submit to HW ring.
>> @@ -476,11 +473,14 @@ drm_sched_job_dependency(struct drm_sched_job *job,
>>   struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity)
>>   {
>>   	struct drm_sched_job *sched_job;
>> +	struct spsc_node *node;
>>   
>> -	sched_job = to_drm_sched_job(spsc_queue_peek(&entity->job_queue));
>> -	if (!sched_job)
>> +	node = spsc_queue_peek(&entity->job_queue);
>> +	if (!node)
>>   		return NULL;
>>   
>> +	sched_job = container_of(node, typeof(*sched_job), queue_node);
> 
> So, here you have this pattern for a valid used case and even twice. I think you
> should add drm_sched_entity_peek_job() instead and document what the
> preconditions are to be allowed to peek given it's an spsc queue.

I thought about it but would have to put the declaration into 
gpu_scheduler.h so I thought it would be a bit better like this. Since 
gpu_scheduler.h in the ideal world should contain as little as possible 
things which individual drivers should not use. And I see from your 
comment on 1/4 that you think the same (just that one was in effect 
documenting the current state of things).

What if I add a new header file local to driver/gpu/drm/scheduler for 
internal API?

Regards,

Tvrtko

>> +
>>   	while ((entity->dependency =
>>   			drm_sched_job_dependency(sched_job, entity))) {
>>   		trace_drm_sched_job_wait_dep(sched_job, entity->dependency);
>> @@ -511,10 +511,10 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity)
>>   	 * the timestamp of the next job, if any.
>>   	 */
>>   	if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) {
>> -		struct drm_sched_job *next;
>> -
>> -		next = to_drm_sched_job(spsc_queue_peek(&entity->job_queue));
>> -		if (next) {
>> +		node = spsc_queue_peek(&entity->job_queue);
>> +		if (node) {
>> +			struct drm_sched_job *next =
>> +				container_of(node, typeof(*next), queue_node);
>>   			struct drm_sched_rq *rq;
>>   
>>   			spin_lock(&entity->lock);
>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>> index a48be16ab84f..66eee6372253 100644
>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>> @@ -87,9 +87,6 @@ static struct lockdep_map drm_sched_lockdep_map = {
>>   };
>>   #endif
>>   
>> -#define to_drm_sched_job(sched_job)		\
>> -		container_of((sched_job), struct drm_sched_job, queue_node)
>> -
>>   int drm_sched_policy = DRM_SCHED_POLICY_FIFO;
>>   
>>   /**
>> @@ -122,11 +119,14 @@ static bool drm_sched_can_queue(struct drm_gpu_scheduler *sched,
>>   				struct drm_sched_entity *entity)
>>   {
>>   	struct drm_sched_job *s_job;
>> +	struct spsc_node *node;
>>   
>> -	s_job = to_drm_sched_job(spsc_queue_peek(&entity->job_queue));
>> -	if (!s_job)
>> +	node = spsc_queue_peek(&entity->job_queue);
>> +	if (!node)
>>   		return false;
>>   
>> +	s_job = container_of(node, typeof(*s_job), queue_node);
>> +
>>   	/* If a job exceeds the credit limit, truncate it to the credit limit
>>   	 * itself to guarantee forward progress.
>>   	 */
>> -- 
>> 2.47.1
>>

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2025-01-21 13:57 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-20 16:52 [PATCH 0/4] A bit of struct drm_sched_job cleanup Tvrtko Ursulin
2025-01-20 16:52 ` [PATCH 1/4] drm/sched: Add job popping API Tvrtko Ursulin
2025-01-20 17:13   ` Danilo Krummrich
2025-01-20 18:49     ` Christian König
2025-01-21  9:38       ` Tvrtko Ursulin
2025-01-20 16:52 ` [PATCH 2/4] drm/amdgpu: Use the new low level job popping helper Tvrtko Ursulin
2025-01-20 16:52 ` [PATCH 3/4] drm/sched: Remove to_drm_sched_job internal helper Tvrtko Ursulin
2025-01-20 17:17   ` Danilo Krummrich
2025-01-21  9:45     ` Tvrtko Ursulin
2025-01-20 16:52 ` [PATCH 4/4] drm/sched: Make the type of drm_sched_job->last_dependency consistent Tvrtko Ursulin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.