Try to fix amdgpu's error handling

dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed

* Try to fix amdgpu's error handling
@ 2025-03-18 12:03 Christian König
  2025-03-18 12:03 ` [PATCH 1/2] drm/sched: add drm_sched_prealloc_dependency_slots Christian König
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Christian König @ 2025-03-18 12:03 UTC (permalink / raw)
  To: phasta, tvrtko.ursulin, dri-devel, dakr, amd-gfx

Hi guys,

as partially discussed on the list already amdgpu has a bug in it's gang
submission code.

Basic problem is to add the correct dependency to the gang leader we
need to arm the other gang members first, but that is a point of no
return and it is possible that adding the dependencies fails with
ENOMEM.

Try to fix that by allowing drivers to preallocate dependency slots. Not
sure if that is a good approach, but of hand I don't see much
alternative.

Please comment,
Christian.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/2] drm/sched: add drm_sched_prealloc_dependency_slots
  2025-03-18 12:03 Try to fix amdgpu's error handling Christian König
@ 2025-03-18 12:03 ` Christian König
  2025-03-18 12:39   ` Danilo Krummrich
                     ` (2 more replies)
  2025-03-18 12:03 ` [PATCH 2/2] drm/amdgpu: fix gang submission error handling Christian König
  2025-03-18 12:36 ` Try to fix amdgpu's " Danilo Krummrich
  2 siblings, 3 replies; 13+ messages in thread
From: Christian König @ 2025-03-18 12:03 UTC (permalink / raw)
  To: phasta, tvrtko.ursulin, dri-devel, dakr, amd-gfx

The problem is that drivers sometimes need to add dependencies without
allocating any memory.

Add a function which preallocates slots by inserting signaled stub fences
into the dependency array.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/scheduler/sched_main.c | 41 ++++++++++++++++++++++++--
 include/drm/gpu_scheduler.h            |  2 ++
 2 files changed, 40 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 4d4219fbe49d..2085eea89513 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -852,6 +852,38 @@ void drm_sched_job_arm(struct drm_sched_job *job)
 }
 EXPORT_SYMBOL(drm_sched_job_arm);
 
+/**
+ * drm_sched_job_prealloc_dependency_slots - avoid ENOMEM on adding dependencies
+ * @job: scheduler job where dependencies will be added
+ * @num_deps: number of dependencies to preallocate slots for
+ *
+ * Preallocate memory so that no ENOMEM can come later while adding
+ * dependencies.
+ *
+ * Return:
+ * 0 on success, or an error on failing to expand the array.
+ */
+int drm_sched_job_prealloc_dependency_slots(struct drm_sched_job *job,
+					    unsigned int num_deps)
+{
+	struct dma_fence *fence;
+	u32 id = 0;
+	int ret;
+
+	while (num_deps--) {
+		fence = dma_fence_get_stub();
+		ret = xa_alloc(&job->dependencies, &id, fence, xa_limit_32b,
+			       GFP_KERNEL);
+		if (ret != 0) {
+			dma_fence_put(fence);
+			return ret;
+		}
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL(drm_sched_job_prealloc_dependency_slots);
+
 /**
  * drm_sched_job_add_dependency - adds the fence as a job dependency
  * @job: scheduler job to add the dependencies to
@@ -878,10 +910,12 @@ int drm_sched_job_add_dependency(struct drm_sched_job *job,
 	 * engines involved, rather than the number of BOs.
 	 */
 	xa_for_each(&job->dependencies, index, entry) {
-		if (entry->context != fence->context)
+		bool signaled = dma_fence_is_signaled(entry);
+
+		if (!signaled && entry->context != fence->context)
 			continue;
 
-		if (dma_fence_is_later(fence, entry)) {
+		if (signaled || dma_fence_is_later(fence, entry)) {
 			dma_fence_put(entry);
 			xa_store(&job->dependencies, index, fence, GFP_KERNEL);
 		} else {
@@ -890,7 +924,8 @@ int drm_sched_job_add_dependency(struct drm_sched_job *job,
 		return 0;
 	}
 
-	ret = xa_alloc(&job->dependencies, &id, fence, xa_limit_32b, GFP_KERNEL);
+	ret = xa_alloc(&job->dependencies, &id, fence, xa_limit_32b,
+		       GFP_KERNEL);
 	if (ret != 0)
 		dma_fence_put(fence);
 
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 1a7e377d4cbb..916e820b27ff 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -632,6 +632,8 @@ int drm_sched_job_init(struct drm_sched_job *job,
 		       u32 credits, void *owner);
 void drm_sched_job_arm(struct drm_sched_job *job);
 void drm_sched_entity_push_job(struct drm_sched_job *sched_job);
+int drm_sched_job_prealloc_dependency_slots(struct drm_sched_job *job,
+					    unsigned int num_deps);
 int drm_sched_job_add_dependency(struct drm_sched_job *job,
 				 struct dma_fence *fence);
 int drm_sched_job_add_syncobj_dependency(struct drm_sched_job *job,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/2] drm/amdgpu: fix gang submission error handling
  2025-03-18 12:03 Try to fix amdgpu's error handling Christian König
  2025-03-18 12:03 ` [PATCH 1/2] drm/sched: add drm_sched_prealloc_dependency_slots Christian König
@ 2025-03-18 12:03 ` Christian König
  2025-03-18 12:55   ` Tvrtko Ursulin
  2025-03-18 12:36 ` Try to fix amdgpu's " Danilo Krummrich
  2 siblings, 1 reply; 13+ messages in thread
From: Christian König @ 2025-03-18 12:03 UTC (permalink / raw)
  To: phasta, tvrtko.ursulin, dri-devel, dakr, amd-gfx

For the unlikely case that we ran into an ENOMEM while fixing up the gang
submission dependencies we can't clean up any more since the gang
members are already armed.

Fix this by using pre-allocated dependency slots and re-ordering the
code.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 58 +++++++++++++++-----------
 1 file changed, 34 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 5cc5f59e3018..770005c8e41f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1285,36 +1285,21 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
 	uint64_t seq;
 	int r;
 
-	for (i = 0; i < p->gang_size; ++i)
-		drm_sched_job_arm(&p->jobs[i]->base);
-
-	for (i = 0; i < p->gang_size; ++i) {
-		struct dma_fence *fence;
-
-		if (p->jobs[i] == leader)
-			continue;
-
-		fence = &p->jobs[i]->base.s_fence->scheduled;
-		dma_fence_get(fence);
-		r = drm_sched_job_add_dependency(&leader->base, fence);
-		if (r) {
-			dma_fence_put(fence);
-			return r;
-		}
-	}
-
-	if (p->gang_size > 1) {
-		for (i = 0; i < p->gang_size; ++i)
-			amdgpu_job_set_gang_leader(p->jobs[i], leader);
-	}
+	/* Preallocate the memory for the gang dependencies */
+	r = drm_sched_job_prealloc_dependency_slots(&leader->base,
+						    p->gang_size - 1);
+	if (r)
+		return r;
 
-	/* No memory allocation is allowed while holding the notifier lock.
+	/*
+	 * No memory allocation is allowed while holding the notifier lock.
 	 * The lock is held until amdgpu_cs_submit is finished and fence is
 	 * added to BOs.
 	 */
 	mutex_lock(&p->adev->notifier_lock);
 
-	/* If userptr are invalidated after amdgpu_cs_parser_bos(), return
+	/*
+	 * If userptr are invalidated after amdgpu_cs_parser_bos(), return
 	 * -EAGAIN, drmIoctl in libdrm will restart the amdgpu_cs_ioctl.
 	 */
 	r = 0;
@@ -1329,6 +1314,31 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
 		return r;
 	}
 
+	for (i = 0; i < p->gang_size; ++i)
+		drm_sched_job_arm(&p->jobs[i]->base);
+
+	for (i = 0; i < p->gang_size; ++i) {
+		struct dma_fence *fence;
+
+		if (p->jobs[i] == leader)
+			continue;
+
+		fence = &p->jobs[i]->base.s_fence->scheduled;
+		dma_fence_get(fence);
+		r = drm_sched_job_add_dependency(&leader->base, fence);
+		/*
+		 * We can't abort here with an error any more, but we should
+		 * also never run into an error since the slots for the
+		 * dependency fences are preallocated.
+		 */
+		WARN_ON(r);
+	}
+
+	if (p->gang_size > 1) {
+		for (i = 0; i < p->gang_size; ++i)
+			amdgpu_job_set_gang_leader(p->jobs[i], leader);
+	}
+
 	p->fence = dma_fence_get(&leader->base.s_fence->finished);
 	drm_exec_for_each_locked_object(&p->exec, index, gobj) {
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: Try to fix amdgpu's error handling
  2025-03-18 12:03 Try to fix amdgpu's error handling Christian König
  2025-03-18 12:03 ` [PATCH 1/2] drm/sched: add drm_sched_prealloc_dependency_slots Christian König
  2025-03-18 12:03 ` [PATCH 2/2] drm/amdgpu: fix gang submission error handling Christian König
@ 2025-03-18 12:36 ` Danilo Krummrich
  2 siblings, 0 replies; 13+ messages in thread
From: Danilo Krummrich @ 2025-03-18 12:36 UTC (permalink / raw)
  To: Christian König; +Cc: phasta, tvrtko.ursulin, dri-devel, amd-gfx

On Tue, Mar 18, 2025 at 01:03:11PM +0100, Christian König wrote:
> Hi guys,
> 
> as partially discussed on the list already amdgpu has a bug in it's gang
> submission code.
> 
> Basic problem is to add the correct dependency to the gang leader we
> need to arm the other gang members first, but that is a point of no
> return and it is possible that adding the dependencies fails with
> ENOMEM.
> 
> Try to fix that by allowing drivers to preallocate dependency slots. Not
> sure if that is a good approach, but of hand I don't see much
> alternative.

I think that's reasonable, in GPUVM we have a similar problem where we have to
preallocate in order to avoid allocations under a mutex used in the fence
signalling critical path.

Unfortunately, this even prevented us from using the maple tree, since it can't
preallocate for multiple entries ahead of time.

From my side,

Acked-by: Danilo Krummrich <dakr@kernel.org>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] drm/sched: add drm_sched_prealloc_dependency_slots
  2025-03-18 12:03 ` [PATCH 1/2] drm/sched: add drm_sched_prealloc_dependency_slots Christian König
@ 2025-03-18 12:39   ` Danilo Krummrich
  2025-03-19 11:14     ` Christian König
  2025-03-18 12:39   ` Philipp Stanner
  2025-03-18 13:03   ` Tvrtko Ursulin
  2 siblings, 1 reply; 13+ messages in thread
From: Danilo Krummrich @ 2025-03-18 12:39 UTC (permalink / raw)
  To: Christian König; +Cc: phasta, tvrtko.ursulin, dri-devel, amd-gfx

On Tue, Mar 18, 2025 at 01:03:12PM +0100, Christian König wrote:
>  /**
>   * drm_sched_job_add_dependency - adds the fence as a job dependency
>   * @job: scheduler job to add the dependencies to
> @@ -878,10 +910,12 @@ int drm_sched_job_add_dependency(struct drm_sched_job *job,
>  	 * engines involved, rather than the number of BOs.
>  	 */
>  	xa_for_each(&job->dependencies, index, entry) {
> -		if (entry->context != fence->context)
> +		bool signaled = dma_fence_is_signaled(entry);
> +
> +		if (!signaled && entry->context != fence->context)
>  			continue;
>  
> -		if (dma_fence_is_later(fence, entry)) {
> +		if (signaled || dma_fence_is_later(fence, entry)) {
>  			dma_fence_put(entry);
>  			xa_store(&job->dependencies, index, fence, GFP_KERNEL);
>  		} else {
> @@ -890,7 +924,8 @@ int drm_sched_job_add_dependency(struct drm_sched_job *job,
>  		return 0;
>  	}
>  
> -	ret = xa_alloc(&job->dependencies, &id, fence, xa_limit_32b, GFP_KERNEL);
> +	ret = xa_alloc(&job->dependencies, &id, fence, xa_limit_32b,
> +		       GFP_KERNEL);
>  	if (ret != 0)
>  		dma_fence_put(fence);

Those changes seem unrelated, aren't they?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] drm/sched: add drm_sched_prealloc_dependency_slots
  2025-03-18 12:03 ` [PATCH 1/2] drm/sched: add drm_sched_prealloc_dependency_slots Christian König
  2025-03-18 12:39   ` Danilo Krummrich
@ 2025-03-18 12:39   ` Philipp Stanner
  2025-03-19 11:23     ` Christian König
  2025-03-18 13:03   ` Tvrtko Ursulin
  2 siblings, 1 reply; 13+ messages in thread
From: Philipp Stanner @ 2025-03-18 12:39 UTC (permalink / raw)
  To: Christian König, tvrtko.ursulin, dri-devel, dakr, amd-gfx

On Tue, 2025-03-18 at 13:03 +0100, Christian König wrote:
> The problem is that drivers sometimes need to add dependencies
> without
> allocating any memory.
> 
> Add a function which preallocates slots by inserting signaled stub
> fences
> into the dependency array.

I think I get what you're doing here, but it would certainly be helpful
to provide some more justification in the commit message a la: "Some
drivers have to [...]. With drm_sched_job_add_dependency() that's not
possible because [...]"

> 
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>  drivers/gpu/drm/scheduler/sched_main.c | 41
> ++++++++++++++++++++++++--
>  include/drm/gpu_scheduler.h            |  2 ++
>  2 files changed, 40 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> b/drivers/gpu/drm/scheduler/sched_main.c
> index 4d4219fbe49d..2085eea89513 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -852,6 +852,38 @@ void drm_sched_job_arm(struct drm_sched_job
> *job)
>  }
>  EXPORT_SYMBOL(drm_sched_job_arm);
>  
> +/**
> + * drm_sched_job_prealloc_dependency_slots - avoid ENOMEM on adding
> dependencies
> + * @job: scheduler job where dependencies will be added
> + * @num_deps: number of dependencies to preallocate slots for
> + *
> + * Preallocate memory so that no ENOMEM can come later while adding
> + * dependencies.

Similarly, should have a sentence that clarifies for when this is
needed.

> + *
> + * Return:
> + * 0 on success, or an error on failing to expand the array.
> + */
> +int drm_sched_job_prealloc_dependency_slots(struct drm_sched_job
> *job,
> +					    unsigned int num_deps)
> +{
> +	struct dma_fence *fence;
> +	u32 id = 0;
> +	int ret;
> +
> +	while (num_deps--) {
> +		fence = dma_fence_get_stub();
> +		ret = xa_alloc(&job->dependencies, &id, fence,
> xa_limit_32b,
> +			       GFP_KERNEL);

So this would fill the xarr with already signaled fences which then
later will be replaced with unsignaled fences?

Help me out here: would it also work to add NULL instead of that stub-
fence?


> +		if (ret != 0) {

btw. style should be consistent with the while() above where you ommit
'> 0'.

I personally prefer having the comparison operators, but implicitly
checking for 0 is well established in the kernel, so I guess both is
OK.

> +			dma_fence_put(fence);
> +			return ret;
> +		}
> +	}
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL(drm_sched_job_prealloc_dependency_slots);
> +
>  /**
>   * drm_sched_job_add_dependency - adds the fence as a job dependency
>   * @job: scheduler job to add the dependencies to
> @@ -878,10 +910,12 @@ int drm_sched_job_add_dependency(struct
> drm_sched_job *job,
>  	 * engines involved, rather than the number of BOs.
>  	 */
>  	xa_for_each(&job->dependencies, index, entry) {
> -		if (entry->context != fence->context)
> +		bool signaled = dma_fence_is_signaled(entry);
> +
> +		if (!signaled && entry->context != fence->context)
>  			continue;
>  
> -		if (dma_fence_is_later(fence, entry)) {
> +		if (signaled || dma_fence_is_later(fence, entry)) {
>  			dma_fence_put(entry);
>  			xa_store(&job->dependencies, index, fence,
> GFP_KERNEL);
>  		} else {
> @@ -890,7 +924,8 @@ int drm_sched_job_add_dependency(struct
> drm_sched_job *job,
>  		return 0;
>  	}
>  
> -	ret = xa_alloc(&job->dependencies, &id, fence, xa_limit_32b,
> GFP_KERNEL);
> +	ret = xa_alloc(&job->dependencies, &id, fence, xa_limit_32b,
> +		       GFP_KERNEL);

Would leave that unchanged for git-blame.


Thx
P.

>  	if (ret != 0)
>  		dma_fence_put(fence);
>  
> diff --git a/include/drm/gpu_scheduler.h
> b/include/drm/gpu_scheduler.h
> index 1a7e377d4cbb..916e820b27ff 100644
> --- a/include/drm/gpu_scheduler.h
> +++ b/include/drm/gpu_scheduler.h
> @@ -632,6 +632,8 @@ int drm_sched_job_init(struct drm_sched_job *job,
>  		       u32 credits, void *owner);
>  void drm_sched_job_arm(struct drm_sched_job *job);
>  void drm_sched_entity_push_job(struct drm_sched_job *sched_job);
> +int drm_sched_job_prealloc_dependency_slots(struct drm_sched_job
> *job,
> +					    unsigned int num_deps);
>  int drm_sched_job_add_dependency(struct drm_sched_job *job,
>  				 struct dma_fence *fence);
>  int drm_sched_job_add_syncobj_dependency(struct drm_sched_job *job,


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] drm/amdgpu: fix gang submission error handling
  2025-03-18 12:03 ` [PATCH 2/2] drm/amdgpu: fix gang submission error handling Christian König
@ 2025-03-18 12:55   ` Tvrtko Ursulin
  0 siblings, 0 replies; 13+ messages in thread
From: Tvrtko Ursulin @ 2025-03-18 12:55 UTC (permalink / raw)
  To: Christian König, phasta, dri-devel, dakr, amd-gfx


On 18/03/2025 12:03, Christian König wrote:
> For the unlikely case that we ran into an ENOMEM while fixing up the gang
> submission dependencies we can't clean up any more since the gang
> members are already armed.
> 
> Fix this by using pre-allocated dependency slots and re-ordering the
> code.
> 
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 58 +++++++++++++++-----------
>   1 file changed, 34 insertions(+), 24 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> index 5cc5f59e3018..770005c8e41f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> @@ -1285,36 +1285,21 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
>   	uint64_t seq;
>   	int r;
>   
> -	for (i = 0; i < p->gang_size; ++i)
> -		drm_sched_job_arm(&p->jobs[i]->base);
> -
> -	for (i = 0; i < p->gang_size; ++i) {
> -		struct dma_fence *fence;
> -
> -		if (p->jobs[i] == leader)
> -			continue;
> -
> -		fence = &p->jobs[i]->base.s_fence->scheduled;
> -		dma_fence_get(fence);
> -		r = drm_sched_job_add_dependency(&leader->base, fence);
> -		if (r) {
> -			dma_fence_put(fence);

Ouch, was this broken too - one put too many since 
drm_sched_job_add_dependency consumes the reference in the failing case? 
If it looks like that to you too, see if you think it is worth to split 
into two patches, or as minimum mention in the commit message.

Otherwise LGTM.

Regards,

Tvrtko

> -			return r;
> -		}
> -	}
> -
> -	if (p->gang_size > 1) {
> -		for (i = 0; i < p->gang_size; ++i)
> -			amdgpu_job_set_gang_leader(p->jobs[i], leader);
> -	}
> +	/* Preallocate the memory for the gang dependencies */
> +	r = drm_sched_job_prealloc_dependency_slots(&leader->base,
> +						    p->gang_size - 1);
> +	if (r)
> +		return r;
>   
> -	/* No memory allocation is allowed while holding the notifier lock.
> +	/*
> +	 * No memory allocation is allowed while holding the notifier lock.
>   	 * The lock is held until amdgpu_cs_submit is finished and fence is
>   	 * added to BOs.
>   	 */
>   	mutex_lock(&p->adev->notifier_lock);
>   
> -	/* If userptr are invalidated after amdgpu_cs_parser_bos(), return
> +	/*
> +	 * If userptr are invalidated after amdgpu_cs_parser_bos(), return
>   	 * -EAGAIN, drmIoctl in libdrm will restart the amdgpu_cs_ioctl.
>   	 */
>   	r = 0;
> @@ -1329,6 +1314,31 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
>   		return r;
>   	}
>   
> +	for (i = 0; i < p->gang_size; ++i)
> +		drm_sched_job_arm(&p->jobs[i]->base);
> +
> +	for (i = 0; i < p->gang_size; ++i) {
> +		struct dma_fence *fence;
> +
> +		if (p->jobs[i] == leader)
> +			continue;
> +
> +		fence = &p->jobs[i]->base.s_fence->scheduled;
> +		dma_fence_get(fence);
> +		r = drm_sched_job_add_dependency(&leader->base, fence);
> +		/*
> +		 * We can't abort here with an error any more, but we should
> +		 * also never run into an error since the slots for the
> +		 * dependency fences are preallocated.
> +		 */
> +		WARN_ON(r);
> +	}
> +
> +	if (p->gang_size > 1) {
> +		for (i = 0; i < p->gang_size; ++i)
> +			amdgpu_job_set_gang_leader(p->jobs[i], leader);
> +	}
> +
>   	p->fence = dma_fence_get(&leader->base.s_fence->finished);
>   	drm_exec_for_each_locked_object(&p->exec, index, gobj) {
>   


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] drm/sched: add drm_sched_prealloc_dependency_slots
  2025-03-18 12:03 ` [PATCH 1/2] drm/sched: add drm_sched_prealloc_dependency_slots Christian König
  2025-03-18 12:39   ` Danilo Krummrich
  2025-03-18 12:39   ` Philipp Stanner
@ 2025-03-18 13:03   ` Tvrtko Ursulin
  2 siblings, 0 replies; 13+ messages in thread
From: Tvrtko Ursulin @ 2025-03-18 13:03 UTC (permalink / raw)
  To: Christian König, phasta, dri-devel, dakr, amd-gfx


On 18/03/2025 12:03, Christian König wrote:
> The problem is that drivers sometimes need to add dependencies without
> allocating any memory.
> 
> Add a function which preallocates slots by inserting signaled stub fences
> into the dependency array.
> 
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>   drivers/gpu/drm/scheduler/sched_main.c | 41 ++++++++++++++++++++++++--
>   include/drm/gpu_scheduler.h            |  2 ++
>   2 files changed, 40 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index 4d4219fbe49d..2085eea89513 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -852,6 +852,38 @@ void drm_sched_job_arm(struct drm_sched_job *job)
>   }
>   EXPORT_SYMBOL(drm_sched_job_arm);
>   
> +/**
> + * drm_sched_job_prealloc_dependency_slots - avoid ENOMEM on adding dependencies
> + * @job: scheduler job where dependencies will be added
> + * @num_deps: number of dependencies to preallocate slots for
> + *
> + * Preallocate memory so that no ENOMEM can come later while adding
> + * dependencies.
> + *
> + * Return:
> + * 0 on success, or an error on failing to expand the array.
> + */
> +int drm_sched_job_prealloc_dependency_slots(struct drm_sched_job *job,
> +					    unsigned int num_deps)
> +{
> +	struct dma_fence *fence;
> +	u32 id = 0;
> +	int ret;
> +
> +	while (num_deps--) {
> +		fence = dma_fence_get_stub();
> +		ret = xa_alloc(&job->dependencies, &id, fence, xa_limit_32b,
> +			       GFP_KERNEL);
> +		if (ret != 0) {
> +			dma_fence_put(fence);
> +			return ret;
> +		}
> +	}
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL(drm_sched_job_prealloc_dependency_slots);
> +
>   /**
>    * drm_sched_job_add_dependency - adds the fence as a job dependency
>    * @job: scheduler job to add the dependencies to
> @@ -878,10 +910,12 @@ int drm_sched_job_add_dependency(struct drm_sched_job *job,
>   	 * engines involved, rather than the number of BOs.
>   	 */
>   	xa_for_each(&job->dependencies, index, entry) {
> -		if (entry->context != fence->context)
> +		bool signaled = dma_fence_is_signaled(entry);

I so dislike dma_fence_is_signaled() due it leaking the fence signaling 
annotations to unsuspecting callers. I hope it doesn't mark someone's 
lock/mutex as the critical signalling path..

Regards,

Tvrtko

> +
> +		if (!signaled && entry->context != fence->context)
>   			continue;
>   
> -		if (dma_fence_is_later(fence, entry)) {
> +		if (signaled || dma_fence_is_later(fence, entry)) {
>   			dma_fence_put(entry);
>   			xa_store(&job->dependencies, index, fence, GFP_KERNEL);
>   		} else {
> @@ -890,7 +924,8 @@ int drm_sched_job_add_dependency(struct drm_sched_job *job,
>   		return 0;
>   	}
>   
> -	ret = xa_alloc(&job->dependencies, &id, fence, xa_limit_32b, GFP_KERNEL);
> +	ret = xa_alloc(&job->dependencies, &id, fence, xa_limit_32b,
> +		       GFP_KERNEL);
>   	if (ret != 0)
>   		dma_fence_put(fence);
>   
> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> index 1a7e377d4cbb..916e820b27ff 100644
> --- a/include/drm/gpu_scheduler.h
> +++ b/include/drm/gpu_scheduler.h
> @@ -632,6 +632,8 @@ int drm_sched_job_init(struct drm_sched_job *job,
>   		       u32 credits, void *owner);
>   void drm_sched_job_arm(struct drm_sched_job *job);
>   void drm_sched_entity_push_job(struct drm_sched_job *sched_job);
> +int drm_sched_job_prealloc_dependency_slots(struct drm_sched_job *job,
> +					    unsigned int num_deps);
>   int drm_sched_job_add_dependency(struct drm_sched_job *job,
>   				 struct dma_fence *fence);
>   int drm_sched_job_add_syncobj_dependency(struct drm_sched_job *job,


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] drm/sched: add drm_sched_prealloc_dependency_slots
  2025-03-18 12:39   ` Danilo Krummrich
@ 2025-03-19 11:14     ` Christian König
  0 siblings, 0 replies; 13+ messages in thread
From: Christian König @ 2025-03-19 11:14 UTC (permalink / raw)
  To: Danilo Krummrich; +Cc: phasta, tvrtko.ursulin, dri-devel, amd-gfx

Am 18.03.25 um 13:39 schrieb Danilo Krummrich:
> On Tue, Mar 18, 2025 at 01:03:12PM +0100, Christian König wrote:
>>  /**
>>   * drm_sched_job_add_dependency - adds the fence as a job dependency
>>   * @job: scheduler job to add the dependencies to
>> @@ -878,10 +910,12 @@ int drm_sched_job_add_dependency(struct drm_sched_job *job,
>>  	 * engines involved, rather than the number of BOs.
>>  	 */
>>  	xa_for_each(&job->dependencies, index, entry) {
>> -		if (entry->context != fence->context)
>> +		bool signaled = dma_fence_is_signaled(entry);
>> +
>> +		if (!signaled && entry->context != fence->context)
>>  			continue;
>>  
>> -		if (dma_fence_is_later(fence, entry)) {
>> +		if (signaled || dma_fence_is_later(fence, entry)) {
>>  			dma_fence_put(entry);
>>  			xa_store(&job->dependencies, index, fence, GFP_KERNEL);
>>  		} else {
>> @@ -890,7 +924,8 @@ int drm_sched_job_add_dependency(struct drm_sched_job *job,
>>  		return 0;
>>  	}
>>  
>> -	ret = xa_alloc(&job->dependencies, &id, fence, xa_limit_32b, GFP_KERNEL);
>> +	ret = xa_alloc(&job->dependencies, &id, fence, xa_limit_32b,
>> +		       GFP_KERNEL);
>>  	if (ret != 0)
>>  		dma_fence_put(fence);
> Those changes seem unrelated, aren't they?

Ah, yes that was just a leftover from a previous try to fix this.

Thanks,
Christian.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] drm/sched: add drm_sched_prealloc_dependency_slots
  2025-03-18 12:39   ` Philipp Stanner
@ 2025-03-19 11:23     ` Christian König
  2025-03-20 11:49       ` Tvrtko Ursulin
  0 siblings, 1 reply; 13+ messages in thread
From: Christian König @ 2025-03-19 11:23 UTC (permalink / raw)
  To: phasta, tvrtko.ursulin, dri-devel, dakr, amd-gfx

[-- Attachment #1: Type: text/plain, Size: 981 bytes --]

>> + *
>> + * Return:
>> + * 0 on success, or an error on failing to expand the array.
>> + */
>> +int drm_sched_job_prealloc_dependency_slots(struct drm_sched_job
>> *job,
>> +					    unsigned int num_deps)
>> +{
>> +	struct dma_fence *fence;
>> +	u32 id = 0;
>> +	int ret;
>> +
>> +	while (num_deps--) {
>> +		fence = dma_fence_get_stub();
>> +		ret = xa_alloc(&job->dependencies, &id, fence,
>> xa_limit_32b,
>> +			       GFP_KERNEL);
> So this would fill the xarr with already signaled fences which then
> later will be replaced with unsignaled fences?

Yes, exactly that's the idea.

> Help me out here: would it also work to add NULL instead of that stub-
> fence?

Good question, idk. That's an implementation detail of the xarray.

Tvrtko also correctly pointed out that it is most likely a bad idea to use dma_fence_is_signaled() in the critical code path.

I will try to dig through the xarray behavior up and update the patch if possible.

Thanks,
Christian.

[-- Attachment #2: Type: text/html, Size: 1666 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] drm/sched: add drm_sched_prealloc_dependency_slots
  2025-03-19 11:23     ` Christian König
@ 2025-03-20 11:49       ` Tvrtko Ursulin
  2025-03-21  8:20         ` Philipp Stanner
  0 siblings, 1 reply; 13+ messages in thread
From: Tvrtko Ursulin @ 2025-03-20 11:49 UTC (permalink / raw)
  To: Christian König, phasta, dri-devel, dakr, amd-gfx


On 19/03/2025 11:23, Christian König wrote:
>>> + *
>>> + * Return:
>>> + * 0 on success, or an error on failing to expand the array.
>>> + */
>>> +int drm_sched_job_prealloc_dependency_slots(struct drm_sched_job
>>> *job,
>>> +					    unsigned int num_deps)
>>> +{
>>> +	struct dma_fence *fence;
>>> +	u32 id = 0;
>>> +	int ret;
>>> +
>>> +	while (num_deps--) {
>>> +		fence = dma_fence_get_stub();
>>> +		ret = xa_alloc(&job->dependencies, &id, fence,
>>> xa_limit_32b,
>>> +			       GFP_KERNEL);
>> So this would fill the xarr with already signaled fences which then
>> later will be replaced with unsignaled fences?
> 
> Yes, exactly that's the idea.
> 
>> Help me out here: would it also work to add NULL instead of that stub-
>> fence?
> 
> Good question, idk. That's an implementation detail of the xarray.
> 
> Tvrtko also correctly pointed out that it is most likely a bad idea to 
> use dma_fence_is_signaled() in the critical code path.
> 
> I will try to dig through the xarray behavior up and update the patch if 
> possible.

I think NULL on its own is not possible, but the two low bits are 
available for pointer tagging, or designating pointers vs integers, 
which looks like it could work. Something like storing 
xa_tag_pointer(NULL, 1) to reserved slots and at lookup time they would 
be detected with "xa_pointer_tag(fence) & 1".

Regards,

Tvrtko


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] drm/sched: add drm_sched_prealloc_dependency_slots
  2025-03-20 11:49       ` Tvrtko Ursulin
@ 2025-03-21  8:20         ` Philipp Stanner
  2025-03-21 13:25           ` Christian König
  0 siblings, 1 reply; 13+ messages in thread
From: Philipp Stanner @ 2025-03-21  8:20 UTC (permalink / raw)
  To: Tvrtko Ursulin, Christian König, phasta, dri-devel, dakr,
	amd-gfx

On Thu, 2025-03-20 at 11:49 +0000, Tvrtko Ursulin wrote:
> 
> On 19/03/2025 11:23, Christian König wrote:
> > > > + *
> > > > + * Return:
> > > > + * 0 on success, or an error on failing to expand the array.
> > > > + */
> > > > +int drm_sched_job_prealloc_dependency_slots(struct
> > > > drm_sched_job
> > > > *job,
> > > > +					    unsigned int
> > > > num_deps)
> > > > +{
> > > > +	struct dma_fence *fence;
> > > > +	u32 id = 0;
> > > > +	int ret;
> > > > +
> > > > +	while (num_deps--) {
> > > > +		fence = dma_fence_get_stub();
> > > > +		ret = xa_alloc(&job->dependencies, &id, fence,
> > > > xa_limit_32b,
> > > > +			       GFP_KERNEL);
> > > So this would fill the xarr with already signaled fences which
> > > then
> > > later will be replaced with unsignaled fences?
> > 
> > Yes, exactly that's the idea.
> > 
> > > Help me out here: would it also work to add NULL instead of that
> > > stub-
> > > fence?
> > 
> > Good question, idk. That's an implementation detail of the xarray.
> > 
> > Tvrtko also correctly pointed out that it is most likely a bad idea
> > to 
> > use dma_fence_is_signaled() in the critical code path.
> > 
> > I will try to dig through the xarray behavior up and update the
> > patch if 
> > possible.
> 
> I think NULL on its own is not possible, but the two low bits are 
> available for pointer tagging, or designating pointers vs integers, 
> which looks like it could work. Something like storing 
> xa_tag_pointer(NULL, 1) to reserved slots and at lookup time they
> would 
> be detected with "xa_pointer_tag(fence) & 1".

Almost!

they would be detected with a super-readable

#define DRM_SCHED_XARR_TAG_RESERVED_ENTRY 1

or maybe …UNUSED_ENTRY?

^_^

P.


> 
> Regards,
> 
> Tvrtko
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] drm/sched: add drm_sched_prealloc_dependency_slots
  2025-03-21  8:20         ` Philipp Stanner
@ 2025-03-21 13:25           ` Christian König
  0 siblings, 0 replies; 13+ messages in thread
From: Christian König @ 2025-03-21 13:25 UTC (permalink / raw)
  To: phasta, Tvrtko Ursulin, dri-devel, dakr, amd-gfx

Am 21.03.25 um 09:20 schrieb Philipp Stanner:
> On Thu, 2025-03-20 at 11:49 +0000, Tvrtko Ursulin wrote:
>> On 19/03/2025 11:23, Christian König wrote:
>>>>> + *
>>>>> + * Return:
>>>>> + * 0 on success, or an error on failing to expand the array.
>>>>> + */
>>>>> +int drm_sched_job_prealloc_dependency_slots(struct
>>>>> drm_sched_job
>>>>> *job,
>>>>> +					    unsigned int
>>>>> num_deps)
>>>>> +{
>>>>> +	struct dma_fence *fence;
>>>>> +	u32 id = 0;
>>>>> +	int ret;
>>>>> +
>>>>> +	while (num_deps--) {
>>>>> +		fence = dma_fence_get_stub();
>>>>> +		ret = xa_alloc(&job->dependencies, &id, fence,
>>>>> xa_limit_32b,
>>>>> +			       GFP_KERNEL);
>>>> So this would fill the xarr with already signaled fences which
>>>> then
>>>> later will be replaced with unsignaled fences?
>>> Yes, exactly that's the idea.
>>>
>>>> Help me out here: would it also work to add NULL instead of that
>>>> stub-
>>>> fence?
>>> Good question, idk. That's an implementation detail of the xarray.
>>>
>>> Tvrtko also correctly pointed out that it is most likely a bad idea
>>> to 
>>> use dma_fence_is_signaled() in the critical code path.
>>>
>>> I will try to dig through the xarray behavior up and update the
>>> patch if 
>>> possible.
>> I think NULL on its own is not possible, but the two low bits are 
>> available for pointer tagging, or designating pointers vs integers, 
>> which looks like it could work. Something like storing 
>> xa_tag_pointer(NULL, 1) to reserved slots and at lookup time they
>> would 
>> be detected with "xa_pointer_tag(fence) & 1".
> Almost!
>
> they would be detected with a super-readable
>
> #define DRM_SCHED_XARR_TAG_RESERVED_ENTRY 1
>
> or maybe …UNUSED_ENTRY?

NULL doesn't work because xa_for_each() skips NULL entries, but it looks like somebody else stumbled over the same problem we have here as well.

So there is already the solution to use XA_ZERO_ENTRY! That special value can then either be used with xa_alloc() or through xa_reserve().

It's just that the xarray documentation is not explicitly pointing that out, so I had to dig around in the code a bit to figure out how everything works.

Regards,
Christian.

>
> ^_^
>
> P.
>
>
>> Regards,
>>
>> Tvrtko
>>


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2025-03-21 13:25 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-18 12:03 Try to fix amdgpu's error handling Christian König
2025-03-18 12:03 ` [PATCH 1/2] drm/sched: add drm_sched_prealloc_dependency_slots Christian König
2025-03-18 12:39   ` Danilo Krummrich
2025-03-19 11:14     ` Christian König
2025-03-18 12:39   ` Philipp Stanner
2025-03-19 11:23     ` Christian König
2025-03-20 11:49       ` Tvrtko Ursulin
2025-03-21  8:20         ` Philipp Stanner
2025-03-21 13:25           ` Christian König
2025-03-18 13:03   ` Tvrtko Ursulin
2025-03-18 12:03 ` [PATCH 2/2] drm/amdgpu: fix gang submission error handling Christian König
2025-03-18 12:55   ` Tvrtko Ursulin
2025-03-18 12:36 ` Try to fix amdgpu's " Danilo Krummrich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).