Fixing AMDGPUs gang submit error handling

dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed

* Fixing AMDGPUs gang submit error handling
@ 2025-05-15 15:00 Christian König
  0 siblings, 0 replies; 25+ messages in thread
From: Christian König @ 2025-05-15 15:00 UTC (permalink / raw)
  To: dri-devel, phasta, dakr, amd-gfx

Hi guys,

third revision of this patch set. I've re-worked the interface
completely this time since my previous assumptions on how the
reservation function of the xarray work weren't correct at all.

I also added a test case to make sure I've got it right this time.

Please review and comment,
Christian.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Fixing AMDGPUs gang submit error handling
@ 2025-05-22 13:41 Christian König
  0 siblings, 0 replies; 25+ messages in thread
From: Christian König @ 2025-05-22 13:41 UTC (permalink / raw)
  To: dri-devel, phasta, dakr, amd-gfx

Hi guys,

fourth revision of those patches.

Tvrtko got me on another idea how to avoid returning the index of the
reserved slot to the caller. That simplfies the handling quite a bit and
makes the code more resilent to errors.

Please take another look,
Christian.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Fixing AMDGPUs gang submit error handling
@ 2025-05-23 12:56 Christian König
  2025-05-23 12:56 ` [PATCH 1/4] drm/sched: optimize drm_sched_job_add_dependency Christian König
                   ` (3 more replies)
  0 siblings, 4 replies; 25+ messages in thread
From: Christian König @ 2025-05-23 12:56 UTC (permalink / raw)
  To: tursulin, phasta, dakr, amd-gfx, dri-devel

Hi guys,

fives try to those patches. I think I finally manage to understand
how xarray works.

There are the high level and lower level API and we can actually save
tons of CPU cycles when we switch to the lower level API for adding
the fences to the xarray.

Looks like this is working now, but I won't mind some additional
testing or ideas for more test cases.

Regards,
Christian.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/4] drm/sched: optimize drm_sched_job_add_dependency
  2025-05-23 12:56 Fixing AMDGPUs gang submit error handling Christian König
@ 2025-05-23 12:56 ` Christian König
  2025-05-23 13:49   ` Tvrtko Ursulin
                     ` (2 more replies)
  2025-05-23 12:56 ` [PATCH 2/4] drm/sched: add drm_sched_prealloc_dependency_slots Christian König
                   ` (2 subsequent siblings)
  3 siblings, 3 replies; 25+ messages in thread
From: Christian König @ 2025-05-23 12:56 UTC (permalink / raw)
  To: tursulin, phasta, dakr, amd-gfx, dri-devel

It turned out that we can actually massively optimize here.

The previous code was horrible inefficient since it constantly released
and re-acquired the lock of the xarray and started each iteration from the
base of the array to avoid concurrent modification which in our case
doesn't exist.

Additional to that the xas_find() and xas_store() functions are explicitly
made in a way so that you can efficiently check entries and if you don't
find a match store a new one at the end or replace existing ones.

So use xas_for_each()/xa_store() instead of xa_for_each()/xa_alloc().
It's a bit more code, but should be much faster in the end.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/scheduler/sched_main.c | 29 ++++++++++++++++++--------
 1 file changed, 20 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index f7118497e47a..cf200b1b643e 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -871,10 +871,8 @@ EXPORT_SYMBOL(drm_sched_job_arm);
 int drm_sched_job_add_dependency(struct drm_sched_job *job,
 				 struct dma_fence *fence)
 {
+	XA_STATE(xas, &job->dependencies, 0);
 	struct dma_fence *entry;
-	unsigned long index;
-	u32 id = 0;
-	int ret;
 
 	if (!fence)
 		return 0;
@@ -883,24 +881,37 @@ int drm_sched_job_add_dependency(struct drm_sched_job *job,
 	 * This lets the size of the array of deps scale with the number of
 	 * engines involved, rather than the number of BOs.
 	 */
-	xa_for_each(&job->dependencies, index, entry) {
+	xa_lock(&job->dependencies);
+	xas_for_each(&xas, entry, ULONG_MAX) {
 		if (entry->context != fence->context)
 			continue;
 
 		if (dma_fence_is_later(fence, entry)) {
 			dma_fence_put(entry);
-			xa_store(&job->dependencies, index, fence, GFP_KERNEL);
+			xas_store(&xas, fence);
 		} else {
 			dma_fence_put(fence);
 		}
-		return 0;
+		xa_unlock(&job->dependencies);
+		return xas_error(&xas);
 	}
 
-	ret = xa_alloc(&job->dependencies, &id, fence, xa_limit_32b, GFP_KERNEL);
-	if (ret != 0)
+retry:
+	entry = xas_store(&xas, fence);
+	xa_unlock(&job->dependencies);
+
+	/* There shouldn't be any concurrent add, so no need to loop again */
+	if (xas_nomem(&xas, GFP_KERNEL)) {
+		xa_lock(&job->dependencies);
+		goto retry;
+	}
+
+	if (xas_error(&xas))
 		dma_fence_put(fence);
+	else
+		WARN_ON(entry);
 
-	return ret;
+	return xas_error(&xas);
 }
 EXPORT_SYMBOL(drm_sched_job_add_dependency);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/4] drm/sched: add drm_sched_prealloc_dependency_slots
  2025-05-23 12:56 Fixing AMDGPUs gang submit error handling Christian König
  2025-05-23 12:56 ` [PATCH 1/4] drm/sched: optimize drm_sched_job_add_dependency Christian König
@ 2025-05-23 12:56 ` Christian König
  2025-05-23 12:56 ` [PATCH 3/4] drm/sched: Add a test for prealloced fence slots Christian König
  2025-05-23 12:56 ` [PATCH 4/4] drm/amdgpu: fix gang submission error handling Christian König
  3 siblings, 0 replies; 25+ messages in thread
From: Christian König @ 2025-05-23 12:56 UTC (permalink / raw)
  To: tursulin, phasta, dakr, amd-gfx, dri-devel

Sometimes drivers need to be able to submit multiple jobs which depend on
each other to different schedulers at the same time, but using
drm_sched_job_add_dependency() can't fail any more after the first job is
initialized.

This function preallocate memory for dependency slots so that no ENOMEM
can come later while adding dependencies.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/scheduler/sched_main.c | 42 +++++++++++++++++++++++++-
 include/drm/gpu_scheduler.h            |  2 ++
 2 files changed, 43 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index cf200b1b643e..37082f52f43f 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -858,6 +858,43 @@ void drm_sched_job_arm(struct drm_sched_job *job)
 }
 EXPORT_SYMBOL(drm_sched_job_arm);
 
+/**
+ * drm_sched_job_prealloc_dependency_slots - avoid ENOMEM on adding dependencies
+ * @job: scheduler job where dependencies will be added
+ * @num_slots: number of slots to reserve
+  *
+ * Sometimes drivers need to be able to submit multiple jobs which depend on
+ * each other to different schedulers at the same time, but using
+ * drm_sched_job_add_dependency() can't fail any more after the first job is
+ * initialized.
+ *
+ * This function preallocate memory for dependency slots so that no ENOMEM can
+ * come later while adding dependencies.
+ *
+ * Return:
+ * 0 on success, or an error on failing to expand the array.
+ */
+int drm_sched_job_prealloc_dependency_slots(struct drm_sched_job *job,
+					    unsigned int num_slots)
+{
+	int ret;
+	u32 id;
+
+	/*
+	 * This works because NULL entries are not returned by xa_for_each. So
+	 * drm_sched_job_add_dependency() will ignore those while checking for
+	 * duplicates, but can then use the entry when storing the new fence.
+	 */
+	while (num_slots--) {
+		ret = xa_alloc(&job->dependencies, &id, NULL, xa_limit_32b,
+			       GFP_KERNEL);
+		if (ret)
+			return ret;
+	}
+	return 0;
+}
+EXPORT_SYMBOL(drm_sched_job_prealloc_dependency_slots);
+
 /**
  * drm_sched_job_add_dependency - adds the fence as a job dependency
  * @job: scheduler job to add the dependencies to
@@ -883,6 +920,9 @@ int drm_sched_job_add_dependency(struct drm_sched_job *job,
 	 */
 	xa_lock(&job->dependencies);
 	xas_for_each(&xas, entry, ULONG_MAX) {
+		if (xa_is_zero(entry))
+		    break;
+
 		if (entry->context != fence->context)
 			continue;
 
@@ -909,7 +949,7 @@ int drm_sched_job_add_dependency(struct drm_sched_job *job,
 	if (xas_error(&xas))
 		dma_fence_put(fence);
 	else
-		WARN_ON(entry);
+		WARN_ON(entry && !xa_is_zero(entry));
 
 	return xas_error(&xas);
 }
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index d860db087ea5..a560a00c6275 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -632,6 +632,8 @@ int drm_sched_job_init(struct drm_sched_job *job,
 		       u32 credits, void *owner);
 void drm_sched_job_arm(struct drm_sched_job *job);
 void drm_sched_entity_push_job(struct drm_sched_job *sched_job);
+int drm_sched_job_prealloc_dependency_slots(struct drm_sched_job *job,
+					    unsigned int num_slots);
 int drm_sched_job_add_dependency(struct drm_sched_job *job,
 				 struct dma_fence *fence);
 int drm_sched_job_add_syncobj_dependency(struct drm_sched_job *job,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 3/4] drm/sched: Add a test for prealloced fence slots
  2025-05-23 12:56 Fixing AMDGPUs gang submit error handling Christian König
  2025-05-23 12:56 ` [PATCH 1/4] drm/sched: optimize drm_sched_job_add_dependency Christian König
  2025-05-23 12:56 ` [PATCH 2/4] drm/sched: add drm_sched_prealloc_dependency_slots Christian König
@ 2025-05-23 12:56 ` Christian König
  2025-05-23 12:56 ` [PATCH 4/4] drm/amdgpu: fix gang submission error handling Christian König
  3 siblings, 0 replies; 25+ messages in thread
From: Christian König @ 2025-05-23 12:56 UTC (permalink / raw)
  To: tursulin, phasta, dakr, amd-gfx, dri-devel

Just to exercise the functionality.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/scheduler/tests/tests_basic.c | 56 ++++++++++++++++++-
 1 file changed, 55 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/scheduler/tests/tests_basic.c b/drivers/gpu/drm/scheduler/tests/tests_basic.c
index 7230057e0594..00dcee298100 100644
--- a/drivers/gpu/drm/scheduler/tests/tests_basic.c
+++ b/drivers/gpu/drm/scheduler/tests/tests_basic.c
@@ -469,8 +469,62 @@ static struct kunit_suite drm_sched_credits = {
 	.test_cases = drm_sched_credits_tests,
 };
 
+static void drm_sched_test_prealloc(struct kunit *test)
+{
+	struct dma_fence *stub = dma_fence_get_stub();
+	struct drm_mock_sched_entity *entity;
+	struct drm_mock_scheduler *sched;
+	struct drm_mock_sched_job *job;
+	bool done;
+	int ret;
+
+	/*
+	 * Check if preallocation of dependency slots work
+	 */
+
+	sched = drm_mock_sched_new(test, MAX_SCHEDULE_TIMEOUT);
+
+	entity = drm_mock_sched_entity_new(test,
+					   DRM_SCHED_PRIORITY_NORMAL,
+					   sched);
+
+	job = drm_mock_sched_job_new(test, entity);
+
+	ret = drm_sched_job_add_dependency(&job->base, dma_fence_get(stub));
+	KUNIT_ASSERT_EQ(test, ret, 0);
+
+	ret = drm_sched_job_prealloc_dependency_slots(&job->base, 2);
+	KUNIT_ASSERT_EQ(test, ret, 0);
+
+	ret = drm_sched_job_add_dependency(&job->base, dma_fence_get(stub));
+	KUNIT_ASSERT_EQ(test, ret, 0);
+
+	ret = drm_sched_job_add_dependency(&job->base, dma_fence_get(stub));
+	KUNIT_ASSERT_EQ(test, ret, 0);
+
+	drm_mock_sched_job_submit(job);
+
+	done = drm_mock_sched_job_wait_scheduled(job, HZ);
+	KUNIT_ASSERT_TRUE(test, done);
+
+	drm_mock_sched_entity_free(entity);
+	drm_mock_sched_fini(sched);
+	dma_fence_put(stub);
+}
+
+static struct kunit_case drm_sched_prealloc_tests[] = {
+	KUNIT_CASE(drm_sched_test_prealloc),
+	{}
+};
+
+static struct kunit_suite drm_sched_prealloc = {
+	.name = "drm_sched_basic_prealloc_tests",
+	.test_cases = drm_sched_prealloc_tests,
+};
+
 kunit_test_suites(&drm_sched_basic,
 		  &drm_sched_timeout,
 		  &drm_sched_priority,
 		  &drm_sched_modify_sched,
-		  &drm_sched_credits);
+		  &drm_sched_credits,
+		  &drm_sched_prealloc);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 4/4] drm/amdgpu: fix gang submission error handling
  2025-05-23 12:56 Fixing AMDGPUs gang submit error handling Christian König
                   ` (2 preceding siblings ...)
  2025-05-23 12:56 ` [PATCH 3/4] drm/sched: Add a test for prealloced fence slots Christian König
@ 2025-05-23 12:56 ` Christian König
  3 siblings, 0 replies; 25+ messages in thread
From: Christian König @ 2025-05-23 12:56 UTC (permalink / raw)
  To: tursulin, phasta, dakr, amd-gfx, dri-devel

For the unlikely case that we ran into an ENOMEM while fixing up the gang
submission dependencies we can't clean up any more since the gang
members are already armed.

Fix this by using pre-allocated dependency slots and re-ordering the
code, also fix a double unref since the fence reference is also dropped
on error.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 53 ++++++++++++++------------
 1 file changed, 29 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 82df06a72ee0..4728de07315b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1289,36 +1289,21 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
 	uint64_t seq;
 	int r;
 
-	for (i = 0; i < p->gang_size; ++i)
-		drm_sched_job_arm(&p->jobs[i]->base);
-
-	for (i = 0; i < p->gang_size; ++i) {
-		struct dma_fence *fence;
-
-		if (p->jobs[i] == leader)
-			continue;
-
-		fence = &p->jobs[i]->base.s_fence->scheduled;
-		dma_fence_get(fence);
-		r = drm_sched_job_add_dependency(&leader->base, fence);
-		if (r) {
-			dma_fence_put(fence);
-			return r;
-		}
-	}
-
-	if (p->gang_size > 1) {
-		for (i = 0; i < p->gang_size; ++i)
-			amdgpu_job_set_gang_leader(p->jobs[i], leader);
-	}
+	/* Preallocate the memory for the gang dependencies */
+	r = drm_sched_job_prealloc_dependency_slots(&leader->base,
+						    p->gang_size -1);
+	if (r)
+		return r;
 
-	/* No memory allocation is allowed while holding the notifier lock.
+	/*
+	 * No memory allocation is allowed while holding the notifier lock.
 	 * The lock is held until amdgpu_cs_submit is finished and fence is
 	 * added to BOs.
 	 */
 	mutex_lock(&p->adev->notifier_lock);
 
-	/* If userptr are invalidated after amdgpu_cs_parser_bos(), return
+	/*
+	 * If userptr are invalidated after amdgpu_cs_parser_bos(), return
 	 * -EAGAIN, drmIoctl in libdrm will restart the amdgpu_cs_ioctl.
 	 */
 	r = 0;
@@ -1333,6 +1318,26 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
 		return r;
 	}
 
+	for (i = 0; i < p->gang_size; ++i)
+		drm_sched_job_arm(&p->jobs[i]->base);
+
+	for (i = 0; i < p->gang_size; ++i) {
+		struct dma_fence *fence;
+
+		if (p->jobs[i] == leader)
+			continue;
+
+		fence = dma_fence_get(&p->jobs[i]->base.s_fence->scheduled);
+		r = drm_sched_job_add_dependency(&leader->base, fence);
+		/* We have preallocated a slot, so this should never fail */
+		WARN_ON(r);
+	}
+
+	if (p->gang_size > 1) {
+		for (i = 0; i < p->gang_size; ++i)
+			amdgpu_job_set_gang_leader(p->jobs[i], leader);
+	}
+
 	p->fence = dma_fence_get(&leader->base.s_fence->finished);
 	drm_exec_for_each_locked_object(&p->exec, index, gobj) {
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/4] drm/sched: optimize drm_sched_job_add_dependency
  2025-05-23 12:56 ` [PATCH 1/4] drm/sched: optimize drm_sched_job_add_dependency Christian König
@ 2025-05-23 13:49   ` Tvrtko Ursulin
  2025-05-23 14:11   ` Danilo Krummrich
  2025-05-26  7:28   ` Philipp Stanner
  2 siblings, 0 replies; 25+ messages in thread
From: Tvrtko Ursulin @ 2025-05-23 13:49 UTC (permalink / raw)
  To: Christian König, phasta, dakr, amd-gfx, dri-devel


On 23/05/2025 13:56, Christian König wrote:
> It turned out that we can actually massively optimize here.
> 
> The previous code was horrible inefficient since it constantly released
> and re-acquired the lock of the xarray and started each iteration from the
> base of the array to avoid concurrent modification which in our case
> doesn't exist.
> 
> Additional to that the xas_find() and xas_store() functions are explicitly
> made in a way so that you can efficiently check entries and if you don't
> find a match store a new one at the end or replace existing ones.
> 
> So use xas_for_each()/xa_store() instead of xa_for_each()/xa_alloc().
> It's a bit more code, but should be much faster in the end.
> 
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>   drivers/gpu/drm/scheduler/sched_main.c | 29 ++++++++++++++++++--------
>   1 file changed, 20 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index f7118497e47a..cf200b1b643e 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -871,10 +871,8 @@ EXPORT_SYMBOL(drm_sched_job_arm);
>   int drm_sched_job_add_dependency(struct drm_sched_job *job,
>   				 struct dma_fence *fence)
>   {
> +	XA_STATE(xas, &job->dependencies, 0);
>   	struct dma_fence *entry;
> -	unsigned long index;
> -	u32 id = 0;
> -	int ret;
>   
>   	if (!fence)
>   		return 0;
> @@ -883,24 +881,37 @@ int drm_sched_job_add_dependency(struct drm_sched_job *job,
>   	 * This lets the size of the array of deps scale with the number of
>   	 * engines involved, rather than the number of BOs.
>   	 */
> -	xa_for_each(&job->dependencies, index, entry) {
> +	xa_lock(&job->dependencies);
> +	xas_for_each(&xas, entry, ULONG_MAX) {
>   		if (entry->context != fence->context)
>   			continue;
>   
>   		if (dma_fence_is_later(fence, entry)) {
>   			dma_fence_put(entry);
> -			xa_store(&job->dependencies, index, fence, GFP_KERNEL);
> +			xas_store(&xas, fence);
>   		} else {
>   			dma_fence_put(fence);
>   		}
> -		return 0;
> +		xa_unlock(&job->dependencies);
> +		return xas_error(&xas);
>   	}
>   
> -	ret = xa_alloc(&job->dependencies, &id, fence, xa_limit_32b, GFP_KERNEL);
> -	if (ret != 0)
> +retry:
> +	entry = xas_store(&xas, fence);
> +	xa_unlock(&job->dependencies);
> +
> +	/* There shouldn't be any concurrent add, so no need to loop again */
> +	if (xas_nomem(&xas, GFP_KERNEL)) {
> +		xa_lock(&job->dependencies);
> +		goto retry;
> +	}
> +
> +	if (xas_error(&xas))
>   		dma_fence_put(fence);
> +	else
> +		WARN_ON(entry);

Looks good, I cannot spot a high level problem with this approach.

Maybe only tail end of this function could be improved with something 
like this:

...
if (xas_nomem(&xas, GFP_KERNEL)) {
	xa_lock(&job->dependencies);
	goto retry;
}

err = xas_error(&xas);
if (WARN_ON(!err && entry))
	dma_fence_put(entry);
else if (err)
	dma_fence_put(fence);

return err;

Thoughts?


>   
> -	return ret;
> +	return xas_error(&xas);
>   }
>   EXPORT_SYMBOL(drm_sched_job_add_dependency);
>   


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/4] drm/sched: optimize drm_sched_job_add_dependency
  2025-05-23 12:56 ` [PATCH 1/4] drm/sched: optimize drm_sched_job_add_dependency Christian König
  2025-05-23 13:49   ` Tvrtko Ursulin
@ 2025-05-23 14:11   ` Danilo Krummrich
  2025-05-23 14:16     ` Danilo Krummrich
  2025-05-24 11:17     ` Danilo Krummrich
  2025-05-26  7:28   ` Philipp Stanner
  2 siblings, 2 replies; 25+ messages in thread
From: Danilo Krummrich @ 2025-05-23 14:11 UTC (permalink / raw)
  To: Christian König; +Cc: tursulin, phasta, amd-gfx, dri-devel

On Fri, May 23, 2025 at 02:56:40PM +0200, Christian König wrote:
> It turned out that we can actually massively optimize here.
> 
> The previous code was horrible inefficient since it constantly released
> and re-acquired the lock of the xarray and started each iteration from the
> base of the array to avoid concurrent modification which in our case
> doesn't exist.
> 
> Additional to that the xas_find() and xas_store() functions are explicitly
> made in a way so that you can efficiently check entries and if you don't
> find a match store a new one at the end or replace existing ones.
> 
> So use xas_for_each()/xa_store() instead of xa_for_each()/xa_alloc().
> It's a bit more code, but should be much faster in the end.

This commit message does neither explain the motivation of the commit nor what it
does. It describes what instead belongs into the changelog between versions.

Speaking of versioning of the patch series, AFAIK there were previous versions,
but this series was sent as a whole new series -- why?

Please resend with a proper commit message, version and changelog. Thanks!

> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>  drivers/gpu/drm/scheduler/sched_main.c | 29 ++++++++++++++++++--------
>  1 file changed, 20 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index f7118497e47a..cf200b1b643e 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -871,10 +871,8 @@ EXPORT_SYMBOL(drm_sched_job_arm);
>  int drm_sched_job_add_dependency(struct drm_sched_job *job,
>  				 struct dma_fence *fence)
>  {
> +	XA_STATE(xas, &job->dependencies, 0);
>  	struct dma_fence *entry;
> -	unsigned long index;
> -	u32 id = 0;
> -	int ret;
>  
>  	if (!fence)
>  		return 0;
> @@ -883,24 +881,37 @@ int drm_sched_job_add_dependency(struct drm_sched_job *job,
>  	 * This lets the size of the array of deps scale with the number of
>  	 * engines involved, rather than the number of BOs.
>  	 */
> -	xa_for_each(&job->dependencies, index, entry) {
> +	xa_lock(&job->dependencies);
> +	xas_for_each(&xas, entry, ULONG_MAX) {
>  		if (entry->context != fence->context)
>  			continue;
>  
>  		if (dma_fence_is_later(fence, entry)) {
>  			dma_fence_put(entry);
> -			xa_store(&job->dependencies, index, fence, GFP_KERNEL);
> +			xas_store(&xas, fence);
>  		} else {
>  			dma_fence_put(fence);
>  		}
> -		return 0;
> +		xa_unlock(&job->dependencies);
> +		return xas_error(&xas);
>  	}
>  
> -	ret = xa_alloc(&job->dependencies, &id, fence, xa_limit_32b, GFP_KERNEL);
> -	if (ret != 0)
> +retry:
> +	entry = xas_store(&xas, fence);
> +	xa_unlock(&job->dependencies);
> +
> +	/* There shouldn't be any concurrent add, so no need to loop again */

Concurrency shouldn't matter, xas_nomem() stores the pre-allocated memory in the
XA_STATE not the xarray. Hence, I think we should remove the comment.

> +	if (xas_nomem(&xas, GFP_KERNEL)) {
> +		xa_lock(&job->dependencies);
> +		goto retry;

Please don't use a goto here, if we would have failed to allocate memory here,
this would be an endless loop until we succeed eventually. It would be equal to:

	while (!ptr) {
		ptr = kmalloc();
	}

Instead just take the lock and call xas_store() again.

> +	}
> +
> +	if (xas_error(&xas))
>  		dma_fence_put(fence);
> +	else
> +		WARN_ON(entry);

Please don't call WARN_ON() here, this isn't fatal, we only need to return the
error code.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/4] drm/sched: optimize drm_sched_job_add_dependency
  2025-05-23 14:11   ` Danilo Krummrich
@ 2025-05-23 14:16     ` Danilo Krummrich
  2025-05-26  9:25       ` Christian König
  2025-05-24 11:17     ` Danilo Krummrich
  1 sibling, 1 reply; 25+ messages in thread
From: Danilo Krummrich @ 2025-05-23 14:16 UTC (permalink / raw)
  To: Christian König; +Cc: tursulin, phasta, amd-gfx, dri-devel

On Fri, May 23, 2025 at 04:11:39PM +0200, Danilo Krummrich wrote:
> On Fri, May 23, 2025 at 02:56:40PM +0200, Christian König wrote:
> > It turned out that we can actually massively optimize here.
> > 
> > The previous code was horrible inefficient since it constantly released
> > and re-acquired the lock of the xarray and started each iteration from the
> > base of the array to avoid concurrent modification which in our case
> > doesn't exist.
> > 
> > Additional to that the xas_find() and xas_store() functions are explicitly
> > made in a way so that you can efficiently check entries and if you don't
> > find a match store a new one at the end or replace existing ones.
> > 
> > So use xas_for_each()/xa_store() instead of xa_for_each()/xa_alloc().
> > It's a bit more code, but should be much faster in the end.
> 
> This commit message does neither explain the motivation of the commit nor what it
> does. It describes what instead belongs into the changelog between versions.

Sorry, this is wrong. I got confused, the commit message is perfectly fine. :)

The rest still applies though.

> Speaking of versioning of the patch series, AFAIK there were previous versions,
> but this series was sent as a whole new series -- why?
> 
> Please resend with a proper commit message, version and changelog. Thanks!
> 
> > Signed-off-by: Christian König <christian.koenig@amd.com>
> > ---
> >  drivers/gpu/drm/scheduler/sched_main.c | 29 ++++++++++++++++++--------
> >  1 file changed, 20 insertions(+), 9 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> > index f7118497e47a..cf200b1b643e 100644
> > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > @@ -871,10 +871,8 @@ EXPORT_SYMBOL(drm_sched_job_arm);
> >  int drm_sched_job_add_dependency(struct drm_sched_job *job,
> >  				 struct dma_fence *fence)
> >  {
> > +	XA_STATE(xas, &job->dependencies, 0);
> >  	struct dma_fence *entry;
> > -	unsigned long index;
> > -	u32 id = 0;
> > -	int ret;
> >  
> >  	if (!fence)
> >  		return 0;
> > @@ -883,24 +881,37 @@ int drm_sched_job_add_dependency(struct drm_sched_job *job,
> >  	 * This lets the size of the array of deps scale with the number of
> >  	 * engines involved, rather than the number of BOs.
> >  	 */
> > -	xa_for_each(&job->dependencies, index, entry) {
> > +	xa_lock(&job->dependencies);
> > +	xas_for_each(&xas, entry, ULONG_MAX) {
> >  		if (entry->context != fence->context)
> >  			continue;
> >  
> >  		if (dma_fence_is_later(fence, entry)) {
> >  			dma_fence_put(entry);
> > -			xa_store(&job->dependencies, index, fence, GFP_KERNEL);
> > +			xas_store(&xas, fence);
> >  		} else {
> >  			dma_fence_put(fence);
> >  		}
> > -		return 0;
> > +		xa_unlock(&job->dependencies);
> > +		return xas_error(&xas);
> >  	}
> >  
> > -	ret = xa_alloc(&job->dependencies, &id, fence, xa_limit_32b, GFP_KERNEL);
> > -	if (ret != 0)
> > +retry:
> > +	entry = xas_store(&xas, fence);
> > +	xa_unlock(&job->dependencies);
> > +
> > +	/* There shouldn't be any concurrent add, so no need to loop again */
> 
> Concurrency shouldn't matter, xas_nomem() stores the pre-allocated memory in the
> XA_STATE not the xarray. Hence, I think we should remove the comment.
> 
> > +	if (xas_nomem(&xas, GFP_KERNEL)) {
> > +		xa_lock(&job->dependencies);
> > +		goto retry;
> 
> Please don't use a goto here, if we would have failed to allocate memory here,
> this would be an endless loop until we succeed eventually. It would be equal to:
> 
> 	while (!ptr) {
> 		ptr = kmalloc();
> 	}
> 
> Instead just take the lock and call xas_store() again.
> 
> > +	}
> > +
> > +	if (xas_error(&xas))
> >  		dma_fence_put(fence);
> > +	else
> > +		WARN_ON(entry);
> 
> Please don't call WARN_ON() here, this isn't fatal, we only need to return the
> error code.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/4] drm/sched: optimize drm_sched_job_add_dependency
  2025-05-23 14:11   ` Danilo Krummrich
  2025-05-23 14:16     ` Danilo Krummrich
@ 2025-05-24 11:17     ` Danilo Krummrich
  2025-05-26 10:59       ` Christian König
  1 sibling, 1 reply; 25+ messages in thread
From: Danilo Krummrich @ 2025-05-24 11:17 UTC (permalink / raw)
  To: Christian König; +Cc: tursulin, phasta, amd-gfx, dri-devel

On Fri, May 23, 2025 at 04:11:39PM +0200, Danilo Krummrich wrote:
> On Fri, May 23, 2025 at 02:56:40PM +0200, Christian König wrote:
> > +	if (xas_nomem(&xas, GFP_KERNEL)) {
> > +		xa_lock(&job->dependencies);
> > +		goto retry;
> 
> Please don't use a goto here, if we would have failed to allocate memory here,
> this would be an endless loop until we succeed eventually.

I think I got confused by xas_nomem() returning true meaning that memory was
needed and was successfully allocated (which can be a bit counterintuitive).

So, your code here should be correct. However, I'd still remove the goto and
just call xas_store() again. There's no reason to make this a loop and backwards
goto is better avoided anyways. :)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/4] drm/sched: optimize drm_sched_job_add_dependency
  2025-05-23 12:56 ` [PATCH 1/4] drm/sched: optimize drm_sched_job_add_dependency Christian König
  2025-05-23 13:49   ` Tvrtko Ursulin
  2025-05-23 14:11   ` Danilo Krummrich
@ 2025-05-26  7:28   ` Philipp Stanner
  2 siblings, 0 replies; 25+ messages in thread
From: Philipp Stanner @ 2025-05-26  7:28 UTC (permalink / raw)
  To: Christian König, tursulin, dakr, amd-gfx, dri-devel

On Fri, 2025-05-23 at 14:56 +0200, Christian König wrote:
> It turned out that we can actually massively optimize here.
> 
> The previous code was horrible inefficient since it constantly
> released
> and re-acquired the lock of the xarray and started each iteration
> from the
> base of the array to avoid concurrent modification which in our case
> doesn't exist.
> 
> Additional to that the xas_find() and xas_store() functions are
> explicitly
> made in a way so that you can efficiently check entries and if you
> don't
> find a match store a new one at the end or replace existing ones.
> 
> So use xas_for_each()/xa_store() instead of xa_for_each()/xa_alloc().
> It's a bit more code, but should be much faster in the end.
> 
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>  drivers/gpu/drm/scheduler/sched_main.c | 29 ++++++++++++++++++------
> --
>  1 file changed, 20 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> b/drivers/gpu/drm/scheduler/sched_main.c
> index f7118497e47a..cf200b1b643e 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -871,10 +871,8 @@ EXPORT_SYMBOL(drm_sched_job_arm);
>  int drm_sched_job_add_dependency(struct drm_sched_job *job,
>  				 struct dma_fence *fence)
>  {
> +	XA_STATE(xas, &job->dependencies, 0);
>  	struct dma_fence *entry;
> -	unsigned long index;
> -	u32 id = 0;
> -	int ret;
>  
>  	if (!fence)
>  		return 0;
> @@ -883,24 +881,37 @@ int drm_sched_job_add_dependency(struct
> drm_sched_job *job,
>  	 * This lets the size of the array of deps scale with the
> number of
>  	 * engines involved, rather than the number of BOs.
>  	 */
> -	xa_for_each(&job->dependencies, index, entry) {
> +	xa_lock(&job->dependencies);
> +	xas_for_each(&xas, entry, ULONG_MAX) {
>  		if (entry->context != fence->context)
>  			continue;
>  
>  		if (dma_fence_is_later(fence, entry)) {
>  			dma_fence_put(entry);
> -			xa_store(&job->dependencies, index, fence,
> GFP_KERNEL);
> +			xas_store(&xas, fence);
>  		} else {
>  			dma_fence_put(fence);
>  		}
> -		return 0;
> +		xa_unlock(&job->dependencies);
> +		return xas_error(&xas);
>  	}
>  
> -	ret = xa_alloc(&job->dependencies, &id, fence, xa_limit_32b,
> GFP_KERNEL);
> -	if (ret != 0)
> +retry:
> +	entry = xas_store(&xas, fence);
> +	xa_unlock(&job->dependencies);
> +
> +	/* There shouldn't be any concurrent add, so no need to loop
> again */

Should we maybe add it to the function documentation that this must not
be called concurrently?

Looks to me as if the current version were already broken if someone
does that. So maybe is also OK to just leave it as is.


P.


> +	if (xas_nomem(&xas, GFP_KERNEL)) {
> +		xa_lock(&job->dependencies);
> +		goto retry;
> +	}
> +
> +	if (xas_error(&xas))
>  		dma_fence_put(fence);
> +	else
> +		WARN_ON(entry);
>  
> -	return ret;
> +	return xas_error(&xas);
>  }
>  EXPORT_SYMBOL(drm_sched_job_add_dependency);
>  


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/4] drm/sched: optimize drm_sched_job_add_dependency
  2025-05-23 14:16     ` Danilo Krummrich
@ 2025-05-26  9:25       ` Christian König
  2025-05-26  9:34         ` Philipp Stanner
  0 siblings, 1 reply; 25+ messages in thread
From: Christian König @ 2025-05-26  9:25 UTC (permalink / raw)
  To: Danilo Krummrich; +Cc: tursulin, phasta, amd-gfx, dri-devel

On 5/23/25 16:16, Danilo Krummrich wrote:
> On Fri, May 23, 2025 at 04:11:39PM +0200, Danilo Krummrich wrote:
>> On Fri, May 23, 2025 at 02:56:40PM +0200, Christian König wrote:
>>> It turned out that we can actually massively optimize here.
>>>
>>> The previous code was horrible inefficient since it constantly released
>>> and re-acquired the lock of the xarray and started each iteration from the
>>> base of the array to avoid concurrent modification which in our case
>>> doesn't exist.
>>>
>>> Additional to that the xas_find() and xas_store() functions are explicitly
>>> made in a way so that you can efficiently check entries and if you don't
>>> find a match store a new one at the end or replace existing ones.
>>>
>>> So use xas_for_each()/xa_store() instead of xa_for_each()/xa_alloc().
>>> It's a bit more code, but should be much faster in the end.
>>
>> This commit message does neither explain the motivation of the commit nor what it
>> does. It describes what instead belongs into the changelog between versions.
> 
> Sorry, this is wrong. I got confused, the commit message is perfectly fine. :)
> 
> The rest still applies though.
> 
>> Speaking of versioning of the patch series, AFAIK there were previous versions,
>> but this series was sent as a whole new series -- why?
>>
>> Please resend with a proper commit message, version and changelog. Thanks!


Well Philip asked to remove the changelog. I'm happy to bring it back, but yeah...

Regards,
Christian.

>>
>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>> ---
>>>  drivers/gpu/drm/scheduler/sched_main.c | 29 ++++++++++++++++++--------
>>>  1 file changed, 20 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>>> index f7118497e47a..cf200b1b643e 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>> @@ -871,10 +871,8 @@ EXPORT_SYMBOL(drm_sched_job_arm);
>>>  int drm_sched_job_add_dependency(struct drm_sched_job *job,
>>>  				 struct dma_fence *fence)
>>>  {
>>> +	XA_STATE(xas, &job->dependencies, 0);
>>>  	struct dma_fence *entry;
>>> -	unsigned long index;
>>> -	u32 id = 0;
>>> -	int ret;
>>>  
>>>  	if (!fence)
>>>  		return 0;
>>> @@ -883,24 +881,37 @@ int drm_sched_job_add_dependency(struct drm_sched_job *job,
>>>  	 * This lets the size of the array of deps scale with the number of
>>>  	 * engines involved, rather than the number of BOs.
>>>  	 */
>>> -	xa_for_each(&job->dependencies, index, entry) {
>>> +	xa_lock(&job->dependencies);
>>> +	xas_for_each(&xas, entry, ULONG_MAX) {
>>>  		if (entry->context != fence->context)
>>>  			continue;
>>>  
>>>  		if (dma_fence_is_later(fence, entry)) {
>>>  			dma_fence_put(entry);
>>> -			xa_store(&job->dependencies, index, fence, GFP_KERNEL);
>>> +			xas_store(&xas, fence);
>>>  		} else {
>>>  			dma_fence_put(fence);
>>>  		}
>>> -		return 0;
>>> +		xa_unlock(&job->dependencies);
>>> +		return xas_error(&xas);
>>>  	}
>>>  
>>> -	ret = xa_alloc(&job->dependencies, &id, fence, xa_limit_32b, GFP_KERNEL);
>>> -	if (ret != 0)
>>> +retry:
>>> +	entry = xas_store(&xas, fence);
>>> +	xa_unlock(&job->dependencies);
>>> +
>>> +	/* There shouldn't be any concurrent add, so no need to loop again */
>>
>> Concurrency shouldn't matter, xas_nomem() stores the pre-allocated memory in the
>> XA_STATE not the xarray. Hence, I think we should remove the comment.
>>
>>> +	if (xas_nomem(&xas, GFP_KERNEL)) {
>>> +		xa_lock(&job->dependencies);
>>> +		goto retry;
>>
>> Please don't use a goto here, if we would have failed to allocate memory here,
>> this would be an endless loop until we succeed eventually. It would be equal to:
>>
>> 	while (!ptr) {
>> 		ptr = kmalloc();
>> 	}
>>
>> Instead just take the lock and call xas_store() again.
>>
>>> +	}
>>> +
>>> +	if (xas_error(&xas))
>>>  		dma_fence_put(fence);
>>> +	else
>>> +		WARN_ON(entry);
>>
>> Please don't call WARN_ON() here, this isn't fatal, we only need to return the
>> error code.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/4] drm/sched: optimize drm_sched_job_add_dependency
  2025-05-26  9:25       ` Christian König
@ 2025-05-26  9:34         ` Philipp Stanner
  2025-05-26 11:16           ` Christian König
  0 siblings, 1 reply; 25+ messages in thread
From: Philipp Stanner @ 2025-05-26  9:34 UTC (permalink / raw)
  To: Christian König, Danilo Krummrich; +Cc: tursulin, amd-gfx, dri-devel

On Mon, 2025-05-26 at 11:25 +0200, Christian König wrote:
> On 5/23/25 16:16, Danilo Krummrich wrote:
> > On Fri, May 23, 2025 at 04:11:39PM +0200, Danilo Krummrich wrote:
> > > On Fri, May 23, 2025 at 02:56:40PM +0200, Christian König wrote:
> > > > It turned out that we can actually massively optimize here.
> > > > 
> > > > The previous code was horrible inefficient since it constantly
> > > > released
> > > > and re-acquired the lock of the xarray and started each
> > > > iteration from the
> > > > base of the array to avoid concurrent modification which in our
> > > > case
> > > > doesn't exist.
> > > > 
> > > > Additional to that the xas_find() and xas_store() functions are
> > > > explicitly
> > > > made in a way so that you can efficiently check entries and if
> > > > you don't
> > > > find a match store a new one at the end or replace existing
> > > > ones.
> > > > 
> > > > So use xas_for_each()/xa_store() instead of
> > > > xa_for_each()/xa_alloc().
> > > > It's a bit more code, but should be much faster in the end.
> > > 
> > > This commit message does neither explain the motivation of the
> > > commit nor what it
> > > does. It describes what instead belongs into the changelog
> > > between versions.
> > 
> > Sorry, this is wrong. I got confused, the commit message is
> > perfectly fine. :)
> > 
> > The rest still applies though.
> > 
> > > Speaking of versioning of the patch series, AFAIK there were
> > > previous versions,
> > > but this series was sent as a whole new series -- why?
> > > 
> > > Please resend with a proper commit message, version and
> > > changelog. Thanks!
> 
> 
> Well Philip asked to remove the changelog. I'm happy to bring it
> back, but yeah...

No no no no :D

Philipp asked for the changelog to be removed *from the git commit
message*; because it doesn't belong / isn't useful there.

If there's a cover letter, the changelog should be in the cover letter.
If there's no cover letter, it should be between the --- separators:


Signed-off-by: Gordon Freeman <freeman@blackmesa.org>
Reviewed-by: Alyx Vance <alyx@vance.edu>
---
Changes in v2:
  - Provide more docu for crowbar-alloc-function.
  - Use NULL pointers for reserved xarray entries
---
<DIFF>


P.


> 
> Regards,
> Christian.
> 
> > > 
> > > > Signed-off-by: Christian König <christian.koenig@amd.com>
> > > > ---
> > > >  drivers/gpu/drm/scheduler/sched_main.c | 29
> > > > ++++++++++++++++++--------
> > > >  1 file changed, 20 insertions(+), 9 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> > > > b/drivers/gpu/drm/scheduler/sched_main.c
> > > > index f7118497e47a..cf200b1b643e 100644
> > > > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > > > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > > > @@ -871,10 +871,8 @@ EXPORT_SYMBOL(drm_sched_job_arm);
> > > >  int drm_sched_job_add_dependency(struct drm_sched_job *job,
> > > >  				 struct dma_fence *fence)
> > > >  {
> > > > +	XA_STATE(xas, &job->dependencies, 0);
> > > >  	struct dma_fence *entry;
> > > > -	unsigned long index;
> > > > -	u32 id = 0;
> > > > -	int ret;
> > > >  
> > > >  	if (!fence)
> > > >  		return 0;
> > > > @@ -883,24 +881,37 @@ int drm_sched_job_add_dependency(struct
> > > > drm_sched_job *job,
> > > >  	 * This lets the size of the array of deps scale with
> > > > the number of
> > > >  	 * engines involved, rather than the number of BOs.
> > > >  	 */
> > > > -	xa_for_each(&job->dependencies, index, entry) {
> > > > +	xa_lock(&job->dependencies);
> > > > +	xas_for_each(&xas, entry, ULONG_MAX) {
> > > >  		if (entry->context != fence->context)
> > > >  			continue;
> > > >  
> > > >  		if (dma_fence_is_later(fence, entry)) {
> > > >  			dma_fence_put(entry);
> > > > -			xa_store(&job->dependencies, index,
> > > > fence, GFP_KERNEL);
> > > > +			xas_store(&xas, fence);
> > > >  		} else {
> > > >  			dma_fence_put(fence);
> > > >  		}
> > > > -		return 0;
> > > > +		xa_unlock(&job->dependencies);
> > > > +		return xas_error(&xas);
> > > >  	}
> > > >  
> > > > -	ret = xa_alloc(&job->dependencies, &id, fence,
> > > > xa_limit_32b, GFP_KERNEL);
> > > > -	if (ret != 0)
> > > > +retry:
> > > > +	entry = xas_store(&xas, fence);
> > > > +	xa_unlock(&job->dependencies);
> > > > +
> > > > +	/* There shouldn't be any concurrent add, so no need
> > > > to loop again */
> > > 
> > > Concurrency shouldn't matter, xas_nomem() stores the pre-
> > > allocated memory in the
> > > XA_STATE not the xarray. Hence, I think we should remove the
> > > comment.
> > > 
> > > > +	if (xas_nomem(&xas, GFP_KERNEL)) {
> > > > +		xa_lock(&job->dependencies);
> > > > +		goto retry;
> > > 
> > > Please don't use a goto here, if we would have failed to allocate
> > > memory here,
> > > this would be an endless loop until we succeed eventually. It
> > > would be equal to:
> > > 
> > > 	while (!ptr) {
> > > 		ptr = kmalloc();
> > > 	}
> > > 
> > > Instead just take the lock and call xas_store() again.
> > > 
> > > > +	}
> > > > +
> > > > +	if (xas_error(&xas))
> > > >  		dma_fence_put(fence);
> > > > +	else
> > > > +		WARN_ON(entry);
> > > 
> > > Please don't call WARN_ON() here, this isn't fatal, we only need
> > > to return the
> > > error code.
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/4] drm/sched: optimize drm_sched_job_add_dependency
  2025-05-24 11:17     ` Danilo Krummrich
@ 2025-05-26 10:59       ` Christian König
  2025-05-26 11:14         ` Danilo Krummrich
  0 siblings, 1 reply; 25+ messages in thread
From: Christian König @ 2025-05-26 10:59 UTC (permalink / raw)
  To: Danilo Krummrich; +Cc: tursulin, phasta, amd-gfx, dri-devel

On 5/24/25 13:17, Danilo Krummrich wrote:
> On Fri, May 23, 2025 at 04:11:39PM +0200, Danilo Krummrich wrote:
>> On Fri, May 23, 2025 at 02:56:40PM +0200, Christian König wrote:
>>> +	if (xas_nomem(&xas, GFP_KERNEL)) {
>>> +		xa_lock(&job->dependencies);
>>> +		goto retry;
>>
>> Please don't use a goto here, if we would have failed to allocate memory here,
>> this would be an endless loop until we succeed eventually.
> 
> I think I got confused by xas_nomem() returning true meaning that memory was
> needed and was successfully allocated (which can be a bit counterintuitive).


Yeah, I had to take a look at the samples/tests to understand that as well.

> 
> So, your code here should be correct. However, I'd still remove the goto and
> just call xas_store() again. There's no reason to make this a loop and backwards
> goto is better avoided anyways. :)


I was considering that as well, but than abandoned this idea. The xarray() sample code and test cases as well as the use cases where I took a look either use a loop or a goto.

I'm not 100% sure why, my suspicion is that you need the loop when there can be concurrent add/remove operations on the xarray, but I think we should stick with the common approach.

Regards,
Christian. 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/4] drm/sched: optimize drm_sched_job_add_dependency
  2025-05-26 10:59       ` Christian König
@ 2025-05-26 11:14         ` Danilo Krummrich
  2025-05-26 11:36           ` Christian König
  0 siblings, 1 reply; 25+ messages in thread
From: Danilo Krummrich @ 2025-05-26 11:14 UTC (permalink / raw)
  To: Christian König, i
  Cc: tursulin, phasta, amd-gfx, dri-devel, Matthew Wilcox

(Cc: Matthew)

Let's get this clarified to not work with assumptions. :)

On Mon, May 26, 2025 at 12:59:41PM +0200, Christian König wrote:
> On 5/24/25 13:17, Danilo Krummrich wrote:
> > On Fri, May 23, 2025 at 04:11:39PM +0200, Danilo Krummrich wrote:
> > So, your code here should be correct. However, I'd still remove the goto and
> > just call xas_store() again. There's no reason to make this a loop and backwards
> > goto is better avoided anyways. :)
> 
> 
> I was considering that as well, but than abandoned this idea. The xarray() sample code and test cases as well as the use cases where I took a look either use a loop or a goto.
> 
> I'm not 100% sure why, my suspicion is that you need the loop when there can be concurrent add/remove operations on the xarray, but I think we should stick with the common approach.

I don't think that concurrency is relevant here. xas_nomem() stores the
allocated memory in the XA_STATE structure, which is on the stack.

I know that for maple tree a pre-allocation is only valid for the exact state
of the tree it was allocated for.

But I think xarray does not have this limitation, i.e. concurrent modification
of the xarray does not invalidate an xas_nomem() allocation in terms of being
sufficient for a subsequent store.

@Matthew: Can you clarify this please?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/4] drm/sched: optimize drm_sched_job_add_dependency
  2025-05-26  9:34         ` Philipp Stanner
@ 2025-05-26 11:16           ` Christian König
  2025-05-26 11:27             ` Philipp Stanner
  0 siblings, 1 reply; 25+ messages in thread
From: Christian König @ 2025-05-26 11:16 UTC (permalink / raw)
  To: phasta, Danilo Krummrich; +Cc: tursulin, amd-gfx, dri-devel

On 5/26/25 11:34, Philipp Stanner wrote:
> On Mon, 2025-05-26 at 11:25 +0200, Christian König wrote:
>> On 5/23/25 16:16, Danilo Krummrich wrote:
>>> On Fri, May 23, 2025 at 04:11:39PM +0200, Danilo Krummrich wrote:
>>>> On Fri, May 23, 2025 at 02:56:40PM +0200, Christian König wrote:
>>>>> It turned out that we can actually massively optimize here.
>>>>>
>>>>> The previous code was horrible inefficient since it constantly
>>>>> released
>>>>> and re-acquired the lock of the xarray and started each
>>>>> iteration from the
>>>>> base of the array to avoid concurrent modification which in our
>>>>> case
>>>>> doesn't exist.
>>>>>
>>>>> Additional to that the xas_find() and xas_store() functions are
>>>>> explicitly
>>>>> made in a way so that you can efficiently check entries and if
>>>>> you don't
>>>>> find a match store a new one at the end or replace existing
>>>>> ones.
>>>>>
>>>>> So use xas_for_each()/xa_store() instead of
>>>>> xa_for_each()/xa_alloc().
>>>>> It's a bit more code, but should be much faster in the end.
>>>>
>>>> This commit message does neither explain the motivation of the
>>>> commit nor what it
>>>> does. It describes what instead belongs into the changelog
>>>> between versions.
>>>
>>> Sorry, this is wrong. I got confused, the commit message is
>>> perfectly fine. :)
>>>
>>> The rest still applies though.
>>>
>>>> Speaking of versioning of the patch series, AFAIK there were
>>>> previous versions,
>>>> but this series was sent as a whole new series -- why?
>>>>
>>>> Please resend with a proper commit message, version and
>>>> changelog. Thanks!
>>
>>
>> Well Philip asked to remove the changelog. I'm happy to bring it
>> back, but yeah...
> 
> No no no no :D
> 
> Philipp asked for the changelog to be removed *from the git commit
> message*; because it doesn't belong / isn't useful there.
> 
> If there's a cover letter, the changelog should be in the cover letter.
> If there's no cover letter, it should be between the --- separators:

I can live with that, just clearly state what you want.

For DRM the ask is often to keep the changelog in the commit message or remove it entirely.

Regards,
Christian.

> 
> 
> Signed-off-by: Gordon Freeman <freeman@blackmesa.org>
> Reviewed-by: Alyx Vance <alyx@vance.edu>
> ---
> Changes in v2:
>   - Provide more docu for crowbar-alloc-function.
>   - Use NULL pointers for reserved xarray entries
> ---
> <DIFF>
> 
> 
> P.
> 
> 
>>
>> Regards,
>> Christian.
>>
>>>>
>>>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>>>> ---
>>>>>  drivers/gpu/drm/scheduler/sched_main.c | 29
>>>>> ++++++++++++++++++--------
>>>>>  1 file changed, 20 insertions(+), 9 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>>>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>>>> index f7118497e47a..cf200b1b643e 100644
>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>>> @@ -871,10 +871,8 @@ EXPORT_SYMBOL(drm_sched_job_arm);
>>>>>  int drm_sched_job_add_dependency(struct drm_sched_job *job,
>>>>>  				 struct dma_fence *fence)
>>>>>  {
>>>>> +	XA_STATE(xas, &job->dependencies, 0);
>>>>>  	struct dma_fence *entry;
>>>>> -	unsigned long index;
>>>>> -	u32 id = 0;
>>>>> -	int ret;
>>>>>  
>>>>>  	if (!fence)
>>>>>  		return 0;
>>>>> @@ -883,24 +881,37 @@ int drm_sched_job_add_dependency(struct
>>>>> drm_sched_job *job,
>>>>>  	 * This lets the size of the array of deps scale with
>>>>> the number of
>>>>>  	 * engines involved, rather than the number of BOs.
>>>>>  	 */
>>>>> -	xa_for_each(&job->dependencies, index, entry) {
>>>>> +	xa_lock(&job->dependencies);
>>>>> +	xas_for_each(&xas, entry, ULONG_MAX) {
>>>>>  		if (entry->context != fence->context)
>>>>>  			continue;
>>>>>  
>>>>>  		if (dma_fence_is_later(fence, entry)) {
>>>>>  			dma_fence_put(entry);
>>>>> -			xa_store(&job->dependencies, index,
>>>>> fence, GFP_KERNEL);
>>>>> +			xas_store(&xas, fence);
>>>>>  		} else {
>>>>>  			dma_fence_put(fence);
>>>>>  		}
>>>>> -		return 0;
>>>>> +		xa_unlock(&job->dependencies);
>>>>> +		return xas_error(&xas);
>>>>>  	}
>>>>>  
>>>>> -	ret = xa_alloc(&job->dependencies, &id, fence,
>>>>> xa_limit_32b, GFP_KERNEL);
>>>>> -	if (ret != 0)
>>>>> +retry:
>>>>> +	entry = xas_store(&xas, fence);
>>>>> +	xa_unlock(&job->dependencies);
>>>>> +
>>>>> +	/* There shouldn't be any concurrent add, so no need
>>>>> to loop again */
>>>>
>>>> Concurrency shouldn't matter, xas_nomem() stores the pre-
>>>> allocated memory in the
>>>> XA_STATE not the xarray. Hence, I think we should remove the
>>>> comment.
>>>>
>>>>> +	if (xas_nomem(&xas, GFP_KERNEL)) {
>>>>> +		xa_lock(&job->dependencies);
>>>>> +		goto retry;
>>>>
>>>> Please don't use a goto here, if we would have failed to allocate
>>>> memory here,
>>>> this would be an endless loop until we succeed eventually. It
>>>> would be equal to:
>>>>
>>>> 	while (!ptr) {
>>>> 		ptr = kmalloc();
>>>> 	}
>>>>
>>>> Instead just take the lock and call xas_store() again.
>>>>
>>>>> +	}
>>>>> +
>>>>> +	if (xas_error(&xas))
>>>>>  		dma_fence_put(fence);
>>>>> +	else
>>>>> +		WARN_ON(entry);
>>>>
>>>> Please don't call WARN_ON() here, this isn't fatal, we only need
>>>> to return the
>>>> error code.
>>
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/4] drm/sched: optimize drm_sched_job_add_dependency
  2025-05-26 11:16           ` Christian König
@ 2025-05-26 11:27             ` Philipp Stanner
  2025-05-28 12:30               ` Simona Vetter
  0 siblings, 1 reply; 25+ messages in thread
From: Philipp Stanner @ 2025-05-26 11:27 UTC (permalink / raw)
  To: Christian König, phasta, Danilo Krummrich
  Cc: tursulin, amd-gfx, dri-devel

On Mon, 2025-05-26 at 13:16 +0200, Christian König wrote:
> On 5/26/25 11:34, Philipp Stanner wrote:
> > On Mon, 2025-05-26 at 11:25 +0200, Christian König wrote:
> > > On 5/23/25 16:16, Danilo Krummrich wrote:
> > > > On Fri, May 23, 2025 at 04:11:39PM +0200, Danilo Krummrich
> > > > wrote:
> > > > > On Fri, May 23, 2025 at 02:56:40PM +0200, Christian König
> > > > > wrote:
> > > > > > It turned out that we can actually massively optimize here.
> > > > > > 
> > > > > > The previous code was horrible inefficient since it
> > > > > > constantly
> > > > > > released
> > > > > > and re-acquired the lock of the xarray and started each
> > > > > > iteration from the
> > > > > > base of the array to avoid concurrent modification which in
> > > > > > our
> > > > > > case
> > > > > > doesn't exist.
> > > > > > 
> > > > > > Additional to that the xas_find() and xas_store() functions
> > > > > > are
> > > > > > explicitly
> > > > > > made in a way so that you can efficiently check entries and
> > > > > > if
> > > > > > you don't
> > > > > > find a match store a new one at the end or replace existing
> > > > > > ones.
> > > > > > 
> > > > > > So use xas_for_each()/xa_store() instead of
> > > > > > xa_for_each()/xa_alloc().
> > > > > > It's a bit more code, but should be much faster in the end.
> > > > > 
> > > > > This commit message does neither explain the motivation of
> > > > > the
> > > > > commit nor what it
> > > > > does. It describes what instead belongs into the changelog
> > > > > between versions.
> > > > 
> > > > Sorry, this is wrong. I got confused, the commit message is
> > > > perfectly fine. :)
> > > > 
> > > > The rest still applies though.
> > > > 
> > > > > Speaking of versioning of the patch series, AFAIK there were
> > > > > previous versions,
> > > > > but this series was sent as a whole new series -- why?
> > > > > 
> > > > > Please resend with a proper commit message, version and
> > > > > changelog. Thanks!
> > > 
> > > 
> > > Well Philip asked to remove the changelog. I'm happy to bring it
> > > back, but yeah...
> > 
> > No no no no :D
> > 
> > Philipp asked for the changelog to be removed *from the git commit
> > message*; because it doesn't belong / isn't useful there.
> > 
> > If there's a cover letter, the changelog should be in the cover
> > letter.
> > If there's no cover letter, it should be between the ---
> > separators:
> 
> I can live with that, just clearly state what you want.

Sure thing:

 * Patches and patch series's should contain their version identifier
   within the square brackets [PATCH v3]. git format-patch -v3 does
   that automatically.
 * Changelog should be as described above
 * Ideally, cover letters always contain the full changelog, v2, v3 and
   so on, so that new readers get a sense of the evolution of the
   series.

> 
> For DRM the ask is often to keep the changelog in the commit message
> or remove it entirely.

Yup, I've seen that a few times. I think we, the DRM community, should
stop that. It's just not useful and makes the commit messages larger,
both for the human reader while scrolling, as for the hard drive
regarding storage size


Thx
P.


> 
> Regards,
> Christian.
> 
> > 
> > 
> > Signed-off-by: Gordon Freeman <freeman@blackmesa.org>
> > Reviewed-by: Alyx Vance <alyx@vance.edu>
> > ---
> > Changes in v2:
> >   - Provide more docu for crowbar-alloc-function.
> >   - Use NULL pointers for reserved xarray entries
> > ---
> > <DIFF>
> > 
> > 
> > P.
> > 
> > 
> > > 
> > > Regards,
> > > Christian.
> > > 
> > > > > 
> > > > > > Signed-off-by: Christian König <christian.koenig@amd.com>
> > > > > > ---
> > > > > >  drivers/gpu/drm/scheduler/sched_main.c | 29
> > > > > > ++++++++++++++++++--------
> > > > > >  1 file changed, 20 insertions(+), 9 deletions(-)
> > > > > > 
> > > > > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> > > > > > b/drivers/gpu/drm/scheduler/sched_main.c
> > > > > > index f7118497e47a..cf200b1b643e 100644
> > > > > > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > > > > > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > > > > > @@ -871,10 +871,8 @@ EXPORT_SYMBOL(drm_sched_job_arm);
> > > > > >  int drm_sched_job_add_dependency(struct drm_sched_job
> > > > > > *job,
> > > > > >   struct dma_fence *fence)
> > > > > >  {
> > > > > > + XA_STATE(xas, &job->dependencies, 0);
> > > > > >   struct dma_fence *entry;
> > > > > > - unsigned long index;
> > > > > > - u32 id = 0;
> > > > > > - int ret;
> > > > > >  
> > > > > >   if (!fence)
> > > > > >   return 0;
> > > > > > @@ -883,24 +881,37 @@ int
> > > > > > drm_sched_job_add_dependency(struct
> > > > > > drm_sched_job *job,
> > > > > >   * This lets the size of the array of deps scale with
> > > > > > the number of
> > > > > >   * engines involved, rather than the number of BOs.
> > > > > >   */
> > > > > > - xa_for_each(&job->dependencies, index, entry) {
> > > > > > + xa_lock(&job->dependencies);
> > > > > > + xas_for_each(&xas, entry, ULONG_MAX) {
> > > > > >   if (entry->context != fence->context)
> > > > > >   continue;
> > > > > >  
> > > > > >   if (dma_fence_is_later(fence, entry)) {
> > > > > >   dma_fence_put(entry);
> > > > > > - xa_store(&job->dependencies, index,
> > > > > > fence, GFP_KERNEL);
> > > > > > + xas_store(&xas, fence);
> > > > > >   } else {
> > > > > >   dma_fence_put(fence);
> > > > > >   }
> > > > > > - return 0;
> > > > > > + xa_unlock(&job->dependencies);
> > > > > > + return xas_error(&xas);
> > > > > >   }
> > > > > >  
> > > > > > - ret = xa_alloc(&job->dependencies, &id, fence,
> > > > > > xa_limit_32b, GFP_KERNEL);
> > > > > > - if (ret != 0)
> > > > > > +retry:
> > > > > > + entry = xas_store(&xas, fence);
> > > > > > + xa_unlock(&job->dependencies);
> > > > > > +
> > > > > > + /* There shouldn't be any concurrent add, so no need
> > > > > > to loop again */
> > > > > 
> > > > > Concurrency shouldn't matter, xas_nomem() stores the pre-
> > > > > allocated memory in the
> > > > > XA_STATE not the xarray. Hence, I think we should remove the
> > > > > comment.
> > > > > 
> > > > > > + if (xas_nomem(&xas, GFP_KERNEL)) {
> > > > > > + xa_lock(&job->dependencies);
> > > > > > + goto retry;
> > > > > 
> > > > > Please don't use a goto here, if we would have failed to
> > > > > allocate
> > > > > memory here,
> > > > > this would be an endless loop until we succeed eventually. It
> > > > > would be equal to:
> > > > > 
> > > > > while (!ptr) {
> > > > > ptr = kmalloc();
> > > > > }
> > > > > 
> > > > > Instead just take the lock and call xas_store() again.
> > > > > 
> > > > > > + }
> > > > > > +
> > > > > > + if (xas_error(&xas))
> > > > > >   dma_fence_put(fence);
> > > > > > + else
> > > > > > + WARN_ON(entry);
> > > > > 
> > > > > Please don't call WARN_ON() here, this isn't fatal, we only
> > > > > need
> > > > > to return the
> > > > > error code.
> > > 
> > 
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/4] drm/sched: optimize drm_sched_job_add_dependency
  2025-05-26 11:14         ` Danilo Krummrich
@ 2025-05-26 11:36           ` Christian König
  0 siblings, 0 replies; 25+ messages in thread
From: Christian König @ 2025-05-26 11:36 UTC (permalink / raw)
  To: Danilo Krummrich, Matthew Wilcox; +Cc: tursulin, phasta, amd-gfx, dri-devel

On 5/26/25 13:14, Danilo Krummrich wrote:
> (Cc: Matthew)
> 
> Let's get this clarified to not work with assumptions. :)
> 
> On Mon, May 26, 2025 at 12:59:41PM +0200, Christian König wrote:
>> On 5/24/25 13:17, Danilo Krummrich wrote:
>>> On Fri, May 23, 2025 at 04:11:39PM +0200, Danilo Krummrich wrote:
>>> So, your code here should be correct. However, I'd still remove the goto and
>>> just call xas_store() again. There's no reason to make this a loop and backwards
>>> goto is better avoided anyways. :)
>>
>>
>> I was considering that as well, but than abandoned this idea. The xarray() sample code and test cases as well as the use cases where I took a look either use a loop or a goto.
>>
>> I'm not 100% sure why, my suspicion is that you need the loop when there can be concurrent add/remove operations on the xarray, but I think we should stick with the common approach.
> 
> I don't think that concurrency is relevant here. xas_nomem() stores the
> allocated memory in the XA_STATE structure, which is on the stack.

Yeah, but it could be that when you have a concurrent remove that this memory isn't sufficient and you need even more.

> I know that for maple tree a pre-allocation is only valid for the exact state
> of the tree it was allocated for.
> 
> But I think xarray does not have this limitation, i.e. concurrent modification
> of the xarray does not invalidate an xas_nomem() allocation in terms of being
> sufficient for a subsequent store.
> 
> @Matthew: Can you clarify this please?

While at it: Is it possible to run the xarray without locking?

We have exactly one thread handling the job at all times, so no concurrent accesses whatsoever.

Acquiring and dropping the lock is just additional overhead and without it the memory can also be allocated just with GFP_KERNEL.

Regards,
Christian.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/4] drm/sched: optimize drm_sched_job_add_dependency
  2025-05-26 11:27             ` Philipp Stanner
@ 2025-05-28 12:30               ` Simona Vetter
  2025-05-28 13:24                 ` Christian König
  2025-05-28 13:29                 ` Alex Deucher
  0 siblings, 2 replies; 25+ messages in thread
From: Simona Vetter @ 2025-05-28 12:30 UTC (permalink / raw)
  To: phasta; +Cc: Christian König, Danilo Krummrich, tursulin, amd-gfx,
	dri-devel

On Mon, May 26, 2025 at 01:27:28PM +0200, Philipp Stanner wrote:
> On Mon, 2025-05-26 at 13:16 +0200, Christian König wrote:
> > On 5/26/25 11:34, Philipp Stanner wrote:
> > > On Mon, 2025-05-26 at 11:25 +0200, Christian König wrote:
> > > > On 5/23/25 16:16, Danilo Krummrich wrote:
> > > > > On Fri, May 23, 2025 at 04:11:39PM +0200, Danilo Krummrich
> > > > > wrote:
> > > > > > On Fri, May 23, 2025 at 02:56:40PM +0200, Christian König
> > > > > > wrote:
> > > > > > > It turned out that we can actually massively optimize here.
> > > > > > > 
> > > > > > > The previous code was horrible inefficient since it
> > > > > > > constantly
> > > > > > > released
> > > > > > > and re-acquired the lock of the xarray and started each
> > > > > > > iteration from the
> > > > > > > base of the array to avoid concurrent modification which in
> > > > > > > our
> > > > > > > case
> > > > > > > doesn't exist.
> > > > > > > 
> > > > > > > Additional to that the xas_find() and xas_store() functions
> > > > > > > are
> > > > > > > explicitly
> > > > > > > made in a way so that you can efficiently check entries and
> > > > > > > if
> > > > > > > you don't
> > > > > > > find a match store a new one at the end or replace existing
> > > > > > > ones.
> > > > > > > 
> > > > > > > So use xas_for_each()/xa_store() instead of
> > > > > > > xa_for_each()/xa_alloc().
> > > > > > > It's a bit more code, but should be much faster in the end.
> > > > > > 
> > > > > > This commit message does neither explain the motivation of
> > > > > > the
> > > > > > commit nor what it
> > > > > > does. It describes what instead belongs into the changelog
> > > > > > between versions.
> > > > > 
> > > > > Sorry, this is wrong. I got confused, the commit message is
> > > > > perfectly fine. :)
> > > > > 
> > > > > The rest still applies though.
> > > > > 
> > > > > > Speaking of versioning of the patch series, AFAIK there were
> > > > > > previous versions,
> > > > > > but this series was sent as a whole new series -- why?
> > > > > > 
> > > > > > Please resend with a proper commit message, version and
> > > > > > changelog. Thanks!
> > > > 
> > > > 
> > > > Well Philip asked to remove the changelog. I'm happy to bring it
> > > > back, but yeah...
> > > 
> > > No no no no :D
> > > 
> > > Philipp asked for the changelog to be removed *from the git commit
> > > message*; because it doesn't belong / isn't useful there.
> > > 
> > > If there's a cover letter, the changelog should be in the cover
> > > letter.
> > > If there's no cover letter, it should be between the ---
> > > separators:
> > 
> > I can live with that, just clearly state what you want.
> 
> Sure thing:
> 
>  * Patches and patch series's should contain their version identifier
>    within the square brackets [PATCH v3]. git format-patch -v3 does
>    that automatically.
>  * Changelog should be as described above
>  * Ideally, cover letters always contain the full changelog, v2, v3 and
>    so on, so that new readers get a sense of the evolution of the
>    series.
> 
> > 
> > For DRM the ask is often to keep the changelog in the commit message
> > or remove it entirely.
> 
> Yup, I've seen that a few times. I think we, the DRM community, should
> stop that. It's just not useful and makes the commit messages larger,
> both for the human reader while scrolling, as for the hard drive
> regarding storage size

I do occasionally find it useful as a record of different approaches
considered, which sometimes people fail to adequately cover in their
commit messages. Also useful indicator of how cursed a patch is :-)

But as long as anything relevant does end up in the commit message and
people don't just delete stuff I don't care how it's done at all. It's
just that the cost of deleting something that should have been there can
be really nasty sometimes, and storage is cheap.
-Sima

> 
> 
> Thx
> P.
> 
> 
> > 
> > Regards,
> > Christian.
> > 
> > > 
> > > 
> > > Signed-off-by: Gordon Freeman <freeman@blackmesa.org>
> > > Reviewed-by: Alyx Vance <alyx@vance.edu>
> > > ---
> > > Changes in v2:
> > >   - Provide more docu for crowbar-alloc-function.
> > >   - Use NULL pointers for reserved xarray entries
> > > ---
> > > <DIFF>
> > > 
> > > 
> > > P.
> > > 
> > > 
> > > > 
> > > > Regards,
> > > > Christian.
> > > > 
> > > > > > 
> > > > > > > Signed-off-by: Christian König <christian.koenig@amd.com>
> > > > > > > ---
> > > > > > >  drivers/gpu/drm/scheduler/sched_main.c | 29
> > > > > > > ++++++++++++++++++--------
> > > > > > >  1 file changed, 20 insertions(+), 9 deletions(-)
> > > > > > > 
> > > > > > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> > > > > > > b/drivers/gpu/drm/scheduler/sched_main.c
> > > > > > > index f7118497e47a..cf200b1b643e 100644
> > > > > > > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > > > > > > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > > > > > > @@ -871,10 +871,8 @@ EXPORT_SYMBOL(drm_sched_job_arm);
> > > > > > >  int drm_sched_job_add_dependency(struct drm_sched_job
> > > > > > > *job,
> > > > > > >   struct dma_fence *fence)
> > > > > > >  {
> > > > > > > + XA_STATE(xas, &job->dependencies, 0);
> > > > > > >   struct dma_fence *entry;
> > > > > > > - unsigned long index;
> > > > > > > - u32 id = 0;
> > > > > > > - int ret;
> > > > > > >  
> > > > > > >   if (!fence)
> > > > > > >   return 0;
> > > > > > > @@ -883,24 +881,37 @@ int
> > > > > > > drm_sched_job_add_dependency(struct
> > > > > > > drm_sched_job *job,
> > > > > > >   * This lets the size of the array of deps scale with
> > > > > > > the number of
> > > > > > >   * engines involved, rather than the number of BOs.
> > > > > > >   */
> > > > > > > - xa_for_each(&job->dependencies, index, entry) {
> > > > > > > + xa_lock(&job->dependencies);
> > > > > > > + xas_for_each(&xas, entry, ULONG_MAX) {
> > > > > > >   if (entry->context != fence->context)
> > > > > > >   continue;
> > > > > > >  
> > > > > > >   if (dma_fence_is_later(fence, entry)) {
> > > > > > >   dma_fence_put(entry);
> > > > > > > - xa_store(&job->dependencies, index,
> > > > > > > fence, GFP_KERNEL);
> > > > > > > + xas_store(&xas, fence);
> > > > > > >   } else {
> > > > > > >   dma_fence_put(fence);
> > > > > > >   }
> > > > > > > - return 0;
> > > > > > > + xa_unlock(&job->dependencies);
> > > > > > > + return xas_error(&xas);
> > > > > > >   }
> > > > > > >  
> > > > > > > - ret = xa_alloc(&job->dependencies, &id, fence,
> > > > > > > xa_limit_32b, GFP_KERNEL);
> > > > > > > - if (ret != 0)
> > > > > > > +retry:
> > > > > > > + entry = xas_store(&xas, fence);
> > > > > > > + xa_unlock(&job->dependencies);
> > > > > > > +
> > > > > > > + /* There shouldn't be any concurrent add, so no need
> > > > > > > to loop again */
> > > > > > 
> > > > > > Concurrency shouldn't matter, xas_nomem() stores the pre-
> > > > > > allocated memory in the
> > > > > > XA_STATE not the xarray. Hence, I think we should remove the
> > > > > > comment.
> > > > > > 
> > > > > > > + if (xas_nomem(&xas, GFP_KERNEL)) {
> > > > > > > + xa_lock(&job->dependencies);
> > > > > > > + goto retry;
> > > > > > 
> > > > > > Please don't use a goto here, if we would have failed to
> > > > > > allocate
> > > > > > memory here,
> > > > > > this would be an endless loop until we succeed eventually. It
> > > > > > would be equal to:
> > > > > > 
> > > > > > while (!ptr) {
> > > > > > ptr = kmalloc();
> > > > > > }
> > > > > > 
> > > > > > Instead just take the lock and call xas_store() again.
> > > > > > 
> > > > > > > + }
> > > > > > > +
> > > > > > > + if (xas_error(&xas))
> > > > > > >   dma_fence_put(fence);
> > > > > > > + else
> > > > > > > + WARN_ON(entry);
> > > > > > 
> > > > > > Please don't call WARN_ON() here, this isn't fatal, we only
> > > > > > need
> > > > > > to return the
> > > > > > error code.
> > > > 
> > > 
> > 
> 

-- 
Simona Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/4] drm/sched: optimize drm_sched_job_add_dependency
  2025-05-28 12:30               ` Simona Vetter
@ 2025-05-28 13:24                 ` Christian König
  2025-05-28 13:29                 ` Alex Deucher
  1 sibling, 0 replies; 25+ messages in thread
From: Christian König @ 2025-05-28 13:24 UTC (permalink / raw)
  To: Simona Vetter, phasta; +Cc: Danilo Krummrich, tursulin, amd-gfx, dri-devel

On 5/28/25 14:30, Simona Vetter wrote:
>> Yup, I've seen that a few times. I think we, the DRM community, should
>> stop that. It's just not useful and makes the commit messages larger,
>> both for the human reader while scrolling, as for the hard drive
>> regarding storage size
> 
> I do occasionally find it useful as a record of different approaches
> considered, which sometimes people fail to adequately cover in their
> commit messages. Also useful indicator of how cursed a patch is :-)

Yeah that's certainly true :)

> But as long as anything relevant does end up in the commit message and
> people don't just delete stuff I don't care how it's done at all. It's
> just that the cost of deleting something that should have been there can
> be really nasty sometimes, and storage is cheap.

The problem is that some links to mailing list archives become 404 at some point.

But git on the other hand will keep it's history (at least as far as I know).

Again on the other hand LKML is the one exception which tends to keep things forever: https://lore.kernel.org/lkml/16GLAZ-1rFguGC@fwd07.sul.t-online.com/ :D

Christian.

> -Sima

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/4] drm/sched: optimize drm_sched_job_add_dependency
  2025-05-28 12:30               ` Simona Vetter
  2025-05-28 13:24                 ` Christian König
@ 2025-05-28 13:29                 ` Alex Deucher
  2025-05-28 14:38                   ` Danilo Krummrich
  1 sibling, 1 reply; 25+ messages in thread
From: Alex Deucher @ 2025-05-28 13:29 UTC (permalink / raw)
  To: Simona Vetter
  Cc: phasta, Christian König, Danilo Krummrich, tursulin, amd-gfx,
	dri-devel

On Wed, May 28, 2025 at 8:45 AM Simona Vetter <simona.vetter@ffwll.ch> wrote:
>
> On Mon, May 26, 2025 at 01:27:28PM +0200, Philipp Stanner wrote:
> > On Mon, 2025-05-26 at 13:16 +0200, Christian König wrote:
> > > On 5/26/25 11:34, Philipp Stanner wrote:
> > > > On Mon, 2025-05-26 at 11:25 +0200, Christian König wrote:
> > > > > On 5/23/25 16:16, Danilo Krummrich wrote:
> > > > > > On Fri, May 23, 2025 at 04:11:39PM +0200, Danilo Krummrich
> > > > > > wrote:
> > > > > > > On Fri, May 23, 2025 at 02:56:40PM +0200, Christian König
> > > > > > > wrote:
> > > > > > > > It turned out that we can actually massively optimize here.
> > > > > > > >
> > > > > > > > The previous code was horrible inefficient since it
> > > > > > > > constantly
> > > > > > > > released
> > > > > > > > and re-acquired the lock of the xarray and started each
> > > > > > > > iteration from the
> > > > > > > > base of the array to avoid concurrent modification which in
> > > > > > > > our
> > > > > > > > case
> > > > > > > > doesn't exist.
> > > > > > > >
> > > > > > > > Additional to that the xas_find() and xas_store() functions
> > > > > > > > are
> > > > > > > > explicitly
> > > > > > > > made in a way so that you can efficiently check entries and
> > > > > > > > if
> > > > > > > > you don't
> > > > > > > > find a match store a new one at the end or replace existing
> > > > > > > > ones.
> > > > > > > >
> > > > > > > > So use xas_for_each()/xa_store() instead of
> > > > > > > > xa_for_each()/xa_alloc().
> > > > > > > > It's a bit more code, but should be much faster in the end.
> > > > > > >
> > > > > > > This commit message does neither explain the motivation of
> > > > > > > the
> > > > > > > commit nor what it
> > > > > > > does. It describes what instead belongs into the changelog
> > > > > > > between versions.
> > > > > >
> > > > > > Sorry, this is wrong. I got confused, the commit message is
> > > > > > perfectly fine. :)
> > > > > >
> > > > > > The rest still applies though.
> > > > > >
> > > > > > > Speaking of versioning of the patch series, AFAIK there were
> > > > > > > previous versions,
> > > > > > > but this series was sent as a whole new series -- why?
> > > > > > >
> > > > > > > Please resend with a proper commit message, version and
> > > > > > > changelog. Thanks!
> > > > >
> > > > >
> > > > > Well Philip asked to remove the changelog. I'm happy to bring it
> > > > > back, but yeah...
> > > >
> > > > No no no no :D
> > > >
> > > > Philipp asked for the changelog to be removed *from the git commit
> > > > message*; because it doesn't belong / isn't useful there.
> > > >
> > > > If there's a cover letter, the changelog should be in the cover
> > > > letter.
> > > > If there's no cover letter, it should be between the ---
> > > > separators:
> > >
> > > I can live with that, just clearly state what you want.
> >
> > Sure thing:
> >
> >  * Patches and patch series's should contain their version identifier
> >    within the square brackets [PATCH v3]. git format-patch -v3 does
> >    that automatically.
> >  * Changelog should be as described above
> >  * Ideally, cover letters always contain the full changelog, v2, v3 and
> >    so on, so that new readers get a sense of the evolution of the
> >    series.
> >
> > >
> > > For DRM the ask is often to keep the changelog in the commit message
> > > or remove it entirely.
> >
> > Yup, I've seen that a few times. I think we, the DRM community, should
> > stop that. It's just not useful and makes the commit messages larger,
> > both for the human reader while scrolling, as for the hard drive
> > regarding storage size
>
> I do occasionally find it useful as a record of different approaches
> considered, which sometimes people fail to adequately cover in their
> commit messages. Also useful indicator of how cursed a patch is :-)
>
> But as long as anything relevant does end up in the commit message and
> people don't just delete stuff I don't care how it's done at all. It's
> just that the cost of deleting something that should have been there can
> be really nasty sometimes, and storage is cheap.

I like them for the same reasons.  Also, even with links, sometimes
there are forks of the conversation that get missed that a changelog
provides some insight into.  I find it useful in my own development as
I can note what I've changed in a patch and can retain that in the
commit rather than as something I need to track separately and then
add to the patches when I send them out.

Alex

> -Sima
>
> >
> >
> > Thx
> > P.
> >
> >
> > >
> > > Regards,
> > > Christian.
> > >
> > > >
> > > >
> > > > Signed-off-by: Gordon Freeman <freeman@blackmesa.org>
> > > > Reviewed-by: Alyx Vance <alyx@vance.edu>
> > > > ---
> > > > Changes in v2:
> > > >   - Provide more docu for crowbar-alloc-function.
> > > >   - Use NULL pointers for reserved xarray entries
> > > > ---
> > > > <DIFF>
> > > >
> > > >
> > > > P.
> > > >
> > > >
> > > > >
> > > > > Regards,
> > > > > Christian.
> > > > >
> > > > > > >
> > > > > > > > Signed-off-by: Christian König <christian.koenig@amd.com>
> > > > > > > > ---
> > > > > > > >  drivers/gpu/drm/scheduler/sched_main.c | 29
> > > > > > > > ++++++++++++++++++--------
> > > > > > > >  1 file changed, 20 insertions(+), 9 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> > > > > > > > b/drivers/gpu/drm/scheduler/sched_main.c
> > > > > > > > index f7118497e47a..cf200b1b643e 100644
> > > > > > > > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > > > > > > > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > > > > > > > @@ -871,10 +871,8 @@ EXPORT_SYMBOL(drm_sched_job_arm);
> > > > > > > >  int drm_sched_job_add_dependency(struct drm_sched_job
> > > > > > > > *job,
> > > > > > > >   struct dma_fence *fence)
> > > > > > > >  {
> > > > > > > > + XA_STATE(xas, &job->dependencies, 0);
> > > > > > > >   struct dma_fence *entry;
> > > > > > > > - unsigned long index;
> > > > > > > > - u32 id = 0;
> > > > > > > > - int ret;
> > > > > > > >
> > > > > > > >   if (!fence)
> > > > > > > >   return 0;
> > > > > > > > @@ -883,24 +881,37 @@ int
> > > > > > > > drm_sched_job_add_dependency(struct
> > > > > > > > drm_sched_job *job,
> > > > > > > >   * This lets the size of the array of deps scale with
> > > > > > > > the number of
> > > > > > > >   * engines involved, rather than the number of BOs.
> > > > > > > >   */
> > > > > > > > - xa_for_each(&job->dependencies, index, entry) {
> > > > > > > > + xa_lock(&job->dependencies);
> > > > > > > > + xas_for_each(&xas, entry, ULONG_MAX) {
> > > > > > > >   if (entry->context != fence->context)
> > > > > > > >   continue;
> > > > > > > >
> > > > > > > >   if (dma_fence_is_later(fence, entry)) {
> > > > > > > >   dma_fence_put(entry);
> > > > > > > > - xa_store(&job->dependencies, index,
> > > > > > > > fence, GFP_KERNEL);
> > > > > > > > + xas_store(&xas, fence);
> > > > > > > >   } else {
> > > > > > > >   dma_fence_put(fence);
> > > > > > > >   }
> > > > > > > > - return 0;
> > > > > > > > + xa_unlock(&job->dependencies);
> > > > > > > > + return xas_error(&xas);
> > > > > > > >   }
> > > > > > > >
> > > > > > > > - ret = xa_alloc(&job->dependencies, &id, fence,
> > > > > > > > xa_limit_32b, GFP_KERNEL);
> > > > > > > > - if (ret != 0)
> > > > > > > > +retry:
> > > > > > > > + entry = xas_store(&xas, fence);
> > > > > > > > + xa_unlock(&job->dependencies);
> > > > > > > > +
> > > > > > > > + /* There shouldn't be any concurrent add, so no need
> > > > > > > > to loop again */
> > > > > > >
> > > > > > > Concurrency shouldn't matter, xas_nomem() stores the pre-
> > > > > > > allocated memory in the
> > > > > > > XA_STATE not the xarray. Hence, I think we should remove the
> > > > > > > comment.
> > > > > > >
> > > > > > > > + if (xas_nomem(&xas, GFP_KERNEL)) {
> > > > > > > > + xa_lock(&job->dependencies);
> > > > > > > > + goto retry;
> > > > > > >
> > > > > > > Please don't use a goto here, if we would have failed to
> > > > > > > allocate
> > > > > > > memory here,
> > > > > > > this would be an endless loop until we succeed eventually. It
> > > > > > > would be equal to:
> > > > > > >
> > > > > > > while (!ptr) {
> > > > > > > ptr = kmalloc();
> > > > > > > }
> > > > > > >
> > > > > > > Instead just take the lock and call xas_store() again.
> > > > > > >
> > > > > > > > + }
> > > > > > > > +
> > > > > > > > + if (xas_error(&xas))
> > > > > > > >   dma_fence_put(fence);
> > > > > > > > + else
> > > > > > > > + WARN_ON(entry);
> > > > > > >
> > > > > > > Please don't call WARN_ON() here, this isn't fatal, we only
> > > > > > > need
> > > > > > > to return the
> > > > > > > error code.
> > > > >
> > > >
> > >
> >
>
> --
> Simona Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/4] drm/sched: optimize drm_sched_job_add_dependency
  2025-05-28 13:29                 ` Alex Deucher
@ 2025-05-28 14:38                   ` Danilo Krummrich
  2025-05-28 14:47                     ` Danilo Krummrich
  0 siblings, 1 reply; 25+ messages in thread
From: Danilo Krummrich @ 2025-05-28 14:38 UTC (permalink / raw)
  To: Alex Deucher
  Cc: Simona Vetter, phasta, Christian König, tursulin, amd-gfx,
	dri-devel

On Wed, May 28, 2025 at 09:29:30AM -0400, Alex Deucher wrote:
> On Wed, May 28, 2025 at 8:45 AM Simona Vetter <simona.vetter@ffwll.ch> wrote:
> > I do occasionally find it useful as a record of different approaches
> > considered, which sometimes people fail to adequately cover in their
> > commit messages. Also useful indicator of how cursed a patch is :-)
> >
> > But as long as anything relevant does end up in the commit message and
> > people don't just delete stuff I don't care how it's done at all. It's
> > just that the cost of deleting something that should have been there can
> > be really nasty sometimes, and storage is cheap.
> 
> I like them for the same reasons.  Also, even with links, sometimes
> there are forks of the conversation that get missed that a changelog
> provides some insight into.  I find it useful in my own development as
> I can note what I've changed in a patch and can retain that in the
> commit rather than as something I need to track separately and then
> add to the patches when I send them out.

Personally, I don't think it's super useful in the commit message, it still
remains in the patches sent to the mailing list though. And since we put lore
links everywhere, it's easily accessible, *including* the context of why a
change was made from one version to another, i.e. the full conversation.

However, if we really want that, we should make it an offical thing, since
currently the kernel's process documentation [1] clearly states otherwise:

"Please put this information after the '---' line which separates the changelog
from the rest of the patch. The version information is not part of the changelog
which gets committed to the git tree. It is additional information for the
reviewers. If it's placed above the commit tags, it needs manual interaction to
remove it."

Alternatively, it can go into the cover letter.

[1] https://docs.kernel.org/process/submitting-patches.html#commentary

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/4] drm/sched: optimize drm_sched_job_add_dependency
  2025-05-28 14:38                   ` Danilo Krummrich
@ 2025-05-28 14:47                     ` Danilo Krummrich
  2025-06-03 11:34                       ` Simona Vetter
  0 siblings, 1 reply; 25+ messages in thread
From: Danilo Krummrich @ 2025-05-28 14:47 UTC (permalink / raw)
  To: Alex Deucher
  Cc: Simona Vetter, phasta, Christian König, tursulin, amd-gfx,
	dri-devel

On Wed, May 28, 2025 at 04:39:01PM +0200, Danilo Krummrich wrote:
> On Wed, May 28, 2025 at 09:29:30AM -0400, Alex Deucher wrote:
> > On Wed, May 28, 2025 at 8:45 AM Simona Vetter <simona.vetter@ffwll.ch> wrote:
> > > I do occasionally find it useful as a record of different approaches
> > > considered, which sometimes people fail to adequately cover in their
> > > commit messages. Also useful indicator of how cursed a patch is :-)
> > >
> > > But as long as anything relevant does end up in the commit message and
> > > people don't just delete stuff I don't care how it's done at all. It's
> > > just that the cost of deleting something that should have been there can
> > > be really nasty sometimes, and storage is cheap.
> > 
> > I like them for the same reasons.  Also, even with links, sometimes
> > there are forks of the conversation that get missed that a changelog
> > provides some insight into.  I find it useful in my own development as
> > I can note what I've changed in a patch and can retain that in the
> > commit rather than as something I need to track separately and then
> > add to the patches when I send them out.
> 
> Personally, I don't think it's super useful in the commit message, it still
> remains in the patches sent to the mailing list though. And since we put lore
> links everywhere, it's easily accessible, *including* the context of why a
> change was made from one version to another, i.e. the full conversation.
> 
> However, if we really want that, we should make it an offical thing, since
> currently the kernel's process documentation [1] clearly states otherwise:
> 
> "Please put this information after the '---' line which separates the changelog
> from the rest of the patch. The version information is not part of the changelog
> which gets committed to the git tree. It is additional information for the
> reviewers. If it's placed above the commit tags, it needs manual interaction to
> remove it."
> 
> Alternatively, it can go into the cover letter.

One additional note:

This is not me trying to be super bureaucratic; instead I think being consistent
in the process across the whole kernel results in a better experience for (new)
contributors.

> [1] https://docs.kernel.org/process/submitting-patches.html#commentary

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/4] drm/sched: optimize drm_sched_job_add_dependency
  2025-05-28 14:47                     ` Danilo Krummrich
@ 2025-06-03 11:34                       ` Simona Vetter
  0 siblings, 0 replies; 25+ messages in thread
From: Simona Vetter @ 2025-06-03 11:34 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Alex Deucher, Simona Vetter, phasta, Christian König,
	tursulin, amd-gfx, dri-devel

On Wed, May 28, 2025 at 04:47:00PM +0200, Danilo Krummrich wrote:
> On Wed, May 28, 2025 at 04:39:01PM +0200, Danilo Krummrich wrote:
> > On Wed, May 28, 2025 at 09:29:30AM -0400, Alex Deucher wrote:
> > > On Wed, May 28, 2025 at 8:45 AM Simona Vetter <simona.vetter@ffwll.ch> wrote:
> > > > I do occasionally find it useful as a record of different approaches
> > > > considered, which sometimes people fail to adequately cover in their
> > > > commit messages. Also useful indicator of how cursed a patch is :-)
> > > >
> > > > But as long as anything relevant does end up in the commit message and
> > > > people don't just delete stuff I don't care how it's done at all. It's
> > > > just that the cost of deleting something that should have been there can
> > > > be really nasty sometimes, and storage is cheap.
> > > 
> > > I like them for the same reasons.  Also, even with links, sometimes
> > > there are forks of the conversation that get missed that a changelog
> > > provides some insight into.  I find it useful in my own development as
> > > I can note what I've changed in a patch and can retain that in the
> > > commit rather than as something I need to track separately and then
> > > add to the patches when I send them out.
> > 
> > Personally, I don't think it's super useful in the commit message, it still
> > remains in the patches sent to the mailing list though. And since we put lore
> > links everywhere, it's easily accessible, *including* the context of why a
> > change was made from one version to another, i.e. the full conversation.
> > 
> > However, if we really want that, we should make it an offical thing, since
> > currently the kernel's process documentation [1] clearly states otherwise:
> > 
> > "Please put this information after the '---' line which separates the changelog
> > from the rest of the patch. The version information is not part of the changelog
> > which gets committed to the git tree. It is additional information for the
> > reviewers. If it's placed above the commit tags, it needs manual interaction to
> > remove it."
> > 
> > Alternatively, it can go into the cover letter.
> 
> One additional note:
> 
> This is not me trying to be super bureaucratic; instead I think being consistent
> in the process across the whole kernel results in a better experience for (new)
> contributors.

Yeah I agree with this part, which is why in the past I didn't ask people
to keep that part, but also won't complain if it's kept. The entire goal
being "minimal amount to get to a commit message that's hopefully
complete". I think with b4 this has now also become a bit easier than 10+
years ago.

Also all the kernel fd.o lists are on lore and the archive on fd.o is
under our control, so hopefully the archive situation shouldn't ever be an
issue for us.

Anyway no strong opinion from me, but we might want to document that we're
a bit more relaxed here.

*shrugs*

Cheers, Sima

> 
> > [1] https://docs.kernel.org/process/submitting-patches.html#commentary

-- 
Simona Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2025-06-03 11:35 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-23 12:56 Fixing AMDGPUs gang submit error handling Christian König
2025-05-23 12:56 ` [PATCH 1/4] drm/sched: optimize drm_sched_job_add_dependency Christian König
2025-05-23 13:49   ` Tvrtko Ursulin
2025-05-23 14:11   ` Danilo Krummrich
2025-05-23 14:16     ` Danilo Krummrich
2025-05-26  9:25       ` Christian König
2025-05-26  9:34         ` Philipp Stanner
2025-05-26 11:16           ` Christian König
2025-05-26 11:27             ` Philipp Stanner
2025-05-28 12:30               ` Simona Vetter
2025-05-28 13:24                 ` Christian König
2025-05-28 13:29                 ` Alex Deucher
2025-05-28 14:38                   ` Danilo Krummrich
2025-05-28 14:47                     ` Danilo Krummrich
2025-06-03 11:34                       ` Simona Vetter
2025-05-24 11:17     ` Danilo Krummrich
2025-05-26 10:59       ` Christian König
2025-05-26 11:14         ` Danilo Krummrich
2025-05-26 11:36           ` Christian König
2025-05-26  7:28   ` Philipp Stanner
2025-05-23 12:56 ` [PATCH 2/4] drm/sched: add drm_sched_prealloc_dependency_slots Christian König
2025-05-23 12:56 ` [PATCH 3/4] drm/sched: Add a test for prealloced fence slots Christian König
2025-05-23 12:56 ` [PATCH 4/4] drm/amdgpu: fix gang submission error handling Christian König
  -- strict thread matches above, loose matches on Subject: below --
2025-05-22 13:41 Fixing AMDGPUs gang submit " Christian König
2025-05-15 15:00 Christian König

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).