* Re: [PATCH 1/2] drm/sched: add drm_sched_prealloc_dependency_slots
2025-03-18 12:03 ` [PATCH 1/2] drm/sched: add drm_sched_prealloc_dependency_slots Christian König
@ 2025-03-18 12:39 ` Danilo Krummrich
2025-03-19 11:14 ` Christian König
2025-03-18 12:39 ` Philipp Stanner
2025-03-18 13:03 ` Tvrtko Ursulin
2 siblings, 1 reply; 13+ messages in thread
From: Danilo Krummrich @ 2025-03-18 12:39 UTC (permalink / raw)
To: Christian König; +Cc: phasta, tvrtko.ursulin, dri-devel, amd-gfx
On Tue, Mar 18, 2025 at 01:03:12PM +0100, Christian König wrote:
> /**
> * drm_sched_job_add_dependency - adds the fence as a job dependency
> * @job: scheduler job to add the dependencies to
> @@ -878,10 +910,12 @@ int drm_sched_job_add_dependency(struct drm_sched_job *job,
> * engines involved, rather than the number of BOs.
> */
> xa_for_each(&job->dependencies, index, entry) {
> - if (entry->context != fence->context)
> + bool signaled = dma_fence_is_signaled(entry);
> +
> + if (!signaled && entry->context != fence->context)
> continue;
>
> - if (dma_fence_is_later(fence, entry)) {
> + if (signaled || dma_fence_is_later(fence, entry)) {
> dma_fence_put(entry);
> xa_store(&job->dependencies, index, fence, GFP_KERNEL);
> } else {
> @@ -890,7 +924,8 @@ int drm_sched_job_add_dependency(struct drm_sched_job *job,
> return 0;
> }
>
> - ret = xa_alloc(&job->dependencies, &id, fence, xa_limit_32b, GFP_KERNEL);
> + ret = xa_alloc(&job->dependencies, &id, fence, xa_limit_32b,
> + GFP_KERNEL);
> if (ret != 0)
> dma_fence_put(fence);
Those changes seem unrelated, aren't they?
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] drm/sched: add drm_sched_prealloc_dependency_slots
2025-03-18 12:39 ` Danilo Krummrich
@ 2025-03-19 11:14 ` Christian König
0 siblings, 0 replies; 13+ messages in thread
From: Christian König @ 2025-03-19 11:14 UTC (permalink / raw)
To: Danilo Krummrich; +Cc: phasta, tvrtko.ursulin, dri-devel, amd-gfx
Am 18.03.25 um 13:39 schrieb Danilo Krummrich:
> On Tue, Mar 18, 2025 at 01:03:12PM +0100, Christian König wrote:
>> /**
>> * drm_sched_job_add_dependency - adds the fence as a job dependency
>> * @job: scheduler job to add the dependencies to
>> @@ -878,10 +910,12 @@ int drm_sched_job_add_dependency(struct drm_sched_job *job,
>> * engines involved, rather than the number of BOs.
>> */
>> xa_for_each(&job->dependencies, index, entry) {
>> - if (entry->context != fence->context)
>> + bool signaled = dma_fence_is_signaled(entry);
>> +
>> + if (!signaled && entry->context != fence->context)
>> continue;
>>
>> - if (dma_fence_is_later(fence, entry)) {
>> + if (signaled || dma_fence_is_later(fence, entry)) {
>> dma_fence_put(entry);
>> xa_store(&job->dependencies, index, fence, GFP_KERNEL);
>> } else {
>> @@ -890,7 +924,8 @@ int drm_sched_job_add_dependency(struct drm_sched_job *job,
>> return 0;
>> }
>>
>> - ret = xa_alloc(&job->dependencies, &id, fence, xa_limit_32b, GFP_KERNEL);
>> + ret = xa_alloc(&job->dependencies, &id, fence, xa_limit_32b,
>> + GFP_KERNEL);
>> if (ret != 0)
>> dma_fence_put(fence);
> Those changes seem unrelated, aren't they?
Ah, yes that was just a leftover from a previous try to fix this.
Thanks,
Christian.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] drm/sched: add drm_sched_prealloc_dependency_slots
2025-03-18 12:03 ` [PATCH 1/2] drm/sched: add drm_sched_prealloc_dependency_slots Christian König
2025-03-18 12:39 ` Danilo Krummrich
@ 2025-03-18 12:39 ` Philipp Stanner
2025-03-19 11:23 ` Christian König
2025-03-18 13:03 ` Tvrtko Ursulin
2 siblings, 1 reply; 13+ messages in thread
From: Philipp Stanner @ 2025-03-18 12:39 UTC (permalink / raw)
To: Christian König, tvrtko.ursulin, dri-devel, dakr, amd-gfx
On Tue, 2025-03-18 at 13:03 +0100, Christian König wrote:
> The problem is that drivers sometimes need to add dependencies
> without
> allocating any memory.
>
> Add a function which preallocates slots by inserting signaled stub
> fences
> into the dependency array.
I think I get what you're doing here, but it would certainly be helpful
to provide some more justification in the commit message a la: "Some
drivers have to [...]. With drm_sched_job_add_dependency() that's not
possible because [...]"
>
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
> drivers/gpu/drm/scheduler/sched_main.c | 41
> ++++++++++++++++++++++++--
> include/drm/gpu_scheduler.h | 2 ++
> 2 files changed, 40 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> b/drivers/gpu/drm/scheduler/sched_main.c
> index 4d4219fbe49d..2085eea89513 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -852,6 +852,38 @@ void drm_sched_job_arm(struct drm_sched_job
> *job)
> }
> EXPORT_SYMBOL(drm_sched_job_arm);
>
> +/**
> + * drm_sched_job_prealloc_dependency_slots - avoid ENOMEM on adding
> dependencies
> + * @job: scheduler job where dependencies will be added
> + * @num_deps: number of dependencies to preallocate slots for
> + *
> + * Preallocate memory so that no ENOMEM can come later while adding
> + * dependencies.
Similarly, should have a sentence that clarifies for when this is
needed.
> + *
> + * Return:
> + * 0 on success, or an error on failing to expand the array.
> + */
> +int drm_sched_job_prealloc_dependency_slots(struct drm_sched_job
> *job,
> + unsigned int num_deps)
> +{
> + struct dma_fence *fence;
> + u32 id = 0;
> + int ret;
> +
> + while (num_deps--) {
> + fence = dma_fence_get_stub();
> + ret = xa_alloc(&job->dependencies, &id, fence,
> xa_limit_32b,
> + GFP_KERNEL);
So this would fill the xarr with already signaled fences which then
later will be replaced with unsignaled fences?
Help me out here: would it also work to add NULL instead of that stub-
fence?
> + if (ret != 0) {
btw. style should be consistent with the while() above where you ommit
'> 0'.
I personally prefer having the comparison operators, but implicitly
checking for 0 is well established in the kernel, so I guess both is
OK.
> + dma_fence_put(fence);
> + return ret;
> + }
> + }
> +
> + return 0;
> +}
> +EXPORT_SYMBOL(drm_sched_job_prealloc_dependency_slots);
> +
> /**
> * drm_sched_job_add_dependency - adds the fence as a job dependency
> * @job: scheduler job to add the dependencies to
> @@ -878,10 +910,12 @@ int drm_sched_job_add_dependency(struct
> drm_sched_job *job,
> * engines involved, rather than the number of BOs.
> */
> xa_for_each(&job->dependencies, index, entry) {
> - if (entry->context != fence->context)
> + bool signaled = dma_fence_is_signaled(entry);
> +
> + if (!signaled && entry->context != fence->context)
> continue;
>
> - if (dma_fence_is_later(fence, entry)) {
> + if (signaled || dma_fence_is_later(fence, entry)) {
> dma_fence_put(entry);
> xa_store(&job->dependencies, index, fence,
> GFP_KERNEL);
> } else {
> @@ -890,7 +924,8 @@ int drm_sched_job_add_dependency(struct
> drm_sched_job *job,
> return 0;
> }
>
> - ret = xa_alloc(&job->dependencies, &id, fence, xa_limit_32b,
> GFP_KERNEL);
> + ret = xa_alloc(&job->dependencies, &id, fence, xa_limit_32b,
> + GFP_KERNEL);
Would leave that unchanged for git-blame.
Thx
P.
> if (ret != 0)
> dma_fence_put(fence);
>
> diff --git a/include/drm/gpu_scheduler.h
> b/include/drm/gpu_scheduler.h
> index 1a7e377d4cbb..916e820b27ff 100644
> --- a/include/drm/gpu_scheduler.h
> +++ b/include/drm/gpu_scheduler.h
> @@ -632,6 +632,8 @@ int drm_sched_job_init(struct drm_sched_job *job,
> u32 credits, void *owner);
> void drm_sched_job_arm(struct drm_sched_job *job);
> void drm_sched_entity_push_job(struct drm_sched_job *sched_job);
> +int drm_sched_job_prealloc_dependency_slots(struct drm_sched_job
> *job,
> + unsigned int num_deps);
> int drm_sched_job_add_dependency(struct drm_sched_job *job,
> struct dma_fence *fence);
> int drm_sched_job_add_syncobj_dependency(struct drm_sched_job *job,
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] drm/sched: add drm_sched_prealloc_dependency_slots
2025-03-18 12:39 ` Philipp Stanner
@ 2025-03-19 11:23 ` Christian König
2025-03-20 11:49 ` Tvrtko Ursulin
0 siblings, 1 reply; 13+ messages in thread
From: Christian König @ 2025-03-19 11:23 UTC (permalink / raw)
To: phasta, tvrtko.ursulin, dri-devel, dakr, amd-gfx
[-- Attachment #1: Type: text/plain, Size: 981 bytes --]
>> + *
>> + * Return:
>> + * 0 on success, or an error on failing to expand the array.
>> + */
>> +int drm_sched_job_prealloc_dependency_slots(struct drm_sched_job
>> *job,
>> + unsigned int num_deps)
>> +{
>> + struct dma_fence *fence;
>> + u32 id = 0;
>> + int ret;
>> +
>> + while (num_deps--) {
>> + fence = dma_fence_get_stub();
>> + ret = xa_alloc(&job->dependencies, &id, fence,
>> xa_limit_32b,
>> + GFP_KERNEL);
> So this would fill the xarr with already signaled fences which then
> later will be replaced with unsignaled fences?
Yes, exactly that's the idea.
> Help me out here: would it also work to add NULL instead of that stub-
> fence?
Good question, idk. That's an implementation detail of the xarray.
Tvrtko also correctly pointed out that it is most likely a bad idea to use dma_fence_is_signaled() in the critical code path.
I will try to dig through the xarray behavior up and update the patch if possible.
Thanks,
Christian.
[-- Attachment #2: Type: text/html, Size: 1666 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] drm/sched: add drm_sched_prealloc_dependency_slots
2025-03-19 11:23 ` Christian König
@ 2025-03-20 11:49 ` Tvrtko Ursulin
2025-03-21 8:20 ` Philipp Stanner
0 siblings, 1 reply; 13+ messages in thread
From: Tvrtko Ursulin @ 2025-03-20 11:49 UTC (permalink / raw)
To: Christian König, phasta, dri-devel, dakr, amd-gfx
On 19/03/2025 11:23, Christian König wrote:
>>> + *
>>> + * Return:
>>> + * 0 on success, or an error on failing to expand the array.
>>> + */
>>> +int drm_sched_job_prealloc_dependency_slots(struct drm_sched_job
>>> *job,
>>> + unsigned int num_deps)
>>> +{
>>> + struct dma_fence *fence;
>>> + u32 id = 0;
>>> + int ret;
>>> +
>>> + while (num_deps--) {
>>> + fence = dma_fence_get_stub();
>>> + ret = xa_alloc(&job->dependencies, &id, fence,
>>> xa_limit_32b,
>>> + GFP_KERNEL);
>> So this would fill the xarr with already signaled fences which then
>> later will be replaced with unsignaled fences?
>
> Yes, exactly that's the idea.
>
>> Help me out here: would it also work to add NULL instead of that stub-
>> fence?
>
> Good question, idk. That's an implementation detail of the xarray.
>
> Tvrtko also correctly pointed out that it is most likely a bad idea to
> use dma_fence_is_signaled() in the critical code path.
>
> I will try to dig through the xarray behavior up and update the patch if
> possible.
I think NULL on its own is not possible, but the two low bits are
available for pointer tagging, or designating pointers vs integers,
which looks like it could work. Something like storing
xa_tag_pointer(NULL, 1) to reserved slots and at lookup time they would
be detected with "xa_pointer_tag(fence) & 1".
Regards,
Tvrtko
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] drm/sched: add drm_sched_prealloc_dependency_slots
2025-03-20 11:49 ` Tvrtko Ursulin
@ 2025-03-21 8:20 ` Philipp Stanner
2025-03-21 13:25 ` Christian König
0 siblings, 1 reply; 13+ messages in thread
From: Philipp Stanner @ 2025-03-21 8:20 UTC (permalink / raw)
To: Tvrtko Ursulin, Christian König, phasta, dri-devel, dakr,
amd-gfx
On Thu, 2025-03-20 at 11:49 +0000, Tvrtko Ursulin wrote:
>
> On 19/03/2025 11:23, Christian König wrote:
> > > > + *
> > > > + * Return:
> > > > + * 0 on success, or an error on failing to expand the array.
> > > > + */
> > > > +int drm_sched_job_prealloc_dependency_slots(struct
> > > > drm_sched_job
> > > > *job,
> > > > + unsigned int
> > > > num_deps)
> > > > +{
> > > > + struct dma_fence *fence;
> > > > + u32 id = 0;
> > > > + int ret;
> > > > +
> > > > + while (num_deps--) {
> > > > + fence = dma_fence_get_stub();
> > > > + ret = xa_alloc(&job->dependencies, &id, fence,
> > > > xa_limit_32b,
> > > > + GFP_KERNEL);
> > > So this would fill the xarr with already signaled fences which
> > > then
> > > later will be replaced with unsignaled fences?
> >
> > Yes, exactly that's the idea.
> >
> > > Help me out here: would it also work to add NULL instead of that
> > > stub-
> > > fence?
> >
> > Good question, idk. That's an implementation detail of the xarray.
> >
> > Tvrtko also correctly pointed out that it is most likely a bad idea
> > to
> > use dma_fence_is_signaled() in the critical code path.
> >
> > I will try to dig through the xarray behavior up and update the
> > patch if
> > possible.
>
> I think NULL on its own is not possible, but the two low bits are
> available for pointer tagging, or designating pointers vs integers,
> which looks like it could work. Something like storing
> xa_tag_pointer(NULL, 1) to reserved slots and at lookup time they
> would
> be detected with "xa_pointer_tag(fence) & 1".
Almost!
they would be detected with a super-readable
#define DRM_SCHED_XARR_TAG_RESERVED_ENTRY 1
or maybe …UNUSED_ENTRY?
^_^
P.
>
> Regards,
>
> Tvrtko
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] drm/sched: add drm_sched_prealloc_dependency_slots
2025-03-21 8:20 ` Philipp Stanner
@ 2025-03-21 13:25 ` Christian König
0 siblings, 0 replies; 13+ messages in thread
From: Christian König @ 2025-03-21 13:25 UTC (permalink / raw)
To: phasta, Tvrtko Ursulin, dri-devel, dakr, amd-gfx
Am 21.03.25 um 09:20 schrieb Philipp Stanner:
> On Thu, 2025-03-20 at 11:49 +0000, Tvrtko Ursulin wrote:
>> On 19/03/2025 11:23, Christian König wrote:
>>>>> + *
>>>>> + * Return:
>>>>> + * 0 on success, or an error on failing to expand the array.
>>>>> + */
>>>>> +int drm_sched_job_prealloc_dependency_slots(struct
>>>>> drm_sched_job
>>>>> *job,
>>>>> + unsigned int
>>>>> num_deps)
>>>>> +{
>>>>> + struct dma_fence *fence;
>>>>> + u32 id = 0;
>>>>> + int ret;
>>>>> +
>>>>> + while (num_deps--) {
>>>>> + fence = dma_fence_get_stub();
>>>>> + ret = xa_alloc(&job->dependencies, &id, fence,
>>>>> xa_limit_32b,
>>>>> + GFP_KERNEL);
>>>> So this would fill the xarr with already signaled fences which
>>>> then
>>>> later will be replaced with unsignaled fences?
>>> Yes, exactly that's the idea.
>>>
>>>> Help me out here: would it also work to add NULL instead of that
>>>> stub-
>>>> fence?
>>> Good question, idk. That's an implementation detail of the xarray.
>>>
>>> Tvrtko also correctly pointed out that it is most likely a bad idea
>>> to
>>> use dma_fence_is_signaled() in the critical code path.
>>>
>>> I will try to dig through the xarray behavior up and update the
>>> patch if
>>> possible.
>> I think NULL on its own is not possible, but the two low bits are
>> available for pointer tagging, or designating pointers vs integers,
>> which looks like it could work. Something like storing
>> xa_tag_pointer(NULL, 1) to reserved slots and at lookup time they
>> would
>> be detected with "xa_pointer_tag(fence) & 1".
> Almost!
>
> they would be detected with a super-readable
>
> #define DRM_SCHED_XARR_TAG_RESERVED_ENTRY 1
>
> or maybe …UNUSED_ENTRY?
NULL doesn't work because xa_for_each() skips NULL entries, but it looks like somebody else stumbled over the same problem we have here as well.
So there is already the solution to use XA_ZERO_ENTRY! That special value can then either be used with xa_alloc() or through xa_reserve().
It's just that the xarray documentation is not explicitly pointing that out, so I had to dig around in the code a bit to figure out how everything works.
Regards,
Christian.
>
> ^_^
>
> P.
>
>
>> Regards,
>>
>> Tvrtko
>>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] drm/sched: add drm_sched_prealloc_dependency_slots
2025-03-18 12:03 ` [PATCH 1/2] drm/sched: add drm_sched_prealloc_dependency_slots Christian König
2025-03-18 12:39 ` Danilo Krummrich
2025-03-18 12:39 ` Philipp Stanner
@ 2025-03-18 13:03 ` Tvrtko Ursulin
2 siblings, 0 replies; 13+ messages in thread
From: Tvrtko Ursulin @ 2025-03-18 13:03 UTC (permalink / raw)
To: Christian König, phasta, dri-devel, dakr, amd-gfx
On 18/03/2025 12:03, Christian König wrote:
> The problem is that drivers sometimes need to add dependencies without
> allocating any memory.
>
> Add a function which preallocates slots by inserting signaled stub fences
> into the dependency array.
>
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
> drivers/gpu/drm/scheduler/sched_main.c | 41 ++++++++++++++++++++++++--
> include/drm/gpu_scheduler.h | 2 ++
> 2 files changed, 40 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index 4d4219fbe49d..2085eea89513 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -852,6 +852,38 @@ void drm_sched_job_arm(struct drm_sched_job *job)
> }
> EXPORT_SYMBOL(drm_sched_job_arm);
>
> +/**
> + * drm_sched_job_prealloc_dependency_slots - avoid ENOMEM on adding dependencies
> + * @job: scheduler job where dependencies will be added
> + * @num_deps: number of dependencies to preallocate slots for
> + *
> + * Preallocate memory so that no ENOMEM can come later while adding
> + * dependencies.
> + *
> + * Return:
> + * 0 on success, or an error on failing to expand the array.
> + */
> +int drm_sched_job_prealloc_dependency_slots(struct drm_sched_job *job,
> + unsigned int num_deps)
> +{
> + struct dma_fence *fence;
> + u32 id = 0;
> + int ret;
> +
> + while (num_deps--) {
> + fence = dma_fence_get_stub();
> + ret = xa_alloc(&job->dependencies, &id, fence, xa_limit_32b,
> + GFP_KERNEL);
> + if (ret != 0) {
> + dma_fence_put(fence);
> + return ret;
> + }
> + }
> +
> + return 0;
> +}
> +EXPORT_SYMBOL(drm_sched_job_prealloc_dependency_slots);
> +
> /**
> * drm_sched_job_add_dependency - adds the fence as a job dependency
> * @job: scheduler job to add the dependencies to
> @@ -878,10 +910,12 @@ int drm_sched_job_add_dependency(struct drm_sched_job *job,
> * engines involved, rather than the number of BOs.
> */
> xa_for_each(&job->dependencies, index, entry) {
> - if (entry->context != fence->context)
> + bool signaled = dma_fence_is_signaled(entry);
I so dislike dma_fence_is_signaled() due it leaking the fence signaling
annotations to unsuspecting callers. I hope it doesn't mark someone's
lock/mutex as the critical signalling path..
Regards,
Tvrtko
> +
> + if (!signaled && entry->context != fence->context)
> continue;
>
> - if (dma_fence_is_later(fence, entry)) {
> + if (signaled || dma_fence_is_later(fence, entry)) {
> dma_fence_put(entry);
> xa_store(&job->dependencies, index, fence, GFP_KERNEL);
> } else {
> @@ -890,7 +924,8 @@ int drm_sched_job_add_dependency(struct drm_sched_job *job,
> return 0;
> }
>
> - ret = xa_alloc(&job->dependencies, &id, fence, xa_limit_32b, GFP_KERNEL);
> + ret = xa_alloc(&job->dependencies, &id, fence, xa_limit_32b,
> + GFP_KERNEL);
> if (ret != 0)
> dma_fence_put(fence);
>
> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> index 1a7e377d4cbb..916e820b27ff 100644
> --- a/include/drm/gpu_scheduler.h
> +++ b/include/drm/gpu_scheduler.h
> @@ -632,6 +632,8 @@ int drm_sched_job_init(struct drm_sched_job *job,
> u32 credits, void *owner);
> void drm_sched_job_arm(struct drm_sched_job *job);
> void drm_sched_entity_push_job(struct drm_sched_job *sched_job);
> +int drm_sched_job_prealloc_dependency_slots(struct drm_sched_job *job,
> + unsigned int num_deps);
> int drm_sched_job_add_dependency(struct drm_sched_job *job,
> struct dma_fence *fence);
> int drm_sched_job_add_syncobj_dependency(struct drm_sched_job *job,
^ permalink raw reply [flat|nested] 13+ messages in thread