Re: [PATCH v1] drm/amdgpu: give each kernel job a unique id

dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed

* Re: [PATCH v1] drm/amdgpu: give each kernel job a unique id
       [not found] <2cefb0f5-3820-4828-8360-ff8c92e21aa6@amd.com>
@ 2025-06-11 14:25 ` Danilo Krummrich
  2025-06-11 14:57   ` Christian König
  0 siblings, 1 reply; 16+ messages in thread
From: Danilo Krummrich @ 2025-06-11 14:25 UTC (permalink / raw)
  To: Christian König
  Cc: Pierre-Eric Pelloux-Prayer, alexander.deucher, amd-gfx,
	Philipp Stanner, dri-devel

(Cc: dri-devel)

On Tue, Jun 10, 2025 at 03:05:34PM +0200, Christian König wrote:
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
> > index 5a8bc6342222..6108a6f9dba7 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
> > @@ -44,6 +44,22 @@
> >  struct amdgpu_fence;
> >  enum amdgpu_ib_pool_type;
> >  
> > +/* Internal kernel job ids. (decreasing values, starting from U64_MAX). */
> > +#define AMDGPU_KERNEL_JOB_ID_VM_UPDATE              (18446744073709551615ULL)
> > +#define AMDGPU_KERNEL_JOB_ID_VM_UPDATE_PDES         (18446744073709551614ULL)
> > +#define AMDGPU_KERNEL_JOB_ID_VM_UPDATE_RANGE        (18446744073709551613ULL)
> > +#define AMDGPU_KERNEL_JOB_ID_VM_PT_CLEAR            (18446744073709551612ULL)
> > +#define AMDGPU_KERNEL_JOB_ID_TTM_MAP_BUFFER         (18446744073709551611ULL)
> > +#define AMDGPU_KERNEL_JOB_ID_TTM_ACCESS_MEMORY_SDMA (18446744073709551610ULL)
> > +#define AMDGPU_KERNEL_JOB_ID_TTM_COPY_BUFFER        (18446744073709551609ULL)
> > +#define AMDGPU_KERNEL_JOB_ID_CLEAR_ON_RELEASE       (18446744073709551608ULL)
> > +#define AMDGPU_KERNEL_JOB_ID_MOVE_BLIT              (18446744073709551607ULL)
> > +#define AMDGPU_KERNEL_JOB_ID_TTM_CLEAR_BUFFER       (18446744073709551606ULL)
> > +#define AMDGPU_KERNEL_JOB_ID_CLEANER_SHADER         (18446744073709551605ULL)
> > +#define AMDGPU_KERNEL_JOB_ID_FLUSH_GPU_TLB          (18446744073709551604ULL)
> > +#define AMDGPU_KERNEL_JOB_ID_KFD_GART_MAP           (18446744073709551603ULL)
> > +#define AMDGPU_KERNEL_JOB_ID_VCN_RING_TEST          (18446744073709551602ULL)

Why not

	(U64_MAX - {1, 2, ...})?

> Mhm, reiterating our internal discussion on the mailing list.
> 
> I think it would be nicer if we could use negative values for the kernel submissions and positive for userspace. But as discussed internally we would need to adjust the scheduler trace points for that once more.
> 
> @Philip and @Danilo any opinion on that?

Both, the U64_MAX and the positive-negative approach, are a bit hacky. I wonder
why we need client_id to be a u64, wouldn't a u32 not be enough?

Anyways, if client_id remains to be a u64, we could add a global DRM constant
instead, e.g.

	#define DRM_CLIENT_ID_MAX	0x0fffffffffffffff
	#define DRM_KERNEL_ID_BASE	(DRM_CLIENT_ID_MAX + 1)

And in drm_file_alloc() fail if we're out of IDs.

I think this would be much cleaner.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v1] drm/amdgpu: give each kernel job a unique id
  2025-06-11 14:25 ` [PATCH v1] drm/amdgpu: give each kernel job a unique id Danilo Krummrich
@ 2025-06-11 14:57   ` Christian König
  2025-06-11 15:11     ` Danilo Krummrich
  0 siblings, 1 reply; 16+ messages in thread
From: Christian König @ 2025-06-11 14:57 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Pierre-Eric Pelloux-Prayer, alexander.deucher, amd-gfx,
	Philipp Stanner, dri-devel

On 6/11/25 16:25, Danilo Krummrich wrote:
> (Cc: dri-devel)
> 
> On Tue, Jun 10, 2025 at 03:05:34PM +0200, Christian König wrote:
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
>>> index 5a8bc6342222..6108a6f9dba7 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
>>> @@ -44,6 +44,22 @@
>>>  struct amdgpu_fence;
>>>  enum amdgpu_ib_pool_type;
>>>  
>>> +/* Internal kernel job ids. (decreasing values, starting from U64_MAX). */
>>> +#define AMDGPU_KERNEL_JOB_ID_VM_UPDATE              (18446744073709551615ULL)
>>> +#define AMDGPU_KERNEL_JOB_ID_VM_UPDATE_PDES         (18446744073709551614ULL)
>>> +#define AMDGPU_KERNEL_JOB_ID_VM_UPDATE_RANGE        (18446744073709551613ULL)
>>> +#define AMDGPU_KERNEL_JOB_ID_VM_PT_CLEAR            (18446744073709551612ULL)
>>> +#define AMDGPU_KERNEL_JOB_ID_TTM_MAP_BUFFER         (18446744073709551611ULL)
>>> +#define AMDGPU_KERNEL_JOB_ID_TTM_ACCESS_MEMORY_SDMA (18446744073709551610ULL)
>>> +#define AMDGPU_KERNEL_JOB_ID_TTM_COPY_BUFFER        (18446744073709551609ULL)
>>> +#define AMDGPU_KERNEL_JOB_ID_CLEAR_ON_RELEASE       (18446744073709551608ULL)
>>> +#define AMDGPU_KERNEL_JOB_ID_MOVE_BLIT              (18446744073709551607ULL)
>>> +#define AMDGPU_KERNEL_JOB_ID_TTM_CLEAR_BUFFER       (18446744073709551606ULL)
>>> +#define AMDGPU_KERNEL_JOB_ID_CLEANER_SHADER         (18446744073709551605ULL)
>>> +#define AMDGPU_KERNEL_JOB_ID_FLUSH_GPU_TLB          (18446744073709551604ULL)
>>> +#define AMDGPU_KERNEL_JOB_ID_KFD_GART_MAP           (18446744073709551603ULL)
>>> +#define AMDGPU_KERNEL_JOB_ID_VCN_RING_TEST          (18446744073709551602ULL)
> 
> Why not
> 
> 	(U64_MAX - {1, 2, ...})?

That's what Pierre came up with as well, but I thought that this is ugly because it makes it hard to match the numbers from the trace back to something in the code.

>> Mhm, reiterating our internal discussion on the mailing list.
>>
>> I think it would be nicer if we could use negative values for the kernel submissions and positive for userspace. But as discussed internally we would need to adjust the scheduler trace points for that once more.
>>
>> @Philip and @Danilo any opinion on that?
> 
> Both, the U64_MAX and the positive-negative approach, are a bit hacky. I wonder
> why we need client_id to be a u64, wouldn't a u32 not be enough?

That can trivially overflow on long running boxes.

> Anyways, if client_id remains to be a u64, we could add a global DRM constant
> instead, e.g.
> 
> 	#define DRM_CLIENT_ID_MAX	0x0fffffffffffffff
> 	#define DRM_KERNEL_ID_BASE	(DRM_CLIENT_ID_MAX + 1)
> 
> And in drm_file_alloc() fail if we're out of IDs.

Mhm, I wouldn't mind printing the client id as hex and then always setting the highest bit for kernel submissions.

But when we keep printing them as base 10 it kind of becomes unreadable.

Christian.

> 
> I think this would be much cleaner.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v1] drm/amdgpu: give each kernel job a unique id
  2025-06-11 14:57   ` Christian König
@ 2025-06-11 15:11     ` Danilo Krummrich
  2025-06-12  7:00       ` Christian König
  0 siblings, 1 reply; 16+ messages in thread
From: Danilo Krummrich @ 2025-06-11 15:11 UTC (permalink / raw)
  To: Christian König
  Cc: Pierre-Eric Pelloux-Prayer, alexander.deucher, amd-gfx,
	Philipp Stanner, dri-devel

On Wed, Jun 11, 2025 at 04:57:50PM +0200, Christian König wrote:
> On 6/11/25 16:25, Danilo Krummrich wrote:
> > (Cc: dri-devel)
> > 
> > On Tue, Jun 10, 2025 at 03:05:34PM +0200, Christian König wrote:
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
> >>> index 5a8bc6342222..6108a6f9dba7 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
> >>> @@ -44,6 +44,22 @@
> >>>  struct amdgpu_fence;
> >>>  enum amdgpu_ib_pool_type;
> >>>  
> >>> +/* Internal kernel job ids. (decreasing values, starting from U64_MAX). */
> >>> +#define AMDGPU_KERNEL_JOB_ID_VM_UPDATE              (18446744073709551615ULL)
> >>> +#define AMDGPU_KERNEL_JOB_ID_VM_UPDATE_PDES         (18446744073709551614ULL)
> >>> +#define AMDGPU_KERNEL_JOB_ID_VM_UPDATE_RANGE        (18446744073709551613ULL)
> >>> +#define AMDGPU_KERNEL_JOB_ID_VM_PT_CLEAR            (18446744073709551612ULL)
> >>> +#define AMDGPU_KERNEL_JOB_ID_TTM_MAP_BUFFER         (18446744073709551611ULL)
> >>> +#define AMDGPU_KERNEL_JOB_ID_TTM_ACCESS_MEMORY_SDMA (18446744073709551610ULL)
> >>> +#define AMDGPU_KERNEL_JOB_ID_TTM_COPY_BUFFER        (18446744073709551609ULL)
> >>> +#define AMDGPU_KERNEL_JOB_ID_CLEAR_ON_RELEASE       (18446744073709551608ULL)
> >>> +#define AMDGPU_KERNEL_JOB_ID_MOVE_BLIT              (18446744073709551607ULL)
> >>> +#define AMDGPU_KERNEL_JOB_ID_TTM_CLEAR_BUFFER       (18446744073709551606ULL)
> >>> +#define AMDGPU_KERNEL_JOB_ID_CLEANER_SHADER         (18446744073709551605ULL)
> >>> +#define AMDGPU_KERNEL_JOB_ID_FLUSH_GPU_TLB          (18446744073709551604ULL)
> >>> +#define AMDGPU_KERNEL_JOB_ID_KFD_GART_MAP           (18446744073709551603ULL)
> >>> +#define AMDGPU_KERNEL_JOB_ID_VCN_RING_TEST          (18446744073709551602ULL)
> > 
> > Why not
> > 
> > 	(U64_MAX - {1, 2, ...})?
> 
> That's what Pierre came up with as well, but I thought that this is ugly because it makes it hard to match the numbers from the trace back to something in the code.
> 
> >> Mhm, reiterating our internal discussion on the mailing list.
> >>
> >> I think it would be nicer if we could use negative values for the kernel submissions and positive for userspace. But as discussed internally we would need to adjust the scheduler trace points for that once more.
> >>
> >> @Philip and @Danilo any opinion on that?
> > 
> > Both, the U64_MAX and the positive-negative approach, are a bit hacky. I wonder
> > why we need client_id to be a u64, wouldn't a u32 not be enough?
> 
> That can trivially overflow on long running boxes.

I don't know if "trivially" is the word of choice given that the number is
4,294,967,295.

But I did indeed miss that this is a for ever increasing atomic. Why is it an
atomic? Why is it not an IDA?

> > Anyways, if client_id remains to be a u64, we could add a global DRM constant
> > instead, e.g.
> > 
> > 	#define DRM_CLIENT_ID_MAX	0x0fffffffffffffff
> > 	#define DRM_KERNEL_ID_BASE	(DRM_CLIENT_ID_MAX + 1)
> > 
> > And in drm_file_alloc() fail if we're out of IDs.
> 
> Mhm, I wouldn't mind printing the client id as hex and then always setting the highest bit for kernel submissions.
> 
> But when we keep printing them as base 10 it kind of becomes unreadable.
> 
> Christian.
> 
> > 
> > I think this would be much cleaner.
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v1] drm/amdgpu: give each kernel job a unique id
  2025-06-11 15:11     ` Danilo Krummrich
@ 2025-06-12  7:00       ` Christian König
  2025-06-12 23:48         ` Danilo Krummrich
  0 siblings, 1 reply; 16+ messages in thread
From: Christian König @ 2025-06-12  7:00 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Pierre-Eric Pelloux-Prayer, alexander.deucher, amd-gfx,
	Philipp Stanner, dri-devel

On 6/11/25 17:11, Danilo Krummrich wrote:
>>>> Mhm, reiterating our internal discussion on the mailing list.
>>>>
>>>> I think it would be nicer if we could use negative values for the kernel submissions and positive for userspace. But as discussed internally we would need to adjust the scheduler trace points for that once more.
>>>>
>>>> @Philip and @Danilo any opinion on that?
>>>
>>> Both, the U64_MAX and the positive-negative approach, are a bit hacky. I wonder
>>> why we need client_id to be a u64, wouldn't a u32 not be enough?
>>
>> That can trivially overflow on long running boxes.
> 
> I don't know if "trivially" is the word of choice given that the number is
> 4,294,967,295.
> 
> But I did indeed miss that this is a for ever increasing atomic. Why is it an
> atomic? Why is it not an IDA?

Well IDA has some extra overhead compared to an ever increasing atomic, additional to that it might not be the best choice to re-use numbers for clients in a trace log.

On the other hand using smaller numbers is usually nicer for manual inspection.

Regards,
Christian.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v1] drm/amdgpu: give each kernel job a unique id
  2025-06-12  7:00       ` Christian König
@ 2025-06-12 23:48         ` Danilo Krummrich
  2025-06-13  7:51           ` Pierre-Eric Pelloux-Prayer
  2025-06-13  8:23           ` Christian König
  0 siblings, 2 replies; 16+ messages in thread
From: Danilo Krummrich @ 2025-06-12 23:48 UTC (permalink / raw)
  To: Christian König
  Cc: Pierre-Eric Pelloux-Prayer, alexander.deucher, amd-gfx,
	Philipp Stanner, dri-devel

On Thu, Jun 12, 2025 at 09:00:34AM +0200, Christian König wrote:
> On 6/11/25 17:11, Danilo Krummrich wrote:
> >>>> Mhm, reiterating our internal discussion on the mailing list.
> >>>>
> >>>> I think it would be nicer if we could use negative values for the kernel submissions and positive for userspace. But as discussed internally we would need to adjust the scheduler trace points for that once more.
> >>>>
> >>>> @Philip and @Danilo any opinion on that?
> >>>
> >>> Both, the U64_MAX and the positive-negative approach, are a bit hacky. I wonder
> >>> why we need client_id to be a u64, wouldn't a u32 not be enough?
> >>
> >> That can trivially overflow on long running boxes.
> > 
> > I don't know if "trivially" is the word of choice given that the number is
> > 4,294,967,295.
> > 
> > But I did indeed miss that this is a for ever increasing atomic. Why is it an
> > atomic? Why is it not an IDA?
> 
> Well IDA has some extra overhead compared to an ever increasing atomic, additional to that it might not be the best choice to re-use numbers for clients in a trace log.

I think the overhead is not relevant at all, this is called from
drm_file_alloc(). The only path I can see where this is called is
drm_client_init(), which isn't high frequent stuff at all, is it?

It seems to me that we should probably use IDA here.

> On the other hand using smaller numbers is usually nicer for manual inspection.

Another option is to just add an interface to get a kernel client_id from the
same atomic / IDA.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v1] drm/amdgpu: give each kernel job a unique id
  2025-06-12 23:48         ` Danilo Krummrich
@ 2025-06-13  7:51           ` Pierre-Eric Pelloux-Prayer
  2025-06-13  8:31             ` Danilo Krummrich
  2025-06-13  8:23           ` Christian König
  1 sibling, 1 reply; 16+ messages in thread
From: Pierre-Eric Pelloux-Prayer @ 2025-06-13  7:51 UTC (permalink / raw)
  To: Danilo Krummrich, Christian König
  Cc: Pierre-Eric Pelloux-Prayer, alexander.deucher, amd-gfx,
	Philipp Stanner, dri-devel


Hi,

Le 13/06/2025 à 01:48, Danilo Krummrich a écrit :
> On Thu, Jun 12, 2025 at 09:00:34AM +0200, Christian König wrote:
>> On 6/11/25 17:11, Danilo Krummrich wrote:
>>>>>> Mhm, reiterating our internal discussion on the mailing list.
>>>>>>
>>>>>> I think it would be nicer if we could use negative values for the kernel submissions and positive for userspace. But as discussed internally we would need to adjust the scheduler trace points for that once more.
>>>>>>
>>>>>> @Philip and @Danilo any opinion on that?
>>>>>
>>>>> Both, the U64_MAX and the positive-negative approach, are a bit hacky. I wonder
>>>>> why we need client_id to be a u64, wouldn't a u32 not be enough?
>>>>
>>>> That can trivially overflow on long running boxes.
>>>
>>> I don't know if "trivially" is the word of choice given that the number is
>>> 4,294,967,295.
>>>
>>> But I did indeed miss that this is a for ever increasing atomic. Why is it an
>>> atomic? Why is it not an IDA?
>>
>> Well IDA has some extra overhead compared to an ever increasing atomic, additional to that it might not be the best choice to re-use numbers for clients in a trace log.
> 
> I think the overhead is not relevant at all, this is called from
> drm_file_alloc(). The only path I can see where this is called is
> drm_client_init(), which isn't high frequent stuff at all, is it?
> 
> It seems to me that we should probably use IDA here.
> 
>> On the other hand using smaller numbers is usually nicer for manual inspection.

Re-using IDs might bring confusion in the trace logs, for instance when tracing lots of short lived 
processes.

> 
> Another option is to just add an interface to get a kernel client_id from the
> same atomic / IDA.

Honestly I don't really see the problem with the current patch: using the last N ids for the kernel 
jobs requires no other changes and works fine.

https://gitlab.freedesktop.org/drm/amd/-/issues/4260 has a screenshot of the UMR tool using these ids.

If the theoretical overlap with client drm id is a concern, adding code to drm_file_alloc() to not 
use the last 0xff ids would be easy.

btw maybe other drivers also use kernel jobs with the same semantics, in which case we might want to 
move the special IDs definition to a shared place.

Thanks,
Pierre-Eric


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v1] drm/amdgpu: give each kernel job a unique id
  2025-06-12 23:48         ` Danilo Krummrich
  2025-06-13  7:51           ` Pierre-Eric Pelloux-Prayer
@ 2025-06-13  8:23           ` Christian König
  2025-06-13  8:29             ` Philipp Stanner
  2025-06-13  8:35             ` Danilo Krummrich
  1 sibling, 2 replies; 16+ messages in thread
From: Christian König @ 2025-06-13  8:23 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Pierre-Eric Pelloux-Prayer, alexander.deucher, amd-gfx,
	Philipp Stanner, dri-devel

On 6/13/25 01:48, Danilo Krummrich wrote:
> On Thu, Jun 12, 2025 at 09:00:34AM +0200, Christian König wrote:
>> On 6/11/25 17:11, Danilo Krummrich wrote:
>>>>>> Mhm, reiterating our internal discussion on the mailing list.
>>>>>>
>>>>>> I think it would be nicer if we could use negative values for the kernel submissions and positive for userspace. But as discussed internally we would need to adjust the scheduler trace points for that once more.
>>>>>>
>>>>>> @Philip and @Danilo any opinion on that?
>>>>>
>>>>> Both, the U64_MAX and the positive-negative approach, are a bit hacky. I wonder
>>>>> why we need client_id to be a u64, wouldn't a u32 not be enough?
>>>>
>>>> That can trivially overflow on long running boxes.
>>>
>>> I don't know if "trivially" is the word of choice given that the number is
>>> 4,294,967,295.
>>>
>>> But I did indeed miss that this is a for ever increasing atomic. Why is it an
>>> atomic? Why is it not an IDA?
>>
>> Well IDA has some extra overhead compared to an ever increasing atomic, additional to that it might not be the best choice to re-use numbers for clients in a trace log.
> 
> I think the overhead is not relevant at all, this is called from
> drm_file_alloc(). The only path I can see where this is called is
> drm_client_init(), which isn't high frequent stuff at all, is it?

I don't think so. But we should really use ida_alloc_cyclic to make sure that numbers are not re-used so quickly.

> 
> It seems to me that we should probably use IDA here.
> 
>> On the other hand using smaller numbers is usually nicer for manual inspection.
> 
> Another option is to just add an interface to get a kernel client_id from the
> same atomic / IDA.

That won't give us fixed numbers for in kernel clients.

Regards,
Christian.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v1] drm/amdgpu: give each kernel job a unique id
  2025-06-13  8:23           ` Christian König
@ 2025-06-13  8:29             ` Philipp Stanner
  2025-06-13  8:42               ` Danilo Krummrich
  2025-06-13  8:35             ` Danilo Krummrich
  1 sibling, 1 reply; 16+ messages in thread
From: Philipp Stanner @ 2025-06-13  8:29 UTC (permalink / raw)
  To: Christian König, Danilo Krummrich
  Cc: Pierre-Eric Pelloux-Prayer, alexander.deucher, amd-gfx,
	Philipp Stanner, dri-devel

On Fri, 2025-06-13 at 10:23 +0200, Christian König wrote:
> On 6/13/25 01:48, Danilo Krummrich wrote:
> > On Thu, Jun 12, 2025 at 09:00:34AM +0200, Christian König wrote:
> > > On 6/11/25 17:11, Danilo Krummrich wrote:
> > > > > > > Mhm, reiterating our internal discussion on the mailing
> > > > > > > list.
> > > > > > > 
> > > > > > > I think it would be nicer if we could use negative values
> > > > > > > for the kernel submissions and positive for userspace.
> > > > > > > But as discussed internally we would need to adjust the
> > > > > > > scheduler trace points for that once more.
> > > > > > > 
> > > > > > > @Philip and @Danilo any opinion on that?
> > > > > > 
> > > > > > Both, the U64_MAX and the positive-negative approach, are a
> > > > > > bit hacky. I wonder
> > > > > > why we need client_id to be a u64, wouldn't a u32 not be
> > > > > > enough?
> > > > > 
> > > > > That can trivially overflow on long running boxes.
> > > > 
> > > > I don't know if "trivially" is the word of choice given that
> > > > the number is
> > > > 4,294,967,295.
> > > > 
> > > > But I did indeed miss that this is a for ever increasing
> > > > atomic. Why is it an
> > > > atomic? Why is it not an IDA?
> > > 
> > > Well IDA has some extra overhead compared to an ever increasing
> > > atomic, additional to that it might not be the best choice to re-
> > > use numbers for clients in a trace log.
> > 
> > I think the overhead is not relevant at all, this is called from
> > drm_file_alloc(). The only path I can see where this is called is
> > drm_client_init(), which isn't high frequent stuff at all, is it?
> 
> I don't think so. But we should really use ida_alloc_cyclic to make
> sure that numbers are not re-used so quickly.

Shouldn't the xarray be used nowadays for ID allocation? I think
idr_alloc_cyclic() (ida_alloc_cyclic() doesn't exist) is just a wrapper
around the xarray anyways.

P.


> 
> > 
> > It seems to me that we should probably use IDA here.
> > 
> > > On the other hand using smaller numbers is usually nicer for
> > > manual inspection.
> > 
> > Another option is to just add an interface to get a kernel
> > client_id from the
> > same atomic / IDA.
> 
> That won't give us fixed numbers for in kernel clients.
> 
> Regards,
> Christian.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v1] drm/amdgpu: give each kernel job a unique id
  2025-06-13  7:51           ` Pierre-Eric Pelloux-Prayer
@ 2025-06-13  8:31             ` Danilo Krummrich
  0 siblings, 0 replies; 16+ messages in thread
From: Danilo Krummrich @ 2025-06-13  8:31 UTC (permalink / raw)
  To: Pierre-Eric Pelloux-Prayer
  Cc: Christian König, Pierre-Eric Pelloux-Prayer,
	alexander.deucher, amd-gfx, Philipp Stanner, dri-devel

On Fri, Jun 13, 2025 at 09:51:27AM +0200, Pierre-Eric Pelloux-Prayer wrote:
> If the theoretical overlap with client drm id is a concern, adding code to
> drm_file_alloc() to not use the last 0xff ids would be easy.

Not the theoretical overlap is not of any concern for me. What I dislike is
drivers silently relying on DRM API internals.

> btw maybe other drivers also use kernel jobs with the same semantics, in
> which case we might want to move the special IDs definition to a shared
> place.

Exactly, if kernel IDs are a thing we agree makes sense, we should provide a
common API to properly obtain a kernel ID, regardless of whether it's an atomic
or IDA.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v1] drm/amdgpu: give each kernel job a unique id
  2025-06-13  8:23           ` Christian König
  2025-06-13  8:29             ` Philipp Stanner
@ 2025-06-13  8:35             ` Danilo Krummrich
  2025-06-13  9:27               ` Pierre-Eric Pelloux-Prayer
  1 sibling, 1 reply; 16+ messages in thread
From: Danilo Krummrich @ 2025-06-13  8:35 UTC (permalink / raw)
  To: Christian König
  Cc: Pierre-Eric Pelloux-Prayer, alexander.deucher, amd-gfx,
	Philipp Stanner, dri-devel

On Fri, Jun 13, 2025 at 10:23:15AM +0200, Christian König wrote:
> > Another option is to just add an interface to get a kernel client_id from the
> > same atomic / IDA.
> 
> That won't give us fixed numbers for in kernel clients.

That's fine, then let's come up with an API that reserves fixed numbers.

My main concern is that drivers silently rely on DRM API internals, i.e. that
client_id is an u64 atomic, etc.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v1] drm/amdgpu: give each kernel job a unique id
  2025-06-13  8:29             ` Philipp Stanner
@ 2025-06-13  8:42               ` Danilo Krummrich
  0 siblings, 0 replies; 16+ messages in thread
From: Danilo Krummrich @ 2025-06-13  8:42 UTC (permalink / raw)
  To: phasta
  Cc: Christian König, Pierre-Eric Pelloux-Prayer,
	alexander.deucher, amd-gfx, dri-devel

On Fri, Jun 13, 2025 at 10:29:35AM +0200, Philipp Stanner wrote:
> Shouldn't the xarray be used nowadays for ID allocation? I think
> idr_alloc_cyclic() (ida_alloc_cyclic() doesn't exist) is just a wrapper
> around the xarray anyways.

Yes, IDR is basically xarray under the hood. But, a raw xarray (or IDR) is
unnecessary overhead, since we don't need to store pointers.

While IDA also uses xarray under the hood, it does perform some tricks to
provide only ID numbers, but much more efficiently.

Given that there's no ida_alloc_cyclic(), the atomic is probably fine, even
though, ida_alloc_cyclic() doesn't sound like a bad idea to add in general.

But maybe it's difficult due to some of the optimization tricks IDA performs.

I think we should stick to the atomic for now, but add a proper API to reserve
kernel IDs.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v1] drm/amdgpu: give each kernel job a unique id
  2025-06-13  8:35             ` Danilo Krummrich
@ 2025-06-13  9:27               ` Pierre-Eric Pelloux-Prayer
  2025-06-13 11:48                 ` Danilo Krummrich
  0 siblings, 1 reply; 16+ messages in thread
From: Pierre-Eric Pelloux-Prayer @ 2025-06-13  9:27 UTC (permalink / raw)
  To: Danilo Krummrich, Christian König
  Cc: Pierre-Eric Pelloux-Prayer, alexander.deucher, amd-gfx,
	Philipp Stanner, dri-devel



Le 13/06/2025 à 10:35, Danilo Krummrich a écrit :
> On Fri, Jun 13, 2025 at 10:23:15AM +0200, Christian König wrote:
>>> Another option is to just add an interface to get a kernel client_id from the
>>> same atomic / IDA.
>>
>> That won't give us fixed numbers for in kernel clients.
> 
> That's fine, then let's come up with an API that reserves fixed numbers.
> 
> My main concern is that drivers silently rely on DRM API internals, i.e. that
> client_id is an u64 atomic, etc.

Let me express the need then: an id that is printed in gpu_scheduler_trace events and this id needs 
to be mappable by a userspace tool to a semantic.

The current solution implements this need using:
* fixed ids which are u64 numbers because drm client id are u64.
* hard-coded mapping in the tool between these ids and their meaning ("u64_max - 1" interpreted as 
"vm_update" and so on).

It doesn't depend on how drm internally manage these ids.

Adding an API to reserve fixed numbers would work but:
* if the fixed numbers are chosen by the driver ("drm_reserve_id(u64_max -1)"), I don't see the 
benefit over the current patch
* if the fixed numbers are allocated by drm (drm_reserve_id("vm_update") -> u64), it would then 
require a way to expose them to userspace (through /sys/kernel/debug/dri/x/clients?)


Thanks,
Pierre-Eric

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v1] drm/amdgpu: give each kernel job a unique id
  2025-06-13  9:27               ` Pierre-Eric Pelloux-Prayer
@ 2025-06-13 11:48                 ` Danilo Krummrich
  2025-06-18  9:18                   ` Pierre-Eric Pelloux-Prayer
  0 siblings, 1 reply; 16+ messages in thread
From: Danilo Krummrich @ 2025-06-13 11:48 UTC (permalink / raw)
  To: Pierre-Eric Pelloux-Prayer
  Cc: Christian König, Pierre-Eric Pelloux-Prayer,
	alexander.deucher, amd-gfx, Philipp Stanner, dri-devel

On Fri, Jun 13, 2025 at 11:27:08AM +0200, Pierre-Eric Pelloux-Prayer wrote:
> Le 13/06/2025 à 10:35, Danilo Krummrich a écrit :
> > On Fri, Jun 13, 2025 at 10:23:15AM +0200, Christian König wrote:
> > > > Another option is to just add an interface to get a kernel client_id from the
> > > > same atomic / IDA.
> > > 
> > > That won't give us fixed numbers for in kernel clients.
> > 
> > That's fine, then let's come up with an API that reserves fixed numbers.
> > 
> > My main concern is that drivers silently rely on DRM API internals, i.e. that
> > client_id is an u64 atomic, etc.
> 
> Let me express the need then: an id that is printed in gpu_scheduler_trace
> events and this id needs to be mappable by a userspace tool to a semantic.
> 
> The current solution implements this need using:
> * fixed ids which are u64 numbers because drm client id are u64.
> * hard-coded mapping in the tool between these ids and their meaning
> ("u64_max - 1" interpreted as "vm_update" and so on).
> 
> It doesn't depend on how drm internally manage these ids.
> 
> Adding an API to reserve fixed numbers would work but:
> * if the fixed numbers are chosen by the driver ("drm_reserve_id(u64_max
> -1)"), I don't see the benefit over the current patch
> * if the fixed numbers are allocated by drm (drm_reserve_id("vm_update") ->
> u64), it would then require a way to expose them to userspace (through
> /sys/kernel/debug/dri/x/clients?)

Yeah, both is possible, I'm fine with either.

The benefit is that this way it becomes an official API, which can (and must)
be considered should there ever be a change on drm_file::client_id.

If someone would patch drm_file::client_id to start allocating IDs from high to
low, your corresponding driver code would silently break, because it relies on
an implementation detail of something that is not an official API.

Yes, I know that this exact case won't happen, but I guess you get the idea. :-)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v1] drm/amdgpu: give each kernel job a unique id
  2025-06-13 11:48                 ` Danilo Krummrich
@ 2025-06-18  9:18                   ` Pierre-Eric Pelloux-Prayer
  2025-07-02  9:23                     ` Pierre-Eric Pelloux-Prayer
  0 siblings, 1 reply; 16+ messages in thread
From: Pierre-Eric Pelloux-Prayer @ 2025-06-18  9:18 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Christian König, Pierre-Eric Pelloux-Prayer,
	alexander.deucher, amd-gfx, Philipp Stanner, dri-devel




>>
>> Adding an API to reserve fixed numbers would work but:
>> * if the fixed numbers are chosen by the driver ("drm_reserve_id(u64_max
>> -1)"), I don't see the benefit over the current patch
>> * if the fixed numbers are allocated by drm (drm_reserve_id("vm_update") ->
>> u64), it would then require a way to expose them to userspace (through
>> /sys/kernel/debug/dri/x/clients?)
> 
> Yeah, both is possible, I'm fine with either.
> 
> The benefit is that this way it becomes an official API, which can (and must)
> be considered should there ever be a change on drm_file::client_id.
> 
> If someone would patch drm_file::client_id to start allocating IDs from high to
> low, your corresponding driver code would silently break, because it relies on
> an implementation detail of something that is not an official API.
> 
> Yes, I know that this exact case won't happen, but I guess you get the idea. :-)

After looking a bit more into this, I came to the conclusion that IMO the 2 above options aren't great:
* the first adds a function that expose the possibility of reserving an id so we'll have to keep 
track of such reserved ids for a benefit that is limited at best
* the second option is nicer on the surface because it doesn't make the tools dependent on 
hard-coded kernel ids. But it also requires quite a bit of changes and memory usage allocations.

Honestly I'm wondering if adding a comment to drm_file_alloc like this would be enough;

    /* Get a unique identifier for fdinfo.
     * The highest ids may be used by drivers for tracing purposes. Overlapping is
     * unlikely to occur, and if it does the impact will be limited to tracing.
     */
    file->client_id = atomic64_inc_return(&ident);

What do you think?

Pierre-Eric




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v1] drm/amdgpu: give each kernel job a unique id
  2025-06-18  9:18                   ` Pierre-Eric Pelloux-Prayer
@ 2025-07-02  9:23                     ` Pierre-Eric Pelloux-Prayer
  2025-07-02  9:42                       ` Danilo Krummrich
  0 siblings, 1 reply; 16+ messages in thread
From: Pierre-Eric Pelloux-Prayer @ 2025-07-02  9:23 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Christian König, Pierre-Eric Pelloux-Prayer,
	alexander.deucher, amd-gfx, Philipp Stanner, dri-devel



Le 18/06/2025 à 11:18, Pierre-Eric Pelloux-Prayer a écrit :
> 
> 
> 
>>>
>>> Adding an API to reserve fixed numbers would work but:
>>> * if the fixed numbers are chosen by the driver ("drm_reserve_id(u64_max
>>> -1)"), I don't see the benefit over the current patch
>>> * if the fixed numbers are allocated by drm (drm_reserve_id("vm_update") ->
>>> u64), it would then require a way to expose them to userspace (through
>>> /sys/kernel/debug/dri/x/clients?)
>>
>> Yeah, both is possible, I'm fine with either.
>>
>> The benefit is that this way it becomes an official API, which can (and must)
>> be considered should there ever be a change on drm_file::client_id.
>>
>> If someone would patch drm_file::client_id to start allocating IDs from high to
>> low, your corresponding driver code would silently break, because it relies on
>> an implementation detail of something that is not an official API.
>>
>> Yes, I know that this exact case won't happen, but I guess you get the idea. :-)
> 
> After looking a bit more into this, I came to the conclusion that IMO the 2 above options aren't great:
> * the first adds a function that expose the possibility of reserving an id so we'll have to keep 
> track of such reserved ids for a benefit that is limited at best
> * the second option is nicer on the surface because it doesn't make the tools dependent on hard- 
> coded kernel ids. But it also requires quite a bit of changes and memory usage allocations.
> 
> Honestly I'm wondering if adding a comment to drm_file_alloc like this would be enough;
> 
>     /* Get a unique identifier for fdinfo.
>      * The highest ids may be used by drivers for tracing purposes. Overlapping is
>      * unlikely to occur, and if it does the impact will be limited to tracing.
>      */
>     file->client_id = atomic64_inc_return(&ident);
> 
> What do you think?
> 

ping?

btw, I don't think that adding a comment to drm_file is even useful.

What the original patch does is passing opaque ids to a function that expects
client_id (drm_sched_job_init).
These opaque ids could have any values, they won't interfere with fdinfo statistics
nor the driver inner working - they're just for tracing purpose.

Pierre-Eric


> Pierre-Eric
> 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v1] drm/amdgpu: give each kernel job a unique id
  2025-07-02  9:23                     ` Pierre-Eric Pelloux-Prayer
@ 2025-07-02  9:42                       ` Danilo Krummrich
  0 siblings, 0 replies; 16+ messages in thread
From: Danilo Krummrich @ 2025-07-02  9:42 UTC (permalink / raw)
  To: Pierre-Eric Pelloux-Prayer
  Cc: Christian König, Pierre-Eric Pelloux-Prayer,
	alexander.deucher, amd-gfx, Philipp Stanner, dri-devel

On Wed, Jul 02, 2025 at 11:23:26AM +0200, Pierre-Eric Pelloux-Prayer wrote:
> 
> 
> Le 18/06/2025 à 11:18, Pierre-Eric Pelloux-Prayer a écrit :
> > 
> > 
> > 
> > > > 
> > > > Adding an API to reserve fixed numbers would work but:
> > > > * if the fixed numbers are chosen by the driver ("drm_reserve_id(u64_max
> > > > -1)"), I don't see the benefit over the current patch
> > > > * if the fixed numbers are allocated by drm (drm_reserve_id("vm_update") ->
> > > > u64), it would then require a way to expose them to userspace (through
> > > > /sys/kernel/debug/dri/x/clients?)
> > > 
> > > Yeah, both is possible, I'm fine with either.
> > > 
> > > The benefit is that this way it becomes an official API, which can (and must)
> > > be considered should there ever be a change on drm_file::client_id.
> > > 
> > > If someone would patch drm_file::client_id to start allocating IDs from high to
> > > low, your corresponding driver code would silently break, because it relies on
> > > an implementation detail of something that is not an official API.
> > > 
> > > Yes, I know that this exact case won't happen, but I guess you get the idea. :-)
> > 
> > After looking a bit more into this, I came to the conclusion that IMO the 2 above options aren't great:
> > * the first adds a function that expose the possibility of reserving an
> > id so we'll have to keep track of such reserved ids for a benefit that
> > is limited at best
> > * the second option is nicer on the surface because it doesn't make the
> > tools dependent on hard- coded kernel ids. But it also requires quite a
> > bit of changes and memory usage allocations.
> > 
> > Honestly I'm wondering if adding a comment to drm_file_alloc like this would be enough;
> > 
> >     /* Get a unique identifier for fdinfo.
> >      * The highest ids may be used by drivers for tracing purposes. Overlapping is
> >      * unlikely to occur, and if it does the impact will be limited to tracing.
> >      */
> >     file->client_id = atomic64_inc_return(&ident);
> > 
> > What do you think?
> > 
> 
> ping?
> 
> btw, I don't think that adding a comment to drm_file is even useful.
> 
> What the original patch does is passing opaque ids to a function that expects
> client_id (drm_sched_job_init).
> These opaque ids could have any values, they won't interfere with fdinfo statistics
> nor the driver inner working - they're just for tracing purpose.

I mean, you're right, you can definitely do that, it's entirely up to the driver
what to pass as a debug cookie to drm_sched_job_init().

I'm just saying that you're completely on your own if the implementation of
file->client_id would change (which admittedly is unlikely). In such a case
you'd have to accept that potentially a change silently breaks your driver and
that people are free to ignore this fact.

In this case it's probably not that big a deal, but still I like to create some
awareness that this class of solutions (i.e. rely on how generic infrastructure
works internally) is usually not a good idea at all, since it's error prone and
is giving maintainers a hard time.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2025-07-02  9:42 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <2cefb0f5-3820-4828-8360-ff8c92e21aa6@amd.com>
2025-06-11 14:25 ` [PATCH v1] drm/amdgpu: give each kernel job a unique id Danilo Krummrich
2025-06-11 14:57   ` Christian König
2025-06-11 15:11     ` Danilo Krummrich
2025-06-12  7:00       ` Christian König
2025-06-12 23:48         ` Danilo Krummrich
2025-06-13  7:51           ` Pierre-Eric Pelloux-Prayer
2025-06-13  8:31             ` Danilo Krummrich
2025-06-13  8:23           ` Christian König
2025-06-13  8:29             ` Philipp Stanner
2025-06-13  8:42               ` Danilo Krummrich
2025-06-13  8:35             ` Danilo Krummrich
2025-06-13  9:27               ` Pierre-Eric Pelloux-Prayer
2025-06-13 11:48                 ` Danilo Krummrich
2025-06-18  9:18                   ` Pierre-Eric Pelloux-Prayer
2025-07-02  9:23                     ` Pierre-Eric Pelloux-Prayer
2025-07-02  9:42                       ` Danilo Krummrich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).