dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH] drm/sched: Avoid killing entity last used by parent on child SIGKILL
       [not found] <20251002150524.7552-3-david.rosca@amd.com>
@ 2025-10-09  7:01 ` Christian König
  2025-10-09 15:04   ` Christian König
  0 siblings, 1 reply; 4+ messages in thread
From: Christian König @ 2025-10-09  7:01 UTC (permalink / raw)
  To: David Rosca, Philipp Stanner, Danilo Krummrich, Matthew Brost
  Cc: amd-gfx, dri-devel@lists.freedesktop.org

On 02.10.25 17:05, David Rosca wrote:
> drm_sched_entity_flush should only kill the entity if the current
> process is the last user of the entity. The last_user is only set
> when adding new job, so entities that had no jobs submitted to them
> have NULL last_user and would always be killed.
> Another issue is setting last_user to NULL from drm_sched_entity_flush,
> which causes subsequent calls to kill the entity.
> 
> Signed-off-by: David Rosca <david.rosca@amd.com>
> Fixes: 51564e9f06f0 ("drm/amdgpu: Avoid extra evict-restore process.")

Good catch, but in general please CC the relevant maintainers and mailing lists for scheduler patches.

> ---
>  drivers/gpu/drm/scheduler/sched_entity.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
> index 8867b95ab089..a325e4a59990 100644
> --- a/drivers/gpu/drm/scheduler/sched_entity.c
> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> @@ -70,6 +70,7 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
>  	entity->guilty = guilty;
>  	entity->num_sched_list = num_sched_list;
>  	entity->priority = priority;
> +	entity->last_user = current->group_leader;
>  	/*
>  	 * It's perfectly valid to initialize an entity without having a valid
>  	 * scheduler attached. It's just not valid to use the scheduler before it
> @@ -278,7 +279,6 @@ static void drm_sched_entity_kill(struct drm_sched_entity *entity)
>  long drm_sched_entity_flush(struct drm_sched_entity *entity, long timeout)
>  {
>  	struct drm_gpu_scheduler *sched;
> -	struct task_struct *last_user;
>  	long ret = timeout;
>  
>  	if (!entity->rq)
> @@ -301,8 +301,7 @@ long drm_sched_entity_flush(struct drm_sched_entity *entity, long timeout)
>  	}
>  
>  	/* For killed process disable any more IBs enqueue right now */
> -	last_user = cmpxchg(&entity->last_user, current->group_leader, NULL);
> -	if ((!last_user || last_user == current->group_leader) &&
> +	if (entity->last_user == current->group_leader &&

You still need the cmpxchg() here or otherwise drm_sched_entity_kill() would run multiple times.

Regards,
Christian.

>  	    (current->flags & PF_EXITING) && (current->exit_code == SIGKILL))
>  		drm_sched_entity_kill(entity);
>  


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] drm/sched: Avoid killing entity last used by parent on child SIGKILL
  2025-10-09  7:01 ` [PATCH] drm/sched: Avoid killing entity last used by parent on child SIGKILL Christian König
@ 2025-10-09 15:04   ` Christian König
  2025-10-10 12:28     ` Christian König
  0 siblings, 1 reply; 4+ messages in thread
From: Christian König @ 2025-10-09 15:04 UTC (permalink / raw)
  To: David Rosca, Philipp Stanner, Danilo Krummrich, Matthew Brost
  Cc: amd-gfx, dri-devel@lists.freedesktop.org, Prosyak, Vitaly

FYI

On 09.10.25 09:01, Christian König wrote:
> On 02.10.25 17:05, David Rosca wrote:
>> drm_sched_entity_flush should only kill the entity if the current
>> process is the last user of the entity. The last_user is only set
>> when adding new job, so entities that had no jobs submitted to them
>> have NULL last_user and would always be killed.
>> Another issue is setting last_user to NULL from drm_sched_entity_flush,
>> which causes subsequent calls to kill the entity.
>>
>> Signed-off-by: David Rosca <david.rosca@amd.com>
>> Fixes: 51564e9f06f0 ("drm/amdgpu: Avoid extra evict-restore process.")
> 
> Good catch, but in general please CC the relevant maintainers and mailing lists for scheduler patches.
> 
>> ---
>>  drivers/gpu/drm/scheduler/sched_entity.c | 5 ++---
>>  1 file changed, 2 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
>> index 8867b95ab089..a325e4a59990 100644
>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
>> @@ -70,6 +70,7 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
>>  	entity->guilty = guilty;
>>  	entity->num_sched_list = num_sched_list;
>>  	entity->priority = priority;
>> +	entity->last_user = current->group_leader;
>>  	/*
>>  	 * It's perfectly valid to initialize an entity without having a valid
>>  	 * scheduler attached. It's just not valid to use the scheduler before it
>> @@ -278,7 +279,6 @@ static void drm_sched_entity_kill(struct drm_sched_entity *entity)
>>  long drm_sched_entity_flush(struct drm_sched_entity *entity, long timeout)
>>  {
>>  	struct drm_gpu_scheduler *sched;
>> -	struct task_struct *last_user;
>>  	long ret = timeout;
>>  
>>  	if (!entity->rq)
>> @@ -301,8 +301,7 @@ long drm_sched_entity_flush(struct drm_sched_entity *entity, long timeout)
>>  	}
>>  
>>  	/* For killed process disable any more IBs enqueue right now */
>> -	last_user = cmpxchg(&entity->last_user, current->group_leader, NULL);
>> -	if ((!last_user || last_user == current->group_leader) &&
>> +	if (entity->last_user == current->group_leader &&
> 
> You still need the cmpxchg() here or otherwise drm_sched_entity_kill() would run multiple times.
> 
> Regards,
> Christian.
> 
>>  	    (current->flags & PF_EXITING) && (current->exit_code == SIGKILL))
>>  		drm_sched_entity_kill(entity);
>>  
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] drm/sched: Avoid killing entity last used by parent on child SIGKILL
  2025-10-09 15:04   ` Christian König
@ 2025-10-10 12:28     ` Christian König
  2025-10-10 12:48       ` David Rosca
  0 siblings, 1 reply; 4+ messages in thread
From: Christian König @ 2025-10-10 12:28 UTC (permalink / raw)
  To: David Rosca, Philipp Stanner, Danilo Krummrich, Matthew Brost
  Cc: amd-gfx, dri-devel@lists.freedesktop.org, Prosyak, Vitaly

David any objections that I take this patch and make the necessary modifications?

People are pinging me about the problem.

Regards,
Christian.

On 09.10.25 17:04, Christian König wrote:
> FYI
> 
> On 09.10.25 09:01, Christian König wrote:
>> On 02.10.25 17:05, David Rosca wrote:
>>> drm_sched_entity_flush should only kill the entity if the current
>>> process is the last user of the entity. The last_user is only set
>>> when adding new job, so entities that had no jobs submitted to them
>>> have NULL last_user and would always be killed.
>>> Another issue is setting last_user to NULL from drm_sched_entity_flush,
>>> which causes subsequent calls to kill the entity.
>>>
>>> Signed-off-by: David Rosca <david.rosca@amd.com>
>>> Fixes: 51564e9f06f0 ("drm/amdgpu: Avoid extra evict-restore process.")
>>
>> Good catch, but in general please CC the relevant maintainers and mailing lists for scheduler patches.
>>
>>> ---
>>>  drivers/gpu/drm/scheduler/sched_entity.c | 5 ++---
>>>  1 file changed, 2 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
>>> index 8867b95ab089..a325e4a59990 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
>>> @@ -70,6 +70,7 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
>>>  	entity->guilty = guilty;
>>>  	entity->num_sched_list = num_sched_list;
>>>  	entity->priority = priority;
>>> +	entity->last_user = current->group_leader;
>>>  	/*
>>>  	 * It's perfectly valid to initialize an entity without having a valid
>>>  	 * scheduler attached. It's just not valid to use the scheduler before it
>>> @@ -278,7 +279,6 @@ static void drm_sched_entity_kill(struct drm_sched_entity *entity)
>>>  long drm_sched_entity_flush(struct drm_sched_entity *entity, long timeout)
>>>  {
>>>  	struct drm_gpu_scheduler *sched;
>>> -	struct task_struct *last_user;
>>>  	long ret = timeout;
>>>  
>>>  	if (!entity->rq)
>>> @@ -301,8 +301,7 @@ long drm_sched_entity_flush(struct drm_sched_entity *entity, long timeout)
>>>  	}
>>>  
>>>  	/* For killed process disable any more IBs enqueue right now */
>>> -	last_user = cmpxchg(&entity->last_user, current->group_leader, NULL);
>>> -	if ((!last_user || last_user == current->group_leader) &&
>>> +	if (entity->last_user == current->group_leader &&
>>
>> You still need the cmpxchg() here or otherwise drm_sched_entity_kill() would run multiple times.
>>
>> Regards,
>> Christian.
>>
>>>  	    (current->flags & PF_EXITING) && (current->exit_code == SIGKILL))
>>>  		drm_sched_entity_kill(entity);
>>>  
>>
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] drm/sched: Avoid killing entity last used by parent on child SIGKILL
  2025-10-10 12:28     ` Christian König
@ 2025-10-10 12:48       ` David Rosca
  0 siblings, 0 replies; 4+ messages in thread
From: David Rosca @ 2025-10-10 12:48 UTC (permalink / raw)
  To: Christian König, Philipp Stanner, Danilo Krummrich,
	Matthew Brost
  Cc: amd-gfx, dri-devel@lists.freedesktop.org, Prosyak, Vitaly


On 10. 10. 25 14:28, Christian König wrote:
> David any objections that I take this patch and make the necessary modifications?

Sure, please go ahead.

Thanks,
David

>
> People are pinging me about the problem.
>
> Regards,
> Christian.
>
> On 09.10.25 17:04, Christian König wrote:
>> FYI
>>
>> On 09.10.25 09:01, Christian König wrote:
>>> On 02.10.25 17:05, David Rosca wrote:
>>>> drm_sched_entity_flush should only kill the entity if the current
>>>> process is the last user of the entity. The last_user is only set
>>>> when adding new job, so entities that had no jobs submitted to them
>>>> have NULL last_user and would always be killed.
>>>> Another issue is setting last_user to NULL from drm_sched_entity_flush,
>>>> which causes subsequent calls to kill the entity.
>>>>
>>>> Signed-off-by: David Rosca <david.rosca@amd.com>
>>>> Fixes: 51564e9f06f0 ("drm/amdgpu: Avoid extra evict-restore process.")
>>> Good catch, but in general please CC the relevant maintainers and mailing lists for scheduler patches.
>>>
>>>> ---
>>>>   drivers/gpu/drm/scheduler/sched_entity.c | 5 ++---
>>>>   1 file changed, 2 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
>>>> index 8867b95ab089..a325e4a59990 100644
>>>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
>>>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
>>>> @@ -70,6 +70,7 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
>>>>   	entity->guilty = guilty;
>>>>   	entity->num_sched_list = num_sched_list;
>>>>   	entity->priority = priority;
>>>> +	entity->last_user = current->group_leader;
>>>>   	/*
>>>>   	 * It's perfectly valid to initialize an entity without having a valid
>>>>   	 * scheduler attached. It's just not valid to use the scheduler before it
>>>> @@ -278,7 +279,6 @@ static void drm_sched_entity_kill(struct drm_sched_entity *entity)
>>>>   long drm_sched_entity_flush(struct drm_sched_entity *entity, long timeout)
>>>>   {
>>>>   	struct drm_gpu_scheduler *sched;
>>>> -	struct task_struct *last_user;
>>>>   	long ret = timeout;
>>>>   
>>>>   	if (!entity->rq)
>>>> @@ -301,8 +301,7 @@ long drm_sched_entity_flush(struct drm_sched_entity *entity, long timeout)
>>>>   	}
>>>>   
>>>>   	/* For killed process disable any more IBs enqueue right now */
>>>> -	last_user = cmpxchg(&entity->last_user, current->group_leader, NULL);
>>>> -	if ((!last_user || last_user == current->group_leader) &&
>>>> +	if (entity->last_user == current->group_leader &&
>>> You still need the cmpxchg() here or otherwise drm_sched_entity_kill() would run multiple times.
>>>
>>> Regards,
>>> Christian.
>>>
>>>>   	    (current->flags & PF_EXITING) && (current->exit_code == SIGKILL))
>>>>   		drm_sched_entity_kill(entity);
>>>>   

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-10-10 12:49 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20251002150524.7552-3-david.rosca@amd.com>
2025-10-09  7:01 ` [PATCH] drm/sched: Avoid killing entity last used by parent on child SIGKILL Christian König
2025-10-09 15:04   ` Christian König
2025-10-10 12:28     ` Christian König
2025-10-10 12:48       ` David Rosca

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).