* Re: [PATCH] drm/sched: Avoid killing entity last used by parent on child SIGKILL [not found] <20251002150524.7552-3-david.rosca@amd.com> @ 2025-10-09 7:01 ` Christian König 2025-10-09 15:04 ` Christian König 0 siblings, 1 reply; 4+ messages in thread From: Christian König @ 2025-10-09 7:01 UTC (permalink / raw) To: David Rosca, Philipp Stanner, Danilo Krummrich, Matthew Brost Cc: amd-gfx, dri-devel@lists.freedesktop.org On 02.10.25 17:05, David Rosca wrote: > drm_sched_entity_flush should only kill the entity if the current > process is the last user of the entity. The last_user is only set > when adding new job, so entities that had no jobs submitted to them > have NULL last_user and would always be killed. > Another issue is setting last_user to NULL from drm_sched_entity_flush, > which causes subsequent calls to kill the entity. > > Signed-off-by: David Rosca <david.rosca@amd.com> > Fixes: 51564e9f06f0 ("drm/amdgpu: Avoid extra evict-restore process.") Good catch, but in general please CC the relevant maintainers and mailing lists for scheduler patches. > --- > drivers/gpu/drm/scheduler/sched_entity.c | 5 ++--- > 1 file changed, 2 insertions(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c > index 8867b95ab089..a325e4a59990 100644 > --- a/drivers/gpu/drm/scheduler/sched_entity.c > +++ b/drivers/gpu/drm/scheduler/sched_entity.c > @@ -70,6 +70,7 @@ int drm_sched_entity_init(struct drm_sched_entity *entity, > entity->guilty = guilty; > entity->num_sched_list = num_sched_list; > entity->priority = priority; > + entity->last_user = current->group_leader; > /* > * It's perfectly valid to initialize an entity without having a valid > * scheduler attached. It's just not valid to use the scheduler before it > @@ -278,7 +279,6 @@ static void drm_sched_entity_kill(struct drm_sched_entity *entity) > long drm_sched_entity_flush(struct drm_sched_entity *entity, long timeout) > { > struct drm_gpu_scheduler *sched; > - struct task_struct *last_user; > long ret = timeout; > > if (!entity->rq) > @@ -301,8 +301,7 @@ long drm_sched_entity_flush(struct drm_sched_entity *entity, long timeout) > } > > /* For killed process disable any more IBs enqueue right now */ > - last_user = cmpxchg(&entity->last_user, current->group_leader, NULL); > - if ((!last_user || last_user == current->group_leader) && > + if (entity->last_user == current->group_leader && You still need the cmpxchg() here or otherwise drm_sched_entity_kill() would run multiple times. Regards, Christian. > (current->flags & PF_EXITING) && (current->exit_code == SIGKILL)) > drm_sched_entity_kill(entity); > ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] drm/sched: Avoid killing entity last used by parent on child SIGKILL 2025-10-09 7:01 ` [PATCH] drm/sched: Avoid killing entity last used by parent on child SIGKILL Christian König @ 2025-10-09 15:04 ` Christian König 2025-10-10 12:28 ` Christian König 0 siblings, 1 reply; 4+ messages in thread From: Christian König @ 2025-10-09 15:04 UTC (permalink / raw) To: David Rosca, Philipp Stanner, Danilo Krummrich, Matthew Brost Cc: amd-gfx, dri-devel@lists.freedesktop.org, Prosyak, Vitaly FYI On 09.10.25 09:01, Christian König wrote: > On 02.10.25 17:05, David Rosca wrote: >> drm_sched_entity_flush should only kill the entity if the current >> process is the last user of the entity. The last_user is only set >> when adding new job, so entities that had no jobs submitted to them >> have NULL last_user and would always be killed. >> Another issue is setting last_user to NULL from drm_sched_entity_flush, >> which causes subsequent calls to kill the entity. >> >> Signed-off-by: David Rosca <david.rosca@amd.com> >> Fixes: 51564e9f06f0 ("drm/amdgpu: Avoid extra evict-restore process.") > > Good catch, but in general please CC the relevant maintainers and mailing lists for scheduler patches. > >> --- >> drivers/gpu/drm/scheduler/sched_entity.c | 5 ++--- >> 1 file changed, 2 insertions(+), 3 deletions(-) >> >> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c >> index 8867b95ab089..a325e4a59990 100644 >> --- a/drivers/gpu/drm/scheduler/sched_entity.c >> +++ b/drivers/gpu/drm/scheduler/sched_entity.c >> @@ -70,6 +70,7 @@ int drm_sched_entity_init(struct drm_sched_entity *entity, >> entity->guilty = guilty; >> entity->num_sched_list = num_sched_list; >> entity->priority = priority; >> + entity->last_user = current->group_leader; >> /* >> * It's perfectly valid to initialize an entity without having a valid >> * scheduler attached. It's just not valid to use the scheduler before it >> @@ -278,7 +279,6 @@ static void drm_sched_entity_kill(struct drm_sched_entity *entity) >> long drm_sched_entity_flush(struct drm_sched_entity *entity, long timeout) >> { >> struct drm_gpu_scheduler *sched; >> - struct task_struct *last_user; >> long ret = timeout; >> >> if (!entity->rq) >> @@ -301,8 +301,7 @@ long drm_sched_entity_flush(struct drm_sched_entity *entity, long timeout) >> } >> >> /* For killed process disable any more IBs enqueue right now */ >> - last_user = cmpxchg(&entity->last_user, current->group_leader, NULL); >> - if ((!last_user || last_user == current->group_leader) && >> + if (entity->last_user == current->group_leader && > > You still need the cmpxchg() here or otherwise drm_sched_entity_kill() would run multiple times. > > Regards, > Christian. > >> (current->flags & PF_EXITING) && (current->exit_code == SIGKILL)) >> drm_sched_entity_kill(entity); >> > ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] drm/sched: Avoid killing entity last used by parent on child SIGKILL 2025-10-09 15:04 ` Christian König @ 2025-10-10 12:28 ` Christian König 2025-10-10 12:48 ` David Rosca 0 siblings, 1 reply; 4+ messages in thread From: Christian König @ 2025-10-10 12:28 UTC (permalink / raw) To: David Rosca, Philipp Stanner, Danilo Krummrich, Matthew Brost Cc: amd-gfx, dri-devel@lists.freedesktop.org, Prosyak, Vitaly David any objections that I take this patch and make the necessary modifications? People are pinging me about the problem. Regards, Christian. On 09.10.25 17:04, Christian König wrote: > FYI > > On 09.10.25 09:01, Christian König wrote: >> On 02.10.25 17:05, David Rosca wrote: >>> drm_sched_entity_flush should only kill the entity if the current >>> process is the last user of the entity. The last_user is only set >>> when adding new job, so entities that had no jobs submitted to them >>> have NULL last_user and would always be killed. >>> Another issue is setting last_user to NULL from drm_sched_entity_flush, >>> which causes subsequent calls to kill the entity. >>> >>> Signed-off-by: David Rosca <david.rosca@amd.com> >>> Fixes: 51564e9f06f0 ("drm/amdgpu: Avoid extra evict-restore process.") >> >> Good catch, but in general please CC the relevant maintainers and mailing lists for scheduler patches. >> >>> --- >>> drivers/gpu/drm/scheduler/sched_entity.c | 5 ++--- >>> 1 file changed, 2 insertions(+), 3 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c >>> index 8867b95ab089..a325e4a59990 100644 >>> --- a/drivers/gpu/drm/scheduler/sched_entity.c >>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c >>> @@ -70,6 +70,7 @@ int drm_sched_entity_init(struct drm_sched_entity *entity, >>> entity->guilty = guilty; >>> entity->num_sched_list = num_sched_list; >>> entity->priority = priority; >>> + entity->last_user = current->group_leader; >>> /* >>> * It's perfectly valid to initialize an entity without having a valid >>> * scheduler attached. It's just not valid to use the scheduler before it >>> @@ -278,7 +279,6 @@ static void drm_sched_entity_kill(struct drm_sched_entity *entity) >>> long drm_sched_entity_flush(struct drm_sched_entity *entity, long timeout) >>> { >>> struct drm_gpu_scheduler *sched; >>> - struct task_struct *last_user; >>> long ret = timeout; >>> >>> if (!entity->rq) >>> @@ -301,8 +301,7 @@ long drm_sched_entity_flush(struct drm_sched_entity *entity, long timeout) >>> } >>> >>> /* For killed process disable any more IBs enqueue right now */ >>> - last_user = cmpxchg(&entity->last_user, current->group_leader, NULL); >>> - if ((!last_user || last_user == current->group_leader) && >>> + if (entity->last_user == current->group_leader && >> >> You still need the cmpxchg() here or otherwise drm_sched_entity_kill() would run multiple times. >> >> Regards, >> Christian. >> >>> (current->flags & PF_EXITING) && (current->exit_code == SIGKILL)) >>> drm_sched_entity_kill(entity); >>> >> > ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] drm/sched: Avoid killing entity last used by parent on child SIGKILL 2025-10-10 12:28 ` Christian König @ 2025-10-10 12:48 ` David Rosca 0 siblings, 0 replies; 4+ messages in thread From: David Rosca @ 2025-10-10 12:48 UTC (permalink / raw) To: Christian König, Philipp Stanner, Danilo Krummrich, Matthew Brost Cc: amd-gfx, dri-devel@lists.freedesktop.org, Prosyak, Vitaly On 10. 10. 25 14:28, Christian König wrote: > David any objections that I take this patch and make the necessary modifications? Sure, please go ahead. Thanks, David > > People are pinging me about the problem. > > Regards, > Christian. > > On 09.10.25 17:04, Christian König wrote: >> FYI >> >> On 09.10.25 09:01, Christian König wrote: >>> On 02.10.25 17:05, David Rosca wrote: >>>> drm_sched_entity_flush should only kill the entity if the current >>>> process is the last user of the entity. The last_user is only set >>>> when adding new job, so entities that had no jobs submitted to them >>>> have NULL last_user and would always be killed. >>>> Another issue is setting last_user to NULL from drm_sched_entity_flush, >>>> which causes subsequent calls to kill the entity. >>>> >>>> Signed-off-by: David Rosca <david.rosca@amd.com> >>>> Fixes: 51564e9f06f0 ("drm/amdgpu: Avoid extra evict-restore process.") >>> Good catch, but in general please CC the relevant maintainers and mailing lists for scheduler patches. >>> >>>> --- >>>> drivers/gpu/drm/scheduler/sched_entity.c | 5 ++--- >>>> 1 file changed, 2 insertions(+), 3 deletions(-) >>>> >>>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c >>>> index 8867b95ab089..a325e4a59990 100644 >>>> --- a/drivers/gpu/drm/scheduler/sched_entity.c >>>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c >>>> @@ -70,6 +70,7 @@ int drm_sched_entity_init(struct drm_sched_entity *entity, >>>> entity->guilty = guilty; >>>> entity->num_sched_list = num_sched_list; >>>> entity->priority = priority; >>>> + entity->last_user = current->group_leader; >>>> /* >>>> * It's perfectly valid to initialize an entity without having a valid >>>> * scheduler attached. It's just not valid to use the scheduler before it >>>> @@ -278,7 +279,6 @@ static void drm_sched_entity_kill(struct drm_sched_entity *entity) >>>> long drm_sched_entity_flush(struct drm_sched_entity *entity, long timeout) >>>> { >>>> struct drm_gpu_scheduler *sched; >>>> - struct task_struct *last_user; >>>> long ret = timeout; >>>> >>>> if (!entity->rq) >>>> @@ -301,8 +301,7 @@ long drm_sched_entity_flush(struct drm_sched_entity *entity, long timeout) >>>> } >>>> >>>> /* For killed process disable any more IBs enqueue right now */ >>>> - last_user = cmpxchg(&entity->last_user, current->group_leader, NULL); >>>> - if ((!last_user || last_user == current->group_leader) && >>>> + if (entity->last_user == current->group_leader && >>> You still need the cmpxchg() here or otherwise drm_sched_entity_kill() would run multiple times. >>> >>> Regards, >>> Christian. >>> >>>> (current->flags & PF_EXITING) && (current->exit_code == SIGKILL)) >>>> drm_sched_entity_kill(entity); >>>> ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-10-10 12:49 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20251002150524.7552-3-david.rosca@amd.com>
2025-10-09 7:01 ` [PATCH] drm/sched: Avoid killing entity last used by parent on child SIGKILL Christian König
2025-10-09 15:04 ` Christian König
2025-10-10 12:28 ` Christian König
2025-10-10 12:48 ` David Rosca
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).