From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?UTF-8?Q?Christian_K=c3=b6nig?= Subject: Re: [PATCH] drm/scheduler: Remove entity->rq NULL check Date: Fri, 10 Aug 2018 15:27:50 +0200 Message-ID: <5bf40a54-18f9-98fd-a3df-dd0b8da0a424@gmail.com> References: <20180803110810.2055-1-christian.koenig@amd.com> <54621fc1-7246-f1bf-26bb-a16c4daf249f@amd.com> Reply-To: christian.koenig@amd.com Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0421314278==" Return-path: Received: from mail-wr1-x441.google.com (mail-wr1-x441.google.com [IPv6:2a00:1450:4864:20::441]) by gabe.freedesktop.org (Postfix) with ESMTPS id E76876E921 for ; Fri, 10 Aug 2018 13:27:53 +0000 (UTC) Received: by mail-wr1-x441.google.com with SMTP id h15-v6so8304106wrs.7 for ; Fri, 10 Aug 2018 06:27:53 -0700 (PDT) In-Reply-To: <54621fc1-7246-f1bf-26bb-a16c4daf249f@amd.com> Content-Language: en-US List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: Andrey Grodzovsky , Nayan Deshmukh Cc: Maling list - DRI developers List-Id: dri-devel@lists.freedesktop.org This is a multi-part message in MIME format. --===============0421314278== Content-Type: multipart/alternative; boundary="------------9D26DED1801298775333E33A" Content-Language: en-US This is a multi-part message in MIME format. --------------9D26DED1801298775333E33A Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Crap, yeah indeed that needs to be protected by some lock. Going to prepare a patch for that, Christian. Am 09.08.2018 um 21:49 schrieb Andrey Grodzovsky: > > Reviewed-by: Andrey Grodzovsky > > > But I still  have questions about entity->last_user (didn't notice > this before) - > > Looks to me there is a race condition with it's current usage, let's > say process A was preempted after doing > drm_sched_entity_flush->cmpxchg(...) > > now process B working on same entity (forked) is inside > drm_sched_entity_push_job, he writes his PID to entity->last_user and also > > executes drm_sched_rq_add_entity. Now process A runs again and execute > drm_sched_rq_remove_entity inadvertently causing process B removal > > from it's scheduler rq. > > Looks to me like instead we should lock together entity->last_user > accesses and adds/removals of entity to the rq. > > Andrey > > > On 08/06/2018 10:18 AM, Nayan Deshmukh wrote: >> I forgot about this since we started discussing possible scenarios of >> processes and threads. >> >> In any case, this check is redundant. Acked-by: Nayan Deshmukh >> > >> >> Nayan >> >> On Mon, Aug 6, 2018 at 7:43 PM Christian König >> > > wrote: >> >> Ping. Any objections to that? >> >> Christian. >> >> Am 03.08.2018 um 13:08 schrieb Christian König: >> > That is superflous now. >> > >> > Signed-off-by: Christian König > > >> > --- >> >   drivers/gpu/drm/scheduler/gpu_scheduler.c | 5 ----- >> >   1 file changed, 5 deletions(-) >> > >> > diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c >> b/drivers/gpu/drm/scheduler/gpu_scheduler.c >> > index 85908c7f913e..65078dd3c82c 100644 >> > --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c >> > +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c >> > @@ -590,11 +590,6 @@ void drm_sched_entity_push_job(struct >> drm_sched_job *sched_job, >> >       if (first) { >> >               /* Add the entity to the run queue */ >> >               spin_lock(&entity->rq_lock); >> > -             if (!entity->rq) { >> > -                     DRM_ERROR("Trying to push to a killed >> entity\n"); >> > -  spin_unlock(&entity->rq_lock); >> > -                     return; >> > -             } >> >               drm_sched_rq_add_entity(entity->rq, entity); >> >               spin_unlock(&entity->rq_lock); >> >  drm_sched_wakeup(entity->rq->sched); >> > --------------9D26DED1801298775333E33A Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit
Crap, yeah indeed that needs to be protected by some lock.

Going to prepare a patch for that,
Christian.

Am 09.08.2018 um 21:49 schrieb Andrey Grodzovsky:

Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>


But I still  have questions about entity->last_user (didn't notice this before) -

Looks to me there is a race condition with it's current usage, let's say process A was preempted after doing drm_sched_entity_flush->cmpxchg(...)

now process B working on same entity (forked) is inside drm_sched_entity_push_job, he writes his PID to entity->last_user and also

executes drm_sched_rq_add_entity. Now process A runs again and execute drm_sched_rq_remove_entity inadvertently causing process B removal

from it's scheduler rq.

Looks to me like instead we should lock together entity->last_user accesses and adds/removals of entity to the rq.

Andrey


On 08/06/2018 10:18 AM, Nayan Deshmukh wrote:
I forgot about this since we started discussing possible scenarios of processes and threads.

In any case, this check is redundant. Acked-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>

Nayan

On Mon, Aug 6, 2018 at 7:43 PM Christian König <ckoenig.leichtzumerken@gmail.com> wrote:
Ping. Any objections to that?

Christian.

Am 03.08.2018 um 13:08 schrieb Christian König:
> That is superflous now.
>
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>   drivers/gpu/drm/scheduler/gpu_scheduler.c | 5 -----
>   1 file changed, 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c b/drivers/gpu/drm/scheduler/gpu_scheduler.c
> index 85908c7f913e..65078dd3c82c 100644
> --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
> +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
> @@ -590,11 +590,6 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job,
>       if (first) {
>               /* Add the entity to the run queue */
>               spin_lock(&entity->rq_lock);
> -             if (!entity->rq) {
> -                     DRM_ERROR("Trying to push to a killed entity\n");
> -                     spin_unlock(&entity->rq_lock);
> -                     return;
> -             }
>               drm_sched_rq_add_entity(entity->rq, entity);
>               spin_unlock(&entity->rq_lock);
>               drm_sched_wakeup(entity->rq->sched);



--------------9D26DED1801298775333E33A-- --===============0421314278== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============0421314278==--