From: Boris Brezillon <boris.brezillon@collabora.com>
To: Rob Herring <robh+dt@kernel.org>,
Tomeu Vizoso <tomeu@tomeuvizoso.net>,
Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>,
Steven Price <steven.price@arm.com>,
Robin Murphy <robin.murphy@arm.com>
Cc: stable@vger.kernel.org, dri-devel@lists.freedesktop.org
Subject: Re: [PATCH] panfrost: Fix job timeout handling
Date: Thu, 1 Oct 2020 16:02:48 +0200 [thread overview]
Message-ID: <20201001160248.4c2e1fee@collabora.com> (raw)
In-Reply-To: <20201001140143.1058669-1-boris.brezillon@collabora.com>
Oops, the prefix should be "drm/panfrost", will fix that in v2.
On Thu, 1 Oct 2020 16:01:43 +0200
Boris Brezillon <boris.brezillon@collabora.com> wrote:
> If more than two or more jobs end up timeout-ing concurrently, only one
> of them (the one attached to the scheduler acquiring the lock) is fully
> handled. The other one remains in a dangling state where it's no longer
> part of the scheduling queue, but still blocks something in scheduler
> thus leading to repetitive timeouts when new jobs are queued.
>
> Let's make sure all bad jobs are properly handled by the thread acquiring
> the lock.
>
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> Fixes: f3ba91228e8e ("drm/panfrost: Add initial panfrost driver")
> Cc: <stable@vger.kernel.org>
> ---
> drivers/gpu/drm/panfrost/panfrost_job.c | 18 ++++++++++++++----
> 1 file changed, 14 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
> index 30e7b7196dab..e87edca51d84 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
> @@ -25,7 +25,7 @@
>
> struct panfrost_queue_state {
> struct drm_gpu_scheduler sched;
> -
> + struct drm_sched_job *bad;
> u64 fence_context;
> u64 emit_seqno;
> };
> @@ -392,19 +392,29 @@ static void panfrost_job_timedout(struct drm_sched_job *sched_job)
> job_read(pfdev, JS_TAIL_LO(js)),
> sched_job);
>
> + /*
> + * Collect the bad job here so it can be processed by the thread
> + * acquiring the reset lock.
> + */
> + pfdev->js->queue[js].bad = sched_job;
> +
> if (!mutex_trylock(&pfdev->reset_lock))
> return;
>
> for (i = 0; i < NUM_JOB_SLOTS; i++) {
> struct drm_gpu_scheduler *sched = &pfdev->js->queue[i].sched;
>
> - drm_sched_stop(sched, sched_job);
> if (js != i)
> /* Ensure any timeouts on other slots have finished */
> cancel_delayed_work_sync(&sched->work_tdr);
> - }
>
> - drm_sched_increase_karma(sched_job);
> + drm_sched_stop(sched, pfdev->js->queue[i].bad);
> +
> + if (pfdev->js->queue[i].bad)
> + drm_sched_increase_karma(pfdev->js->queue[i].bad);
> +
> + pfdev->js->queue[i].bad = NULL;
> + }
>
> spin_lock_irqsave(&pfdev->js->job_lock, flags);
> for (i = 0; i < NUM_JOB_SLOTS; i++) {
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
WARNING: multiple messages have this Message-ID (diff)
From: Boris Brezillon <boris.brezillon@collabora.com>
To: Rob Herring <robh+dt@kernel.org>,
Tomeu Vizoso <tomeu@tomeuvizoso.net>,
Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>,
Steven Price <steven.price@arm.com>,
Robin Murphy <robin.murphy@arm.com>
Cc: dri-devel@lists.freedesktop.org, stable@vger.kernel.org
Subject: Re: [PATCH] panfrost: Fix job timeout handling
Date: Thu, 1 Oct 2020 16:02:48 +0200 [thread overview]
Message-ID: <20201001160248.4c2e1fee@collabora.com> (raw)
In-Reply-To: <20201001140143.1058669-1-boris.brezillon@collabora.com>
Oops, the prefix should be "drm/panfrost", will fix that in v2.
On Thu, 1 Oct 2020 16:01:43 +0200
Boris Brezillon <boris.brezillon@collabora.com> wrote:
> If more than two or more jobs end up timeout-ing concurrently, only one
> of them (the one attached to the scheduler acquiring the lock) is fully
> handled. The other one remains in a dangling state where it's no longer
> part of the scheduling queue, but still blocks something in scheduler
> thus leading to repetitive timeouts when new jobs are queued.
>
> Let's make sure all bad jobs are properly handled by the thread acquiring
> the lock.
>
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> Fixes: f3ba91228e8e ("drm/panfrost: Add initial panfrost driver")
> Cc: <stable@vger.kernel.org>
> ---
> drivers/gpu/drm/panfrost/panfrost_job.c | 18 ++++++++++++++----
> 1 file changed, 14 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
> index 30e7b7196dab..e87edca51d84 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
> @@ -25,7 +25,7 @@
>
> struct panfrost_queue_state {
> struct drm_gpu_scheduler sched;
> -
> + struct drm_sched_job *bad;
> u64 fence_context;
> u64 emit_seqno;
> };
> @@ -392,19 +392,29 @@ static void panfrost_job_timedout(struct drm_sched_job *sched_job)
> job_read(pfdev, JS_TAIL_LO(js)),
> sched_job);
>
> + /*
> + * Collect the bad job here so it can be processed by the thread
> + * acquiring the reset lock.
> + */
> + pfdev->js->queue[js].bad = sched_job;
> +
> if (!mutex_trylock(&pfdev->reset_lock))
> return;
>
> for (i = 0; i < NUM_JOB_SLOTS; i++) {
> struct drm_gpu_scheduler *sched = &pfdev->js->queue[i].sched;
>
> - drm_sched_stop(sched, sched_job);
> if (js != i)
> /* Ensure any timeouts on other slots have finished */
> cancel_delayed_work_sync(&sched->work_tdr);
> - }
>
> - drm_sched_increase_karma(sched_job);
> + drm_sched_stop(sched, pfdev->js->queue[i].bad);
> +
> + if (pfdev->js->queue[i].bad)
> + drm_sched_increase_karma(pfdev->js->queue[i].bad);
> +
> + pfdev->js->queue[i].bad = NULL;
> + }
>
> spin_lock_irqsave(&pfdev->js->job_lock, flags);
> for (i = 0; i < NUM_JOB_SLOTS; i++) {
next prev parent reply other threads:[~2020-10-01 14:02 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-10-01 14:01 [PATCH] panfrost: Fix job timeout handling Boris Brezillon
2020-10-01 14:01 ` Boris Brezillon
2020-10-01 14:02 ` Boris Brezillon [this message]
2020-10-01 14:02 ` Boris Brezillon
2020-10-01 14:49 ` Steven Price
2020-10-01 14:49 ` Steven Price
2020-10-01 15:22 ` Boris Brezillon
2020-10-01 15:22 ` Boris Brezillon
2020-10-01 15:49 ` Boris Brezillon
2020-10-01 15:49 ` Boris Brezillon
2020-10-01 16:05 ` Steven Price
2020-10-01 16:05 ` Steven Price
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201001160248.4c2e1fee@collabora.com \
--to=boris.brezillon@collabora.com \
--cc=alyssa.rosenzweig@collabora.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=robh+dt@kernel.org \
--cc=robin.murphy@arm.com \
--cc=stable@vger.kernel.org \
--cc=steven.price@arm.com \
--cc=tomeu@tomeuvizoso.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.