From: Boris Brezillon <boris.brezillon@collabora.com>
To: Ashley Smith <ashley.smith@collabora.com>
Cc: "Liviu Dudau" <liviu.dudau@arm.com>,
"Steven Price" <steven.price@arm.com>,
"Maarten Lankhorst" <maarten.lankhorst@linux.intel.com>,
"Maxime Ripard" <mripard@kernel.org>,
"Thomas Zimmermann" <tzimmermann@suse.de>,
"David Airlie" <airlied@gmail.com>,
"Simona Vetter" <simona@ffwll.ch>,
"kernel" <kernel@collabora.com>,
"open list:ARM MALI PANTHOR DRM DRIVER"
<dri-devel@lists.freedesktop.org>,
"open list" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v5 1/2] drm/panthor: Reset queue slots if termination fails
Date: Wed, 4 Jun 2025 13:24:18 +0200 [thread overview]
Message-ID: <20250604132418.6685ff7c@collabora.com> (raw)
In-Reply-To: <1973a637a7f.b61987aa482053.3031227813632792112@collabora.com>
On Wed, 04 Jun 2025 11:01:27 +0100
Ashley Smith <ashley.smith@collabora.com> wrote:
> On Tue, 03 Jun 2025 12:09:44 +0100 Liviu Dudau <liviu.dudau@arm.com>
> wrote:
> > On Tue, Jun 03, 2025 at 10:49:31AM +0100, Ashley Smith wrote:
> > > This fixes a bug where if we timeout after a suspend and the
> > > termination fails, due to waiting on a fence that will never be
> > > signalled for example, we do not resume the group correctly. The
> > > fix forces a reset for groups that are not terminated correctly.
> > >
> >
> > I have a question on the commit message: you're describing a
> > situation where a fence will *never* be signalled. Is that a real
> > example? I thought this is not supposed to ever happen! Or are you
> > trying to say that the fence signalling happens after the timeout?
> >
>
> This covers cases where a fence is never signalled.
Fence not being signaled in time is covered by the job timeout. What
you're fixing here is the FW not responding to a CSG SUSPEND/TERMINATE
request, which is different.
> It shouldn't
> happen, but we have found this in some situations with a FW hang.
This ^.
> Since queue_suspend_timeout() is only called on state update, if a
> suspend/terminate fails due to a FW hang for example this will leave
> delayed work, possibly leading to an incorrect queue_timeout_work().
Nah, that's not true until the second patch in this series is applied,
because drm_sched_stop(), which is called in the
panthor_sched_pre_reset() path, takes care of suspending the drm_sched
timer. What's still problematic if we don't call cs_slot_reset_locked()
here is that we don't re-initialize the FW CS slots, and since
cs_slot_prog_locked() is only called on active queues, we might end up
with an unused CS slot that still has values from the previous user of
this slot. Not sure how harmful this is in practice, but it's certainly
not great.
The true reason we have a Fixes tag is because the second patch has
a Fixes tag too, and it relies on the new driver timer being stopped
even if the FW hangs on a TERMINATE request (queue_suspend_timeout() is
called in cs_slot_reset_locked()). So, either we merge the two patches
back, like was done in v2, or we have to flag both as Fixes.
next prev parent reply other threads:[~2025-06-04 11:24 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-03 9:49 [PATCH v5 0/2] drm/panthor: Scheduler fixes for termination failure and timeouts Ashley Smith
2025-06-03 9:49 ` [PATCH v5 1/2] drm/panthor: Reset queue slots if termination fails Ashley Smith
2025-06-03 11:09 ` Liviu Dudau
2025-06-04 10:01 ` Ashley Smith
2025-06-04 11:24 ` Boris Brezillon [this message]
2025-06-12 15:40 ` Steven Price
2025-06-17 14:45 ` Boris Brezillon
2025-06-03 9:49 ` [PATCH v5 2/2] drm/panthor: Make the timeout per-queue instead of per-job Ashley Smith
2025-06-12 14:32 ` Liviu Dudau
2025-06-12 15:40 ` Steven Price
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250604132418.6685ff7c@collabora.com \
--to=boris.brezillon@collabora.com \
--cc=airlied@gmail.com \
--cc=ashley.smith@collabora.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=kernel@collabora.com \
--cc=linux-kernel@vger.kernel.org \
--cc=liviu.dudau@arm.com \
--cc=maarten.lankhorst@linux.intel.com \
--cc=mripard@kernel.org \
--cc=simona@ffwll.ch \
--cc=steven.price@arm.com \
--cc=tzimmermann@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.