From: Boris Brezillon <boris.brezillon@collabora.com>
To: Steven Price <steven.price@arm.com>
Cc: "Liviu Dudau" <liviu.dudau@arm.com>,
"Adrián Larumbe" <adrian.larumbe@collabora.com>,
dri-devel@lists.freedesktop.org,
"Akash Goel" <akash.goel@arm.com>,
"Karunika Choo" <karunika.choo@arm.com>,
kernel@collabora.com
Subject: Re: [PATCH v2 3/6] drm/panthor: Recover from panthor_gpu_flush_caches() failures
Date: Mon, 17 Nov 2025 16:44:55 +0100 [thread overview]
Message-ID: <20251117164455.09f41dcc@fedora> (raw)
In-Reply-To: <e6fbbfa1-789f-46bf-9591-b19ff954b69a@arm.com>
On Mon, 17 Nov 2025 14:02:35 +0000
Steven Price <steven.price@arm.com> wrote:
> On 13/11/2025 10:39, Boris Brezillon wrote:
> > We have seen a few cases where the whole memory subsystem is blocked
> > and flush operations never complete. When that happens, we want to:
> >
> > - schedule a reset, so we can recover from this situation
> > - in the reset path, we need to reset the pending_reqs so we can send
> > new commands after the reset
> > - if more panthor_gpu_flush_caches() operations are queued after
> > the timeout, we skip them and return -EIO directly to avoid needless
> > waits (the memory block won't miraculously work again)
>
> You've removed the WARN from this last case. Is this intentional? I
> agree the recovery is better, but I don't think we expect this to happen
> - so it's pointing to something else being broken.
I did because there's a way for the UMD to trigger that (see the link
to the bug in the cover letter) without any mitigation we can put in
place kernel side, other than the GPU reset I'm adding here.I
tend to use WARN_ON()s only for things the kernel code has control on,
not stuff users can force the kernel driver into. Note that I kept the
drm_err(), so we still have a trace of such errors in the logs (along
with some timeouts).
>
> >
> > v2:
> > - New patch
> >
> > Fixes: 5cd894e258c4 ("drm/panthor: Add the GPU logical block")
> > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> > ---
> > drivers/gpu/drm/panthor/panthor_gpu.c | 19 ++++++++++++-------
> > 1 file changed, 12 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/panthor/panthor_gpu.c b/drivers/gpu/drm/panthor/panthor_gpu.c
> > index eda670229184..abd2fde04da9 100644
> > --- a/drivers/gpu/drm/panthor/panthor_gpu.c
> > +++ b/drivers/gpu/drm/panthor/panthor_gpu.c
> > @@ -283,38 +283,42 @@ int panthor_gpu_l2_power_on(struct panthor_device *ptdev)
> > int panthor_gpu_flush_caches(struct panthor_device *ptdev,
> > u32 l2, u32 lsc, u32 other)
> > {
> > - bool timedout = false;
> > unsigned long flags;
> > + int ret = 0;
> >
> > /* Serialize cache flush operations. */
> > guard(mutex)(&ptdev->gpu->cache_flush_lock);
> >
> > spin_lock_irqsave(&ptdev->gpu->reqs_lock, flags);
> > - if (!drm_WARN_ON(&ptdev->base,
> > - ptdev->gpu->pending_reqs & GPU_IRQ_CLEAN_CACHES_COMPLETED)) {
> > + if (!(ptdev->gpu->pending_reqs & GPU_IRQ_CLEAN_CACHES_COMPLETED)) {
> > ptdev->gpu->pending_reqs |= GPU_IRQ_CLEAN_CACHES_COMPLETED;
> > gpu_write(ptdev, GPU_CMD, GPU_FLUSH_CACHES(l2, lsc, other));
> > + } else {
> > + ret = -EIO;
> > }
> > spin_unlock_irqrestore(&ptdev->gpu->reqs_lock, flags);
> >
> > + if (ret)
> > + return ret;
> > +
> > if (!wait_event_timeout(ptdev->gpu->reqs_acked,
> > !(ptdev->gpu->pending_reqs & GPU_IRQ_CLEAN_CACHES_COMPLETED),
> > msecs_to_jiffies(100))) {
> > spin_lock_irqsave(&ptdev->gpu->reqs_lock, flags);
> > if ((ptdev->gpu->pending_reqs & GPU_IRQ_CLEAN_CACHES_COMPLETED) != 0 &&
> > !(gpu_read(ptdev, GPU_INT_RAWSTAT) & GPU_IRQ_CLEAN_CACHES_COMPLETED))
> > - timedout = true;
> > + ret = -ETIMEDOUT;
> > else
> > ptdev->gpu->pending_reqs &= ~GPU_IRQ_CLEAN_CACHES_COMPLETED;
> > spin_unlock_irqrestore(&ptdev->gpu->reqs_lock, flags);
> > }
> >
> > - if (timedout) {
> > + if (ret) {
> > + panthor_device_schedule_reset(ptdev);
> > drm_err(&ptdev->base, "Flush caches timeout");
> > - return -ETIMEDOUT;
> > }
> >
> > - return 0;
> > + return ret;
> > }
> >
> > /**
> > @@ -354,6 +358,7 @@ int panthor_gpu_soft_reset(struct panthor_device *ptdev)
> > return -ETIMEDOUT;
> > }
> >
> > + ptdev->gpu->pending_reqs = 0;
> > return 0;
> > }
> >
>
next prev parent reply other threads:[~2025-11-17 15:45 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-13 10:39 [PATCH v2 0/6] drm/panthor: Misc fixes Boris Brezillon
2025-11-13 10:39 ` [PATCH v2 1/6] drm/panthor: Always wait after sending a command to an AS Boris Brezillon
2025-11-17 12:29 ` Steven Price
2025-11-13 10:39 ` [PATCH v2 2/6] drm/panthor: Kill lock_region() Boris Brezillon
2025-11-17 12:44 ` Steven Price
2025-11-17 15:46 ` Boris Brezillon
2025-11-13 10:39 ` [PATCH v2 3/6] drm/panthor: Recover from panthor_gpu_flush_caches() failures Boris Brezillon
2025-11-17 14:02 ` Steven Price
2025-11-17 15:44 ` Boris Brezillon [this message]
2025-11-17 16:10 ` Steven Price
2025-11-13 10:39 ` [PATCH v2 4/6] drm/panthor: Add support for atomic page table updates Boris Brezillon
2025-11-17 14:26 ` Steven Price
2025-11-13 10:39 ` [PATCH v2 5/6] drm/panthor: Make panthor_vm_[un]map_pages() more robust Boris Brezillon
2025-11-17 15:07 ` Steven Price
2025-11-13 10:39 ` [PATCH v2 6/6] drm/panthor: Relax a check in panthor_sched_pre_reset() Boris Brezillon
2025-11-17 15:12 ` Steven Price
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251117164455.09f41dcc@fedora \
--to=boris.brezillon@collabora.com \
--cc=adrian.larumbe@collabora.com \
--cc=akash.goel@arm.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=karunika.choo@arm.com \
--cc=kernel@collabora.com \
--cc=liviu.dudau@arm.com \
--cc=steven.price@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.