From: Boris Brezillon <boris.brezillon@collabora.com>
To: "Adrián Larumbe" <adrian.larumbe@collabora.com>
Cc: Steven Price <steven.price@arm.com>,
Liviu Dudau <liviu.dudau@arm.com>,
Maarten Lankhorst <maarten.lankhorst@linux.intel.com>,
Maxime Ripard <mripard@kernel.org>,
Thomas Zimmermann <tzimmermann@suse.de>,
David Airlie <airlied@gmail.com>, Simona Vetter <simona@ffwll.ch>,
kernel@collabora.com, dri-devel@lists.freedesktop.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH 3/3] drm/panthor: Rreset device and load FW after failed PM suspend
Date: Mon, 14 Oct 2024 09:27:04 +0200 [thread overview]
Message-ID: <20241014092704.50a21276@collabora.com> (raw)
In-Reply-To: <20241011225906.3789965-3-adrian.larumbe@collabora.com>
On Fri, 11 Oct 2024 23:57:01 +0100
Adrián Larumbe <adrian.larumbe@collabora.com> wrote:
> On rk3588 SoCs, during a runtime PM suspend, the transition to the
> lowest voltage/frequency pair might sometimes fail for reasons not yet
> understood. In that case, even a slow FW reset will fail, leaving the
> device's PM runtime status as unusuable.
>
> When that happens, successive attempts to resume the device upon running
> a job will always fail.
>
> Fix it by forcing a synchronous device reset, which will lead to a
> successful FW reload, and also reset the device's PM runtime error
> status before resuming it.
>
> Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
> ---
> drivers/gpu/drm/panthor/panthor_device.c | 10 ++++++++++
> drivers/gpu/drm/panthor/panthor_device.h | 2 ++
> drivers/gpu/drm/panthor/panthor_sched.c | 7 +++++++
> 3 files changed, 19 insertions(+)
>
> diff --git a/drivers/gpu/drm/panthor/panthor_device.c b/drivers/gpu/drm/panthor/panthor_device.c
> index 5430557bd0b8..ec6fed5e996b 100644
> --- a/drivers/gpu/drm/panthor/panthor_device.c
> +++ b/drivers/gpu/drm/panthor/panthor_device.c
> @@ -105,6 +105,16 @@ static void panthor_device_reset_cleanup(struct drm_device *ddev, void *data)
> destroy_workqueue(ptdev->reset.wq);
> }
>
> +int panthor_device_reset_sync(struct panthor_device *ptdev)
> +{
> + panthor_fw_pre_reset(ptdev, false);
> + panthor_mmu_pre_reset(ptdev);
> + panthor_gpu_soft_reset(ptdev);
> + panthor_gpu_l2_power_on(ptdev);
> + panthor_mmu_post_reset(ptdev);
> + return panthor_fw_post_reset(ptdev);
> +}
> +
> static void panthor_device_reset_work(struct work_struct *work)
> {
> struct panthor_device *ptdev = container_of(work, struct panthor_device, reset.work);
> diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
> index 0e68f5a70d20..05a5a7233378 100644
> --- a/drivers/gpu/drm/panthor/panthor_device.h
> +++ b/drivers/gpu/drm/panthor/panthor_device.h
> @@ -217,6 +217,8 @@ struct panthor_file {
> int panthor_device_init(struct panthor_device *ptdev);
> void panthor_device_unplug(struct panthor_device *ptdev);
>
> +int panthor_device_reset_sync(struct panthor_device *ptdev);
> +
> /**
> * panthor_device_schedule_reset() - Schedules a reset operation
> */
> diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
> index c7b350fc3eba..9a854c8c5718 100644
> --- a/drivers/gpu/drm/panthor/panthor_sched.c
> +++ b/drivers/gpu/drm/panthor/panthor_sched.c
> @@ -3101,6 +3101,13 @@ queue_run_job(struct drm_sched_job *sched_job)
> return dma_fence_get(job->done_fence);
> }
>
> + if (ptdev->base.dev->power.runtime_error) {
> + ret = panthor_device_reset_sync(ptdev);
> + if (drm_WARN_ON(&ptdev->base, ret))
> + return ERR_PTR(ret);
> + drm_WARN_ON(&ptdev->base, pm_runtime_set_active(ptdev->base.dev));
> + }
I'd rather pretend the suspend/resume worked (even if it didn't) and
deal with the consequences (force a slow reset on the next resume), than
spread the 'if-PM-op-failed-force-sync-reset' thing everywhere we do a
pm_runtime_resume_and_get(). Also not sure how resetting the GPU will
help fixing the OPP transition failure.
> +
> ret = pm_runtime_resume_and_get(ptdev->base.dev);
> if (drm_WARN_ON(&ptdev->base, ret))
> return ERR_PTR(ret);
next prev parent reply other threads:[~2024-10-14 7:27 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-11 22:56 [PATCH 1/3] drm/panthor: Fix runtime suspend sequence after OPP transition error Adrián Larumbe
2024-10-11 22:57 ` [PATCH 2/3] drm/panthor: Retry OPP transition to suspension state a few times Adrián Larumbe
2024-10-14 7:28 ` Boris Brezillon
2024-10-11 22:57 ` [PATCH 3/3] drm/panthor: Rreset device and load FW after failed PM suspend Adrián Larumbe
2024-10-14 7:27 ` Boris Brezillon [this message]
2024-10-16 9:14 ` kernel test robot
2024-10-11 23:22 ` [PATCH 1/3] drm/panthor: Fix runtime suspend sequence after OPP transition error Liviu Dudau
2024-10-14 7:12 ` Boris Brezillon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241014092704.50a21276@collabora.com \
--to=boris.brezillon@collabora.com \
--cc=adrian.larumbe@collabora.com \
--cc=airlied@gmail.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=kernel@collabora.com \
--cc=linux-kernel@vger.kernel.org \
--cc=liviu.dudau@arm.com \
--cc=maarten.lankhorst@linux.intel.com \
--cc=mripard@kernel.org \
--cc=simona@ffwll.ch \
--cc=steven.price@arm.com \
--cc=tzimmermann@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.