From: Boris Brezillon <boris.brezillon@collabora.com>
To: "Adrián Larumbe" <adrian.larumbe@collabora.com>
Cc: Steven Price <steven.price@arm.com>,
Liviu Dudau <liviu.dudau@arm.com>,
Maarten Lankhorst <maarten.lankhorst@linux.intel.com>,
Maxime Ripard <mripard@kernel.org>,
Thomas Zimmermann <tzimmermann@suse.de>,
David Airlie <airlied@gmail.com>, Simona Vetter <simona@ffwll.ch>,
kernel@collabora.com, dri-devel@lists.freedesktop.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH 3/3] drm/panthor: Rreset device and load FW after failed PM suspend
Date: Mon, 14 Oct 2024 09:27:04 +0200 [thread overview]
Message-ID: <20241014092704.50a21276@collabora.com> (raw)
In-Reply-To: <20241011225906.3789965-3-adrian.larumbe@collabora.com>
On Fri, 11 Oct 2024 23:57:01 +0100
Adrián Larumbe <adrian.larumbe@collabora.com> wrote:
> On rk3588 SoCs, during a runtime PM suspend, the transition to the
> lowest voltage/frequency pair might sometimes fail for reasons not yet
> understood. In that case, even a slow FW reset will fail, leaving the
> device's PM runtime status as unusuable.
>
> When that happens, successive attempts to resume the device upon running
> a job will always fail.
>
> Fix it by forcing a synchronous device reset, which will lead to a
> successful FW reload, and also reset the device's PM runtime error
> status before resuming it.
>
> Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
> ---
> drivers/gpu/drm/panthor/panthor_device.c | 10 ++++++++++
> drivers/gpu/drm/panthor/panthor_device.h | 2 ++
> drivers/gpu/drm/panthor/panthor_sched.c | 7 +++++++
> 3 files changed, 19 insertions(+)
>
> diff --git a/drivers/gpu/drm/panthor/panthor_device.c b/drivers/gpu/drm/panthor/panthor_device.c
> index 5430557bd0b8..ec6fed5e996b 100644
> --- a/drivers/gpu/drm/panthor/panthor_device.c
> +++ b/drivers/gpu/drm/panthor/panthor_device.c
> @@ -105,6 +105,16 @@ static void panthor_device_reset_cleanup(struct drm_device *ddev, void *data)
> destroy_workqueue(ptdev->reset.wq);
> }
>
> +int panthor_device_reset_sync(struct panthor_device *ptdev)
> +{
> + panthor_fw_pre_reset(ptdev, false);
> + panthor_mmu_pre_reset(ptdev);
> + panthor_gpu_soft_reset(ptdev);
> + panthor_gpu_l2_power_on(ptdev);
> + panthor_mmu_post_reset(ptdev);
> + return panthor_fw_post_reset(ptdev);
> +}
> +
> static void panthor_device_reset_work(struct work_struct *work)
> {
> struct panthor_device *ptdev = container_of(work, struct panthor_device, reset.work);
> diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
> index 0e68f5a70d20..05a5a7233378 100644
> --- a/drivers/gpu/drm/panthor/panthor_device.h
> +++ b/drivers/gpu/drm/panthor/panthor_device.h
> @@ -217,6 +217,8 @@ struct panthor_file {
> int panthor_device_init(struct panthor_device *ptdev);
> void panthor_device_unplug(struct panthor_device *ptdev);
>
> +int panthor_device_reset_sync(struct panthor_device *ptdev);
> +
> /**
> * panthor_device_schedule_reset() - Schedules a reset operation
> */
> diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
> index c7b350fc3eba..9a854c8c5718 100644
> --- a/drivers/gpu/drm/panthor/panthor_sched.c
> +++ b/drivers/gpu/drm/panthor/panthor_sched.c
> @@ -3101,6 +3101,13 @@ queue_run_job(struct drm_sched_job *sched_job)
> return dma_fence_get(job->done_fence);
> }
>
> + if (ptdev->base.dev->power.runtime_error) {
> + ret = panthor_device_reset_sync(ptdev);
> + if (drm_WARN_ON(&ptdev->base, ret))
> + return ERR_PTR(ret);
> + drm_WARN_ON(&ptdev->base, pm_runtime_set_active(ptdev->base.dev));
> + }
I'd rather pretend the suspend/resume worked (even if it didn't) and
deal with the consequences (force a slow reset on the next resume), than
spread the 'if-PM-op-failed-force-sync-reset' thing everywhere we do a
pm_runtime_resume_and_get(). Also not sure how resetting the GPU will
help fixing the OPP transition failure.
> +
> ret = pm_runtime_resume_and_get(ptdev->base.dev);
> if (drm_WARN_ON(&ptdev->base, ret))
> return ERR_PTR(ret);
next prev parent reply other threads:[~2024-10-14 7:27 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-11 22:56 [PATCH 1/3] drm/panthor: Fix runtime suspend sequence after OPP transition error Adrián Larumbe
2024-10-11 22:57 ` [PATCH 2/3] drm/panthor: Retry OPP transition to suspension state a few times Adrián Larumbe
2024-10-14 7:28 ` Boris Brezillon
2024-10-11 22:57 ` [PATCH 3/3] drm/panthor: Rreset device and load FW after failed PM suspend Adrián Larumbe
2024-10-14 7:27 ` Boris Brezillon [this message]
2024-10-16 9:14 ` kernel test robot
2024-10-11 23:22 ` [PATCH 1/3] drm/panthor: Fix runtime suspend sequence after OPP transition error Liviu Dudau
2024-10-14 7:12 ` Boris Brezillon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241014092704.50a21276@collabora.com \
--to=boris.brezillon@collabora.com \
--cc=adrian.larumbe@collabora.com \
--cc=airlied@gmail.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=kernel@collabora.com \
--cc=linux-kernel@vger.kernel.org \
--cc=liviu.dudau@arm.com \
--cc=maarten.lankhorst@linux.intel.com \
--cc=mripard@kernel.org \
--cc=simona@ffwll.ch \
--cc=steven.price@arm.com \
--cc=tzimmermann@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox