From: Boris Brezillon <boris.brezillon@collabora.com>
To: Steven Price <steven.price@arm.com>
Cc: "Liviu Dudau" <liviu.dudau@arm.com>,
"Adrián Larumbe" <adrian.larumbe@collabora.com>,
dri-devel@lists.freedesktop.org, kernel@collabora.com
Subject: Re: [PATCH 4/5] drm/panthor: Be robust against resume failures
Date: Thu, 14 Nov 2024 12:24:13 +0100 [thread overview]
Message-ID: <20241114122413.778b01df@collabora.com> (raw)
In-Reply-To: <1fde7d30-7b8d-4f20-a38e-957e0f67a295@arm.com>
On Thu, 14 Nov 2024 10:51:01 +0000
Steven Price <steven.price@arm.com> wrote:
> On 13/11/2024 15:42, Boris Brezillon wrote:
> > When the runtime PM resume callback returns an error, it puts the device
> > in a state where it can't be resumed anymore. Make sure we can recover
> > from such transient failures by calling pm_runtime_set_suspended()
> > explicitly after a pm_runtime_resume_and_get() failure.
> >
> > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> > ---
> > drivers/gpu/drm/panthor/panthor_device.c | 1 +
> > drivers/gpu/drm/panthor/panthor_device.h | 17 +++++++++++++++++
> > drivers/gpu/drm/panthor/panthor_drv.c | 2 +-
> > drivers/gpu/drm/panthor/panthor_sched.c | 4 ++--
> > 4 files changed, 21 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/panthor/panthor_device.c b/drivers/gpu/drm/panthor/panthor_device.c
> > index 353f3aabef42..d3276b936141 100644
> > --- a/drivers/gpu/drm/panthor/panthor_device.c
> > +++ b/drivers/gpu/drm/panthor/panthor_device.c
> > @@ -486,6 +486,7 @@ int panthor_device_resume(struct device *dev)
> >
> > err_set_suspended:
> > atomic_set(&ptdev->pm.state, PANTHOR_DEVICE_PM_STATE_SUSPENDED);
> > + atomic_set(&ptdev->pm.recovery_needed, 1);
> > return ret;
> > }
> >
> > diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
> > index 0e68f5a70d20..cc74e99e53f9 100644
> > --- a/drivers/gpu/drm/panthor/panthor_device.h
> > +++ b/drivers/gpu/drm/panthor/panthor_device.h
> > @@ -9,6 +9,7 @@
> > #include <linux/atomic.h>
> > #include <linux/io-pgtable.h>
> > #include <linux/regulator/consumer.h>
> > +#include <linux/pm_runtime.h>
> > #include <linux/sched.h>
> > #include <linux/spinlock.h>
> >
> > @@ -180,6 +181,9 @@ struct panthor_device {
> > * is suspended.
> > */
> > struct page *dummy_latest_flush;
> > +
> > + /** @recovery_needed: True when a resume attempt failed. */
> > + atomic_t recovery_needed;
> > } pm;
> >
> > /** @profile_mask: User-set profiling flags for job accounting. */
> > @@ -243,6 +247,19 @@ int panthor_device_mmap_io(struct panthor_device *ptdev,
> > int panthor_device_resume(struct device *dev);
> > int panthor_device_suspend(struct device *dev);
> >
> > +static inline int panthor_device_resume_and_get(struct panthor_device *ptdev)
> > +{
> > + int ret = pm_runtime_resume_and_get(ptdev->base.dev);
> > +
> > + /* If the resume failed, we need to clear the runtime_error, which we
> > + * can done by forcing the RPM state to suspended.
> > + */
> > + if (ret && atomic_cmpxchg(&ptdev->pm.recovery_needed, 1, 0) == 1)
>
> I'm unclear what this atomic achieves. At first glance it appears
> pointless: with this change if panthor_device_resume() fails then
> recovery_needed is set to 1. So logically if ret != 0 then also
> recovery_needed == 1.
>
> My second thought was is this to avoid races? If multiple threads are
> calling this then only one will win the cmpxchg and call
> pm_runtime_set_suspended. But it's not safe - it's quite possible for
> the first thread to get past the cmpxchg and be suspended before the
> second thread comes along and reaches the same point - leading to
> multiple calls to pm_runtime_set_suspended().
Yes, it was here to avoid the race. I don't think there's a risk of
multiple threads calling pm_runtime_set_suspended() without
actually needing such a call, because we won't reach
panthor_device_resume() until the runtime_error has been cleared
(runtime PM bails out early with a -EINVAL). So, in practice, there's no
way two threads can see a recovery_needed=1 unless the error has already
been cleared by the other thread and the second thread triggered
another resume error, in which case the second
pm_runtime_set_suspended() call is legit.
But now that you mention it, it indeed doesn't prevent the second
thread to call pm_runtime_resume_and_get() before the PM runtime_error
has been cleared, leading to potential spurious errors, so that's
annoying.
next prev parent reply other threads:[~2024-11-14 11:24 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-13 15:42 [PATCH 0/5] drm/panthor: Be robust against failures in the resume path Boris Brezillon
2024-11-13 15:42 ` [PATCH 1/5] drm/panthor: Preserve the result returned by panthor_fw_resume() Boris Brezillon
2024-11-14 10:50 ` Steven Price
2024-11-13 15:42 ` [PATCH 2/5] drm/panthor: Be robust against runtime PM resume failures in the suspend path Boris Brezillon
2024-11-13 15:54 ` Boris Brezillon
2024-11-13 15:59 ` Boris Brezillon
2024-11-14 11:13 ` Liviu Dudau
2024-11-14 11:27 ` Boris Brezillon
2024-11-14 11:36 ` Liviu Dudau
2024-11-13 15:42 ` [PATCH 3/5] drm/panthor: Ignore devfreq_{suspend, resume}_device() failures Boris Brezillon
2024-11-14 10:50 ` Steven Price
2024-11-13 15:42 ` [PATCH 4/5] drm/panthor: Be robust against resume failures Boris Brezillon
2024-11-14 10:51 ` Steven Price
2024-11-14 11:24 ` Boris Brezillon [this message]
2024-11-13 15:42 ` [PATCH 5/5] drm/panthor: Fix the fast-reset logic Boris Brezillon
2024-11-14 10:51 ` Steven Price
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241114122413.778b01df@collabora.com \
--to=boris.brezillon@collabora.com \
--cc=adrian.larumbe@collabora.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=kernel@collabora.com \
--cc=liviu.dudau@arm.com \
--cc=steven.price@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.