From: Ajay Agarwal <ajayagarwal@google.com>
To: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Brian Norris <briannorris@google.com>,
Oliver Neukum <oneukum@suse.com>,
"Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
Vincent Whitchurch <vincent.whitchurch@axis.com>,
"jic23@kernel.org" <jic23@kernel.org>,
"linux-pm@vger.kernel.org" <linux-pm@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-iio@vger.kernel.org" <linux-iio@vger.kernel.org>,
Brian Norris <briannorris@chromium.org>,
Joy Chakraborty <joychakr@google.com>,
Vamshi Gajjela <vamshigajjela@google.com>,
Manu Gautam <manugautam@google.com>
Subject: Re: PM runtime_error handling missing in many drivers?
Date: Mon, 17 Feb 2025 09:19:45 +0530 [thread overview]
Message-ID: <Z7Kx2RN35QVyg8nP@google.com> (raw)
In-Reply-To: <CAJZ5v0i83eJWV_kvWxZvja+Js3tKbrwZ8rVVGn7vR=0qLf1mtw@mail.gmail.com>
On Wed, Feb 12, 2025 at 08:29:34PM +0100, Rafael J. Wysocki wrote:
> On Tue, Feb 11, 2025 at 11:21 PM Brian Norris <briannorris@google.com> wrote:
> >
> > Hi Ajay,
> >
> > On Mon, Feb 10, 2025 at 09:02:41AM +0530, Ajay Agarwal wrote:
> > > On Wed, Jul 27, 2022 at 06:31:48PM +0200, Rafael J. Wysocki wrote:
> > > > On Wed, Jul 27, 2022 at 10:08 AM Oliver Neukum <oneukum@suse.com> wrote:
> > > > > On 26.07.22 17:41, Rafael J. Wysocki wrote:
> > > > > > Well, in general suspending or resuming a device is a collaborative
> > > > > > effort and if one of the pieces falls over, making it work again
> > > > > > involves fixing up the failing piece and notifying the others that it
> > > > > > is ready again. However, that part isn't covered and I'm not sure if
> > > > > > it can be covered in a sufficiently generic way.
> > > > >
> > > > > True. But that still cannot solve the question what is to be done
> > > > > if error handling fails. Hence my proposal:
> > > > > - record all failures
> > > > > - heed the record only when suspending
> > > >
> > > > I guess that would boil down to moving the power.runtime_error update
> > > > from rpm_callback() to rpm_suspend()?
> > > Resuming this discussion. One of the ways the device drivers are
> > > clearing the runtime_error flag is by calling pm_runtime_set_suspended
> > > [1].
>
> I personally think that jumping on a 2.5 years old thread is not a
> good idea. It would be better to restate the problem statement and
> provide the link to the previous discussion.
>
> > > To me, it feels weird that a device driver calls pm_runtime_set_suspended
> > > if the runtime_resume() has failed. It should be implied that the device
> > > is in suspended state if the resume failed.
> > >
> > > So how really should the runtime_error flag be cleared? Should there be
> > > a new API exposed to device drivers for this? Or should we plan for it
> > > in the framework itself?
> >
> > While the API naming is unclear, that's exactly what
> > pm_runtime_set_suspended() is about. Personally, I find it nice when a
> > driver adds the comment "clear runtime_error flag", because otherwise
> > it's not really obvious why a driver has to take care of "suspending"
> > after a failed resume. But that's not the biggest question here, IMO.
> >
> > The real reson I pointed you at this thread was because I think it's
> > useful to pursue the proposal above: to avoid setting a persistent
> > "runtime_error" for resume failures. This seems to just create a pitfall
> > for clients, as asked by Vincent and Oliver upthread.
> >
> > And along this line, there are relatively few drivers that actually
> > bother to reset this error flag ever (e.g., commit f2bc2afe34c1
> > ("accel/ivpu: Clear runtime_error after pm_runtime_resume_and_get()
> > fails")).
> >
> > So to me, we should simply answer Rafael's question:
> >
> > (repeated:)
> > > > I guess that would boil down to moving the power.runtime_error update
> > > > from rpm_callback() to rpm_suspend()?
> >
> > Yes, I think so. (Although I'm not sure if this leaves undesirable spam
> > where persistent .runtime_resume() failures occur.)
> >
> > ...and then write/test/submit such a patch, provided it achieves the
> > desired results.
> >
> > Unless of course one of the thread participants here has some other
> > update in the intervening 2.5 years, or if Rafael was simply asking the
> > above rhetorically, and wasn't actually interested in fielding such a
> > change.
>
> The reason why runtime_error is there is to prevent runtime PM
> callbacks from being run until something is done about the error,
> under the assumption that running them in that case may make the
> problem worse.
>
> I'm not sure if I see a substantial difference between suspend and
> resume in that respect: If any of them fails, the state of the device
> is kind of unstable. In particular, if resume fails and the device
> doesn't actually resume, something needs to be done about it or it
> just becomes unusable.
>
> Now, the way of clearing the error may not be super-convenient, which
> was a bit hard to figure out upfront, so I'm not against making any
> changes as long as there are sufficient reasons for making them.
I am thinking if we can start with a change to not check runtime_error
in rpm_resume, and let it go through even if the previous rpm_resume
attempt failed. Something like this:
```
static int rpm_resume(struct device *dev, int rpmflags)
trace_rpm_resume(dev, rpmflags);
repeat:
- if (dev->power.runtime_error) {
- retval = -EINVAL;
- } else if (dev->power.disable_depth > 0) {
+ if (dev->power.disable_depth > 0) {
if (dev->power.runtime_status == RPM_ACTIVE &&
dev->power.last_status == RPM_ACTIVE)
retval = 1;
```
I think setting the runtime_error in rpm_callback, i.e. for both resume
and suspend is still a good idea for book-keeping purposes, e.g. the
user reading the runtime_status of the device from sysfs.
next prev parent reply other threads:[~2025-02-17 3:49 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-20 14:42 PM runtime_error handling missing in many drivers? Vincent Whitchurch
2022-06-21 9:38 ` Oliver Neukum
2022-07-08 11:03 ` Vincent Whitchurch
2022-07-08 20:10 ` Rafael J. Wysocki
2022-07-26 9:05 ` Oliver Neukum
2022-07-26 15:41 ` Rafael J. Wysocki
2022-07-27 8:08 ` Oliver Neukum
2022-07-27 16:31 ` Rafael J. Wysocki
2025-02-10 3:32 ` Ajay Agarwal
2025-02-11 22:21 ` Brian Norris
2025-02-12 19:29 ` Rafael J. Wysocki
2025-02-17 3:49 ` Ajay Agarwal [this message]
2025-02-17 20:23 ` Rafael J. Wysocki
2025-02-18 5:37 ` Ajay Agarwal
2025-02-18 5:45 ` Ajay Agarwal
2025-02-18 14:57 ` Rafael J. Wysocki
2025-02-19 22:15 ` Brian Norris
2025-02-20 9:30 ` Oliver Neukum
2025-02-22 1:51 ` Brian Norris
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z7Kx2RN35QVyg8nP@google.com \
--to=ajayagarwal@google.com \
--cc=briannorris@chromium.org \
--cc=briannorris@google.com \
--cc=jic23@kernel.org \
--cc=joychakr@google.com \
--cc=linux-iio@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=manugautam@google.com \
--cc=oneukum@suse.com \
--cc=rafael.j.wysocki@intel.com \
--cc=rafael@kernel.org \
--cc=vamshigajjela@google.com \
--cc=vincent.whitchurch@axis.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.