From: Ajay Agarwal <ajayagarwal@google.com>
To: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Oliver Neukum <oneukum@suse.com>,
"Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
Vincent Whitchurch <vincent.whitchurch@axis.com>,
"jic23@kernel.org" <jic23@kernel.org>,
"linux-pm@vger.kernel.org" <linux-pm@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-iio@vger.kernel.org" <linux-iio@vger.kernel.org>,
Brian Norris <briannorris@chromium.org>
Subject: Re: PM runtime_error handling missing in many drivers?
Date: Mon, 10 Feb 2025 09:02:41 +0530 [thread overview]
Message-ID: <Z6lzWfGbpa7jN1QD@google.com> (raw)
In-Reply-To: <CAJZ5v0ijy4FG84xk_n8gxR_jS0xao246eVbnFj-dXzwz=8S9NQ@mail.gmail.com>
On Wed, Jul 27, 2022 at 06:31:48PM +0200, Rafael J. Wysocki wrote:
> On Wed, Jul 27, 2022 at 10:08 AM Oliver Neukum <oneukum@suse.com> wrote:
> >
> >
> >
> > On 26.07.22 17:41, Rafael J. Wysocki wrote:
> > > On Tue, Jul 26, 2022 at 11:05 AM Oliver Neukum <oneukum@suse.com> wrote:
> >
> > > I guess that depends on what is regarded as "the framework". I mean
> > > the PM-runtime code, excluding the bus type or equivalent.
> >
> > Yes, we have multiple candidates in the generic case. Easy to overengineer.
> >
> > >>> The idea was that drivers would clear these errors.
> > >>
> > >> I am afraid that is a deeply hidden layering violation. Yes, a driver's
> > >> resume() method may have failed. In that case, if that is the same
> > >> driver, it will obviously already know about the failure.
> > >
> > > So presumably it will do something to recover and avoid returning the
> > > error in the first place.
> >
> > Yes, but that does not help us if they do return an error.
> >
> > > From the PM-runtime core code perspective, if an error is returned by
> > > a suspend callback and it is not -EBUSY or -EAGAIN, the subsequent
> > > suspend is also likely to fail.
> >
> > True.
> >
> > > If a resume callback returns an error, any subsequent suspend or
> > > resume operations are likely to fail.
> >
> > Also true, but the consequences are different.
> >
> > > Storing the error effectively prevents subsequent operations from
> > > being carried out in both cases and that's why it is done.
> >
> > I am afraid seeing these two operations as equivalent for this
> > purpose is a problem for two reasons:
> >
> > 1. suspend can be initiated by the generic framework
>
> Resume can be initiated by generic code too.
>
> > 2. a failure to suspend leads to worse power consumption,
> > while a failure to resume is -EIO, at best
>
> Yes, a failure to resume is a big deal.
>
> > >> PM operations, however, are operating on a tree. A driver requesting
> > >> a resume may get an error code. But it has no idea where this error
> > >> comes from. The generic code knows at least that.
> > >
> > > Well, what do you mean by "the generic code"?
> >
> > In this case the device model, which has the tree and all dependencies.
> > Error handling here is potentially very complicated because
> >
> > 1. a driver can experience an error from a node higher in the tree
>
> Well, there can be an error coming from a parent or a supplier, but
> the driver will not receive it directly.
>
> > 2. a driver can trigger a failure in a sibling
> > 3. a driver for a node can be less specific than the drivers higher up
>
> I'm not sure I understand the above correctly.
>
> > Reducing this to a single error condition is difficult.
>
> Fair enough.
>
> > Suppose you have a USB device with two interfaces. The driver for A
> > initiates a resume. Interface A is resumed; B reports an error.
> > Should this block further attempts to suspend the whole device?
>
> It should IMV.
>
> > >> Let's look at at a USB storage device. The request to resume comes
> > >> from sd.c. sd.c is certainly not equipped to handle a PCI error
> > >> condition that has prevented a USB host controller from resuming.
> > >
> > > Sure, but this doesn't mean that suspending or resuming the device is
> > > a good idea until the error condition gets resolved.
> >
> > Suspending clearly yes. Resuming is another matter. It has to work
> > if you want to operate without errors.
>
> Well, it has to physically work in the first place. If it doesn't,
> the rest is a bit moot, because you end up with a non-functional
> device that appears to be permanently suspended.
>
> > >> I am afraid this part of the API has issues. And they keep growing
> > >> the more we divorce the device driver from the bus driver, which
> > >> actually does the PM operation.
> > >
> > > Well, in general suspending or resuming a device is a collaborative
> > > effort and if one of the pieces falls over, making it work again
> > > involves fixing up the failing piece and notifying the others that it
> > > is ready again. However, that part isn't covered and I'm not sure if
> > > it can be covered in a sufficiently generic way.
> >
> > True. But that still cannot solve the question what is to be done
> > if error handling fails. Hence my proposal:
> > - record all failures
> > - heed the record only when suspending
>
> I guess that would boil down to moving the power.runtime_error update
> from rpm_callback() to rpm_suspend()?
Resuming this discussion. One of the ways the device drivers are
clearing the runtime_error flag is by calling pm_runtime_set_suspended
[1].
To me, it feels weird that a device driver calls pm_runtime_set_suspended
if the runtime_resume() has failed. It should be implied that the device
is in suspended state if the resume failed.
So how really should the runtime_error flag be cleared? Should there be
a new API exposed to device drivers for this? Or should we plan for it
in the framework itself?
1: https://lore.kernel.org/all/20250129124009.1039982-3-jacek.lawrynowicz@linux.intel.com/
next prev parent reply other threads:[~2025-02-10 3:32 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-20 14:42 PM runtime_error handling missing in many drivers? Vincent Whitchurch
2022-06-21 9:38 ` Oliver Neukum
2022-07-08 11:03 ` Vincent Whitchurch
2022-07-08 20:10 ` Rafael J. Wysocki
2022-07-26 9:05 ` Oliver Neukum
2022-07-26 15:41 ` Rafael J. Wysocki
2022-07-27 8:08 ` Oliver Neukum
2022-07-27 16:31 ` Rafael J. Wysocki
2025-02-10 3:32 ` Ajay Agarwal [this message]
2025-02-11 22:21 ` Brian Norris
2025-02-12 19:29 ` Rafael J. Wysocki
2025-02-17 3:49 ` Ajay Agarwal
2025-02-17 20:23 ` Rafael J. Wysocki
2025-02-18 5:37 ` Ajay Agarwal
2025-02-18 5:45 ` Ajay Agarwal
2025-02-18 14:57 ` Rafael J. Wysocki
2025-02-19 22:15 ` Brian Norris
2025-02-20 9:30 ` Oliver Neukum
2025-02-22 1:51 ` Brian Norris
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z6lzWfGbpa7jN1QD@google.com \
--to=ajayagarwal@google.com \
--cc=briannorris@chromium.org \
--cc=jic23@kernel.org \
--cc=linux-iio@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=oneukum@suse.com \
--cc=rafael.j.wysocki@intel.com \
--cc=rafael@kernel.org \
--cc=vincent.whitchurch@axis.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox