All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ajay Agarwal <ajayagarwal@google.com>
To: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Oliver Neukum <oneukum@suse.com>,
	"Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
	Vincent Whitchurch <vincent.whitchurch@axis.com>,
	"jic23@kernel.org" <jic23@kernel.org>,
	"linux-pm@vger.kernel.org" <linux-pm@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-iio@vger.kernel.org" <linux-iio@vger.kernel.org>,
	Brian Norris <briannorris@chromium.org>
Subject: Re: PM runtime_error handling missing in many drivers?
Date: Mon, 10 Feb 2025 09:02:41 +0530	[thread overview]
Message-ID: <Z6lzWfGbpa7jN1QD@google.com> (raw)
In-Reply-To: <CAJZ5v0ijy4FG84xk_n8gxR_jS0xao246eVbnFj-dXzwz=8S9NQ@mail.gmail.com>

On Wed, Jul 27, 2022 at 06:31:48PM +0200, Rafael J. Wysocki wrote:
> On Wed, Jul 27, 2022 at 10:08 AM Oliver Neukum <oneukum@suse.com> wrote:
> >
> >
> >
> > On 26.07.22 17:41, Rafael J. Wysocki wrote:
> > > On Tue, Jul 26, 2022 at 11:05 AM Oliver Neukum <oneukum@suse.com> wrote:
> >
> > > I guess that depends on what is regarded as "the framework".  I mean
> > > the PM-runtime code, excluding the bus type or equivalent.
> >
> > Yes, we have multiple candidates in the generic case. Easy to overengineer.
> >
> > >>> The idea was that drivers would clear these errors.
> > >>
> > >> I am afraid that is a deeply hidden layering violation. Yes, a driver's
> > >> resume() method may have failed. In that case, if that is the same
> > >> driver, it will obviously already know about the failure.
> > >
> > > So presumably it will do something to recover and avoid returning the
> > > error in the first place.
> >
> > Yes, but that does not help us if they do return an error.
> >
> > > From the PM-runtime core code perspective, if an error is returned by
> > > a suspend callback and it is not -EBUSY or -EAGAIN, the subsequent
> > > suspend is also likely to fail.
> >
> > True.
> >
> > > If a resume callback returns an error, any subsequent suspend or
> > > resume operations are likely to fail.
> >
> > Also true, but the consequences are different.
> >
> > > Storing the error effectively prevents subsequent operations from
> > > being carried out in both cases and that's why it is done.
> >
> > I am afraid seeing these two operations as equivalent for this
> > purpose is a problem for two reasons:
> >
> > 1. suspend can be initiated by the generic framework
> 
> Resume can be initiated by generic code too.
> 
> > 2. a failure to suspend leads to worse power consumption,
> >    while a failure to resume is -EIO, at best
> 
> Yes, a failure to resume is a big deal.
> 
> > >> PM operations, however, are operating on a tree. A driver requesting
> > >> a resume may get an error code. But it has no idea where this error
> > >> comes from. The generic code knows at least that.
> > >
> > > Well, what do you mean by "the generic code"?
> >
> > In this case the device model, which has the tree and all dependencies.
> > Error handling here is potentially very complicated because
> >
> > 1. a driver can experience an error from a node higher in the tree
> 
> Well, there can be an error coming from a parent or a supplier, but
> the driver will not receive it directly.
> 
> > 2. a driver can trigger a failure in a sibling
> > 3. a driver for a node can be less specific than the drivers higher up
> 
> I'm not sure I understand the above correctly.
> 
> > Reducing this to a single error condition is difficult.
> 
> Fair enough.
> 
> > Suppose you have a USB device with two interfaces. The driver for A
> > initiates a resume. Interface A is resumed; B reports an error.
> > Should this block further attempts to suspend the whole device?
> 
> It should IMV.
> 
> > >> Let's look at at a USB storage device. The request to resume comes
> > >> from sd.c. sd.c is certainly not equipped to handle a PCI error
> > >> condition that has prevented a USB host controller from resuming.
> > >
> > > Sure, but this doesn't mean that suspending or resuming the device is
> > > a good idea until the error condition gets resolved.
> >
> > Suspending clearly yes. Resuming is another matter. It has to work
> > if you want to operate without errors.
> 
> Well, it has to physically work in the first place.  If it doesn't,
> the rest is a bit moot, because you end up with a non-functional
> device that appears to be permanently suspended.
> 
> > >> I am afraid this part of the API has issues. And they keep growing
> > >> the more we divorce the device driver from the bus driver, which
> > >> actually does the PM operation.
> > >
> > > Well, in general suspending or resuming a device is a collaborative
> > > effort and if one of the pieces falls over, making it work again
> > > involves fixing up the failing piece and notifying the others that it
> > > is ready again.  However, that part isn't covered and I'm not sure if
> > > it can be covered in a sufficiently generic way.
> >
> > True. But that still cannot solve the question what is to be done
> > if error handling fails. Hence my proposal:
> > - record all failures
> > - heed the record only when suspending
> 
> I guess that would boil down to moving the power.runtime_error update
> from rpm_callback() to rpm_suspend()?
Resuming this discussion. One of the ways the device drivers are
clearing the runtime_error flag is by calling pm_runtime_set_suspended
[1].

To me, it feels weird that a device driver calls pm_runtime_set_suspended
if the runtime_resume() has failed. It should be implied that the device
is in suspended state if the resume failed.

So how really should the runtime_error flag be cleared? Should there be
a new API exposed to device drivers for this? Or should we plan for it
in the framework itself?

1: https://lore.kernel.org/all/20250129124009.1039982-3-jacek.lawrynowicz@linux.intel.com/

  reply	other threads:[~2025-02-10  3:32 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-20 14:42 PM runtime_error handling missing in many drivers? Vincent Whitchurch
2022-06-21  9:38 ` Oliver Neukum
2022-07-08 11:03   ` Vincent Whitchurch
2022-07-08 20:10     ` Rafael J. Wysocki
2022-07-26  9:05       ` Oliver Neukum
2022-07-26 15:41         ` Rafael J. Wysocki
2022-07-27  8:08           ` Oliver Neukum
2022-07-27 16:31             ` Rafael J. Wysocki
2025-02-10  3:32               ` Ajay Agarwal [this message]
2025-02-11 22:21                 ` Brian Norris
2025-02-12 19:29                   ` Rafael J. Wysocki
2025-02-17  3:49                     ` Ajay Agarwal
2025-02-17 20:23                       ` Rafael J. Wysocki
2025-02-18  5:37                         ` Ajay Agarwal
2025-02-18  5:45                           ` Ajay Agarwal
2025-02-18 14:57                             ` Rafael J. Wysocki
2025-02-19 22:15                     ` Brian Norris
2025-02-20  9:30                       ` Oliver Neukum
2025-02-22  1:51                         ` Brian Norris

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z6lzWfGbpa7jN1QD@google.com \
    --to=ajayagarwal@google.com \
    --cc=briannorris@chromium.org \
    --cc=jic23@kernel.org \
    --cc=linux-iio@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=oneukum@suse.com \
    --cc=rafael.j.wysocki@intel.com \
    --cc=rafael@kernel.org \
    --cc=vincent.whitchurch@axis.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.