linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] PM: suspend: Upstreaming wakeup reason capture support
@ 2022-01-10 18:49 Kelly Rossmoyer
  2022-01-24 17:37 ` John Stultz
  2022-01-27 19:54 ` Rafael J. Wysocki
  0 siblings, 2 replies; 14+ messages in thread
From: Kelly Rossmoyer @ 2022-01-10 18:49 UTC (permalink / raw)
  To: Rafael J. Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman,
	Lee Jones, Vijay Nayak
  Cc: linux-pm, linux-kernel

# Introduction

To aid optimization, troubleshooting, and attribution of battery life, the
Android kernel currently includes a set of patches which provide enhanced
visibility into kernel suspend/resume/abort behaviors.  The capabilities
and implementation of this feature have evolved significantly since an
unsuccessful attempt to upstream the original code
(https://lkml.org/lkml/2014/3/10/716), and we would like to (re)start a
conversation about upstreaming, starting with the central question: is
there support for upstreaming this set of features?

# Motivation

Of the many factors influencing battery life on Linux-powered mobile
devices, kernel suspend tends to be amongst the most impactful.  Maximizing
time spent in suspend and minimizing the frequency of net-negative suspend
cycles are both important contributors to battery life optimization.  But
enabling that optimization - and troubleshooting when things go wrong -
requires more observability of suspend/resume/abort behavior than Linux
currently provides.  While mechanisms like `/sys/power/pm_wakeup_irq` and
wakeup_source stats are useful, they are incomplete and scattered.  The
Android kernel wakeup reason patches implement significant improvements in
that area.

# Features

As of today, the active set of patches surface the following
suspend-related data:

* wakeup IRQs, including:
   * multiple IRQs if more than one is pending during resume flow
   * unmapped HW IRQs (wakeup-capable in HW) that should not be
     occurring
   * misconfigured IRQs (e.g. both enable_irq_wake() and
     IRQF_NO_SUSPEND)
   * threaded IRQs (not just the parent chip's IRQ)

* non-IRQ wakeups, including:
   * wakeups caused by an IRQ that was consumed by lower-level SW
   * wakeups from SOC architecture that don't manifest as IRQs

* abort reasons, including:
   * wakeup_source activity
   * failure to freeze userspace
   * failure to suspend devices
   * failed syscore_suspend callback

* durations from the most recent cycle, including:
   * time spent doing suspend/resume work
   * time spent in suspend

In addition to battery life optimization and troubleshooting, some of these
capabilities also lay the groundwork for efforts around improving
attribution of wakeups/aborts (e.g. to specific processes, device features,
external devices, etc).

# Shortcomings

While the core implementation (see below) is relatively straightforward and
localized, calls into that core are somewhat widely spread in order to
capture the breadth of events of interest.  The pervasiveness of those
hooks is clearly an area where improvement would be beneficial, especially
if a cleaner solution preserved equivalent capabilities.

# Existing Code

As a reference for how Android currently implements the core code for these
features (which would need a bit of work before submission even if all
features were included), see the following link:

https://android.googlesource.com/kernel/common/+/refs/heads/android-mainline/kernel/power/wakeup_reason.c


--

Kelly Rossmoyer | Software Engineer | krossmo@google.com

^ permalink raw reply	[flat|nested] 14+ messages in thread
* Re: [RFC] PM: suspend: Upstreaming wakeup reason capture support
@ 2022-01-30 18:15 Zichar Zhang
  0 siblings, 0 replies; 14+ messages in thread
From: Zichar Zhang @ 2022-01-30 18:15 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg Kroah-Hartman, Kelly Rossmoyer, Lee Jones, Len Brown,
	linux-kernel, linux-pm, Vijay Nayak, Pavel Machek

Hi Rafael, Kelly

hello Rafael, it is a little bit late for me to reply to you. I was
finding the way to
reply to you, cause I'm not in the "cc list". So, thanks Kelly in that way. :)
I'm totally agree with you that we should split the work into smaller
pieces and do it step by step.

Hi Kelly,
I got the strong signal from you that you insist on your requirement.
It's reasonable, and I want that if I am the user too. :)

But it's could be some problem for me to do all of that. And I am calling help
here. Yes! I need some help!

I want to seperate this task into 4 part:
1. user interface: like sysfs file /sys/power/last_wakeup_reason.
2. report interface: call by "wakeup sources" to report "wakeup reason".
3. report operation in kernel: like "interrupt subsystem". (a common interface)
4. report operation in device: like WDT driver, GIC driver or other
device driver.

I think we should do the first 3 parts, but not the last one, cause it is device
specific things. Device and BSP should do that, I insist that.
Part 1 and 2 are easily to do, and I can rework again and agian until it is all
right for everyone.
So it is clear that we have problem with the third part. And yes! it is very
hard.

Kernel desn't know how the "machine" wakeup, kernel just offer the interface
that user can mark the "wakeup source", like IRQD_WAKEUP_STATE flag
and "ws"(wakeup source) interface (acturally they are fake wakeup sources).
These works well and we can easily to report these which Android patch and
mine already do that.
But the left things is hard.Cause kernel or "subsystem" in kernel desn't has
any mechanism to do that. Then we are facing these three things:

1. "misconfigured" and "unmapped" IRQs reporting.
Android patch just add a "wakeup report" interface here once it was occurred,
even it's not in a "suspend" state, and even one of them was in GIC driver.
if I was the maintainer I won't take this, but the question is what should I do
for that?
(Maybe I shoud give a task to "interrupt subsystem people" and ask them to
do that? :) )

2. errors in suspend/resume process.
That means if there is a error occurs in suspend/resume process I need to
report it as "wakeup reason".Which is just "abort wakeup reason" as  Kelly
said. But it is lots of errors may occurs here, and which one I should report,
and is that enough?
And as  Kelly said the code is "messy", that does hit the point.

3. threaded inerrupt
Sorry, I don't find the properly place in kernel to report there "wakeup
reason". Maybe that's my lack of knowalige. :)
It's seem like some interrupt chip driver should do that? I don't know. Maybe
I should offer a interface and just let "user" to use it?

So, that all the things I got.
And again kelly, I got your mind, and I will try to think this over again to see
if I can find a way to do that.

besides, thanks Jone and John, I will rework the patch after this discusion.

And any advice could help! :)
Or you have a better idea, I can help you to do yours!

Best
Zichar

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-02-02  8:00 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-01-10 18:49 [RFC] PM: suspend: Upstreaming wakeup reason capture support Kelly Rossmoyer
2022-01-24 17:37 ` John Stultz
2022-01-26  5:09   ` Zichar Zhang
2022-01-28  4:55     ` [PATCH 1/1] [RFC] wakeup_reason: Add infrastructure to log and report why the system resumed from suspend Zichar Zhang
2022-01-28  7:01       ` Greg KH
2022-01-28  8:43         ` Zichar Zhang
2022-01-28  9:32       ` Lee Jones
2022-01-29  7:52     ` [RFC] PM: suspend: Upstreaming wakeup reason capture support Kelly Rossmoyer
2022-01-27 19:54 ` Rafael J. Wysocki
2022-01-27 20:10   ` Rafael J. Wysocki
2022-01-29  8:26     ` Kelly Rossmoyer
2022-01-30 14:46       ` Rafael J. Wysocki
2022-02-02  7:59         ` Kelly Rossmoyer
  -- strict thread matches above, loose matches on Subject: below --
2022-01-30 18:15 Zichar Zhang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).