* [PATCH v2 1/5] acpi/x86: s2idle: add support for Display Off and Display On callbacks
2024-09-22 17:22 [PATCH v2 0/5] acpi/x86: s2idle: move Display off/on calls outside suspend (fixes ROG Ally suspend) Antheas Kapenekakis
@ 2024-09-22 17:22 ` Antheas Kapenekakis
2024-09-23 15:57 ` Mario Limonciello
2024-09-22 17:22 ` [PATCH v2 2/5] acpi/x86: s2idle: handle Display On/Off calls outside of suspend sequence Antheas Kapenekakis
` (5 subsequent siblings)
6 siblings, 1 reply; 38+ messages in thread
From: Antheas Kapenekakis @ 2024-09-22 17:22 UTC (permalink / raw)
To: linux-pm
Cc: platform-driver-x86, Mario Limonciello, luke, me, Denis Benato,
Antheas Kapenekakis
The Display Off and Display On firmware notifications are meant to signify
the system entering a state where the user is not actively interacting
with it (i.e., in Windows this state is called "Screen Off" and the
system enters it once it turns the screen off e.g., due to inactivity).
Currently, these functions are called within the suspend sequence, which
causes issues when these notifications interact with e.g., a USB device
and makes them unable to be called as part of the screen turning off.
This patch adds a set of callbacks to allow calling the Display On/Off
notifications outside of the suspend/resume path.
Co-developed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Antheas Kapenekakis <lkml@antheas.dev>
---
include/linux/suspend.h | 5 +++++
kernel/power/suspend.c | 12 ++++++++++++
2 files changed, 17 insertions(+)
diff --git a/include/linux/suspend.h b/include/linux/suspend.h
index da6ebca3ff77..8f33249cc067 100644
--- a/include/linux/suspend.h
+++ b/include/linux/suspend.h
@@ -132,6 +132,7 @@ struct platform_suspend_ops {
};
struct platform_s2idle_ops {
+ int (*display_off)(void);
int (*begin)(void);
int (*prepare)(void);
int (*prepare_late)(void);
@@ -140,6 +141,7 @@ struct platform_s2idle_ops {
void (*restore_early)(void);
void (*restore)(void);
void (*end)(void);
+ int (*display_on)(void);
};
#ifdef CONFIG_SUSPEND
@@ -160,6 +162,9 @@ extern unsigned int pm_suspend_global_flags;
#define PM_SUSPEND_FLAG_FW_RESUME BIT(1)
#define PM_SUSPEND_FLAG_NO_PLATFORM BIT(2)
+int platform_suspend_display_off(void);
+int platform_suspend_display_on(void);
+
static inline void pm_suspend_clear_flags(void)
{
pm_suspend_global_flags = 0;
diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
index 09f8397bae15..c527dc0ae5ae 100644
--- a/kernel/power/suspend.c
+++ b/kernel/power/suspend.c
@@ -254,6 +254,18 @@ static bool sleep_state_supported(suspend_state_t state)
(valid_state(state) && !cxl_mem_active());
}
+int platform_suspend_display_off(void)
+{
+ return s2idle_ops && s2idle_ops->display_off ? s2idle_ops->display_off() : 0;
+}
+EXPORT_SYMBOL_GPL(platform_suspend_display_off);
+
+int platform_suspend_display_on(void)
+{
+ return s2idle_ops && s2idle_ops->display_on ? s2idle_ops->display_on() : 0;
+}
+EXPORT_SYMBOL_GPL(platform_suspend_display_on);
+
static int platform_suspend_prepare(suspend_state_t state)
{
return state != PM_SUSPEND_TO_IDLE && suspend_ops->prepare ?
--
2.46.1
^ permalink raw reply related [flat|nested] 38+ messages in thread* Re: [PATCH v2 1/5] acpi/x86: s2idle: add support for Display Off and Display On callbacks
2024-09-22 17:22 ` [PATCH v2 1/5] acpi/x86: s2idle: add support for Display Off and Display On callbacks Antheas Kapenekakis
@ 2024-09-23 15:57 ` Mario Limonciello
0 siblings, 0 replies; 38+ messages in thread
From: Mario Limonciello @ 2024-09-23 15:57 UTC (permalink / raw)
To: Antheas Kapenekakis, linux-pm; +Cc: platform-driver-x86, luke, me, Denis Benato
On 9/22/2024 12:22, Antheas Kapenekakis wrote:
> The Display Off and Display On firmware notifications are meant to signify
> the system entering a state where the user is not actively interacting
> with it (i.e., in Windows this state is called "Screen Off" and the
> system enters it once it turns the screen off e.g., due to inactivity).
>
> Currently, these functions are called within the suspend sequence, which
> causes issues when these notifications interact with e.g., a USB device
> and makes them unable to be called as part of the screen turning off.
>
> This patch adds a set of callbacks to allow calling the Display On/Off
> notifications outside of the suspend/resume path.
>
> Co-developed-by: Mario Limonciello <mario.limonciello@amd.com>
> Signed-off-by: Antheas Kapenekakis <lkml@antheas.dev>
Make sure you run your patches through checkpatch. I don't believe you
got these tags right.
> ---
> include/linux/suspend.h | 5 +++++
> kernel/power/suspend.c | 12 ++++++++++++
> 2 files changed, 17 insertions(+)
>
> diff --git a/include/linux/suspend.h b/include/linux/suspend.h
> index da6ebca3ff77..8f33249cc067 100644
> --- a/include/linux/suspend.h
> +++ b/include/linux/suspend.h
> @@ -132,6 +132,7 @@ struct platform_suspend_ops {
> };
>
> struct platform_s2idle_ops {
> + int (*display_off)(void);
> int (*begin)(void);
> int (*prepare)(void);
> int (*prepare_late)(void);
> @@ -140,6 +141,7 @@ struct platform_s2idle_ops {
> void (*restore_early)(void);
> void (*restore)(void);
> void (*end)(void);
> + int (*display_on)(void);
> };
>
> #ifdef CONFIG_SUSPEND
> @@ -160,6 +162,9 @@ extern unsigned int pm_suspend_global_flags;
> #define PM_SUSPEND_FLAG_FW_RESUME BIT(1)
> #define PM_SUSPEND_FLAG_NO_PLATFORM BIT(2)
>
> +int platform_suspend_display_off(void);
> +int platform_suspend_display_on(void);
> +
> static inline void pm_suspend_clear_flags(void)
> {
> pm_suspend_global_flags = 0;
> diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
> index 09f8397bae15..c527dc0ae5ae 100644
> --- a/kernel/power/suspend.c
> +++ b/kernel/power/suspend.c
> @@ -254,6 +254,18 @@ static bool sleep_state_supported(suspend_state_t state)
> (valid_state(state) && !cxl_mem_active());
> }
>
> +int platform_suspend_display_off(void)
> +{
> + return s2idle_ops && s2idle_ops->display_off ? s2idle_ops->display_off() : 0;
> +}
> +EXPORT_SYMBOL_GPL(platform_suspend_display_off);
> +
> +int platform_suspend_display_on(void)
> +{
> + return s2idle_ops && s2idle_ops->display_on ? s2idle_ops->display_on() : 0;
> +}
> +EXPORT_SYMBOL_GPL(platform_suspend_display_on);
> +
> static int platform_suspend_prepare(suspend_state_t state)
> {
> return state != PM_SUSPEND_TO_IDLE && suspend_ops->prepare ?
^ permalink raw reply [flat|nested] 38+ messages in thread
* [PATCH v2 2/5] acpi/x86: s2idle: handle Display On/Off calls outside of suspend sequence
2024-09-22 17:22 [PATCH v2 0/5] acpi/x86: s2idle: move Display off/on calls outside suspend (fixes ROG Ally suspend) Antheas Kapenekakis
2024-09-22 17:22 ` [PATCH v2 1/5] acpi/x86: s2idle: add support for Display Off and Display On callbacks Antheas Kapenekakis
@ 2024-09-22 17:22 ` Antheas Kapenekakis
2024-09-23 16:03 ` Mario Limonciello
2024-09-22 17:22 ` [PATCH v2 3/5] acpi/x86: s2idle: add quirk table for modern standby delays Antheas Kapenekakis
` (4 subsequent siblings)
6 siblings, 1 reply; 38+ messages in thread
From: Antheas Kapenekakis @ 2024-09-22 17:22 UTC (permalink / raw)
To: linux-pm
Cc: platform-driver-x86, Mario Limonciello, luke, me, Denis Benato,
Antheas Kapenekakis
Currently, the Display On/Off calls are handled within the suspend
sequence, which is a deviation from Windows. This causes issues with
certain devices, where the notification interacts with a USB device
that expects the kernel to be fully awake.
This patch calls the Display On/Off callbacks before entering the suspend
sequence, which fixes this issue. In addition, it opens the possibility
of modelling a state such as "Screen Off" that mirrors Windows, as the
callbacks will be accessible and validated to work outside of the
suspend sequence.
Suggested-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Antheas Kapenekakis <lkml@antheas.dev>
---
kernel/power/suspend.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
index c527dc0ae5ae..610f8ecaeebd 100644
--- a/kernel/power/suspend.c
+++ b/kernel/power/suspend.c
@@ -589,6 +589,13 @@ static int enter_state(suspend_state_t state)
if (state == PM_SUSPEND_TO_IDLE)
s2idle_begin();
+ /*
+ * Linux does not have the concept of a "Screen Off" state, so call
+ * the platform functions for Display On/Off prior to the suspend
+ * sequence, mirroring Windows which calls them outside of it as well.
+ */
+ platform_suspend_display_off();
+
if (sync_on_suspend_enabled) {
trace_suspend_resume(TPS("sync_filesystems"), 0, true);
ksys_sync_helper();
@@ -616,6 +623,8 @@ static int enter_state(suspend_state_t state)
suspend_finish();
Unlock:
mutex_unlock(&system_transition_mutex);
+
+ platform_suspend_display_on();
return error;
}
--
2.46.1
^ permalink raw reply related [flat|nested] 38+ messages in thread* Re: [PATCH v2 2/5] acpi/x86: s2idle: handle Display On/Off calls outside of suspend sequence
2024-09-22 17:22 ` [PATCH v2 2/5] acpi/x86: s2idle: handle Display On/Off calls outside of suspend sequence Antheas Kapenekakis
@ 2024-09-23 16:03 ` Mario Limonciello
2024-09-23 16:15 ` Antheas Kapenekakis
0 siblings, 1 reply; 38+ messages in thread
From: Mario Limonciello @ 2024-09-23 16:03 UTC (permalink / raw)
To: Antheas Kapenekakis, linux-pm; +Cc: platform-driver-x86, luke, me, Denis Benato
On 9/22/2024 12:22, Antheas Kapenekakis wrote:
> Currently, the Display On/Off calls are handled within the suspend
> sequence, which is a deviation from Windows. This causes issues with
> certain devices, where the notification interacts with a USB device
> that expects the kernel to be fully awake.
>
> This patch calls the Display On/Off callbacks before entering the suspend
> sequence, which fixes this issue. In addition, it opens the possibility
> of modelling a state such as "Screen Off" that mirrors Windows, as the
> callbacks will be accessible and validated to work outside of the
> suspend sequence.
>
> Suggested-by: Mario Limonciello <mario.limonciello@amd.com>
> Signed-off-by: Antheas Kapenekakis <lkml@antheas.dev>
> ---
> kernel/power/suspend.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
> index c527dc0ae5ae..610f8ecaeebd 100644
> --- a/kernel/power/suspend.c
> +++ b/kernel/power/suspend.c
> @@ -589,6 +589,13 @@ static int enter_state(suspend_state_t state)
> if (state == PM_SUSPEND_TO_IDLE)
> s2idle_begin();
>
> + /*
> + * Linux does not have the concept of a "Screen Off" state, so call
> + * the platform functions for Display On/Off prior to the suspend
> + * sequence, mirroring Windows which calls them outside of it as well.
> + */
> + platform_suspend_display_off();
> +
I thought about it some more over the weekend and if moving the screen
on/off _DSM at all I don't feel this is the right place for triggering it.
IMO it should be called by DRM core. That is when the number of CRTCs
active goes 1+ -> 0 call screen off and when it goes 0->1+ call screen on.
There could be an argument made only for eDP this happens, but I think
that's a slippery slope if you end up with an AIO design that uses DP
instead of eDP or a desktop for example. So the safest policy should be
across all CRTCs of all GPUs. During the suspend sequence any that were
left on will get turned off, and then it could be triggered at that time
instead.
By making such a change then when the compositor turns off all displays
at runtime you can potentially support dark screen background downloads
that have specifically called this _DSM and any actions that are taken
from doing so.
> if (sync_on_suspend_enabled) {
> trace_suspend_resume(TPS("sync_filesystems"), 0, true);
> ksys_sync_helper();
> @@ -616,6 +623,8 @@ static int enter_state(suspend_state_t state)
> suspend_finish();
> Unlock:
> mutex_unlock(&system_transition_mutex);
> +
> + platform_suspend_display_on();
> return error;
> }
>
^ permalink raw reply [flat|nested] 38+ messages in thread* Re: [PATCH v2 2/5] acpi/x86: s2idle: handle Display On/Off calls outside of suspend sequence
2024-09-23 16:03 ` Mario Limonciello
@ 2024-09-23 16:15 ` Antheas Kapenekakis
2024-09-23 16:30 ` Mario Limonciello
0 siblings, 1 reply; 38+ messages in thread
From: Antheas Kapenekakis @ 2024-09-23 16:15 UTC (permalink / raw)
To: Mario Limonciello; +Cc: linux-pm, platform-driver-x86, luke, me, Denis Benato
Does DRM core handle virtual displays like VNC?
As mentioned in the cover letter, the _DSM specifies both virtual and
actual displays.
Also, Windows behavior is like a lockscreen. 5 seconds after screen
turns off after inactivity, instantly when you press the power button.
I tend towards making a sysfs entry. Not sure.
Antheas
On Mon, 23 Sept 2024 at 18:03, Mario Limonciello
<mario.limonciello@amd.com> wrote:
>
> On 9/22/2024 12:22, Antheas Kapenekakis wrote:
> > Currently, the Display On/Off calls are handled within the suspend
> > sequence, which is a deviation from Windows. This causes issues with
> > certain devices, where the notification interacts with a USB device
> > that expects the kernel to be fully awake.
> >
> > This patch calls the Display On/Off callbacks before entering the suspend
> > sequence, which fixes this issue. In addition, it opens the possibility
> > of modelling a state such as "Screen Off" that mirrors Windows, as the
> > callbacks will be accessible and validated to work outside of the
> > suspend sequence.
> >
> > Suggested-by: Mario Limonciello <mario.limonciello@amd.com>
> > Signed-off-by: Antheas Kapenekakis <lkml@antheas.dev>
> > ---
> > kernel/power/suspend.c | 9 +++++++++
> > 1 file changed, 9 insertions(+)
> >
> > diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
> > index c527dc0ae5ae..610f8ecaeebd 100644
> > --- a/kernel/power/suspend.c
> > +++ b/kernel/power/suspend.c
> > @@ -589,6 +589,13 @@ static int enter_state(suspend_state_t state)
> > if (state == PM_SUSPEND_TO_IDLE)
> > s2idle_begin();
> >
> > + /*
> > + * Linux does not have the concept of a "Screen Off" state, so call
> > + * the platform functions for Display On/Off prior to the suspend
> > + * sequence, mirroring Windows which calls them outside of it as well.
> > + */
> > + platform_suspend_display_off();
> > +
>
> I thought about it some more over the weekend and if moving the screen
> on/off _DSM at all I don't feel this is the right place for triggering it.
>
> IMO it should be called by DRM core. That is when the number of CRTCs
> active goes 1+ -> 0 call screen off and when it goes 0->1+ call screen on.
>
> There could be an argument made only for eDP this happens, but I think
> that's a slippery slope if you end up with an AIO design that uses DP
> instead of eDP or a desktop for example. So the safest policy should be
> across all CRTCs of all GPUs. During the suspend sequence any that were
> left on will get turned off, and then it could be triggered at that time
> instead.
>
> By making such a change then when the compositor turns off all displays
> at runtime you can potentially support dark screen background downloads
> that have specifically called this _DSM and any actions that are taken
> from doing so.
>
> > if (sync_on_suspend_enabled) {
> > trace_suspend_resume(TPS("sync_filesystems"), 0, true);
> > ksys_sync_helper();
> > @@ -616,6 +623,8 @@ static int enter_state(suspend_state_t state)
> > suspend_finish();
> > Unlock:
> > mutex_unlock(&system_transition_mutex);
> > +
> > + platform_suspend_display_on();
> > return error;
> > }
> >
>
^ permalink raw reply [flat|nested] 38+ messages in thread* Re: [PATCH v2 2/5] acpi/x86: s2idle: handle Display On/Off calls outside of suspend sequence
2024-09-23 16:15 ` Antheas Kapenekakis
@ 2024-09-23 16:30 ` Mario Limonciello
2024-09-23 16:43 ` Antheas Kapenekakis
0 siblings, 1 reply; 38+ messages in thread
From: Mario Limonciello @ 2024-09-23 16:30 UTC (permalink / raw)
To: Antheas Kapenekakis; +Cc: linux-pm, platform-driver-x86, luke, me, Denis Benato
On 9/23/2024 11:15, Antheas Kapenekakis wrote:
> Does DRM core handle virtual displays like VNC?
>
You can make use of virtual display connectors in amdgpu. This is how
drivers for new ASICs are first developed in emulation and also what's
used for early hardware bring up.
You can see virtual_display from
https://www.kernel.org/doc/html/v6.11/gpu/amdgpu/module-parameters.html
for more details.
> As mentioned in the cover letter, the _DSM specifies both virtual and
> actual displays.
>
> Also, Windows behavior is like a lockscreen. 5 seconds after screen
> turns off after inactivity, instantly when you press the power button.
>
> I tend towards making a sysfs entry. Not sure.
>
Who would call this sysfs file? Systemd? The compositor? When?
In Linux the compositor is in charge of doing the modesets that involve
turning on/off the screen. In most cases if you press the power button
in Linux systemd-logind picks up the event. It triggers a behavior
that's controlled by the logind configuration. Typically that's turning
on a lockscreen and/or starting the suspend sequence.
Important to note; it DOESN'T explicitly turn off displays. If you
configured it to suspend then displays get turned off as part of the
kernel suspend sequence (drm_atomic_helper_disable_all).
If it is configured to trigger a lockscreen then the compositor will
turn off displays after whatever the DPMS configuration is set to.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v2 2/5] acpi/x86: s2idle: handle Display On/Off calls outside of suspend sequence
2024-09-23 16:30 ` Mario Limonciello
@ 2024-09-23 16:43 ` Antheas Kapenekakis
2024-09-23 17:01 ` Antheas Kapenekakis
2024-09-23 17:05 ` Mario Limonciello
0 siblings, 2 replies; 38+ messages in thread
From: Antheas Kapenekakis @ 2024-09-23 16:43 UTC (permalink / raw)
To: Mario Limonciello; +Cc: linux-pm, platform-driver-x86, luke, me, Denis Benato
> On 9/23/2024 11:15, Antheas Kapenekakis wrote:
> > Does DRM core handle virtual displays like VNC?
> >
>
> You can make use of virtual display connectors in amdgpu. This is how
> drivers for new ASICs are first developed in emulation and also what's
> used for early hardware bring up.
Microsoft specificies all displays "state where all displays—local and
remote, if any—have been turned off"
If any means that no displays = no call perhaps. Would be interesting
when designing a system for disabled people though. Given how it
interacts in Windows, I want to say the target here is inactivity.
> > As mentioned in the cover letter, the _DSM specifies both virtual and
> > actual displays.
> >
> > Also, Windows behavior is like a lockscreen. 5 seconds after screen
> > turns off after inactivity, instantly when you press the power button.
> >
> > I tend towards making a sysfs entry. Not sure.
> >
>
> Who would call this sysfs file? Systemd? The compositor? When?
>
> In Linux the compositor is in charge of doing the modesets that involve
> turning on/off the screen. In most cases if you press the power button
> in Linux systemd-logind picks up the event. It triggers a behavior
> that's controlled by the logind configuration. Typically that's turning
> on a lockscreen and/or starting the suspend sequence.
That's where it gets hairy and an area WIndows uniquely does not have
a problem with because the OS is monolithic.
If it were me, yes, systemd would switch the system to the inactive
"Screen Off" state 5 seconds after the system displays are off due to
inactivity. If we are talking about implementing something Modern
Standby-like, then pressing the powerbutton would no longer suspend
the device. Instead systemd would use two tiers of activators like
windows (first tier for "Screen Off", second tier for "Sleep"; right
now there is only one tier that mirrors screen off) and once all those
activators are released, suspend the kernel.
Then it would handle the transition from "Active" to "Screen Off" to
"Sleep" through a sysfs endpoint, with the platform responding by
e.g., lowering TDP and using a different fan curve.
If the kernel is asked to suspend outside of the Sleep state, then it
does the transitions to reach that state and references the quirk
table to avoid racing conditions in manufacturer firmware (not with
the kernel), before it suspends.
> Important to note; it DOESN'T explicitly turn off displays. If you
> configured it to suspend then displays get turned off as part of the
> kernel suspend sequence (drm_atomic_helper_disable_all).
>
> If it is configured to trigger a lockscreen then the compositor will
> turn off displays after whatever the DPMS configuration is set to.
The problem with a DRM atomic helper is that we cannot control how it
is called and do debouncing. WIndows does debouncing (part of that is
waiting for 5 seconds so that if you move the mouse the call is
skipped). You could have edge conditions that spam the calls.
Antheas
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v2 2/5] acpi/x86: s2idle: handle Display On/Off calls outside of suspend sequence
2024-09-23 16:43 ` Antheas Kapenekakis
@ 2024-09-23 17:01 ` Antheas Kapenekakis
2024-09-23 17:05 ` Mario Limonciello
1 sibling, 0 replies; 38+ messages in thread
From: Antheas Kapenekakis @ 2024-09-23 17:01 UTC (permalink / raw)
To: Mario Limonciello; +Cc: linux-pm, platform-driver-x86, luke, me, Denis Benato
Here I should note that I will not wait for systemd.
As part of adding support for these devices with Handheld Daemon, I
snag the powerbutton already and I only have one userspace app running
anyway, Steam. The idea here would be double tap the powerbutton to
enter a "Download Assistant", custom lockscreen pops up, and two
seconds later DPMS triggers, system enters Sleep state and either when
a battery target is reached or the download finishes, it transitions
to actual suspend.
The challenges here would be if the _DSM calls for Display Off and
Sleep Entry are a nothingburger and do not save much battery and that
Steam uses very heavy compression that does a lot of CPU.
If I find it not useful, maybe I will skip it.
Antheas
On Mon, 23 Sept 2024 at 18:43, Antheas Kapenekakis <lkml@antheas.dev> wrote:
>
> > On 9/23/2024 11:15, Antheas Kapenekakis wrote:
> > > Does DRM core handle virtual displays like VNC?
> > >
> >
> > You can make use of virtual display connectors in amdgpu. This is how
> > drivers for new ASICs are first developed in emulation and also what's
> > used for early hardware bring up.
>
> Microsoft specificies all displays "state where all displays—local and
> remote, if any—have been turned off"
>
> If any means that no displays = no call perhaps. Would be interesting
> when designing a system for disabled people though. Given how it
> interacts in Windows, I want to say the target here is inactivity.
>
> > > As mentioned in the cover letter, the _DSM specifies both virtual and
> > > actual displays.
> > >
> > > Also, Windows behavior is like a lockscreen. 5 seconds after screen
> > > turns off after inactivity, instantly when you press the power button.
> > >
> > > I tend towards making a sysfs entry. Not sure.
> > >
> >
> > Who would call this sysfs file? Systemd? The compositor? When?
> >
> > In Linux the compositor is in charge of doing the modesets that involve
> > turning on/off the screen. In most cases if you press the power button
> > in Linux systemd-logind picks up the event. It triggers a behavior
> > that's controlled by the logind configuration. Typically that's turning
> > on a lockscreen and/or starting the suspend sequence.
>
> That's where it gets hairy and an area WIndows uniquely does not have
> a problem with because the OS is monolithic.
>
> If it were me, yes, systemd would switch the system to the inactive
> "Screen Off" state 5 seconds after the system displays are off due to
> inactivity. If we are talking about implementing something Modern
> Standby-like, then pressing the powerbutton would no longer suspend
> the device. Instead systemd would use two tiers of activators like
> windows (first tier for "Screen Off", second tier for "Sleep"; right
> now there is only one tier that mirrors screen off) and once all those
> activators are released, suspend the kernel.
>
> Then it would handle the transition from "Active" to "Screen Off" to
> "Sleep" through a sysfs endpoint, with the platform responding by
> e.g., lowering TDP and using a different fan curve.
>
> If the kernel is asked to suspend outside of the Sleep state, then it
> does the transitions to reach that state and references the quirk
> table to avoid racing conditions in manufacturer firmware (not with
> the kernel), before it suspends.
>
> > Important to note; it DOESN'T explicitly turn off displays. If you
> > configured it to suspend then displays get turned off as part of the
> > kernel suspend sequence (drm_atomic_helper_disable_all).
> >
> > If it is configured to trigger a lockscreen then the compositor will
> > turn off displays after whatever the DPMS configuration is set to.
>
> The problem with a DRM atomic helper is that we cannot control how it
> is called and do debouncing. WIndows does debouncing (part of that is
> waiting for 5 seconds so that if you move the mouse the call is
> skipped). You could have edge conditions that spam the calls.
>
> Antheas
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v2 2/5] acpi/x86: s2idle: handle Display On/Off calls outside of suspend sequence
2024-09-23 16:43 ` Antheas Kapenekakis
2024-09-23 17:01 ` Antheas Kapenekakis
@ 2024-09-23 17:05 ` Mario Limonciello
2024-09-23 17:13 ` Antheas Kapenekakis
1 sibling, 1 reply; 38+ messages in thread
From: Mario Limonciello @ 2024-09-23 17:05 UTC (permalink / raw)
To: Antheas Kapenekakis; +Cc: linux-pm, platform-driver-x86, luke, me, Denis Benato
On 9/23/2024 11:43, Antheas Kapenekakis wrote:
>
> If it were me, yes, systemd would switch the system to the inactive
> "Screen Off" state 5 seconds after the system displays are off due to
> inactivity. If we are talking about implementing something Modern
> Standby-like, then pressing the powerbutton would no longer suspend
> the device. Instead systemd would use two tiers of activators like
> windows (first tier for "Screen Off", second tier for "Sleep"; right
> now there is only one tier that mirrors screen off) and once all those
> activators are released, suspend the kernel.
>
> Then it would handle the transition from "Active" to "Screen Off" to
> "Sleep" through a sysfs endpoint, with the platform responding by
> e.g., lowering TDP and using a different fan curve.
>
> If the kernel is asked to suspend outside of the Sleep state, then it
> does the transitions to reach that state and references the quirk
> table to avoid racing conditions in manufacturer firmware (not with
> the kernel), before it suspends.
>
>> Important to note; it DOESN'T explicitly turn off displays. If you
>> configured it to suspend then displays get turned off as part of the
>> kernel suspend sequence (drm_atomic_helper_disable_all).
>>
>> If it is configured to trigger a lockscreen then the compositor will
>> turn off displays after whatever the DPMS configuration is set to.
>
> The problem with a DRM atomic helper is that we cannot control how it
> is called and do debouncing. WIndows does debouncing (part of that is
> waiting for 5 seconds so that if you move the mouse the call is
> skipped). You could have edge conditions that spam the calls.
>
> Antheas
I'm not meaning that anything in userspace SHOULD be calling
`drm_atomic_helper_disable_all`, my point was that this is how it's
normally called in the suspend sequence. Folks who work on the
compositors don't like anyone other than the kernel touching their
configuration at runtime.
I think a lot of what you're looking for is the concept of a systemwide
"userspace only" suspend sequence. A lot of devices already support
runtime PM and will already go into the low power state when not in use.
For example USB4 routers you'll see in D3 until you plug something
into the USB4 port.
IMO your proposal needs to cross at least 3 projects. Here's my
thoughts on it.
1) systemd
* To be able to handle behaviors associates with a dark screen only
suspend/resume. I did start a discussion on dark resume some time
back but it went nowhere. [1]
* Make sure that all devices have been put into runtime PM.
* Notify compositor that screens should be turned off.
* Manage transitions from dark screen to full suspend and vice versa.
2) Kernel
A. To be able to notify the _DSM at the right time that all CRTCs have
been turned off).
B. To be able to notify the _DSM at the right time that at least one
CRTC is now on.
3) Wayland protocols
Introduce a new protocol for requesting userspace suspend.
To notify that dark screen only suspend is being triggered.
4) Compositor use.
All the popular compositors would need to add support for the new
protocol. IE Gamescope, mutter, kwin.
This isn't a trivial effort, there are a lot of people to convince. I
think the lowest effort is to start with kernel. IE having DRM core
call the _DSM when the CRTCs are off. If you get that far, you can at
least get this power saving behavior when DPMS is enacted. Then you can
work with systemd and wayland protocol folks.
[1] https://github.com/systemd/systemd/issues/27077
^ permalink raw reply [flat|nested] 38+ messages in thread* Re: [PATCH v2 2/5] acpi/x86: s2idle: handle Display On/Off calls outside of suspend sequence
2024-09-23 17:05 ` Mario Limonciello
@ 2024-09-23 17:13 ` Antheas Kapenekakis
0 siblings, 0 replies; 38+ messages in thread
From: Antheas Kapenekakis @ 2024-09-23 17:13 UTC (permalink / raw)
To: Mario Limonciello; +Cc: linux-pm, platform-driver-x86, luke, me, Denis Benato
Did not mean how it is called as who will call it. But as in the way
it is called does not match the desired behavior.
In any case, as I said, I am happy to work in a downstream POC. My
target usecase is very simple and I do not need/want to tie Display
Off to CRTC.
ie, I will do 2 and 4. I am already familiar with gamescope (including
having two sockets open to it at any given time). Then if we get
interesting results, we move on from there.
As I said I also want to catch the Sleep _DSMs. I do not care about
Display On/Off other than calling it properly (where properly = as in
Windows + a lot of debouncing) so that these handhelds do not get
confused.
Antheas
On Mon, 23 Sept 2024 at 19:05, Mario Limonciello
<mario.limonciello@amd.com> wrote:
>
> On 9/23/2024 11:43, Antheas Kapenekakis wrote:
> >
> > If it were me, yes, systemd would switch the system to the inactive
> > "Screen Off" state 5 seconds after the system displays are off due to
> > inactivity. If we are talking about implementing something Modern
> > Standby-like, then pressing the powerbutton would no longer suspend
> > the device. Instead systemd would use two tiers of activators like
> > windows (first tier for "Screen Off", second tier for "Sleep"; right
> > now there is only one tier that mirrors screen off) and once all those
> > activators are released, suspend the kernel.
> >
> > Then it would handle the transition from "Active" to "Screen Off" to
> > "Sleep" through a sysfs endpoint, with the platform responding by
> > e.g., lowering TDP and using a different fan curve.
> >
> > If the kernel is asked to suspend outside of the Sleep state, then it
> > does the transitions to reach that state and references the quirk
> > table to avoid racing conditions in manufacturer firmware (not with
> > the kernel), before it suspends.
> >
> >> Important to note; it DOESN'T explicitly turn off displays. If you
> >> configured it to suspend then displays get turned off as part of the
> >> kernel suspend sequence (drm_atomic_helper_disable_all).
> >>
> >> If it is configured to trigger a lockscreen then the compositor will
> >> turn off displays after whatever the DPMS configuration is set to.
> >
> > The problem with a DRM atomic helper is that we cannot control how it
> > is called and do debouncing. WIndows does debouncing (part of that is
> > waiting for 5 seconds so that if you move the mouse the call is
> > skipped). You could have edge conditions that spam the calls.
> >
> > Antheas
>
> I'm not meaning that anything in userspace SHOULD be calling
> `drm_atomic_helper_disable_all`, my point was that this is how it's
> normally called in the suspend sequence. Folks who work on the
> compositors don't like anyone other than the kernel touching their
> configuration at runtime.
>
> I think a lot of what you're looking for is the concept of a systemwide
> "userspace only" suspend sequence. A lot of devices already support
> runtime PM and will already go into the low power state when not in use.
> For example USB4 routers you'll see in D3 until you plug something
> into the USB4 port.
>
> IMO your proposal needs to cross at least 3 projects. Here's my
> thoughts on it.
>
> 1) systemd
> * To be able to handle behaviors associates with a dark screen only
> suspend/resume. I did start a discussion on dark resume some time
> back but it went nowhere. [1]
>
> * Make sure that all devices have been put into runtime PM.
> * Notify compositor that screens should be turned off.
> * Manage transitions from dark screen to full suspend and vice versa.
>
> 2) Kernel
> A. To be able to notify the _DSM at the right time that all CRTCs have
> been turned off).
> B. To be able to notify the _DSM at the right time that at least one
> CRTC is now on.
>
> 3) Wayland protocols
> Introduce a new protocol for requesting userspace suspend.
> To notify that dark screen only suspend is being triggered.
> 4) Compositor use.
> All the popular compositors would need to add support for the new
> protocol. IE Gamescope, mutter, kwin.
>
> This isn't a trivial effort, there are a lot of people to convince. I
> think the lowest effort is to start with kernel. IE having DRM core
> call the _DSM when the CRTCs are off. If you get that far, you can at
> least get this power saving behavior when DPMS is enacted. Then you can
> work with systemd and wayland protocol folks.
>
>
> [1] https://github.com/systemd/systemd/issues/27077
^ permalink raw reply [flat|nested] 38+ messages in thread
* [PATCH v2 3/5] acpi/x86: s2idle: add quirk table for modern standby delays
2024-09-22 17:22 [PATCH v2 0/5] acpi/x86: s2idle: move Display off/on calls outside suspend (fixes ROG Ally suspend) Antheas Kapenekakis
2024-09-22 17:22 ` [PATCH v2 1/5] acpi/x86: s2idle: add support for Display Off and Display On callbacks Antheas Kapenekakis
2024-09-22 17:22 ` [PATCH v2 2/5] acpi/x86: s2idle: handle Display On/Off calls outside of suspend sequence Antheas Kapenekakis
@ 2024-09-22 17:22 ` Antheas Kapenekakis
2024-09-22 17:54 ` Denis Benato
2024-09-22 17:22 ` [PATCH v2 4/5] acpi/x86: s2idle: call Display On/Off as part of callbacks and rename Antheas Kapenekakis
` (3 subsequent siblings)
6 siblings, 1 reply; 38+ messages in thread
From: Antheas Kapenekakis @ 2024-09-22 17:22 UTC (permalink / raw)
To: linux-pm
Cc: platform-driver-x86, Mario Limonciello, luke, me, Denis Benato,
Antheas Kapenekakis
Unfortunately, some modern standby systems, including the ROG Ally, rely
on a delay between modern standby transitions. Add a quirk table for
introducing delays between modern standby transitions, and quirk the
ROG Ally on "Display Off", which needs a bit of time to turn off its
controllers prior to suspending (i.e., entering DRIPS).
Reported-by: Denis Benato <benato.denis96@gmail.com>
Signed-off-by: Antheas Kapenekakis <lkml@antheas.dev>
---
include/linux/suspend.h | 5 +++++
kernel/power/suspend.c | 41 +++++++++++++++++++++++++++++++++++++++++
2 files changed, 46 insertions(+)
diff --git a/include/linux/suspend.h b/include/linux/suspend.h
index 8f33249cc067..d7e2a4d8ab0c 100644
--- a/include/linux/suspend.h
+++ b/include/linux/suspend.h
@@ -144,6 +144,11 @@ struct platform_s2idle_ops {
int (*display_on)(void);
};
+struct platform_s2idle_quirks {
+ int delay_display_off;
+ int delay_display_on;
+};
+
#ifdef CONFIG_SUSPEND
extern suspend_state_t pm_suspend_target_state;
extern suspend_state_t mem_sleep_current;
diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
index 610f8ecaeebd..af2abdd2f8c3 100644
--- a/kernel/power/suspend.c
+++ b/kernel/power/suspend.c
@@ -11,6 +11,7 @@
#include <linux/string.h>
#include <linux/delay.h>
+#include <linux/dmi.h>
#include <linux/errno.h>
#include <linux/init.h>
#include <linux/console.h>
@@ -61,6 +62,30 @@ static DECLARE_SWAIT_QUEUE_HEAD(s2idle_wait_head);
enum s2idle_states __read_mostly s2idle_state;
static DEFINE_RAW_SPINLOCK(s2idle_lock);
+// The ROG Ally series disconnects its controllers on Display Off, without
+// holding a lock, introducing a race condition. Add a delay to allow the
+// controller to disconnect cleanly prior to suspend.
+static const struct platform_s2idle_quirks rog_ally_quirks = {
+ .delay_display_off = 500,
+};
+
+static const struct dmi_system_id platform_s2idle_quirks[] = {
+ {
+ .matches = {
+ DMI_MATCH(DMI_BOARD_NAME, "RC71L"),
+ },
+ .driver_data = (void *)&rog_ally_quirks
+ },
+ {
+ .matches = {
+ DMI_MATCH(DMI_BOARD_NAME, "RC72L"),
+ },
+ .driver_data = (void *)&rog_ally_quirks
+ },
+ {}
+};
+
+
/**
* pm_suspend_default_s2idle - Check if suspend-to-idle is the default suspend.
*
@@ -589,12 +614,26 @@ static int enter_state(suspend_state_t state)
if (state == PM_SUSPEND_TO_IDLE)
s2idle_begin();
+ /*
+ * Windows transitions between Modern Standby states slowly, over multiple
+ * seconds. Certain manufacturers may rely on this, introducing race
+ * conditions. Until Linux can support modern standby, add the relevant
+ * delays between transitions here.
+ */
+ const struct dmi_system_id *s2idle_sysid = dmi_first_match(
+ platform_s2idle_quirks
+ );
+ const struct platform_s2idle_quirks *s2idle_quirks = s2idle_sysid ?
+ s2idle_sysid->driver_data : NULL;
+
/*
* Linux does not have the concept of a "Screen Off" state, so call
* the platform functions for Display On/Off prior to the suspend
* sequence, mirroring Windows which calls them outside of it as well.
*/
platform_suspend_display_off();
+ if (s2idle_quirks && s2idle_quirks->delay_display_off)
+ msleep(s2idle_quirks->delay_display_off);
if (sync_on_suspend_enabled) {
trace_suspend_resume(TPS("sync_filesystems"), 0, true);
@@ -624,6 +663,8 @@ static int enter_state(suspend_state_t state)
Unlock:
mutex_unlock(&system_transition_mutex);
+ if (s2idle_quirks && s2idle_quirks->delay_display_on)
+ msleep(s2idle_quirks->delay_display_on);
platform_suspend_display_on();
return error;
}
--
2.46.1
^ permalink raw reply related [flat|nested] 38+ messages in thread* Re: [PATCH v2 3/5] acpi/x86: s2idle: add quirk table for modern standby delays
2024-09-22 17:22 ` [PATCH v2 3/5] acpi/x86: s2idle: add quirk table for modern standby delays Antheas Kapenekakis
@ 2024-09-22 17:54 ` Denis Benato
2024-09-22 18:00 ` Antheas Kapenekakis
0 siblings, 1 reply; 38+ messages in thread
From: Denis Benato @ 2024-09-22 17:54 UTC (permalink / raw)
To: Antheas Kapenekakis, linux-pm
Cc: platform-driver-x86, Mario Limonciello, luke, me
On 22/09/24 19:22, Antheas Kapenekakis wrote:
> Unfortunately, some modern standby systems, including the ROG Ally, rely
> on a delay between modern standby transitions. Add a quirk table for
> introducing delays between modern standby transitions, and quirk the
> ROG Ally on "Display Off", which needs a bit of time to turn off its
> controllers prior to suspending (i.e., entering DRIPS).
>
> Reported-by: Denis Benato <benato.denis96@gmail.com>
I told you privately that as stated here: https://www.kernel.org/doc/html/latest/process/submitting-patches.html#using-reported-by-tested-by-reviewed-by-suggested-by-and-fixes
Reported-By tags are to be followed by a Closes tag stating that the issue is solved.
What has been presented to me today was not about solving bugs, but changing how they manifests and therefore permission was not granted to you to represent me as satisfied with the work as it would be wrong to assume issues are solved and this is worth merging.
Furthermore everybody can see my answers to your v1 patch and there is no need to attach a Reported-by when the issue has been reported publicly and is not resolved.
I will add my Tested-off-by by myself whenever I will be fully satisfied by work presented: in the current state I am not.
> Signed-off-by: Antheas Kapenekakis <lkml@antheas.dev>
> ---
> include/linux/suspend.h | 5 +++++
> kernel/power/suspend.c | 41 +++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 46 insertions(+)
>
> diff --git a/include/linux/suspend.h b/include/linux/suspend.h
> index 8f33249cc067..d7e2a4d8ab0c 100644
> --- a/include/linux/suspend.h
> +++ b/include/linux/suspend.h
> @@ -144,6 +144,11 @@ struct platform_s2idle_ops {
> int (*display_on)(void);
> };
>
> +struct platform_s2idle_quirks {
> + int delay_display_off;
> + int delay_display_on;
> +};
> +
> #ifdef CONFIG_SUSPEND
> extern suspend_state_t pm_suspend_target_state;
> extern suspend_state_t mem_sleep_current;
> diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
> index 610f8ecaeebd..af2abdd2f8c3 100644
> --- a/kernel/power/suspend.c
> +++ b/kernel/power/suspend.c
> @@ -11,6 +11,7 @@
>
> #include <linux/string.h>
> #include <linux/delay.h>
> +#include <linux/dmi.h>
> #include <linux/errno.h>
> #include <linux/init.h>
> #include <linux/console.h>
> @@ -61,6 +62,30 @@ static DECLARE_SWAIT_QUEUE_HEAD(s2idle_wait_head);
> enum s2idle_states __read_mostly s2idle_state;
> static DEFINE_RAW_SPINLOCK(s2idle_lock);
>
> +// The ROG Ally series disconnects its controllers on Display Off, without
> +// holding a lock, introducing a race condition. Add a delay to allow the
> +// controller to disconnect cleanly prior to suspend.
> +static const struct platform_s2idle_quirks rog_ally_quirks = {
> + .delay_display_off = 500,
> +};
> +
> +static const struct dmi_system_id platform_s2idle_quirks[] = {
> + {
> + .matches = {
> + DMI_MATCH(DMI_BOARD_NAME, "RC71L"),
> + },
> + .driver_data = (void *)&rog_ally_quirks
> + },
> + {
> + .matches = {
> + DMI_MATCH(DMI_BOARD_NAME, "RC72L"),
> + },
> + .driver_data = (void *)&rog_ally_quirks
> + },
> + {}
> +};
> +
> +
> /**
> * pm_suspend_default_s2idle - Check if suspend-to-idle is the default suspend.
> *
> @@ -589,12 +614,26 @@ static int enter_state(suspend_state_t state)
> if (state == PM_SUSPEND_TO_IDLE)
> s2idle_begin();
>
> + /*
> + * Windows transitions between Modern Standby states slowly, over multiple
> + * seconds. Certain manufacturers may rely on this, introducing race
> + * conditions. Until Linux can support modern standby, add the relevant
> + * delays between transitions here.
> + */
> + const struct dmi_system_id *s2idle_sysid = dmi_first_match(
> + platform_s2idle_quirks
> + );
> + const struct platform_s2idle_quirks *s2idle_quirks = s2idle_sysid ?
> + s2idle_sysid->driver_data : NULL;
> +
> /*
> * Linux does not have the concept of a "Screen Off" state, so call
> * the platform functions for Display On/Off prior to the suspend
> * sequence, mirroring Windows which calls them outside of it as well.
> */
> platform_suspend_display_off();
> + if (s2idle_quirks && s2idle_quirks->delay_display_off)
> + msleep(s2idle_quirks->delay_display_off);
>
> if (sync_on_suspend_enabled) {
> trace_suspend_resume(TPS("sync_filesystems"), 0, true);
> @@ -624,6 +663,8 @@ static int enter_state(suspend_state_t state)
> Unlock:
> mutex_unlock(&system_transition_mutex);
>
> + if (s2idle_quirks && s2idle_quirks->delay_display_on)
> + msleep(s2idle_quirks->delay_display_on);
> platform_suspend_display_on();
> return error;
> }
^ permalink raw reply [flat|nested] 38+ messages in thread* Re: [PATCH v2 3/5] acpi/x86: s2idle: add quirk table for modern standby delays
2024-09-22 17:54 ` Denis Benato
@ 2024-09-22 18:00 ` Antheas Kapenekakis
0 siblings, 0 replies; 38+ messages in thread
From: Antheas Kapenekakis @ 2024-09-22 18:00 UTC (permalink / raw)
To: Denis Benato; +Cc: linux-pm, platform-driver-x86, Mario Limonciello, luke, me
Hi Denis,
I did not add a Tested-by for this reason. The Report-by is to credit
you for bringing this into my attention, I thought you were ok with
this. If not, I will remove it in the next revision.
Best,
Antheas
On Sun, 22 Sept 2024 at 19:55, Denis Benato <benato.denis96@gmail.com> wrote:
>
> On 22/09/24 19:22, Antheas Kapenekakis wrote:
> > Unfortunately, some modern standby systems, including the ROG Ally, rely
> > on a delay between modern standby transitions. Add a quirk table for
> > introducing delays between modern standby transitions, and quirk the
> > ROG Ally on "Display Off", which needs a bit of time to turn off its
> > controllers prior to suspending (i.e., entering DRIPS).
> >
> > Reported-by: Denis Benato <benato.denis96@gmail.com>
>
> I told you privately that as stated here: https://www.kernel.org/doc/html/latest/process/submitting-patches.html#using-reported-by-tested-by-reviewed-by-suggested-by-and-fixes
> Reported-By tags are to be followed by a Closes tag stating that the issue is solved.
>
> What has been presented to me today was not about solving bugs, but changing how they manifests and therefore permission was not granted to you to represent me as satisfied with the work as it would be wrong to assume issues are solved and this is worth merging.
>
> Furthermore everybody can see my answers to your v1 patch and there is no need to attach a Reported-by when the issue has been reported publicly and is not resolved.
>
> I will add my Tested-off-by by myself whenever I will be fully satisfied by work presented: in the current state I am not.
>
> > Signed-off-by: Antheas Kapenekakis <lkml@antheas.dev>
> > ---
> > include/linux/suspend.h | 5 +++++
> > kernel/power/suspend.c | 41 +++++++++++++++++++++++++++++++++++++++++
> > 2 files changed, 46 insertions(+)
> >
> > diff --git a/include/linux/suspend.h b/include/linux/suspend.h
> > index 8f33249cc067..d7e2a4d8ab0c 100644
> > --- a/include/linux/suspend.h
> > +++ b/include/linux/suspend.h
> > @@ -144,6 +144,11 @@ struct platform_s2idle_ops {
> > int (*display_on)(void);
> > };
> >
> > +struct platform_s2idle_quirks {
> > + int delay_display_off;
> > + int delay_display_on;
> > +};
> > +
> > #ifdef CONFIG_SUSPEND
> > extern suspend_state_t pm_suspend_target_state;
> > extern suspend_state_t mem_sleep_current;
> > diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
> > index 610f8ecaeebd..af2abdd2f8c3 100644
> > --- a/kernel/power/suspend.c
> > +++ b/kernel/power/suspend.c
> > @@ -11,6 +11,7 @@
> >
> > #include <linux/string.h>
> > #include <linux/delay.h>
> > +#include <linux/dmi.h>
> > #include <linux/errno.h>
> > #include <linux/init.h>
> > #include <linux/console.h>
> > @@ -61,6 +62,30 @@ static DECLARE_SWAIT_QUEUE_HEAD(s2idle_wait_head);
> > enum s2idle_states __read_mostly s2idle_state;
> > static DEFINE_RAW_SPINLOCK(s2idle_lock);
> >
> > +// The ROG Ally series disconnects its controllers on Display Off, without
> > +// holding a lock, introducing a race condition. Add a delay to allow the
> > +// controller to disconnect cleanly prior to suspend.
> > +static const struct platform_s2idle_quirks rog_ally_quirks = {
> > + .delay_display_off = 500,
> > +};
> > +
> > +static const struct dmi_system_id platform_s2idle_quirks[] = {
> > + {
> > + .matches = {
> > + DMI_MATCH(DMI_BOARD_NAME, "RC71L"),
> > + },
> > + .driver_data = (void *)&rog_ally_quirks
> > + },
> > + {
> > + .matches = {
> > + DMI_MATCH(DMI_BOARD_NAME, "RC72L"),
> > + },
> > + .driver_data = (void *)&rog_ally_quirks
> > + },
> > + {}
> > +};
> > +
> > +
> > /**
> > * pm_suspend_default_s2idle - Check if suspend-to-idle is the default suspend.
> > *
> > @@ -589,12 +614,26 @@ static int enter_state(suspend_state_t state)
> > if (state == PM_SUSPEND_TO_IDLE)
> > s2idle_begin();
> >
> > + /*
> > + * Windows transitions between Modern Standby states slowly, over multiple
> > + * seconds. Certain manufacturers may rely on this, introducing race
> > + * conditions. Until Linux can support modern standby, add the relevant
> > + * delays between transitions here.
> > + */
> > + const struct dmi_system_id *s2idle_sysid = dmi_first_match(
> > + platform_s2idle_quirks
> > + );
> > + const struct platform_s2idle_quirks *s2idle_quirks = s2idle_sysid ?
> > + s2idle_sysid->driver_data : NULL;
> > +
> > /*
> > * Linux does not have the concept of a "Screen Off" state, so call
> > * the platform functions for Display On/Off prior to the suspend
> > * sequence, mirroring Windows which calls them outside of it as well.
> > */
> > platform_suspend_display_off();
> > + if (s2idle_quirks && s2idle_quirks->delay_display_off)
> > + msleep(s2idle_quirks->delay_display_off);
> >
> > if (sync_on_suspend_enabled) {
> > trace_suspend_resume(TPS("sync_filesystems"), 0, true);
> > @@ -624,6 +663,8 @@ static int enter_state(suspend_state_t state)
> > Unlock:
> > mutex_unlock(&system_transition_mutex);
> >
> > + if (s2idle_quirks && s2idle_quirks->delay_display_on)
> > + msleep(s2idle_quirks->delay_display_on);
> > platform_suspend_display_on();
> > return error;
> > }
>
^ permalink raw reply [flat|nested] 38+ messages in thread
* [PATCH v2 4/5] acpi/x86: s2idle: call Display On/Off as part of callbacks and rename
2024-09-22 17:22 [PATCH v2 0/5] acpi/x86: s2idle: move Display off/on calls outside suspend (fixes ROG Ally suspend) Antheas Kapenekakis
` (2 preceding siblings ...)
2024-09-22 17:22 ` [PATCH v2 3/5] acpi/x86: s2idle: add quirk table for modern standby delays Antheas Kapenekakis
@ 2024-09-22 17:22 ` Antheas Kapenekakis
2024-09-22 17:22 ` [PATCH v2 5/5] platform/x86: asus-wmi: remove Ally (1st gen) and Ally X suspend quirk Antheas Kapenekakis
` (2 subsequent siblings)
6 siblings, 0 replies; 38+ messages in thread
From: Antheas Kapenekakis @ 2024-09-22 17:22 UTC (permalink / raw)
To: linux-pm
Cc: platform-driver-x86, Mario Limonciello, luke, me, Denis Benato,
Antheas Kapenekakis
Move the Display On/Off notifications into dedicated callbacks that gate
the ACPI mutex, so they can be called outside of the suspend path.
This fixes issues on certain devices that expect kernel drivers to be
fully active during the calls, and allows for the flexibility of calling
them as part of a more elaborate userspace suspend sequence (such as
with "Screen Off" in Windows Modern Standby).
In addition, rename the notifications from "screen_" to "display_", as
there is no documentation referring to them as screen, either by
Intel or Microsoft.
Co-developed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Antheas Kapenekakis <lkml@antheas.dev>
---
drivers/acpi/x86/s2idle.c | 89 +++++++++++++++++++++++++++------------
1 file changed, 62 insertions(+), 27 deletions(-)
diff --git a/drivers/acpi/x86/s2idle.c b/drivers/acpi/x86/s2idle.c
index dd0b40b9bbe8..a17e28b91326 100644
--- a/drivers/acpi/x86/s2idle.c
+++ b/drivers/acpi/x86/s2idle.c
@@ -39,8 +39,8 @@ static const struct acpi_device_id lps0_device_ids[] = {
#define ACPI_LPS0_DSM_UUID "c4eb40a0-6cd2-11e2-bcfd-0800200c9a66"
#define ACPI_LPS0_GET_DEVICE_CONSTRAINTS 1
-#define ACPI_LPS0_SCREEN_OFF 3
-#define ACPI_LPS0_SCREEN_ON 4
+#define ACPI_LPS0_DISPLAY_OFF 3
+#define ACPI_LPS0_DISPLAY_ON 4
#define ACPI_LPS0_ENTRY 5
#define ACPI_LPS0_EXIT 6
#define ACPI_LPS0_MS_ENTRY 7
@@ -50,8 +50,8 @@ static const struct acpi_device_id lps0_device_ids[] = {
#define ACPI_LPS0_DSM_UUID_AMD "e3f32452-febc-43ce-9039-932122d37721"
#define ACPI_LPS0_ENTRY_AMD 2
#define ACPI_LPS0_EXIT_AMD 3
-#define ACPI_LPS0_SCREEN_OFF_AMD 4
-#define ACPI_LPS0_SCREEN_ON_AMD 5
+#define ACPI_LPS0_DISPLAY_OFF_AMD 4
+#define ACPI_LPS0_DISPLAY_ON_AMD 5
static acpi_handle lps0_device_handle;
static guid_t lps0_dsm_guid;
@@ -60,6 +60,7 @@ static int lps0_dsm_func_mask;
static guid_t lps0_dsm_guid_microsoft;
static int lps0_dsm_func_mask_microsoft;
static int lps0_dsm_state;
+static bool lsp0_dsm_in_display_off;
/* Device constraint entry structure */
struct lpi_device_info {
@@ -361,9 +362,9 @@ static const char *acpi_sleep_dsm_state_to_str(unsigned int state)
{
if (lps0_dsm_func_mask_microsoft || !acpi_s2idle_vendor_amd()) {
switch (state) {
- case ACPI_LPS0_SCREEN_OFF:
+ case ACPI_LPS0_DISPLAY_OFF:
return "screen off";
- case ACPI_LPS0_SCREEN_ON:
+ case ACPI_LPS0_DISPLAY_ON:
return "screen on";
case ACPI_LPS0_ENTRY:
return "lps0 entry";
@@ -376,9 +377,9 @@ static const char *acpi_sleep_dsm_state_to_str(unsigned int state)
}
} else {
switch (state) {
- case ACPI_LPS0_SCREEN_ON_AMD:
+ case ACPI_LPS0_DISPLAY_ON_AMD:
return "screen on";
- case ACPI_LPS0_SCREEN_OFF_AMD:
+ case ACPI_LPS0_DISPLAY_OFF_AMD:
return "screen off";
case ACPI_LPS0_ENTRY_AMD:
return "lps0 entry";
@@ -539,27 +540,69 @@ static struct acpi_scan_handler lps0_handler = {
.attach = lps0_device_attach,
};
-int acpi_s2idle_prepare_late(void)
+static int acpi_s2idle_display_off(void)
{
- struct acpi_s2idle_dev_ops *handler;
-
if (!lps0_device_handle || sleep_no_lps0)
return 0;
- if (pm_debug_messages_on)
- lpi_check_constraints();
+ if (unlikely(WARN_ON(lsp0_dsm_in_display_off)))
+ return -EINVAL;
+
+ lsp0_dsm_in_display_off = true;
+ acpi_scan_lock_acquire();
- /* Screen off */
+ /* Display off */
if (lps0_dsm_func_mask > 0)
acpi_sleep_run_lps0_dsm(acpi_s2idle_vendor_amd() ?
- ACPI_LPS0_SCREEN_OFF_AMD :
- ACPI_LPS0_SCREEN_OFF,
+ ACPI_LPS0_DISPLAY_OFF_AMD :
+ ACPI_LPS0_DISPLAY_OFF,
lps0_dsm_func_mask, lps0_dsm_guid);
if (lps0_dsm_func_mask_microsoft > 0)
- acpi_sleep_run_lps0_dsm(ACPI_LPS0_SCREEN_OFF,
+ acpi_sleep_run_lps0_dsm(ACPI_LPS0_DISPLAY_OFF,
lps0_dsm_func_mask_microsoft, lps0_dsm_guid_microsoft);
+ acpi_scan_lock_release();
+
+ return 0;
+}
+
+static int acpi_s2idle_display_on(void)
+{
+ if (!lps0_device_handle || sleep_no_lps0)
+ return 0;
+
+ if (unlikely(WARN_ON(!lsp0_dsm_in_display_off)))
+ return -EINVAL;
+
+ lsp0_dsm_in_display_off = false;
+ acpi_scan_lock_acquire();
+
+ /* Display on */
+ if (lps0_dsm_func_mask_microsoft > 0)
+ acpi_sleep_run_lps0_dsm(ACPI_LPS0_DISPLAY_ON,
+ lps0_dsm_func_mask_microsoft, lps0_dsm_guid_microsoft);
+ if (lps0_dsm_func_mask > 0)
+ acpi_sleep_run_lps0_dsm(acpi_s2idle_vendor_amd() ?
+ ACPI_LPS0_DISPLAY_ON_AMD :
+ ACPI_LPS0_DISPLAY_ON,
+ lps0_dsm_func_mask, lps0_dsm_guid);
+
+ acpi_scan_lock_release();
+
+ return 0;
+}
+
+int acpi_s2idle_prepare_late(void)
+{
+ struct acpi_s2idle_dev_ops *handler;
+
+ if (!lps0_device_handle || sleep_no_lps0)
+ return 0;
+
+ if (pm_debug_messages_on)
+ lpi_check_constraints();
+
/* LPS0 entry */
if (lps0_dsm_func_mask > 0 && acpi_s2idle_vendor_amd())
acpi_sleep_run_lps0_dsm(ACPI_LPS0_ENTRY_AMD,
@@ -623,19 +666,10 @@ void acpi_s2idle_restore_early(void)
acpi_sleep_run_lps0_dsm(ACPI_LPS0_MS_EXIT,
lps0_dsm_func_mask_microsoft, lps0_dsm_guid_microsoft);
}
-
- /* Screen on */
- if (lps0_dsm_func_mask_microsoft > 0)
- acpi_sleep_run_lps0_dsm(ACPI_LPS0_SCREEN_ON,
- lps0_dsm_func_mask_microsoft, lps0_dsm_guid_microsoft);
- if (lps0_dsm_func_mask > 0)
- acpi_sleep_run_lps0_dsm(acpi_s2idle_vendor_amd() ?
- ACPI_LPS0_SCREEN_ON_AMD :
- ACPI_LPS0_SCREEN_ON,
- lps0_dsm_func_mask, lps0_dsm_guid);
}
static const struct platform_s2idle_ops acpi_s2idle_ops_lps0 = {
+ .display_off = acpi_s2idle_display_off,
.begin = acpi_s2idle_begin,
.prepare = acpi_s2idle_prepare,
.prepare_late = acpi_s2idle_prepare_late,
@@ -644,6 +678,7 @@ static const struct platform_s2idle_ops acpi_s2idle_ops_lps0 = {
.restore_early = acpi_s2idle_restore_early,
.restore = acpi_s2idle_restore,
.end = acpi_s2idle_end,
+ .display_on = acpi_s2idle_display_on,
};
void __init acpi_s2idle_setup(void)
--
2.46.1
^ permalink raw reply related [flat|nested] 38+ messages in thread* [PATCH v2 5/5] platform/x86: asus-wmi: remove Ally (1st gen) and Ally X suspend quirk
2024-09-22 17:22 [PATCH v2 0/5] acpi/x86: s2idle: move Display off/on calls outside suspend (fixes ROG Ally suspend) Antheas Kapenekakis
` (3 preceding siblings ...)
2024-09-22 17:22 ` [PATCH v2 4/5] acpi/x86: s2idle: call Display On/Off as part of callbacks and rename Antheas Kapenekakis
@ 2024-09-22 17:22 ` Antheas Kapenekakis
2024-09-22 18:15 ` [PATCH v2 0/5] acpi/x86: s2idle: move Display off/on calls outside suspend (fixes ROG Ally suspend) Derek J. Clark
2024-10-05 14:21 ` Hans de Goede
6 siblings, 0 replies; 38+ messages in thread
From: Antheas Kapenekakis @ 2024-09-22 17:22 UTC (permalink / raw)
To: linux-pm
Cc: platform-driver-x86, Mario Limonciello, luke, me, Denis Benato,
Antheas Kapenekakis
By moving the Display On/Off calls outside of the suspend sequence and
introducing a slight delay after Display Off, the ROG Ally controller
functions exactly as it does in Windows.
Therefore, remove the quirk that fixed the controller only when the
mcu_powersave attribute was disabled, while adding a large amount of
delay to the suspend and wake process.
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Antheas Kapenekakis <lkml@antheas.dev>
---
drivers/platform/x86/asus-wmi.c | 54 ---------------------------------
1 file changed, 54 deletions(-)
diff --git a/drivers/platform/x86/asus-wmi.c b/drivers/platform/x86/asus-wmi.c
index 37636e5a38e3..2c9656e0afda 100644
--- a/drivers/platform/x86/asus-wmi.c
+++ b/drivers/platform/x86/asus-wmi.c
@@ -137,29 +137,10 @@ module_param(fnlock_default, bool, 0444);
#define ASUS_MINI_LED_2024_STRONG 0x01
#define ASUS_MINI_LED_2024_OFF 0x02
-/* Controls the power state of the USB0 hub on ROG Ally which input is on */
-#define ASUS_USB0_PWR_EC0_CSEE "\\_SB.PCI0.SBRG.EC0.CSEE"
-/* 300ms so far seems to produce a reliable result on AC and battery */
-#define ASUS_USB0_PWR_EC0_CSEE_WAIT 1500
-
static const char * const ashs_ids[] = { "ATK4001", "ATK4002", NULL };
static int throttle_thermal_policy_write(struct asus_wmi *);
-static const struct dmi_system_id asus_ally_mcu_quirk[] = {
- {
- .matches = {
- DMI_MATCH(DMI_BOARD_NAME, "RC71L"),
- },
- },
- {
- .matches = {
- DMI_MATCH(DMI_BOARD_NAME, "RC72L"),
- },
- },
- { },
-};
-
static bool ashs_present(void)
{
int i = 0;
@@ -269,9 +250,6 @@ struct asus_wmi {
u32 tablet_switch_dev_id;
bool tablet_switch_inverted;
- /* The ROG Ally device requires the MCU USB device be disconnected before suspend */
- bool ally_mcu_usb_switch;
-
enum fan_type fan_type;
enum fan_type gpu_fan_type;
enum fan_type mid_fan_type;
@@ -4698,8 +4676,6 @@ static int asus_wmi_add(struct platform_device *pdev)
asus->egpu_enable_available = asus_wmi_dev_is_present(asus, ASUS_WMI_DEVID_EGPU);
asus->dgpu_disable_available = asus_wmi_dev_is_present(asus, ASUS_WMI_DEVID_DGPU);
asus->kbd_rgb_state_available = asus_wmi_dev_is_present(asus, ASUS_WMI_DEVID_TUF_RGB_STATE);
- asus->ally_mcu_usb_switch = acpi_has_method(NULL, ASUS_USB0_PWR_EC0_CSEE)
- && dmi_check_system(asus_ally_mcu_quirk);
if (asus_wmi_dev_is_present(asus, ASUS_WMI_DEVID_MINI_LED_MODE))
asus->mini_led_dev_id = ASUS_WMI_DEVID_MINI_LED_MODE;
@@ -4892,34 +4868,6 @@ static int asus_hotk_resume(struct device *device)
return 0;
}
-static int asus_hotk_resume_early(struct device *device)
-{
- struct asus_wmi *asus = dev_get_drvdata(device);
-
- if (asus->ally_mcu_usb_switch) {
- /* sleep required to prevent USB0 being yanked then reappearing rapidly */
- if (ACPI_FAILURE(acpi_execute_simple_method(NULL, ASUS_USB0_PWR_EC0_CSEE, 0xB8)))
- dev_err(device, "ROG Ally MCU failed to connect USB dev\n");
- else
- msleep(ASUS_USB0_PWR_EC0_CSEE_WAIT);
- }
- return 0;
-}
-
-static int asus_hotk_prepare(struct device *device)
-{
- struct asus_wmi *asus = dev_get_drvdata(device);
-
- if (asus->ally_mcu_usb_switch) {
- /* sleep required to ensure USB0 is disabled before sleep continues */
- if (ACPI_FAILURE(acpi_execute_simple_method(NULL, ASUS_USB0_PWR_EC0_CSEE, 0xB7)))
- dev_err(device, "ROG Ally MCU failed to disconnect USB dev\n");
- else
- msleep(ASUS_USB0_PWR_EC0_CSEE_WAIT);
- }
- return 0;
-}
-
static int asus_hotk_restore(struct device *device)
{
struct asus_wmi *asus = dev_get_drvdata(device);
@@ -4964,8 +4912,6 @@ static const struct dev_pm_ops asus_pm_ops = {
.thaw = asus_hotk_thaw,
.restore = asus_hotk_restore,
.resume = asus_hotk_resume,
- .resume_early = asus_hotk_resume_early,
- .prepare = asus_hotk_prepare,
};
/* Registration ***************************************************************/
--
2.46.1
^ permalink raw reply related [flat|nested] 38+ messages in thread* Re: [PATCH v2 0/5] acpi/x86: s2idle: move Display off/on calls outside suspend (fixes ROG Ally suspend)
2024-09-22 17:22 [PATCH v2 0/5] acpi/x86: s2idle: move Display off/on calls outside suspend (fixes ROG Ally suspend) Antheas Kapenekakis
` (4 preceding siblings ...)
2024-09-22 17:22 ` [PATCH v2 5/5] platform/x86: asus-wmi: remove Ally (1st gen) and Ally X suspend quirk Antheas Kapenekakis
@ 2024-09-22 18:15 ` Derek J. Clark
2024-09-22 19:40 ` Antheas Kapenekakis
2024-10-05 14:21 ` Hans de Goede
6 siblings, 1 reply; 38+ messages in thread
From: Derek J. Clark @ 2024-09-22 18:15 UTC (permalink / raw)
To: Antheas Kapenekakis
Cc: Mario Limonciello, Luke Jones, me, Denis Benato, linux-pm,
Derek J . Clark
Hello Antheas.
> The following series moves the Display off/on calls outside of the suspend
> sequence, as they are performed in Windows. This fixes certain issues that appear
> in devices that use the calls and expect the kernel to be active during their
> call (especially in the case of the ROG Ally devices) and opens the possibility
> of implementing a "Screen Off" state in the future (which mirrors Windows).
> In addition, it adds a quirk table that will allow for adding delays between
> Modern Standby transitions, to help resolve racing conditions.
>
> This series requires a bit of background on how modern standby works in Windows.
> Fundamentally, it is composed of four states: "Active", "Screen Off", "Sleep",
> and "DRIPS". Here, I take the liberty of naming the state "Active", as it is
> implied in Windows documentation.
>
> When the user actively interacts with a device, it is in the "Active" state.
> The screen is on, all devices are connected, and desired software is running.
> The other 3 stages play a role once the user stops interacting with the device.
> This is triggered in two main ways: either by pressing the power button or by
> inactivity. Once either of those targets is met, the system enters Modern Standby.
>
> Modern Standby consists of an orchestration of the "Screen Off", "Sleep", and
> "DRIPS" states. Windows is free to move throughout these states until the user
> interacts with the device again, where the device will transition to being
> "Active". Moving between the states implies a transition, where Windows performs
> a set of actions. In addition, Windows can only move between adjacent states
> as follows:
>
> "Active" <-> "Screen Off" <-> "Sleep" <-> "DRIPS"
>
> "Screen Off" is the state where all active displays in the device (whether
> *virtual* or real; this means unrelated to DRM) are off. The user might still
> be interacting with the device or running programs (e.g., compiling a kernel).
>
> "Sleep" is a newer state, in which the device turns off its fan and pulses its
> power button, but still supports running software activities. As part of this,
> and to avoid overheating the device a lot of manufacturers lower the TDP (PLx)
> of the device [3; _DSM 9 description].
>
> Finally, DRIPS stands for Deepest Runtime Idle Power State, i.e. suspend.
>
> While Windows may transition from any state to any state, doing so implies
> performing all transitions to reach that state. All states other than DRIPS
> have a fully active kernel (Wifi, USB at least) and allow userspace activity.
> What changes is the extent of the activity, and whether some power consuming
> devices are turned off (this is done with Modern Standby aware drivers and
> firmware notifications). The Windows kernel suspends during the transition from
> the "Sleep" state to the "DRIPS" state. In all other states it is active.
>
> After finishing each transition, the kernel performs a firmware notification,
> described in the _DSM document [3]. Moving from left to right with function num.,
> we have Display Off (3; Active -> Screen Off), Sleep Entry (7; Screen Off -> Sleep),
> and Lowest Power State Entry Notification (5; LPSEN; Sleep -> DRIPS). Then, from
> right to left, we have Lowest Power State Exit Notification (6; DRIPS -> Sleep),
> Sleep Exit (8; Sleep -> Screen) and Display On (4; Screen Off -> Active).
>
> The Linux kernel is not currently Modern Standby aware but will still make these
> calls. Currently, the problem is that the kernel calls all of the firmware
> notifications at the point LPSEN (5, 6) should be called, which is when the
> kernel is mostly suspended. This is a clear deviation from Windows, and causes
> undesirable behavior in certain devices, the main one focused in this patch
> series being the ROG Ally. Although series patch is aimed at Modern Standby
> devices in general.
>
> The ROG Ally is a Modern Standby capable device (uses Secure Core too; really
> ticks all the MS boxes) and in it, there are issues with both Display 3,4
> calls and Sleep 7,8 calls cause issues (7,8 are suspected and todo).
>
> The Display 3,4 calls are responsible for the controller. The Display Off call
> disconnects (if powersave is off) or powers off (if powersave is on and on DC
> power) the MCU(s) responsible for the controller and deactivates the RGB of the
> device. Display On powers on or reconnects the controller respectively.
> This controller, in the Ally X, is composed of 6 HID devices that register to
> the kernel. From testing, it seems that the majority of the problem in the Ally
> comes from Display Off being called way too late timewise, and Display
>
> The Sleep 7,8 calls, in general, are responsible for setting a low power state
> that is safe to use while the device is sleeping, making the suspend light
> pulse, and turning off the fan. Due to a variety of race conditions, there is
> a rare occasion where the Ally EC can get stuck in its Sleep mode, where the
> TDP is 5W, and prevent increasing it until the device reboots. The sleep entries
> contain actions in the Ally, so there is a suspicion that calling them during
> DRIPS is causing issues. However, this is not the subject of this patch and
> has not been verified yet.
>
> This patch centers around moving the Display 3,4 calls outside the suspend
> sequence (which is the transition from Sleep to DRIPS in Modern Standby terms),
> and by implementing the proper locks necessary, opening up the possibility of
> making these calls as part of a more elaborate "Modern Standby"-like userspace
> suspend/wakelock implementation. As of this patch, they are only called before
> the suspend sequence, including with the possibility of adding a delay.
>
> This makes the intent of this patch primarily compatibility focused, as it aims
> to fix issues by the current implementation. And to that end it works.
> After moving the calls outside of the suspend sequence, my ROG Ally X test unit
> can suspend more than 50 times without rebooting, both with powersave on or off,
> regardless of whether it is plugged/unplugged during suspend, and still have the
> controller work with full reliability. In V1, there was an unsolved race condition
> that was dealt by (5) before Display Off triggers. Essentially, Linux suspends
> too fast for the current version of the firmware to deal with. After adding a
> quirk table, which delays suspend after the Display Off call, the controller
> of the original Ally should power off properly (a lot of testing will be done).
>
> Moving the calls outside of the suspend sequence (and the validation work it
> implies) is an important first step in including "Modern Standby"-like
> features in Linux. For example, consider an endpoint /sys/power/standby, that
> allows for entering "active", "inactive" (for Screen Off; since the name causes
> too much confusion), "sleep" values. Those values will then in turn call the
> respective firmware notifications (and driver callbacks in the very future)
> for all transitions required to reach the entered state. Here, the value
> "suspend" (for DRIPS; another confusing name as it can refer to drivers) is
> missing, as userspace will never be able to see it. The kernel should support
> suspending at all standby states, orchestrating the required transitions to
> reach suspend/DRIPS and after suspend returning to the last state.
>
> Therefore, if userspace is not standby aware, the kernel will work the same way
> it works today. In addition, depending on hardware generation, certain power
> states might not be supported. It is important to inform userspace of this, as
> if the hardware does not support sleep, and userspace holds a wakelock for sleep,
> it will just overheat and drain the device battery.
>
> This series is worth backing this up with sources, so as part of it I reference
> Microsoft's documentation on Modern standby [1-3] that explains the whole
> process, including a document by Dell [7] and how to prepare for them and attach a
> repository of more than 15 device DSDT tables [4] from different manufacturers.
> This repository also contains instructions on how to decode the DSDT tables on
> a test laptop, to figure out what the _DSM calls will do on that device.
>
> Moreover, I conduct a short behavioral test in Windows with the Ally X to showcase
> the documentation empirically. The Ally is great for such a test, as it contains
> visual indicators for all Microsoft suspend points: "Display Off/On" calls are
> indicated with the Controller RGB turning off/on, "Screen Off" is indicated with
> the suspend light and fan being on, and Sleep is indicated with the suspend
> light blinking.
>
> Unfortunately, as part of this testing, I never found how to see if the device
> is actually suspended. As the ROG Ally X NOOPs on firmware notifications 5,6,
> and even though I disabled a Mouse from waking up a device, it still would wake
> up my Ally X dev unit.
>
> Referencing Microsoft's documentation, "Screen Off" is entered either through
> inactivity or by pressing the power button, so I conduct two tests: one by pressing
> the powerbutton, and one for entering Screen Off due to inactivity.
>
> 1) Powerbutton test:
> When pressing the powerbutton, the screen of the Ally turns off, and the RGB of
> the controller faints to off within 1s. Following, depending on whether the
> system is plugged in, the power light and fan stay on for 5 seconds to 10 minutes.
> After this point, the power light begins to blink and the fan turns off, showing
> that the system has entered the "Sleep" state.
>
> 2) Inactivity test:
> I set the Windows power settings to turn off the screen after 1 minute and wait.
> After one minute, the display turns off, and after 5 seconds, the controller RGB
> turns off. This indicates to me that "Screen Off" is not defined by the screen
> being off, but is rather characterized by it. During those 5 seconds while the
> RGB is on, I can use the controller to wake up the device. Afterwards it cannot.
>
> Those tests validate Microsoft's documentation and show that "Screen Off"
> seems to more closely correlate to lockscreen behavior (button locks instantly,
> inactivity after 5 seconds) than the screen being off. One other behavior I
> notice is that, as I look at my Ally X dev right now, with its screen off, I
> notice the RGB is still on, which is kind of bothersome, as in Windows the
> device would turn the RGB off. Whether as a side effect or planned, it is still
> a nice touch.
>
> This patch series is developed with help from Mario Limonciello, and, to be
> bisection friendly, is structured based on a patch series he made connecting the
> callbacks to the drm subsystem suspend [5]. It also references (already)
> upstream work by Luke Jones on Asus-wmi for the Ally controller quirk that is
> removed on patch (5) and an issue on amd-drm in 2023 in preparation for the
> work in that quirk [6]. Since patch (3) now uses part of the dmi table removed
> in patch (5) and adds a (small) delay, @Luke I can add you as Suggested-by.
>
> We will begin testing on the patch series, and there will probably be a V3,
> where testing acknowledgements are added. V2 patch adds a delay to display_off
> (500ms), where 300-1300ms were tried, and there was no behavioral difference on
> the Ally X. However, that is arbitrary so it warrants a lot of testing.
> Current status is that my Ally X unit works perfectly other than a little quirk:
> with powersave on, if asus_hid or a userspace program talks to it within
> 2 seconds, it causes the RGB to softfade to off and then on. This is a cosmetic
> issue that can be dealt with by userspace (waiting 2s) or a firmware update
> or both. Windows did not seem to fare much better either in that regard, with
> RGB turning on and off randomly. Original Ally still needs to be verified.
>
> I am personally going to take a bit of a breather on this patch, test it, and
> revisit it next week. I send it today so I get comments on the revision.
I will get into this a little below, but this is not a full and accurate
accounting of what the ROG Ally devices are doing. Before that, I want to
summarize the premise of your proposal. A somewhat oversimplified version might
read as: two systems have an issue where some of the hardware is resumed in a
bad state, and you intend to resolve that issue by reorganizing the idle
process. The reason you want to do this is because you have an intuitive idea
of what those two devices are doing based on reading ACPI tables and guesswork.
Furthermore, this proposal will have additional knock-on effects for millions
of devices that aren't affected by the core issue you are trying to solve, and
you have no way to test all of them. Finally, the solution provided introduces
new issues (as reported by Denis and not yet resolved) for the hardware it is
meant to fix. And you want to do this in spite of knowing the manufacturer is
providing a firmware update that will resolve this issue without any changes
needed in the kernel.
I can confidently state that you don't fully understand the technical issue at
hand as you are missing information on how and why the MCU functions as it
does, so I will add a small amount of background containing information I am
permitted to share. While developing the Ally in Windows, ASUS noticed an issue
with spurious wakeups. To correct this, they added a flag as a workaround that
reports state information internally so they can time certain events during the
synchronous suspend and resume that Windows does to ensure the MCU is asleep or
awake when it needs to be. As Linux has an asynchronous suspend and resume,
this flag causes the extremely timing sensitive quirk to trigger at the wrong
time quite often. There is a group of us who have spent collectively thousands
of hours troubleshooting this issue, so I can confidently say your proposed
DMI quirk suffers from the same issue that all previous attempts to fix it in
kernel do; there is too much variability in how long each piece of this puzzle
takes to reach its full sleep state. Every possible kernel fix is trying to
manage the race condition in the firmware. This inevitably leads to situations
where the Windows quirk triggers a bad state because it doesn't know it is
already resuming, or it wakes early thinking it should have already resumed.
The quirk in 3/5 is masking this issue for your setup, but that delay will be
too long for some configurations and too short for others. Ironically, in an
attempt to fix the issues Denis found, 3/5 moves the delay you're removing in
5/5 outside the device driver. While I'm not allowed to divulge all the
technical details here, I can state your deeper dive into DRIPS is still
missing some key information. My point being this approach appears, at least
to me, to be a major paradigm shift in how suspend is handled so that a
bandaid can be applied that solves a problem in *some* cases while introducing
more problems in others. I do not think that is a valid approach.
The 50 suspend/resume cycles you tested are only valid for your system, with
your power profile settings, running your OS, using your kernel config,
running the same userspace software you have. I know this because we also
reached the same conclusion as you with dozens of patch iterations, many doing
effectively the same thing as this series, only to have 1 device fail on a
different OS or with a different kernel config. The details of this have been
known by ASUS and us for almost a month now and we have been working with them
directly to find a proper hardware level fix for it. They have provided two
testing MCU firmware for us and I can confirm that they resolve the issue in
every kernel and situation we previously tested on both devices. I would like
to gently remind you that manufacturers intent should predominate in these
situations, especially when their intent is to avoid making drastic changes to
the kernel to support a small number of devices. We are also working on a
fwupd module for the MCU so that it can be updated from within Linux directly.
In the meantime, until it is available, dual booting and Win2Go are valid
options for getting the MCU update while that interface is not completed. Luke
will be providing a patch series soon to address backwards compatibility for
the Ally, details to follow.
As I have already upgraded both of my units testing the manufacturer approved
fix, I cannot provide you with validation testing of the patches here. My
devices simply work with any kernel. Having said that, I am intrigued by the
proposal to add feature parity of Windows modern sleep into Linux. That topic
deserves a request for comment thread in its own right that brings in all the
major subsystem stakeholders and primary userspace components (systemd) so that
a conversation can be had on how best to approach the topic while maintaining
backwards compatibility, and not requiring the extreme level of validation this
patch would require. In that same vein I recommend you drop the unnecessary
adjustments to asus-wmi in 5/5 and the quirks in 3/5, and instead defer to the
manufacturer intentions. If adding general modern sleep support is your
intention then it would be best to focus on that issue.
Thank you for your time,
Derek
^ permalink raw reply [flat|nested] 38+ messages in thread* Re: [PATCH v2 0/5] acpi/x86: s2idle: move Display off/on calls outside suspend (fixes ROG Ally suspend)
2024-09-22 18:15 ` [PATCH v2 0/5] acpi/x86: s2idle: move Display off/on calls outside suspend (fixes ROG Ally suspend) Derek J. Clark
@ 2024-09-22 19:40 ` Antheas Kapenekakis
2024-09-23 1:35 ` Derek J. Clark
0 siblings, 1 reply; 38+ messages in thread
From: Antheas Kapenekakis @ 2024-09-22 19:40 UTC (permalink / raw)
To: Derek J. Clark; +Cc: Mario Limonciello, Luke Jones, me, Denis Benato, linux-pm
Hi Derek,
It is ok you have upgraded both of your units. We have plenty of
contributors and users that are not on the experimental firmware you
are on and will provide ample testing for probably the next half year.
As the update will need 1-2 months of validation and it is very
difficult and painful to update firmware from linux.
As I am tired, allow me to quickly jot down some thoughts so I can
enjoy my evening.
I only got my Ally X unit 5 days ago, so I cannot say I spent
thousands of hours collectively but right now, I am pretty certain of
the issue that was happening in the Ally units.
It is the combination of three cascading issues:
1) Display On/Off calls are not part of the suspend sequence in
neither WIndows or Linux currently. In Windows, they are called before
suspend, and in Linux after. This is a large deviation that completely
breaks the controller of the Ally. The previous solution was to insert
a second call to those functions in the middle of the Linux suspend
sequence, and then collectively spend months fighting with random
racing conditions. Patches 1, 2, 4 refactor s2idle to make sure that
never happens again, for any device, including the Ally.
2) The Ally Controller has a choreographed sequence with which it
fades its RGB during suspend. This happens during the Display Off
transition of Screen Off in Windows. In all of my testing in windows,
Screen Off lasts AT LEAST 10 seconds, if not more. I had to stand
around looking at that power light to turn off more than once. If
Linux cuts off its power supply before that, it gets confused and
restarts after suspend. If that restart happens during the resume
sequence, see (1,3). Patch 3 fixes this. This flourish is an important
part of user experience, so adding a delay here is required, even if
firmware updates.
3) Finally, the Ally Controller, when it boots up, is sensitive to
commands for 1-2 seconds, which will cause it to restart. Even with
this patch series, this remains an issue on my device with powersave
on, while the device initializes. asus_hid (or hhd) will instantly
send a command to the device on connection, which causes this issue
and then combines with 1 + 2. My patch series does not fix this.
1 and 2 will always be issues for the Ally, regardless of firmware
updates and probably for other devices too. N3 I will solve through
userspace + with distribution help, and it is not something that will
take that much time. Patch N5 adds too much delay unfortunately,
especially after resume. I would like to see it go, at least for my
users.
Your solution of making kernel changes for newer firmware + custom
firmware + kernel changes for newer firmware with quirks somehow seems
more convoluted to me than just cleaning up a bit of the s2idle
subsystem to benefit all devices, with a little, firmware agnostic
delay, for some flourish.
Antheas
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v2 0/5] acpi/x86: s2idle: move Display off/on calls outside suspend (fixes ROG Ally suspend)
2024-09-22 19:40 ` Antheas Kapenekakis
@ 2024-09-23 1:35 ` Derek J. Clark
2024-09-23 5:57 ` Antheas Kapenekakis
0 siblings, 1 reply; 38+ messages in thread
From: Derek J. Clark @ 2024-09-23 1:35 UTC (permalink / raw)
To: Antheas Kapenekakis
Cc: Mario Limonciello, Luke Jones, me, Denis Benato, linux-pm,
Derek J . Clark
> Hi Derek,
Hello
> It is ok you have upgraded both of your units. We have plenty of
> contributors and users that are not on the experimental firmware you
> are on and will provide ample testing for probably the next half year.
> As the update will need 1-2 months of validation and it is very
> difficult and painful to update firmware from linux.
> As I am tired, allow me to quickly jot down some thoughts so I can
> enjoy my evening.
> I only got my Ally X unit 5 days ago, so I cannot say I spent
> thousands of hours collectively but right now, I am pretty certain of
> the issue that was happening in the Ally units.
> It is the combination of three cascading issues:
> 1) Display On/Off calls are not part of the suspend sequence in
> neither WIndows or Linux currently. In Windows, they are called before
> suspend, and in Linux after. This is a large deviation that completely
> breaks the controller of the Ally. The previous solution was to insert
> a second call to those functions in the middle of the Linux suspend
> sequence, and then collectively spend months fighting with random
> racing conditions. Patches 1, 2, 4 refactor s2idle to make sure that
> never happens again, for any device, including the Ally.
> 2) The Ally Controller has a choreographed sequence with which it
> fades its RGB during suspend. This happens during the Display Off
> transition of Screen Off in Windows. In all of my testing in windows,
> Screen Off lasts AT LEAST 10 seconds, if not more. I had to stand
> around looking at that power light to turn off more than once. If
> Linux cuts off its power supply before that, it gets confused and
> restarts after suspend. If that restart happens during the resume
> sequence, see (1,3). Patch 3 fixes this. This flourish is an important
> part of user experience, so adding a delay here is required, even if
> firmware updates.
> 3) Finally, the Ally Controller, when it boots up, is sensitive to
> commands for 1-2 seconds, which will cause it to restart. Even with
> this patch series, this remains an issue on my device with powersave
> on, while the device initializes. asus_hid (or hhd) will instantly
> send a command to the device on connection, which causes this issue
> and then combines with 1 + 2. My patch series does not fix this.
> 1 and 2 will always be issues for the Ally, regardless of firmware
> updates and probably for other devices too. N3 I will solve through
> userspace + with distribution help, and it is not something that will
> take that much time. Patch N5 adds too much delay unfortunately,
> especially after resume. I would like to see it go, at least for my
> users.
I'm going to be somewhat brief here as I don't like repeating myself, you are
working from incomplete information and from that you are inferring incorrect
assertions. Due to NDA the full slate of information that would clarify this
cannot be released here, but I will be clear: I am not sharing my opinion, I
am stating facts. What you have described here is a missinterpretation of the
symptoms and is not correct. The _DSM call is not relevant to the proper fix,
the sequencing you observe is not applicable to Linux, and the sensitivity of
the controller is another symptom of the Windows quirk behaving badly in Linux.
Furthermore, the RGB "flourish" as you call it works as intended with the new
firmware and no kernel changes required.
What I can provide is information on a test we did that should hopefully
elucidate the issue more clearly for you. We included a patch that allowed us
to alter the delay in asus-wmi on the fly by writing to an attribute in sysfs.
In addition, we pushed the _DSM calls as early as possible in the suspend
sequence. We were unable to find a timing for this that would work consistently
on different configurations. The same issue exists in your patch set and the
testing bears this out with Denis still getting spurious wakes when using it.
The problem with your approach is that you aren't listening to us despite our
much broader understanding of the issue at hand. If this worked we would have
submitted it ourselves nearly three weeks ago.
> Your solution of making kernel changes for newer firmware + custom
> firmware + kernel changes for newer firmware with quirks somehow seems
> more convoluted to me than just cleaning up a bit of the s2idle
> subsystem to benefit all devices, with a little, firmware agnostic
> delay, for some flourish.
> Antheas
Our solution doesn't require any kernel changes for newer firmware, as I
already stated the solution from ASUS fixes the root cause of the problem. Your
attempt at a solution attempts to outrace the symptoms of it. Please don't
mischaracterize my statements.
Derek
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v2 0/5] acpi/x86: s2idle: move Display off/on calls outside suspend (fixes ROG Ally suspend)
2024-09-23 1:35 ` Derek J. Clark
@ 2024-09-23 5:57 ` Antheas Kapenekakis
2024-09-23 11:09 ` Luke Jones
0 siblings, 1 reply; 38+ messages in thread
From: Antheas Kapenekakis @ 2024-09-23 5:57 UTC (permalink / raw)
To: Derek J. Clark; +Cc: Mario Limonciello, Luke Jones, me, Denis Benato, linux-pm
Hi Derek,
> I'm going to be somewhat brief here as I don't like repeating myself, you are
> working from incomplete information and from that you are inferring incorrect
> assertions. Due to NDA the full slate of information that would clarify this
> cannot be released here, but I will be clear: I am not sharing my opinion, I
> am stating facts. What you have described here is a missinterpretation of the
> symptoms and is not correct. The _DSM call is not relevant to the proper fix,
> the sequencing you observe is not applicable to Linux, and the sensitivity of
> the controller is another symptom of the Windows quirk behaving badly in Linux.
> Furthermore, the RGB "flourish" as you call it works as intended with the new
> firmware and no kernel changes required.
Only thing I am seeing here is a bit of heresay. Please get in touch
with Asus, cross your t's and dot your i's, and share what you found
with the rest of the class.
I am in no rush to see this merge, and as all gaming distributions
carry custom kernels that update on a weekly cadence and 90% of users
are on those kernels with the rest being able to figure it out, there
is no practical reason for this to merge quickly. Ubuntu (+variant)
users will get this fix after they've thrown their ally away anyway.
> What I can provide is information on a test we did that should hopefully
> elucidate the issue more clearly for you. We included a patch that allowed us
> to alter the delay in asus-wmi on the fly by writing to an attribute in sysfs.
> In addition, we pushed the _DSM calls as early as possible in the suspend
> sequence. We were unable to find a timing for this that would work consistently
> on different configurations. The same issue exists in your patch set and the
> testing bears this out with Denis still getting spurious wakes when using it.
> The problem with your approach is that you aren't listening to us despite our
> much broader understanding of the issue at hand. If this worked we would have
> submitted it ourselves nearly three weeks ago.
Here you assume I did not do the same. Yes, I did. The asus-wmi quirk
is subject to racing conditions that mean it will never work correctly
(well; without newer firmware maybe perhaps). This series is not, as
the calls happen before suspend starts.
Yes, my V1 of the patch did not include a delay at all. Since the
original Ally is XInput, one of the MCUs was left on, which caused
what Denis experienced. It also triggered a restart of the MCU on the
Ally X after resume. V2 fixes this restart on the Ally X and makes it
behave completely properly.
In fact, after being included in bazzite-testing yesterday, my Ally X
unit went on a deep slumber tonight with powersave on, and then woke
up today beautifully, having only lost 1% of battery.
It is not clear if the issue still exists on the original Ally, I
could not get a clear signal from Denis but do not worry, one of our
contributors is on it.
> *Our solution doesn't require any kernel changes for newer firmware*, as I
> already stated the solution from ASUS fixes the root cause of the problem. Your
> attempt at a solution attempts to outrace the symptoms of it. Please don't
> mischaracterize my statements.
I would hope that is not true and your solution completely removes the
quirk from asus-wmi for newer firmware, as it prolongs the suspend and
resume sequences noticeably. However, given my and Denis' limited
testing, I doubt the controller is able to suspend quickly enough even
with new firmware (you can see the RGB cut before it's able to fade).
In any case, this patch series is not expected to conflict with Asus'
newer firmware and is in fact complementary with it. Both fix the same
issue in a different way, which means our users will have a great
experience, squared. This series also greatly improves resume behavior
on the Ally, which, let me tell you, is blazing fast now.
Antheas
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v2 0/5] acpi/x86: s2idle: move Display off/on calls outside suspend (fixes ROG Ally suspend)
2024-09-23 5:57 ` Antheas Kapenekakis
@ 2024-09-23 11:09 ` Luke Jones
2024-09-23 13:11 ` Antheas Kapenekakis
0 siblings, 1 reply; 38+ messages in thread
From: Luke Jones @ 2024-09-23 11:09 UTC (permalink / raw)
To: Antheas Kapenekakis, Derek J. Clark
Cc: Mario Limonciello, me, Denis Benato, linux-pm
On Mon, 23 Sep 2024, at 5:57 PM, Antheas Kapenekakis wrote:
> Hi Derek,
>
> > I'm going to be somewhat brief here as I don't like repeating myself, you are
> > working from incomplete information and from that you are inferring incorrect
> > assertions. Due to NDA the full slate of information that would clarify this
> > cannot be released here, but I will be clear: I am not sharing my opinion, I
> > am stating facts. What you have described here is a missinterpretation of the
> > symptoms and is not correct. The _DSM call is not relevant to the proper fix,
> > the sequencing you observe is not applicable to Linux, and the sensitivity of
> > the controller is another symptom of the Windows quirk behaving badly in Linux.
> > Furthermore, the RGB "flourish" as you call it works as intended with the new
> > firmware and no kernel changes required.
>
> Only thing I am seeing here is a bit of heresay. Please get in touch
What Derek is saying here is correct because it is based on data I have shared with him for testing to help ASUS engineers fix the root issue in firmware.
> with Asus, cross your t's and dot your i's, and share what you found
> with the rest of the class.
I have direct contact with engineers in ASUS and I am under NDA (extended to a few others) so certain information can not be shared. The proper fix for Linux is done in the MCU firmware - this is something ASUS engineers who work on the Ally devices have done with our aid in pinpointing the exact cause.
> I am in no rush to see this merge, and as all gaming distributions
> carry custom kernels that update on a weekly cadence and 90% of users
> are on those kernels with the rest being able to figure it out, there
> is no practical reason for this to merge quickly. Ubuntu (+variant)
> users will get this fix after they've thrown their ally away anyway.
Lucky for Ubuntu and variant users, they can simply update their firmware and have suspend/resume/reboot all work fine.
> > What I can provide is information on a test we did that should hopefully
> > elucidate the issue more clearly for you. We included a patch that allowed us
> > to alter the delay in asus-wmi on the fly by writing to an attribute in sysfs.
> > In addition, we pushed the _DSM calls as early as possible in the suspend
> > sequence. We were unable to find a timing for this that would work consistently
> > on different configurations. The same issue exists in your patch set and the
> > testing bears this out with Denis still getting spurious wakes when using it.
> > The problem with your approach is that you aren't listening to us despite our
> > much broader understanding of the issue at hand. If this worked we would have
> > submitted it ourselves nearly three weeks ago.
>
> Here you assume I did not do the same. Yes, I did. The asus-wmi quirk
> is subject to racing conditions that mean it will never work correctly
> (well; without newer firmware maybe perhaps). This series is not, as
> the calls happen before suspend starts.
>
> Yes, my V1 of the patch did not include a delay at all. Since the
> original Ally is XInput, one of the MCUs was left on, which caused
> what Denis experienced. It also triggered a restart of the MCU on the
> Ally X after resume. V2 fixes this restart on the Ally X and makes it
> behave completely properly.
With the fixed firmware no delay is required anywhere, no restructuring of suspend is required, no more trying to figure out an optimal ordering is required.
> In fact, after being included in bazzite-testing yesterday, my Ally X
> unit went on a deep slumber tonight with powersave on, and then woke
> up today beautifully, having only lost 1% of battery.
The Ally X is much less sensitive to things than the Ally 1 is and invariably when we thought we'd found a fix, someone with a different kernel config, distro, and compiler (maybe even with LTO) would change the timing of things just enough. We tried pretty much everything you seem to be trying.
> It is not clear if the issue still exists on the original Ally, I
> could not get a clear signal from Denis but do not worry, one of our
> contributors is on it.
Denis was very clear I thought, perhaps you misread? Thank you for your thorough testing Denis.
> > *Our solution doesn't require any kernel changes for newer firmware*, as I
> > already stated the solution from ASUS fixes the root cause of the problem. Your
> > attempt at a solution attempts to outrace the symptoms of it. Please don't
> > mischaracterize my statements.
>
> I would hope that is not true and your solution completely removes the
> quirk from asus-wmi for newer firmware, as it prolongs the suspend and
> resume sequences noticeably. However, given my and Denis' limited
Yes that excessive delay is awful. It was a misguided attempt to race or delay things much like you are doing right now.
With the release of the new fixed firmware for the MCU imminent, this will be removed and all users should be strongly encouraged to update their firmware to the fixed version from ASUS.
> testing, I doubt the controller is able to suspend quickly enough even
> with new firmware (you can see the RGB cut before it's able to fade).
It isn't about trying to suspend the thing "quickly enough". And I'm hoping that with your analysing of that statement it provides you some insight as to why it's a bad assumption to make and why things are always going to be at risk of breaking with async linux suspend when you get another device like the Ally. At best the issue may be masked only to have seemingly random fails that people can't reproduce easily.
> In any case, this patch series is not expected to conflict with Asus'
> newer firmware and is in fact complementary with it. Both fix the same
> issue in a different way, which means our users will have a great
> experience, squared. This series also greatly improves resume behavior
> on the Ally, which, let me tell you, is blazing fast now.
You are correct in that it won't conflict. That is because the firmware that ASUS engineers who worked with us to fix, fixes the root cause of the issue regardless of the USB state at suspend time - which fully removes the need to make changes to core suspend code, especially in regards to the Ally devices.
It is perhaps best if I refer to what Denis and Derek told you about modern standby: focus on background downloads and proceed in that direction - the firmware really does fix the suspend/resume/reboot issues. You can also drop the asus-wmi patch (in submissions) as that will get dealt with soon enough.
Sincerely,
Luke.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v2 0/5] acpi/x86: s2idle: move Display off/on calls outside suspend (fixes ROG Ally suspend)
2024-09-23 11:09 ` Luke Jones
@ 2024-09-23 13:11 ` Antheas Kapenekakis
2024-09-23 15:56 ` Mario Limonciello
0 siblings, 1 reply; 38+ messages in thread
From: Antheas Kapenekakis @ 2024-09-23 13:11 UTC (permalink / raw)
To: Luke Jones; +Cc: Derek J. Clark, Mario Limonciello, me, Denis Benato, linux-pm
Hi Luke,
> What Derek is saying here is correct because it is based on data I have shared with him for testing to help ASUS engineers fix the root issue in firmware.
Would be nice to back it up. Indeed, right now on the X i am facing an
issue that Windows also has (see below), so it is not like Asus do not
have their work cut out for them. I would have to test the new
firmware and get back to you.
> I have direct contact with engineers in ASUS and I am under NDA (extended to a few others) so certain information can not be shared. The proper fix for Linux is done in the MCU firmware - this is something ASUS engineers who work on the Ally devices have done with our aid in pinpointing the exact cause.
The proper fix for _Asus_. I need to support a lot of manufacturers
and a lot of them use these _DSMs, so we need to get a better ordering
here.
> Denis was very clear I thought, perhaps you misread? Thank you for your thorough testing Denis.
Uh, Denis is never very clear unfortunately. From what I understood he
hit the reboot bug once. But I could not get a feel for what his issue
was. I know on V1, the XInput controller did not suspend properly.
> It isn't about trying to suspend the thing "quickly enough". And I'm hoping that with your analysing of that statement it provides you some insight as to why it's a bad assumption to make and why things are always going to be at risk of breaking with async linux suspend when you get another device like the Ally. At best the issue may be masked only to have seemingly random fails that people can't reproduce easily.
If you look into my patch series, you will see that unlike the
variants you tried, there is no async within it. The calls and the
delay are inserted before beginning the suspend sequence (which is
async). I think this is a good compromise and can be blended with
userspace going down, so it will not increase suspend delay in the
long run for devices that need a quirk. For the Ally X, even without a
delay, the controller always comes back online. Seems that it is the
case on the original Ally as well. With a caveat.
That caveat has to do with the delay before suspend. If it is too
little, the OG Ally XInput MCU does not turn off and the Ally X MCU
loses its state. For the OG Ally, I guess Denis informed you about it.
For the Ally X, this manifests itself as the MCU rebooting once when
it receives the first command after suspend. When powersave is off,
this happens when there is no delay, when it is on, it always happens.
I can reproduce this bug on windows very easily right now too.
It is a small nit that can go unnoticed though. So hopefully you can
test the new firmware (with whatever kernel you want) and report back.
Hopefully you are right and it was a timing check and that timing
check also fixes this.
> It is perhaps best if I refer to what Denis and Derek told you about modern standby: focus on background downloads and proceed in that direction - the firmware really does fix the suspend/resume/reboot issues. You can also drop the asus-wmi patch (in submissions) as that will get dealt with soon enough.
Indeed. I am the one with the problem here. I have too many Ally users
and for me to require them to update their firmware right now (for
basic functionality) is a non-starter. Which is why I was happy kind
of keeping this patch to myself.
I personally do not have a problem with making the next variant of
this patch an RFC. But currently I think it has a good scope, so I
would like to ask what other kernel maintainers think.
Also, after we do a bit more testing on the OG Ally, if everything
goes well, my version of the quirk is expected to have better behavior
than the asus-wmi, so I do not see a reason in refactoring that,
unless that refactoring ends in it completely removed. There is still
the issue of old users in this case though.
Antheas
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v2 0/5] acpi/x86: s2idle: move Display off/on calls outside suspend (fixes ROG Ally suspend)
2024-09-23 13:11 ` Antheas Kapenekakis
@ 2024-09-23 15:56 ` Mario Limonciello
2024-09-23 16:11 ` Antheas Kapenekakis
0 siblings, 1 reply; 38+ messages in thread
From: Mario Limonciello @ 2024-09-23 15:56 UTC (permalink / raw)
To: Antheas Kapenekakis, Luke Jones
Cc: Derek J. Clark, me, Denis Benato, linux-pm
>> It is perhaps best if I refer to what Denis and Derek told you about modern standby: focus on background downloads and proceed in that direction - the firmware really does fix the suspend/resume/reboot issues. You can also drop the asus-wmi patch (in submissions) as that will get dealt with soon enough.
>
> I personally do not have a problem with making the next variant of
> this patch an RFC. But currently I think it has a good scope, so I
> would like to ask what other kernel maintainers think.
I don't think it's possible to have it both ways that the ordering is
wrong AND the timing is wrong.
To me; this series is looking like a lot of bandaids to work around what
behave as race conditions. I'll leave some comments on the particular
patches, but in the cover letter I want to see particularly patch 3
tells me that this series isn't doing it right and you're getting lucky
on the timing.
IoW if the call has been moved earlier but then you need to delay it to
later was it really right to move it earlier?
Rather than littering the kernel with quirks for a buggy firmware
that the manufacturer is fixing I personally think it's a LOT better to
invest the effort into getting the MCU updater ported to Linux and
encourage users to update.
I started to add some code to fwupd [1] to at least parse the MCU
version for Ally and Ally-X. As this eventually lands higher level
software could potentially communicate with the fwupd daemon over dbus
to get this information and notify users that they need to update.
If we manage to get the whole update flow ported it could even work with
the daemon to do the update.
https://github.com/fwupd/fwupd/tree/superm1/asus-hid
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v2 0/5] acpi/x86: s2idle: move Display off/on calls outside suspend (fixes ROG Ally suspend)
2024-09-23 15:56 ` Mario Limonciello
@ 2024-09-23 16:11 ` Antheas Kapenekakis
2024-09-23 16:21 ` Mario Limonciello
0 siblings, 1 reply; 38+ messages in thread
From: Antheas Kapenekakis @ 2024-09-23 16:11 UTC (permalink / raw)
To: Mario Limonciello; +Cc: Luke Jones, Derek J. Clark, me, Denis Benato, linux-pm
> I don't think it's possible to have it both ways that the ordering is
> wrong AND the timing is wrong.
I think the current firmware manifests 2 issues on the Ally X: Display
On needed to be called once the kernel is ready otherwise it NOOPed
and Display Off needed to be called 1 second before suspend. Then,
there is the issue in my previous email that also exists in Windows.
I tried from 0 - 4000ms for Display off and 0 - 1500ms for Display on.
Display Off, as long as the RGB fades 100-200ms before, made no
difference. Display On likewise made no difference.
I am currently playing a bit with the calls in the Legion Go. Seems
like one of them turns off its controller as well.
I did not know fwupd could do microprocessors, if it is simple that
would be great.
Antheas
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v2 0/5] acpi/x86: s2idle: move Display off/on calls outside suspend (fixes ROG Ally suspend)
2024-09-23 16:11 ` Antheas Kapenekakis
@ 2024-09-23 16:21 ` Mario Limonciello
2024-09-23 16:54 ` Antheas Kapenekakis
0 siblings, 1 reply; 38+ messages in thread
From: Mario Limonciello @ 2024-09-23 16:21 UTC (permalink / raw)
To: Antheas Kapenekakis
Cc: Luke Jones, Derek J. Clark, me, Denis Benato, linux-pm
On 9/23/2024 11:11, Antheas Kapenekakis wrote:
>> I don't think it's possible to have it both ways that the ordering is
>> wrong AND the timing is wrong.
>
> I think the current firmware manifests 2 issues on the Ally X: Display
> On needed to be called once the kernel is ready otherwise it NOOPed
> and Display Off needed to be called 1 second before suspend. Then,
> there is the issue in my previous email that also exists in Windows.
>
> I tried from 0 - 4000ms for Display off and 0 - 1500ms for Display on.
> Display Off, as long as the RGB fades 100-200ms before, made no
> difference. Display On likewise made no difference.
Part of the problem here is that we (Linux kernel developers) don't have
the details of how the MCU actually interacts with the OS vs how it
wants to interact with the OS. Only ASUS knows this and all we can do
is guess (or join an NDA).
"For example" (not saying this is the problem; I don't know) if we have
a Linux sequence that we put the XHCI host controller into D3 but don't
turn off the port first this might not matter to most devices because
you're saving power by the controller being in D3. But if the device is
looking for some sequence of packets there could be a failure if those
never happen or happen at the wrong time.
>
> I am currently playing a bit with the calls in the Legion Go. Seems
> like one of them turns off its controller as well.
>
> I did not know fwupd could do microprocessors, if it is simple that
> would be great.
Yeah in this case it interacts with the USB HID interface.
Unfortunately we don't have the HID spec from ASUS, so the entire thing
would need to be reverse engineered.
I hope that more of these handhelds with MCUs that get regular updates
make the same investment.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v2 0/5] acpi/x86: s2idle: move Display off/on calls outside suspend (fixes ROG Ally suspend)
2024-09-23 16:21 ` Mario Limonciello
@ 2024-09-23 16:54 ` Antheas Kapenekakis
2024-09-23 17:15 ` Mario Limonciello
0 siblings, 1 reply; 38+ messages in thread
From: Antheas Kapenekakis @ 2024-09-23 16:54 UTC (permalink / raw)
To: Mario Limonciello; +Cc: Luke Jones, Derek J. Clark, me, Denis Benato, linux-pm
> Part of the problem here is that we (Linux kernel developers) don't have
> the details of how the MCU actually interacts with the OS vs how it
> wants to interact with the OS. Only ASUS knows this and all we can do
> is guess (or join an NDA).
I want to say that that is beginning to change, at least for these
devices. Manufacturers want the experience that comes with owning the
whole OS. Mobile phones and TVs figured that out long ago.
> "For example" (not saying this is the problem; I don't know) if we have
> a Linux sequence that we put the XHCI host controller into D3 but don't
> turn off the port first this might not matter to most devices because
> you're saving power by the controller being in D3. But if the device is
> looking for some sequence of packets there could be a failure if those
> never happen or happen at the wrong time.
The problem here is probably that. Killing the power too early made it
not save its state, asking it to be brought up before it powers on
made it NOOP, and asking it to be brought up in prepare_early ended up
with a bunch of race conditions with the kernel.
Lenovo figured it out, so there is no reason Asus wouldn't. Although
the latest bios from lenovo made the controllers take 2 seconds longer
to wake up. We did not have any problem in Linux, that was in Windows.
Unplugging XInput devices is really a mess.
I am really happy with the patch on the Ally X. We will get to testing
it on the Ally. It does a little reboot with powersave on, but that is
a cosmetic failure and happens in Windows too. When Asus fixes that I
will point them to Windows to update if they care about powersave so
much.
> >
> > I am currently playing a bit with the calls in the Legion Go. Seems
> > like one of them turns off its controller as well.
> >
> > I did not know fwupd could do microprocessors, if it is simple that
> > would be great.
>
> Yeah in this case it interacts with the USB HID interface.
>
> Unfortunately we don't have the HID spec from ASUS, so the entire thing
> would need to be reverse engineered.
>
> I hope that more of these handhelds with MCUs that get regular updates
> make the same investment.
I do not think every manufacturer creates a HID spec for updates. They
use the same cookiecutter stuff, for which there is no linux
implementation at all, so kind of the same. Probably after we do 1-3,
we will have all of them.
I saw that HID interface though, and the Legion Go has one and I am
none the wiser.
I've been doing HID for over a year by now, so I am used to it.
Biggest problem is you cannot trigger an update in Windows whenever
you want to sniff it. At least on the Lenovo.
However, I think I have bigger fish to fry, so it is not high in my
priority list.
Antheas
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v2 0/5] acpi/x86: s2idle: move Display off/on calls outside suspend (fixes ROG Ally suspend)
2024-09-23 16:54 ` Antheas Kapenekakis
@ 2024-09-23 17:15 ` Mario Limonciello
2024-09-23 17:48 ` Antheas Kapenekakis
0 siblings, 1 reply; 38+ messages in thread
From: Mario Limonciello @ 2024-09-23 17:15 UTC (permalink / raw)
To: Antheas Kapenekakis
Cc: Luke Jones, Derek J. Clark, me, Denis Benato, linux-pm
On 9/23/2024 11:54, Antheas Kapenekakis wrote:
>> I hope that more of these handhelds with MCUs that get regular updates
>> make the same investment.
>
> I do not think every manufacturer creates a HID spec for updates. They
> use the same cookiecutter stuff, for which there is no linux
> implementation at all, so kind of the same. Probably after we do 1-3,
> we will have all of them.
Actually there are already about a dozen implementations for HID
protocols in fwupd. I've written a few myself, Richard has written way
more than me, and there are even have vendors contributing to their own
plugins now.
I do think that it's very likely that ASUS uses the same implementation
on a lot of their devices, so yes if we get firmware updating figured
out on one we'll probably have it for many more.
>
> I saw that HID interface though, and the Legion Go has one and I am
> none the wiser.
>
> I've been doing HID for over a year by now, so I am used to it.
Considering the previous experience, maybe you can help to document
their update protocol? I'm happy to collaborate on the fwupd side if
you can help push it along.
> Biggest problem is you cannot trigger an update in Windows whenever
> you want to sniff it. At least on the Lenovo.
>
> However, I think I have bigger fish to fry, so it is not high in my
> priority list.
>
AFAICT any need for this series goes away when you have an easy way to
update a handheld that is running Linux already. I would take a pile of
userspace code over kernel quirks any day.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v2 0/5] acpi/x86: s2idle: move Display off/on calls outside suspend (fixes ROG Ally suspend)
2024-09-23 17:15 ` Mario Limonciello
@ 2024-09-23 17:48 ` Antheas Kapenekakis
0 siblings, 0 replies; 38+ messages in thread
From: Antheas Kapenekakis @ 2024-09-23 17:48 UTC (permalink / raw)
To: Mario Limonciello; +Cc: Luke Jones, Derek J. Clark, me, Denis Benato, linux-pm
> Considering the previous experience, maybe you can help to document
> their update protocol? I'm happy to collaborate on the fwupd side if
> you can help push it along.
If I get the time I might try to, but if this patch works on the
original Ally, this fire is handled as far as I am concerned. Even if
it does not merge and even with the little reboot quirk.
So it moves back in the stack of priorities. I still think fw and bios
updates are important and I have been thinking about it too. More on
the GUI side though.
At least BIOS updates are kind of solved for both the Ally and the Go.
However, the Go half bricks itself in almost all updates from WIndows.
I had to power it off during a bios update twice and that's from
Windows. I am not going near it any time soon.
> AFAICT any need for this series goes away when you have an easy way to
> update a handheld that is running Linux already. I would take a pile of
> userspace code over kernel quirks any day.
I cannot agree with you more on skipping the kernel when possible. It
is why most of my work is in userspace.
"Oh custom TDP/Fan on the legion go? Custom TDP is mode 4 that does
not map cleanly to a platform_profile, half of the BIOSes of the
legion go forget the custom TDP when changing TDP profile, and there
is a hw combo to change the TDP. Sniff the kernel net socket for ACPI
events to capture TDP mode changes, then if in Custom mode, wait 2
seconds, set the custom TDP the user saved previously, and then the
fan curve. Want to disable the custom fan curve? Pick a random TDP
profile, switch to it, and repeat."
However, with the kernel firing the _DSMs Display Off and Screen Entry
calls after the suspend sequence back to back with no delay when
WIndows calls them seconds apart before the suspend sequence, I can
personally see the eventuality.
Three talented OSS devs + the Asus engineering team had to work for 1
month on avoiding this table and now they have to move on to
validating the firmware, shipping it, etc. I also spent too much work
on making this upstreamable, but I plan to use this patch so it is
kind of OK, unless the Sleep _DSM does not do much.
Also, did I mention that the Ally gets stuck on 5W TDP semi randomly
after suspend and has a really nice and complex _DSM sleep entry? And
I am the only one that knows it because Handheld Daemon is the only
TDP solution that does not use ryzenadj to set TDP on the Ally. So
next step for me is doing the same series for the Sleep Entry/Exit and
expanding the quirk table. I am not waiting for a BIOS update.
Antheas
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v2 0/5] acpi/x86: s2idle: move Display off/on calls outside suspend (fixes ROG Ally suspend)
2024-09-22 17:22 [PATCH v2 0/5] acpi/x86: s2idle: move Display off/on calls outside suspend (fixes ROG Ally suspend) Antheas Kapenekakis
` (5 preceding siblings ...)
2024-09-22 18:15 ` [PATCH v2 0/5] acpi/x86: s2idle: move Display off/on calls outside suspend (fixes ROG Ally suspend) Derek J. Clark
@ 2024-10-05 14:21 ` Hans de Goede
2024-10-05 15:10 ` Antheas Kapenekakis
6 siblings, 1 reply; 38+ messages in thread
From: Hans de Goede @ 2024-10-05 14:21 UTC (permalink / raw)
To: Antheas Kapenekakis, linux-pm, Rafael J. Wysocki
Cc: platform-driver-x86, Mario Limonciello, luke, me, Denis Benato
Hi,
On 22-Sep-24 7:22 PM, Antheas Kapenekakis wrote:
> The following series moves the Display off/on calls outside of the suspend
> sequence, as they are performed in Windows. This fixes certain issues that appear
> in devices that use the calls and expect the kernel to be active during their
> call (especially in the case of the ROG Ally devices) and opens the possibility
> of implementing a "Screen Off" state in the future (which mirrors Windows).
> In addition, it adds a quirk table that will allow for adding delays between
> Modern Standby transitions, to help resolve racing conditions.
>
> This series requires a bit of background on how modern standby works in Windows.
> Fundamentally, it is composed of four states: "Active", "Screen Off", "Sleep",
> and "DRIPS". Here, I take the liberty of naming the state "Active", as it is
> implied in Windows documentation.
>
> When the user actively interacts with a device, it is in the "Active" state.
> The screen is on, all devices are connected, and desired software is running.
> The other 3 stages play a role once the user stops interacting with the device.
> This is triggered in two main ways: either by pressing the power button or by
> inactivity. Once either of those targets is met, the system enters Modern Standby.
>
> Modern Standby consists of an orchestration of the "Screen Off", "Sleep", and
> "DRIPS" states. Windows is free to move throughout these states until the user
> interacts with the device again, where the device will transition to being
> "Active". Moving between the states implies a transition, where Windows performs
> a set of actions. In addition, Windows can only move between adjacent states
> as follows:
>
> "Active" <-> "Screen Off" <-> "Sleep" <-> "DRIPS"
>
> "Screen Off" is the state where all active displays in the device (whether
> *virtual* or real; this means unrelated to DRM) are off. The user might still
> be interacting with the device or running programs (e.g., compiling a kernel).
>
> "Sleep" is a newer state, in which the device turns off its fan and pulses its
> power button, but still supports running software activities. As part of this,
> and to avoid overheating the device a lot of manufacturers lower the TDP (PLx)
> of the device [3; _DSM 9 description].
>
> Finally, DRIPS stands for Deepest Runtime Idle Power State, i.e. suspend.
>
> While Windows may transition from any state to any state, doing so implies
> performing all transitions to reach that state. All states other than DRIPS
> have a fully active kernel (Wifi, USB at least) and allow userspace activity.
> What changes is the extent of the activity, and whether some power consuming
> devices are turned off (this is done with Modern Standby aware drivers and
> firmware notifications). The Windows kernel suspends during the transition from
> the "Sleep" state to the "DRIPS" state. In all other states it is active.
>
> After finishing each transition, the kernel performs a firmware notification,
> described in the _DSM document [3]. Moving from left to right with function num.,
> we have Display Off (3; Active -> Screen Off), Sleep Entry (7; Screen Off -> Sleep),
> and Lowest Power State Entry Notification (5; LPSEN; Sleep -> DRIPS). Then, from
> right to left, we have Lowest Power State Exit Notification (6; DRIPS -> Sleep),
> Sleep Exit (8; Sleep -> Screen) and Display On (4; Screen Off -> Active).
>
> The Linux kernel is not currently Modern Standby aware but will still make these
> calls. Currently, the problem is that the kernel calls all of the firmware
> notifications at the point LPSEN (5, 6) should be called, which is when the
> kernel is mostly suspended. This is a clear deviation from Windows, and causes
> undesirable behavior in certain devices, the main one focused in this patch
> series being the ROG Ally. Although series patch is aimed at Modern Standby
> devices in general.
>
> The ROG Ally is a Modern Standby capable device (uses Secure Core too; really
> ticks all the MS boxes) and in it, there are issues with both Display 3,4
> calls and Sleep 7,8 calls cause issues (7,8 are suspected and todo).
>
> The Display 3,4 calls are responsible for the controller. The Display Off call
> disconnects (if powersave is off) or powers off (if powersave is on and on DC
> power) the MCU(s) responsible for the controller and deactivates the RGB of the
> device. Display On powers on or reconnects the controller respectively.
> This controller, in the Ally X, is composed of 6 HID devices that register to
> the kernel. From testing, it seems that the majority of the problem in the Ally
> comes from Display Off being called way too late timewise, and Display
>
> The Sleep 7,8 calls, in general, are responsible for setting a low power state
> that is safe to use while the device is sleeping, making the suspend light
> pulse, and turning off the fan. Due to a variety of race conditions, there is
> a rare occasion where the Ally EC can get stuck in its Sleep mode, where the
> TDP is 5W, and prevent increasing it until the device reboots. The sleep entries
> contain actions in the Ally, so there is a suspicion that calling them during
> DRIPS is causing issues. However, this is not the subject of this patch and
> has not been verified yet.
>
> This patch centers around moving the Display 3,4 calls outside the suspend
> sequence (which is the transition from Sleep to DRIPS in Modern Standby terms),
> and by implementing the proper locks necessary, opening up the possibility of
> making these calls as part of a more elaborate "Modern Standby"-like userspace
> suspend/wakelock implementation. As of this patch, they are only called before
> the suspend sequence, including with the possibility of adding a delay.
>
> This makes the intent of this patch primarily compatibility focused, as it aims
> to fix issues by the current implementation. And to that end it works.
> After moving the calls outside of the suspend sequence, my ROG Ally X test unit
> can suspend more than 50 times without rebooting, both with powersave on or off,
> regardless of whether it is plugged/unplugged during suspend, and still have the
> controller work with full reliability. In V1, there was an unsolved race condition
> that was dealt by (5) before Display Off triggers. Essentially, Linux suspends
> too fast for the current version of the firmware to deal with. After adding a
> quirk table, which delays suspend after the Display Off call, the controller
> of the original Ally should power off properly (a lot of testing will be done).
Thank you for your work on this and thank you for the comprehensive write-up
on how Windows does modern standby.
First of all may I suggest that you take the above write-up, minus the ROG
Ally specific bits and turn this into a new documentation file under
Documentation/power ? And also document at which point Linux currently
makes the various transitions.
And then in patches where you move the transitions, also update the docs
on what Linux does to match.
I have read the discussion about tying the display on/off calls to CRTC state
and/or exposing a userspace knob for that. I think that this needs more
discussion / design work.
OTOH IMHO it would be good to take patches 1 - 3 . Certainly 1 + 2 would
be good to have. 3 is a bit unfortunate and not necessary with the current
special ROG Ally handling in the asus-wmi driver. It might be better to
just keep the quirks there.
IMHO it would be good to submit a v2 of just patches 1 - 3 run through
checkpatch. Also the commit message of patch 3 should point to the existing
quirk code in asus-wmi.c and mention that then is no longer necessary after
patch 3, then we can discuss what is the best place for these quirks.
Rafael, what do you think about at least taking patches 1 - 3 upstream?
Reading through how Windows handles things making the display on/off
calls before suspending devices sounds like it is the right thing to do
to me.
Regards,
Hans
^ permalink raw reply [flat|nested] 38+ messages in thread* Re: [PATCH v2 0/5] acpi/x86: s2idle: move Display off/on calls outside suspend (fixes ROG Ally suspend)
2024-10-05 14:21 ` Hans de Goede
@ 2024-10-05 15:10 ` Antheas Kapenekakis
2024-10-05 16:24 ` Hans de Goede
0 siblings, 1 reply; 38+ messages in thread
From: Antheas Kapenekakis @ 2024-10-05 15:10 UTC (permalink / raw)
To: Hans de Goede
Cc: linux-pm, Rafael J. Wysocki, platform-driver-x86,
Mario Limonciello, luke, me, Denis Benato
Hi Hans,
> Thank you for your work on this and thank you for the comprehensive write-up
> on how Windows does modern standby.
>
> First of all may I suggest that you take the above write-up, minus the ROG
> Ally specific bits and turn this into a new documentation file under
> Documentation/power ? And also document at which point Linux currently
> makes the various transitions.
I will try to move some of that documentation there, this is a great idea.
> And then in patches where you move the transitions, also update the docs
> on what Linux does to match.
>
> I have read the discussion about tying the display on/off calls to CRTC state
> and/or exposing a userspace knob for that. I think that this needs more
> discussion / design work.
Yes, you are right. To become a knob this would require a much bigger
discussion. I would also like to move Sleep calls as part of that. The
Legion Go and OneXPlayer devices turn off their controllers as part of
that and other modern standby devices limit their power envelope
(perhaps the Legion Go too). I think the Sleep call is where most of
the userspace usability will come from. Display On/Off is a bit of a
NOOP on most devices.
As for the LSB0 enter and exit, I do not know where the correct place
for those would be, and perhaps someone from Microsoft needs to be
consulted on that. The documentation is very vague. However it is
clear to me that they should be close to where they are right now, so
they very likely do not need to move.
There is also the new _DSM intent to turn display on 9 call. Which
meshes with the sleep call. That call is made before Sleep Exit, if
the kernel knows that the wake-up will cause the display to turn on,
to boost the thermal envelope of the device and help it wake up
quicker. If the Sleep call is moved then we would also have to
introduce that somewhere to avoid wake-up time regressions on devices
that support it, which also raises the question of how would the
kernel decide if an interrupt will cause the display to turn on before
unfreezing userspace (bulk of resume) (or should it be done after
unfreezing?).
> OTOH IMHO it would be good to take patches 1 - 3 . Certainly 1 + 2 would
> be good to have. 3 is a bit unfortunate and not necessary with the current
> special ROG Ally handling in the asus-wmi driver. It might be better to
> just keep the quirks there.
From what I know Luke plans to remove that quirk ASAP due to new
firmware. I would keep it around until this patch series merges
personally and remove it as part of that. As it will probably cause
regressions if both are in place and require manual intervention if
either is not. I will also note that the quirk in asus-wmi calls the
Display On/Off calls a second time and during the suspend sequence,
which is not in any way proper. So if future devices need this kind of
quirk, it really does not seem like a good idea to me to paper over
their problems by calling the notifications a second time in random
platform drivers. There is the question of where that quirk should be
placed, that is true, but I IMO it should be a pm problem.
Perhaps not in the location where I put it though and perhaps it
should be done with LSB0 callbacks instead. Although, being done this
way allows for it to blend with the suspend sequence. Ideally, the
Display Off delay would be blended with userspace going down such that
if e.g., there is heavy userspace activity that requires ~2s to
freeze, the quirk would add no delay. Instead, it would only add delay
if userspace freezes quickly (less than .5s). Same can be said with
Sleep Entry and beginning prepare_late, which blocks the EC interrupts
(that would need a lot of investigation though).
On that note, it seems to me that the Ally has 2 bugs related to the
_DSM calls 3 and 4. First bug is that Display On is gated on current
firmware and only works when the USB subsystem is powered on.
Allegedly, this is fixed on the upcoming firmware but it is not
something I have verified personally. I will verify it in 10 days or
so, if the new firmware does not fail QA I guess.
However, there is a second bug with Display Off in _DSM 4. The
controller of the Ally needs time to power off, around 500ms.
Otherwise it gets its power clipped and/or does not power off
correctly. This causes the issues mentioned in the discussion and I
have no indication that this is fixed with newer controller firmware.
It is also my understanding that most of the testing of the new
firmware happened with the asus-wmi quirk in place, which papers over
that issue, so removing the quirk might be premature in any case.
We have currently released this patch series in Bazzite and I am happy
to report that it completely fixes all controller related issues in
the Ally devices and makes them behave exactly as they do in Windows,
regardless of firmware and bug for bug.
So we will be keeping it around and extending it as appropriate to
include the Sleep calls. I am reminded multiple times per week that
the Ally has TDP suspend bugs, where if the user is playing a heavy
game, the EC of the device tends to get stuck at 6W and fail to
respond after waking the device. So moving calls 7, 8 is the natural
next step in this investigation. I already have a draft patch on
standby, that we plan to AB test soon.
> IMHO it would be good to submit a v2 of just patches 1 - 3 run through
> checkpatch. Also the commit message of patch 3 should point to the existing
> quirk code in asus-wmi.c and mention that then is no longer necessary after
> patch 3, then we can discuss what is the best place for these quirks.
I did run it through before sending the patch. However, some of the
warnings were a bit cryptic to me... I will run it again.
I will add a note for asus-wmi on future patch series.
First 3 patches of the series are designed to NOOP before patch 4. Did
you mean patch 3 (which adds the delay) instead of 4?
> Rafael, what do you think about at least taking patches 1 - 3 upstream?
> Reading through how Windows handles things making the display on/off
> calls before suspending devices sounds like it is the right thing to do
> to me.
Antheas
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v2 0/5] acpi/x86: s2idle: move Display off/on calls outside suspend (fixes ROG Ally suspend)
2024-10-05 15:10 ` Antheas Kapenekakis
@ 2024-10-05 16:24 ` Hans de Goede
2024-10-05 16:27 ` Hans de Goede
` (2 more replies)
0 siblings, 3 replies; 38+ messages in thread
From: Hans de Goede @ 2024-10-05 16:24 UTC (permalink / raw)
To: Antheas Kapenekakis
Cc: linux-pm, Rafael J. Wysocki, platform-driver-x86,
Mario Limonciello, luke, me, Denis Benato
Hi Antheas,
On 5-Oct-24 5:10 PM, Antheas Kapenekakis wrote:
> Hi Hans,
>
>> Thank you for your work on this and thank you for the comprehensive write-up
>> on how Windows does modern standby.
>>
>> First of all may I suggest that you take the above write-up, minus the ROG
>> Ally specific bits and turn this into a new documentation file under
>> Documentation/power ? And also document at which point Linux currently
>> makes the various transitions.
>
> I will try to move some of that documentation there, this is a great idea.
>
>> And then in patches where you move the transitions, also update the docs
>> on what Linux does to match.
>>
>> I have read the discussion about tying the display on/off calls to CRTC state
>> and/or exposing a userspace knob for that. I think that this needs more
>> discussion / design work.
>
> Yes, you are right. To become a knob this would require a much bigger
> discussion. I would also like to move Sleep calls as part of that. The
> Legion Go and OneXPlayer devices turn off their controllers as part of
> that and other modern standby devices limit their power envelope
> (perhaps the Legion Go too). I think the Sleep call is where most of
> the userspace usability will come from. Display On/Off is a bit of a
> NOOP on most devices.
>
> As for the LSB0 enter and exit, I do not know where the correct place
> for those would be, and perhaps someone from Microsoft needs to be
> consulted on that. The documentation is very vague. However it is
> clear to me that they should be close to where they are right now, so
> they very likely do not need to move.
>
> There is also the new _DSM intent to turn display on 9 call. Which
> meshes with the sleep call. That call is made before Sleep Exit, if
> the kernel knows that the wake-up will cause the display to turn on,
> to boost the thermal envelope of the device and help it wake up
> quicker. If the Sleep call is moved then we would also have to
> introduce that somewhere to avoid wake-up time regressions on devices
> that support it, which also raises the question of how would the
> kernel decide if an interrupt will cause the display to turn on before
> unfreezing userspace (bulk of resume) (or should it be done after
> unfreezing?).
>
>> OTOH IMHO it would be good to take patches 1 - 3 . Certainly 1 + 2 would
>> be good to have. 3 is a bit unfortunate and not necessary with the current
>> special ROG Ally handling in the asus-wmi driver. It might be better to
>> just keep the quirks there.
>
> From what I know Luke plans to remove that quirk ASAP due to new
> firmware. I would keep it around until this patch series merges
> personally and remove it as part of that.
Ack I replied to Luke to say something like that.
> As it will probably cause regressions if both are in place
I don't see how having both this patch-sets + the asus-wmi
quirks will cause regressions? The suspend delay will be done
twice, but that is harmless.
> and require manual intervention if
> either is not. I will also note that the quirk in asus-wmi calls the
> Display On/Off calls a second time and during the suspend sequence,
> which is not in any way proper.
AFAICT asus-wmi does not call the display on / off callbacks instead
it directly calls "\\_SB.PCI0.SBRG.EC0.CSEE" to control the power ?
> So if future devices need this kind of
> quirk, it really does not seem like a good idea to me to paper over
> their problems by calling the notifications a second time in random
> platform drivers. There is the question of where that quirk should be
> placed, that is true, but I IMO it should be a pm problem.
>
> Perhaps not in the location where I put it though and perhaps it
> should be done with LSB0 callbacks instead. Although, being done this
> way allows for it to blend with the suspend sequence. Ideally, the
> Display Off delay would be blended with userspace going down such that
> if e.g., there is heavy userspace activity that requires ~2s to
> freeze, the quirk would add no delay. Instead, it would only add delay
> if userspace freezes quickly (less than .5s). Same can be said with
> Sleep Entry and beginning prepare_late, which blocks the EC interrupts
> (that would need a lot of investigation though).
>
> On that note, it seems to me that the Ally has 2 bugs related to the
> _DSM calls 3 and 4. First bug is that Display On is gated on current
> firmware and only works when the USB subsystem is powered on.
> Allegedly, this is fixed on the upcoming firmware but it is not
> something I have verified personally. I will verify it in 10 days or
> so, if the new firmware does not fail QA I guess.
>
> However, there is a second bug with Display Off in _DSM 4. The
> controller of the Ally needs time to power off, around 500ms.
> Otherwise it gets its power clipped and/or does not power off
> correctly. This causes the issues mentioned in the discussion and I
> have no indication that this is fixed with newer controller firmware.
> It is also my understanding that most of the testing of the new
> firmware happened with the asus-wmi quirk in place, which papers over
> that issue, so removing the quirk might be premature in any case.
Ok.
> We have currently released this patch series in Bazzite and I am happy
> to report that it completely fixes all controller related issues in
> the Ally devices and makes them behave exactly as they do in Windows,
> regardless of firmware and bug for bug.
>
> So we will be keeping it around and extending it as appropriate to
> include the Sleep calls. I am reminded multiple times per week that
> the Ally has TDP suspend bugs, where if the user is playing a heavy
> game, the EC of the device tends to get stuck at 6W and fail to
> respond after waking the device. So moving calls 7, 8 is the natural
> next step in this investigation. I already have a draft patch on
> standby, that we plan to AB test soon.
>
>> IMHO it would be good to submit a v2 of just patches 1 - 3 run through
>> checkpatch. Also the commit message of patch 3 should point to the existing
>> quirk code in asus-wmi.c and mention that then is no longer necessary after
>> patch 3, then we can discuss what is the best place for these quirks.
>
> I did run it through before sending the patch. However, some of the
> warnings were a bit cryptic to me... I will run it again.
>
> I will add a note for asus-wmi on future patch series.
>
> First 3 patches of the series are designed to NOOP before patch 4. Did
> you mean patch 3 (which adds the delay) instead of 4?
Ah I misread the series and failed to notice that patch 4 actually hooks
things up, I was under the impression that patch 4 hooks things up.
But I did mean that patch 3 might lead to discussion not patch 4.
Now that I have taken a better look some more detailed review comments:
* Patches 1 and 2 should be squashed into a single patch IMHO.
* Patch 3 adds the quirks to kernel/power/suspend.c but this
really should be added to drivers/acpi/x86/s2idle.c and then do
the sleep as part of the display_off callback.
* Given my comment on patch 3 I would swap the order of patch 3 and 4
and only add the quirks after moving the display off ACPI call to
the new callback
* Patch 4 does too much in a single patch, specifically besides
actually implementing the new callbacks it also does s/SCREEN/DISPLAY
on all the ACPI_LPS0_* defines. This renaming of the defines must
be done in a separate patch.
Regards,
Hans
^ permalink raw reply [flat|nested] 38+ messages in thread* Re: [PATCH v2 0/5] acpi/x86: s2idle: move Display off/on calls outside suspend (fixes ROG Ally suspend)
2024-10-05 16:24 ` Hans de Goede
@ 2024-10-05 16:27 ` Hans de Goede
2024-10-05 16:50 ` Antheas Kapenekakis
2024-10-05 16:57 ` Antheas Kapenekakis
2024-10-05 21:47 ` Hans de Goede
2 siblings, 1 reply; 38+ messages in thread
From: Hans de Goede @ 2024-10-05 16:27 UTC (permalink / raw)
To: Antheas Kapenekakis
Cc: linux-pm, Rafael J. Wysocki, platform-driver-x86,
Mario Limonciello, luke, me, Denis Benato
p.s.
On 5-Oct-24 6:24 PM, Hans de Goede wrote:
> Hi Antheas,
>
> On 5-Oct-24 5:10 PM, Antheas Kapenekakis wrote:
>> Hi Hans,
>>
<snip>
>>> IMHO it would be good to submit a v2 of just patches 1 - 3 run through
>>> checkpatch. Also the commit message of patch 3 should point to the existing
>>> quirk code in asus-wmi.c and mention that then is no longer necessary after
>>> patch 3, then we can discuss what is the best place for these quirks.
>>
>> I did run it through before sending the patch. However, some of the
>> warnings were a bit cryptic to me... I will run it again.
>>
>> I will add a note for asus-wmi on future patch series.
>>
>> First 3 patches of the series are designed to NOOP before patch 4. Did
>> you mean patch 3 (which adds the delay) instead of 4?
>
> Ah I misread the series and failed to notice that patch 4 actually hooks
> things up, I was under the impression that patch 4 hooks things up.
>
> But I did mean that patch 3 might lead to discussion not patch 4.
Oh and upon re-reading the series I see that pathc 5 is just dropping
the quirks from asus-wmi.c, which is fine.
I somehow thought that the later patches where adding a way for userspace
to already enter the LPS0 display off state earlier. No idea how that
idea got in my head ...
Regards,
Hans
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v2 0/5] acpi/x86: s2idle: move Display off/on calls outside suspend (fixes ROG Ally suspend)
2024-10-05 16:27 ` Hans de Goede
@ 2024-10-05 16:50 ` Antheas Kapenekakis
0 siblings, 0 replies; 38+ messages in thread
From: Antheas Kapenekakis @ 2024-10-05 16:50 UTC (permalink / raw)
To: Hans de Goede
Cc: linux-pm, Rafael J. Wysocki, platform-driver-x86,
Mario Limonciello, luke, me, Denis Benato
On Sat, 5 Oct 2024 at 18:27, Hans de Goede <hdegoede@redhat.com> wrote:
>
> p.s.
>
> On 5-Oct-24 6:24 PM, Hans de Goede wrote:
> > Hi Antheas,
> >
> > On 5-Oct-24 5:10 PM, Antheas Kapenekakis wrote:
> >> Hi Hans,
> >>
>
> <snip>
>
> >>> IMHO it would be good to submit a v2 of just patches 1 - 3 run through
> >>> checkpatch. Also the commit message of patch 3 should point to the existing
> >>> quirk code in asus-wmi.c and mention that then is no longer necessary after
> >>> patch 3, then we can discuss what is the best place for these quirks.
> >>
> >> I did run it through before sending the patch. However, some of the
> >> warnings were a bit cryptic to me... I will run it again.
> >>
> >> I will add a note for asus-wmi on future patch series.
> >>
> >> First 3 patches of the series are designed to NOOP before patch 4. Did
> >> you mean patch 3 (which adds the delay) instead of 4?
> >
> > Ah I misread the series and failed to notice that patch 4 actually hooks
> > things up, I was under the impression that patch 4 hooks things up.
> >
> > But I did mean that patch 3 might lead to discussion not patch 4.
>
> Oh and upon re-reading the series I see that pathc 5 is just dropping
> the quirks from asus-wmi.c, which is fine.
>
> I somehow thought that the later patches where adding a way for userspace
> to already enter the LPS0 display off state earlier. No idea how that
> idea got in my head ...
Done this way to hopefully be easier to upstream and get this fix out
sooner. The plan here would be 3 series: 1) move Display On/Off +
quirk, 2) move Sleep Entry/Exit + Quirk, 3) RFC for exposing to
userspace, in which case if the kernel starts to suspend while in
standby it would skip those calls.
Antheas
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v2 0/5] acpi/x86: s2idle: move Display off/on calls outside suspend (fixes ROG Ally suspend)
2024-10-05 16:24 ` Hans de Goede
2024-10-05 16:27 ` Hans de Goede
@ 2024-10-05 16:57 ` Antheas Kapenekakis
2024-10-05 21:47 ` Hans de Goede
2 siblings, 0 replies; 38+ messages in thread
From: Antheas Kapenekakis @ 2024-10-05 16:57 UTC (permalink / raw)
To: Hans de Goede
Cc: linux-pm, Rafael J. Wysocki, platform-driver-x86,
Mario Limonciello, luke, me, Denis Benato
On Sat, 5 Oct 2024 at 18:24, Hans de Goede <hdegoede@redhat.com> wrote:
>
> Hi Antheas,
>
> On 5-Oct-24 5:10 PM, Antheas Kapenekakis wrote:
> > Hi Hans,
> >
> >> Thank you for your work on this and thank you for the comprehensive write-up
> >> on how Windows does modern standby.
> >>
> >> First of all may I suggest that you take the above write-up, minus the ROG
> >> Ally specific bits and turn this into a new documentation file under
> >> Documentation/power ? And also document at which point Linux currently
> >> makes the various transitions.
> >
> > I will try to move some of that documentation there, this is a great idea.
> >
> >> And then in patches where you move the transitions, also update the docs
> >> on what Linux does to match.
> >>
> >> I have read the discussion about tying the display on/off calls to CRTC state
> >> and/or exposing a userspace knob for that. I think that this needs more
> >> discussion / design work.
> >
> > Yes, you are right. To become a knob this would require a much bigger
> > discussion. I would also like to move Sleep calls as part of that. The
> > Legion Go and OneXPlayer devices turn off their controllers as part of
> > that and other modern standby devices limit their power envelope
> > (perhaps the Legion Go too). I think the Sleep call is where most of
> > the userspace usability will come from. Display On/Off is a bit of a
> > NOOP on most devices.
> >
> > As for the LSB0 enter and exit, I do not know where the correct place
> > for those would be, and perhaps someone from Microsoft needs to be
> > consulted on that. The documentation is very vague. However it is
> > clear to me that they should be close to where they are right now, so
> > they very likely do not need to move.
> >
> > There is also the new _DSM intent to turn display on 9 call. Which
> > meshes with the sleep call. That call is made before Sleep Exit, if
> > the kernel knows that the wake-up will cause the display to turn on,
> > to boost the thermal envelope of the device and help it wake up
> > quicker. If the Sleep call is moved then we would also have to
> > introduce that somewhere to avoid wake-up time regressions on devices
> > that support it, which also raises the question of how would the
> > kernel decide if an interrupt will cause the display to turn on before
> > unfreezing userspace (bulk of resume) (or should it be done after
> > unfreezing?).
> >
> >> OTOH IMHO it would be good to take patches 1 - 3 . Certainly 1 + 2 would
> >> be good to have. 3 is a bit unfortunate and not necessary with the current
> >> special ROG Ally handling in the asus-wmi driver. It might be better to
> >> just keep the quirks there.
> >
> > From what I know Luke plans to remove that quirk ASAP due to new
> > firmware. I would keep it around until this patch series merges
> > personally and remove it as part of that.
>
> Ack I replied to Luke to say something like that.
>
> > As it will probably cause regressions if both are in place
>
> I don't see how having both this patch-sets + the asus-wmi
> quirks will cause regressions? The suspend delay will be done
> twice, but that is harmless.
Probably it will be harmless, but I think the Display On being done
twice, and one of the times being inside the suspend sequence might
result in sub-optimal behavior. Well, the behavior that exists now.
> > and require manual intervention if
> > either is not. I will also note that the quirk in asus-wmi calls the
> > Display On/Off calls a second time and during the suspend sequence,
> > which is not in any way proper.
>
> AFAICT asus-wmi does not call the display on / off callbacks instead
> it directly calls "\\_SB.PCI0.SBRG.EC0.CSEE" to control the power ?
Refer to [1]. CSEE with B7, B8 is in fact the _DSM 3,4 Display On/Off
internal call. There is also a spurious Lid switch call on Display On
that does not exist and causes dmesg errors.
Link: https://github.com/hhd-dev/hwinfo/blob/21b7442219922233c41c0568995214ba92873f69/devices/ally_x/decoded/ssdt28.dsl#L841-L855
[1]
> > So if future devices need this kind of
> > quirk, it really does not seem like a good idea to me to paper over
> > their problems by calling the notifications a second time in random
> > platform drivers. There is the question of where that quirk should be
> > placed, that is true, but I IMO it should be a pm problem.
> >
> > Perhaps not in the location where I put it though and perhaps it
> > should be done with LSB0 callbacks instead. Although, being done this
> > way allows for it to blend with the suspend sequence. Ideally, the
> > Display Off delay would be blended with userspace going down such that
> > if e.g., there is heavy userspace activity that requires ~2s to
> > freeze, the quirk would add no delay. Instead, it would only add delay
> > if userspace freezes quickly (less than .5s). Same can be said with
> > Sleep Entry and beginning prepare_late, which blocks the EC interrupts
> > (that would need a lot of investigation though).
> >
> > On that note, it seems to me that the Ally has 2 bugs related to the
> > _DSM calls 3 and 4. First bug is that Display On is gated on current
> > firmware and only works when the USB subsystem is powered on.
> > Allegedly, this is fixed on the upcoming firmware but it is not
> > something I have verified personally. I will verify it in 10 days or
> > so, if the new firmware does not fail QA I guess.
> >
> > However, there is a second bug with Display Off in _DSM 4. The
> > controller of the Ally needs time to power off, around 500ms.
> > Otherwise it gets its power clipped and/or does not power off
> > correctly. This causes the issues mentioned in the discussion and I
> > have no indication that this is fixed with newer controller firmware.
> > It is also my understanding that most of the testing of the new
> > firmware happened with the asus-wmi quirk in place, which papers over
> > that issue, so removing the quirk might be premature in any case.
>
> Ok.
>
> > We have currently released this patch series in Bazzite and I am happy
> > to report that it completely fixes all controller related issues in
> > the Ally devices and makes them behave exactly as they do in Windows,
> > regardless of firmware and bug for bug.
> >
> > So we will be keeping it around and extending it as appropriate to
> > include the Sleep calls. I am reminded multiple times per week that
> > the Ally has TDP suspend bugs, where if the user is playing a heavy
> > game, the EC of the device tends to get stuck at 6W and fail to
> > respond after waking the device. So moving calls 7, 8 is the natural
> > next step in this investigation. I already have a draft patch on
> > standby, that we plan to AB test soon.
> >
> >> IMHO it would be good to submit a v2 of just patches 1 - 3 run through
> >> checkpatch. Also the commit message of patch 3 should point to the existing
> >> quirk code in asus-wmi.c and mention that then is no longer necessary after
> >> patch 3, then we can discuss what is the best place for these quirks.
> >
> > I did run it through before sending the patch. However, some of the
> > warnings were a bit cryptic to me... I will run it again.
> >
> > I will add a note for asus-wmi on future patch series.
> >
> > First 3 patches of the series are designed to NOOP before patch 4. Did
> > you mean patch 3 (which adds the delay) instead of 4?
>
> Ah I misread the series and failed to notice that patch 4 actually hooks
> things up, I was under the impression that patch 4 hooks things up.
>
> But I did mean that patch 3 might lead to discussion not patch 4.
>
> Now that I have taken a better look some more detailed review comments:
>
> * Patches 1 and 2 should be squashed into a single patch IMHO.
>
> * Patch 3 adds the quirks to kernel/power/suspend.c but this
> really should be added to drivers/acpi/x86/s2idle.c and then do
> the sleep as part of the display_off callback.
>
> * Given my comment on patch 3 I would swap the order of patch 3 and 4
> and only add the quirks after moving the display off ACPI call to
> the new callback
>
> * Patch 4 does too much in a single patch, specifically besides
> actually implementing the new callbacks it also does s/SCREEN/DISPLAY
> on all the ACPI_LPS0_* defines. This renaming of the defines must
> be done in a separate patch.
All are fair comments. I will fix them on the next revision.
On the Patch 3 comment, do you think there is merit with blending the
quirk with userspace freezing? Moving it inside LPS0 would make that
difficult, however at the same time 500ms for just the Ally (and
perhaps 2-3 other affected devices) is not something particularly
noticeable anyway.
Antheas
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v2 0/5] acpi/x86: s2idle: move Display off/on calls outside suspend (fixes ROG Ally suspend)
2024-10-05 16:24 ` Hans de Goede
2024-10-05 16:27 ` Hans de Goede
2024-10-05 16:57 ` Antheas Kapenekakis
@ 2024-10-05 21:47 ` Hans de Goede
2024-10-05 22:15 ` Antheas Kapenekakis
2 siblings, 1 reply; 38+ messages in thread
From: Hans de Goede @ 2024-10-05 21:47 UTC (permalink / raw)
To: Antheas Kapenekakis
Cc: linux-pm, Rafael J. Wysocki, platform-driver-x86,
Mario Limonciello, luke, me, Denis Benato
Hi Antheas,
On 5-Oct-24 6:24 PM, Hans de Goede wrote:
> Hi Antheas,
>
> On 5-Oct-24 5:10 PM, Antheas Kapenekakis wrote:
>> Hi Hans,
>>
>>> Thank you for your work on this and thank you for the comprehensive write-up
>>> on how Windows does modern standby.
>>>
>>> First of all may I suggest that you take the above write-up, minus the ROG
>>> Ally specific bits and turn this into a new documentation file under
>>> Documentation/power ? And also document at which point Linux currently
>>> makes the various transitions.
>>
>> I will try to move some of that documentation there, this is a great idea.
>>
>>> And then in patches where you move the transitions, also update the docs
>>> on what Linux does to match.
>>>
>>> I have read the discussion about tying the display on/off calls to CRTC state
>>> and/or exposing a userspace knob for that. I think that this needs more
>>> discussion / design work.
>>
>> Yes, you are right. To become a knob this would require a much bigger
>> discussion. I would also like to move Sleep calls as part of that. The
>> Legion Go and OneXPlayer devices turn off their controllers as part of
>> that and other modern standby devices limit their power envelope
>> (perhaps the Legion Go too). I think the Sleep call is where most of
>> the userspace usability will come from. Display On/Off is a bit of a
>> NOOP on most devices.
>>
>> As for the LSB0 enter and exit, I do not know where the correct place
>> for those would be, and perhaps someone from Microsoft needs to be
>> consulted on that. The documentation is very vague. However it is
>> clear to me that they should be close to where they are right now, so
>> they very likely do not need to move.
>>
>> There is also the new _DSM intent to turn display on 9 call. Which
>> meshes with the sleep call. That call is made before Sleep Exit, if
>> the kernel knows that the wake-up will cause the display to turn on,
>> to boost the thermal envelope of the device and help it wake up
>> quicker. If the Sleep call is moved then we would also have to
>> introduce that somewhere to avoid wake-up time regressions on devices
>> that support it, which also raises the question of how would the
>> kernel decide if an interrupt will cause the display to turn on before
>> unfreezing userspace (bulk of resume) (or should it be done after
>> unfreezing?).
>>
>>> OTOH IMHO it would be good to take patches 1 - 3 . Certainly 1 + 2 would
>>> be good to have. 3 is a bit unfortunate and not necessary with the current
>>> special ROG Ally handling in the asus-wmi driver. It might be better to
>>> just keep the quirks there.
>>
>> From what I know Luke plans to remove that quirk ASAP due to new
>> firmware. I would keep it around until this patch series merges
>> personally and remove it as part of that.
>
> Ack I replied to Luke to say something like that.
>
>> As it will probably cause regressions if both are in place
>
> I don't see how having both this patch-sets + the asus-wmi
> quirks will cause regressions? The suspend delay will be done
> twice, but that is harmless.
>
>> and require manual intervention if
>> either is not. I will also note that the quirk in asus-wmi calls the
>> Display On/Off calls a second time and during the suspend sequence,
>> which is not in any way proper.
>
> AFAICT asus-wmi does not call the display on / off callbacks instead
> it directly calls "\\_SB.PCI0.SBRG.EC0.CSEE" to control the power ?
>
>> So if future devices need this kind of
>> quirk, it really does not seem like a good idea to me to paper over
>> their problems by calling the notifications a second time in random
>> platform drivers. There is the question of where that quirk should be
>> placed, that is true, but I IMO it should be a pm problem.
>>
>> Perhaps not in the location where I put it though and perhaps it
>> should be done with LSB0 callbacks instead. Although, being done this
>> way allows for it to blend with the suspend sequence. Ideally, the
>> Display Off delay would be blended with userspace going down such that
>> if e.g., there is heavy userspace activity that requires ~2s to
>> freeze, the quirk would add no delay. Instead, it would only add delay
>> if userspace freezes quickly (less than .5s). Same can be said with
>> Sleep Entry and beginning prepare_late, which blocks the EC interrupts
>> (that would need a lot of investigation though).
>>
>> On that note, it seems to me that the Ally has 2 bugs related to the
>> _DSM calls 3 and 4. First bug is that Display On is gated on current
>> firmware and only works when the USB subsystem is powered on.
>> Allegedly, this is fixed on the upcoming firmware but it is not
>> something I have verified personally. I will verify it in 10 days or
>> so, if the new firmware does not fail QA I guess.
>>
>> However, there is a second bug with Display Off in _DSM 4. The
>> controller of the Ally needs time to power off, around 500ms.
>> Otherwise it gets its power clipped and/or does not power off
>> correctly. This causes the issues mentioned in the discussion and I
>> have no indication that this is fixed with newer controller firmware.
>> It is also my understanding that most of the testing of the new
>> firmware happened with the asus-wmi quirk in place, which papers over
>> that issue, so removing the quirk might be premature in any case.
>
> Ok.
>
>> We have currently released this patch series in Bazzite and I am happy
>> to report that it completely fixes all controller related issues in
>> the Ally devices and makes them behave exactly as they do in Windows,
>> regardless of firmware and bug for bug.
>>
>> So we will be keeping it around and extending it as appropriate to
>> include the Sleep calls. I am reminded multiple times per week that
>> the Ally has TDP suspend bugs, where if the user is playing a heavy
>> game, the EC of the device tends to get stuck at 6W and fail to
>> respond after waking the device. So moving calls 7, 8 is the natural
>> next step in this investigation. I already have a draft patch on
>> standby, that we plan to AB test soon.
>>
>>> IMHO it would be good to submit a v2 of just patches 1 - 3 run through
>>> checkpatch. Also the commit message of patch 3 should point to the existing
>>> quirk code in asus-wmi.c and mention that then is no longer necessary after
>>> patch 3, then we can discuss what is the best place for these quirks.
>>
>> I did run it through before sending the patch. However, some of the
>> warnings were a bit cryptic to me... I will run it again.
>>
>> I will add a note for asus-wmi on future patch series.
>>
>> First 3 patches of the series are designed to NOOP before patch 4. Did
>> you mean patch 3 (which adds the delay) instead of 4?
>
> Ah I misread the series and failed to notice that patch 4 actually hooks
> things up, I was under the impression that patch 4 hooks things up.
>
> But I did mean that patch 3 might lead to discussion not patch 4.
>
> Now that I have taken a better look some more detailed review comments:
>
> * Patches 1 and 2 should be squashed into a single patch IMHO.
>
> * Patch 3 adds the quirks to kernel/power/suspend.c but this
> really should be added to drivers/acpi/x86/s2idle.c and then do
> the sleep as part of the display_off callback.
>
> * Given my comment on patch 3 I would swap the order of patch 3 and 4
> and only add the quirks after moving the display off ACPI call to
> the new callback
>
> * Patch 4 does too much in a single patch, specifically besides
> actually implementing the new callbacks it also does s/SCREEN/DISPLAY
> on all the ACPI_LPS0_* defines. This renaming of the defines must
> be done in a separate patch.
Thinking some more about this I am having second doubts about
moving the LPS0 display power off call to before devices are suspended,
doing so would mean that the display might still be on when that call
is made and that call could disable power-resources which are necessary
for the display causing issues when the display driver's suspend method
runs.
So I think that we need something closer to Mario's POC from:
https://git.kernel.org/pub/scm/linux/kernel/git/superm1/linux.git/log/?h=superm1/dsm-screen-on-off
here where the call is made when the last display is turned off.
IOW have the drm modesetting core call this.
Maybe have something like a enabled_displays counter in the
drm-core which gets increased / decreased by helpers and
have the drm-core call platform_suspend_screen_off() /
platform_suspend_screen_on() when the counter goes from 1 -> 0
resp. 0 -> 1, ignoring the very first 0 -> 1 transition
which will be done when the first GPU with an enabled
output is found ?
The idea being that the first increase() call gets made when
a drm/kms driver probes a display and finds outputs which are
light up during probe() and then further increase / decrease
calls are made either when all displays go off; or maybe
per crtc when the crtc gets enabled / disabled.
Anyways how best to do this at display off time should be
discussed with the drm/kms community on the dri-devel list.
Regards,
Hans
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v2 0/5] acpi/x86: s2idle: move Display off/on calls outside suspend (fixes ROG Ally suspend)
2024-10-05 21:47 ` Hans de Goede
@ 2024-10-05 22:15 ` Antheas Kapenekakis
2024-10-06 10:16 ` Hans de Goede
0 siblings, 1 reply; 38+ messages in thread
From: Antheas Kapenekakis @ 2024-10-05 22:15 UTC (permalink / raw)
To: Hans de Goede
Cc: linux-pm, Rafael J. Wysocki, platform-driver-x86,
Mario Limonciello, luke, me, Denis Benato
<skip>
> Thinking some more about this I am having second doubts about
> moving the LPS0 display power off call to before devices are suspended,
> doing so would mean that the display might still be on when that call
> is made and that call could disable power-resources which are necessary
> for the display causing issues when the display driver's suspend method
> runs.
Is there any device where that is used for display powersaving?
> So I think that we need something closer to Mario's POC from:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/superm1/linux.git/log/?h=superm1/dsm-screen-on-off
>
> here where the call is made when the last display is turned off.
>
> IOW have the drm modesetting core call this.
I can see two problems with this approach: 1) it would happen in a
random point in the suspend sequence, introducing race conditions with
sensitive modern standby devices (e.g., Ally). 2) It would not be
gated and debounced properly, so a drm driver could call it 5 times
when you e.g., plug in an HDMI cable.
And indeed that is the case, that PR horribly breaks the Ally even
while any asus-wmi quirk was active. Perhaps DRM can be consulted
though, see below.
> Maybe have something like a enabled_displays counter in the
> drm-core which gets increased / decreased by helpers and
> have the drm-core call platform_suspend_screen_off() /
> platform_suspend_screen_on() when the counter goes from 1 -> 0
> resp. 0 -> 1, ignoring the very first 0 -> 1 transition
> which will be done when the first GPU with an enabled
> output is found ?
To quote Microsoft [1], "This _DSM Function will be invoked when the
operating system has entered a state where all displays—local and
*remote*, if any—have been turned off."
Since it says remote, binding it to DRM could prove difficult. In
addition, the call in Windows is made 5 seconds after the displays
turn off due to inactivity. To mirror this behavior you would need
userspace.
If there is strong indication that the Display On/Off calls interfere
with the DRM subsystem, e.g., turn off a GPU in certain laptops, the
call could be gated with a counter similar to Mario's PR and error
out. In that way, it is still controllable by userspace while ensuring
the device display is off. Is there such an indication/do we know of
such a device?
The Ally and Legion Go which I tested happily had their display turn
on after I yanked their display on and sleep exit callbacks. The
Legion Go even had its suspend light blink while the screen was on.
And both had disabled controllers. This behavior was sticky even after
a reboot. I suppose this is due to the fact that the device might
hibernate, so the EC would have to remember the last state before
power off.
> The idea being that the first increase() call gets made when
> a drm/kms driver probes a display and finds outputs which are
> light up during probe() and then further increase / decrease
> calls are made either when all displays go off; or maybe
> per crtc when the crtc gets enabled / disabled.
>
> Anyways how best to do this at display off time should be
> discussed with the drm/kms community on the dri-devel list.
I can cc on the next version.
Link: https://learn.microsoft.com/en-us/windows-hardware/design/device-experiences/modern-standby-firmware-notifications#display-off-notification-function-3
[1]
Antheas
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v2 0/5] acpi/x86: s2idle: move Display off/on calls outside suspend (fixes ROG Ally suspend)
2024-10-05 22:15 ` Antheas Kapenekakis
@ 2024-10-06 10:16 ` Hans de Goede
2024-10-06 11:29 ` Antheas Kapenekakis
0 siblings, 1 reply; 38+ messages in thread
From: Hans de Goede @ 2024-10-06 10:16 UTC (permalink / raw)
To: Antheas Kapenekakis
Cc: linux-pm, Rafael J. Wysocki, platform-driver-x86,
Mario Limonciello, luke, me, Denis Benato
Hi,
On 6-Oct-24 12:15 AM, Antheas Kapenekakis wrote:
> <skip>
>
>> Thinking some more about this I am having second doubts about
>> moving the LPS0 display power off call to before devices are suspended,
>> doing so would mean that the display might still be on when that call
>> is made and that call could disable power-resources which are necessary
>> for the display causing issues when the display driver's suspend method
>> runs.
>
> Is there any device where that is used for display powersaving?
The problem is that we cannot rule out that the LPS0 display off
call relies on the displays actually being off.
I have seen ACPI AML code do all sort of crazy stuff.
So IMHO we really need to make sure that all physical displays
are off before we make the LPS0 display off call.
I have read what you wrote about this also applying to virtual
displays, I guess that means that there should be no rendering done
(so also no GPU non display tasks) when this is called.
IOW it might be best to tie this to all VGA class PCI devices
being in D3 as Mario suggested.
Regards,
Hans
>> So I think that we need something closer to Mario's POC from:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/superm1/linux.git/log/?h=superm1/dsm-screen-on-off
>>
>> here where the call is made when the last display is turned off.
>>
>> IOW have the drm modesetting core call this.
>
> I can see two problems with this approach: 1) it would happen in a
> random point in the suspend sequence, introducing race conditions with
> sensitive modern standby devices (e.g., Ally). 2) It would not be
> gated and debounced properly, so a drm driver could call it 5 times
> when you e.g., plug in an HDMI cable.
>
> And indeed that is the case, that PR horribly breaks the Ally even
> while any asus-wmi quirk was active. Perhaps DRM can be consulted
> though, see below.
>
>> Maybe have something like a enabled_displays counter in the
>> drm-core which gets increased / decreased by helpers and
>> have the drm-core call platform_suspend_screen_off() /
>> platform_suspend_screen_on() when the counter goes from 1 -> 0
>> resp. 0 -> 1, ignoring the very first 0 -> 1 transition
>> which will be done when the first GPU with an enabled
>> output is found ?
>
> To quote Microsoft [1], "This _DSM Function will be invoked when the
> operating system has entered a state where all displays—local and
> *remote*, if any—have been turned off."
>
> Since it says remote, binding it to DRM could prove difficult. In
> addition, the call in Windows is made 5 seconds after the displays
> turn off due to inactivity. To mirror this behavior you would need
> userspace.
>
> If there is strong indication that the Display On/Off calls interfere
> with the DRM subsystem, e.g., turn off a GPU in certain laptops, the
> call could be gated with a counter similar to Mario's PR and error
> out. In that way, it is still controllable by userspace while ensuring
> the device display is off. Is there such an indication/do we know of
> such a device?
>
> The Ally and Legion Go which I tested happily had their display turn
> on after I yanked their display on and sleep exit callbacks. The
> Legion Go even had its suspend light blink while the screen was on.
> And both had disabled controllers. This behavior was sticky even after
> a reboot. I suppose this is due to the fact that the device might
> hibernate, so the EC would have to remember the last state before
> power off.
>
>> The idea being that the first increase() call gets made when
>> a drm/kms driver probes a display and finds outputs which are
>> light up during probe() and then further increase / decrease
>> calls are made either when all displays go off; or maybe
>> per crtc when the crtc gets enabled / disabled.
>>
>> Anyways how best to do this at display off time should be
>> discussed with the drm/kms community on the dri-devel list.
>
> I can cc on the next version.
>
> Link: https://learn.microsoft.com/en-us/windows-hardware/design/device-experiences/modern-standby-firmware-notifications#display-off-notification-function-3
> [1]
>
> Antheas
>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v2 0/5] acpi/x86: s2idle: move Display off/on calls outside suspend (fixes ROG Ally suspend)
2024-10-06 10:16 ` Hans de Goede
@ 2024-10-06 11:29 ` Antheas Kapenekakis
0 siblings, 0 replies; 38+ messages in thread
From: Antheas Kapenekakis @ 2024-10-06 11:29 UTC (permalink / raw)
To: Hans de Goede
Cc: linux-pm, Rafael J. Wysocki, platform-driver-x86,
Mario Limonciello, luke, me, Denis Benato
Hi Hans,
there is no rush from my end to see this series merge. The current
asus-wmi quirk works well for the Ally, all firmwares. New firmware is
also supposed to fix powersaving with it.
Yes, that quirk is suboptimal as it adds a noticeable delay to suspend
and resume and blocks powersaving from working correctly in old
firmwares. However, as most Ally users are on custom kernels anyway,
and this patch series can be merged quite easily into them, broadly
speaking it is a non-issue if the mainline kernel has ideal behavior
on the Ally for the next few months. From user feedback, they did not
notice changing to this patch anyway, other than the powersaving
benefit ("did you change anything? my ally just uses less power now").
Next revision I will cc dri so we get feedback from there too.
> >> Thinking some more about this I am having second doubts about
> >> moving the LPS0 display power off call to before devices are suspended,
> >> doing so would mean that the display might still be on when that call
> >> is made and that call could disable power-resources which are necessary
> >> for the display causing issues when the display driver's suspend method
> >> runs.
> >
> > Is there any device where that is used for display powersaving?
>
> The problem is that we cannot rule out that the LPS0 display off
> call relies on the displays actually being off.
>
> I have seen ACPI AML code do all sort of crazy stuff.
Indeed, as this series shows (for other reasons).
> So IMHO we really need to make sure that all physical displays
> are off before we make the LPS0 display off call.
In my use-case I'd like to be able to fire the display off
notification prior to turning off the screen or turn on the screen
after both Display Off and Sleep Entry have fired to be able to show a
dim lockscreen if the user briefly interacts with the device. I will
be doing this downstream for a limited set of prevalidated devices
though.
Therefore, before we block this behavior in a non-certain manner (as
it will be up to each gpu driver to do it), there needs to be some
documentation showing there is an issue, certain manufacturers rely on
the behavior, or that Microsoft has guaranteed that this is the case
in Windows. Even with any of the former, a blacklist quirk for those
manufacturers can be put in place in their GPU driver that blocks the
Display Off _DSM from firing and automatically calls the Display On
_DSM if the GPU wants to wake up the display.
> I have read what you wrote about this also applying to virtual
> displays, I guess that means that there should be no rendering done
> (so also no GPU non display tasks) when this is called.
>
> IOW it might be best to tie this to all VGA class PCI devices
> being in D3 as Mario suggested.
GPU can do stuff while the screen is off: render videos, hold browser
videos in main memory, have a game in the background, etc. If by D3
you mean the whole GPU has powered off, this would be a deviation from
Windows.
Antheas
^ permalink raw reply [flat|nested] 38+ messages in thread