Re: S3 resume regression [1cf4f629d9d2 ("cpu/hotplug: Move online calls to hotplugged cpu")]

linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: S3 resume regression [1cf4f629d9d2 ("cpu/hotplug: Move online calls to hotplugged cpu")]
       [not found]   ` <20160511122116.GA4329@intel.com>
@ 2016-05-11 13:36     ` Rafael J. Wysocki
  2016-05-11 15:25       ` Jim Bos
       [not found]     ` <20160511084445.00030b49@gandalf.local.home>
  1 sibling, 1 reply; 28+ messages in thread
From: Rafael J. Wysocki @ 2016-05-11 13:36 UTC (permalink / raw)
  To: Ville Syrjälä, Sebastian Andrzej Siewior
  Cc: Thomas Gleixner, linux-arch, Rik van Riel, Srivatsa S. Bhat,
	Peter Zijlstra, Arjan van de Ven, Rusty Russell, Steven Rostedt,
	Oleg Nesterov, Tejun Heo, Andrew Morton, Paul McKenney,
	Linus Torvalds, Paul Turner, linux-kernel, rui.zhang, len.brown,
	Linux PM

On 5/11/2016 2:21 PM, Ville Syrjälä wrote:
> On Wed, May 11, 2016 at 02:11:29PM +0200, Sebastian Andrzej Siewior wrote:
>> On 05/11/2016 12:19 PM, Ville Syrjälä wrote:
>>> Hi,
>> Hi,
>>
>>> I have a Lenovo Ideapad S10-3t machine here (Atom N450, 1 core, 2 HT)
>>> which fails to resume from S3 on 4.6-rc releases. I bisected it down to
>>>
>>> commit 1cf4f629d9d246519a1e76c021806f2a51ddba4d
>>> Author: Thomas Gleixner <tglx@linutronix.de>
>>> Date:   Fri Feb 26 18:43:39 2016 +0000
>>>
>>>      cpu/hotplug: Move online calls to hotplugged cpu
>>>
>>> Unfortunately that won't revert cleanly, and neither does the merge
>>> commit, so I was unable to see if that is the only problematic commit
>>> in 4.6.
>>>
>>> Any ideas?
>> do you have a backtrace or anything or is it just not working and you
>> end up with a blank screen?
> Yeah can't get anything from the machine at that point. netconsole
> didn't help either, and no serial on this machine. And IIRC I've
> tried ramoops on this thing in the past but unfortunately the memory
> got cleared on reboot.
>

Please try

# echo processors > /sys/power/pm_test

and then suspend (it should simulate a suspend, wait for approx. 5 sec 
and then resume, see
Documentation/power/basic_pm_debugging.txt for details).  See if that 
works or if you can get any
traces etc.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: S3 resume regression [1cf4f629d9d2 ("cpu/hotplug: Move online calls to hotplugged cpu")]
  2016-05-11 13:36     ` S3 resume regression [1cf4f629d9d2 ("cpu/hotplug: Move online calls to hotplugged cpu")] Rafael J. Wysocki
@ 2016-05-11 15:25       ` Jim Bos
  2016-05-11 16:19         ` Rafael J. Wysocki
  0 siblings, 1 reply; 28+ messages in thread
From: Jim Bos @ 2016-05-11 15:25 UTC (permalink / raw)
  To: Rafael J. Wysocki, Ville Syrjälä,
	Sebastian Andrzej Siewior
  Cc: Thomas Gleixner, linux-arch, Rik van Riel, Srivatsa S. Bhat,
	Peter Zijlstra, Arjan van de Ven, Rusty Russell, Steven Rostedt,
	Oleg Nesterov, Tejun Heo, Andrew Morton, Paul McKenney,
	Linus Torvalds, Paul Turner, linux-kernel, rui.zhang, len.brown,
	Linux PM

[-- Attachment #1: Type: text/plain, Size: 1973 bytes --]

On 05/11/2016 03:36 PM, Rafael J. Wysocki wrote:
> On 5/11/2016 2:21 PM, Ville Syrjälä wrote:
>> On Wed, May 11, 2016 at 02:11:29PM +0200, Sebastian Andrzej Siewior
>> wrote:
>>> On 05/11/2016 12:19 PM, Ville Syrjälä wrote:
>>>> Hi,
>>> Hi,
>>>
>>>> I have a Lenovo Ideapad S10-3t machine here (Atom N450, 1 core, 2 HT)
>>>> which fails to resume from S3 on 4.6-rc releases. I bisected it down to
>>>>
>>>> commit 1cf4f629d9d246519a1e76c021806f2a51ddba4d
>>>> Author: Thomas Gleixner <tglx@linutronix.de>
>>>> Date:   Fri Feb 26 18:43:39 2016 +0000
>>>>
>>>>      cpu/hotplug: Move online calls to hotplugged cpu
>>>>
>>>> Unfortunately that won't revert cleanly, and neither does the merge
>>>> commit, so I was unable to see if that is the only problematic commit
>>>> in 4.6.
>>>>
>>>> Any ideas?
>>> do you have a backtrace or anything or is it just not working and you
>>> end up with a blank screen?
>> Yeah can't get anything from the machine at that point. netconsole
>> didn't help either, and no serial on this machine. And IIRC I've
>> tried ramoops on this thing in the past but unfortunately the memory
>> got cleared on reboot.
>>
> 
> Please try
> 
> # echo processors > /sys/power/pm_test
> 
> and then suspend (it should simulate a suspend, wait for approx. 5 sec
> and then resume, see
> Documentation/power/basic_pm_debugging.txt for details).  See if that
> works or if you can get any
> traces etc.
> 
> Thanks,
> Rafael
> 


Hmm, I thought I had some resume issue but ignored that, so I just tried
again.
On 4.6.0-rc1 all is fine but on 4.6.0-rc7 on resume the machine locks up
totally. No response on ping or sysrq-B, only hard reset works.

Tried this 'echo processors > /sys/power/pm_test' follow by pm-suspend

and did find a lot of ACPI errors (attached) in the log which are
definitely not present after normal boot.

This is on a Intel(R) Pentium(R) CPU G3220 @ 3.00GHz

So not sure if this is same issue but just wanted to mention it.

_
Jim


[-- Attachment #2: dmesg.gz --]
[-- Type: application/gzip, Size: 3484 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: S3 resume regression [1cf4f629d9d2 ("cpu/hotplug: Move online calls to hotplugged cpu")]
  2016-05-11 15:25       ` Jim Bos
@ 2016-05-11 16:19         ` Rafael J. Wysocki
  2016-05-11 16:21           ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 28+ messages in thread
From: Rafael J. Wysocki @ 2016-05-11 16:19 UTC (permalink / raw)
  To: Jim Bos
  Cc: Rafael J. Wysocki, Ville Syrjälä,
	Sebastian Andrzej Siewior, Thomas Gleixner, linux-arch,
	Rik van Riel, Srivatsa S. Bhat, Peter Zijlstra, Arjan van de Ven,
	Rusty Russell, Steven Rostedt, Oleg Nesterov, Tejun Heo,
	Andrew Morton, Paul McKenney, Linus Torvalds, Paul Turner,
	Linux Kernel Mailing List, Zhang, Rui, Len Brown, Linux PM

On Wed, May 11, 2016 at 5:25 PM, Jim Bos <jim876@xs4all.nl> wrote:
> On 05/11/2016 03:36 PM, Rafael J. Wysocki wrote:
>> On 5/11/2016 2:21 PM, Ville Syrjälä wrote:
>>> On Wed, May 11, 2016 at 02:11:29PM +0200, Sebastian Andrzej Siewior
>>> wrote:
>>>> On 05/11/2016 12:19 PM, Ville Syrjälä wrote:
>>>>> Hi,
>>>> Hi,
>>>>
>>>>> I have a Lenovo Ideapad S10-3t machine here (Atom N450, 1 core, 2 HT)
>>>>> which fails to resume from S3 on 4.6-rc releases. I bisected it down to
>>>>>
>>>>> commit 1cf4f629d9d246519a1e76c021806f2a51ddba4d
>>>>> Author: Thomas Gleixner <tglx@linutronix.de>
>>>>> Date:   Fri Feb 26 18:43:39 2016 +0000
>>>>>
>>>>>      cpu/hotplug: Move online calls to hotplugged cpu
>>>>>
>>>>> Unfortunately that won't revert cleanly, and neither does the merge
>>>>> commit, so I was unable to see if that is the only problematic commit
>>>>> in 4.6.
>>>>>
>>>>> Any ideas?
>>>> do you have a backtrace or anything or is it just not working and you
>>>> end up with a blank screen?
>>> Yeah can't get anything from the machine at that point. netconsole
>>> didn't help either, and no serial on this machine. And IIRC I've
>>> tried ramoops on this thing in the past but unfortunately the memory
>>> got cleared on reboot.
>>>
>>
>> Please try
>>
>> # echo processors > /sys/power/pm_test
>>
>> and then suspend (it should simulate a suspend, wait for approx. 5 sec
>> and then resume, see
>> Documentation/power/basic_pm_debugging.txt for details).  See if that
>> works or if you can get any
>> traces etc.
>>
>> Thanks,
>> Rafael
>>
>
>
> Hmm, I thought I had some resume issue but ignored that, so I just tried
> again.
> On 4.6.0-rc1 all is fine but on 4.6.0-rc7 on resume the machine locks up
> totally. No response on ping or sysrq-B, only hard reset works.
>
> Tried this 'echo processors > /sys/power/pm_test' follow by pm-suspend
>
> and did find a lot of ACPI errors (attached) in the log which are
> definitely not present after normal boot.
>
> This is on a Intel(R) Pentium(R) CPU G3220 @ 3.00GHz
>
> So not sure if this is same issue but just wanted to mention it.

If the problem is reproducible, you should be able to identify the
commit that broke things for you.

Have you tried to check if this is the same commit reported in this thread?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: S3 resume regression [1cf4f629d9d2 ("cpu/hotplug: Move online calls to hotplugged cpu")]
  2016-05-11 16:19         ` Rafael J. Wysocki
@ 2016-05-11 16:21           ` Sebastian Andrzej Siewior
  2016-05-11 16:24             ` Rafael J. Wysocki
  0 siblings, 1 reply; 28+ messages in thread
From: Sebastian Andrzej Siewior @ 2016-05-11 16:21 UTC (permalink / raw)
  To: Rafael J. Wysocki, Jim Bos
  Cc: Rafael J. Wysocki, Ville Syrjälä, Thomas Gleixner,
	linux-arch, Rik van Riel, Srivatsa S. Bhat, Peter Zijlstra,
	Arjan van de Ven, Rusty Russell, Steven Rostedt, Oleg Nesterov,
	Tejun Heo, Andrew Morton, Paul McKenney, Linus Torvalds,
	Paul Turner, Linux Kernel Mailing List, Zhang, Rui, Len Brown,
	Linux PM

On 05/11/2016 06:19 PM, Rafael J. Wysocki wrote:
> Have you tried to check if this is the same commit reported in this thread?

The commit in this thread is part of v4.6-rc1 and he is saying that rc1
is working fine.

Sebastian

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: S3 resume regression [1cf4f629d9d2 ("cpu/hotplug: Move online calls to hotplugged cpu")]
  2016-05-11 16:21           ` Sebastian Andrzej Siewior
@ 2016-05-11 16:24             ` Rafael J. Wysocki
  0 siblings, 0 replies; 28+ messages in thread
From: Rafael J. Wysocki @ 2016-05-11 16:24 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, Jim Bos
  Cc: Rafael J. Wysocki, Rafael J. Wysocki, Ville Syrjälä,
	Thomas Gleixner, linux-arch, Rik van Riel, Srivatsa S. Bhat,
	Peter Zijlstra, Arjan van de Ven, Rusty Russell, Steven Rostedt,
	Oleg Nesterov, Tejun Heo, Andrew Morton, Paul McKenney,
	Linus Torvalds, Paul Turner, Linux Kernel Mailing List,
	Zhang, Rui, Len Brown, Linux PM

On Wed, May 11, 2016 at 6:21 PM, Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
> On 05/11/2016 06:19 PM, Rafael J. Wysocki wrote:
>> Have you tried to check if this is the same commit reported in this thread?
>
> The commit in this thread is part of v4.6-rc1 and he is saying that rc1
> is working fine.

I see.  That is a different problem then.

Jim, can you please start a new thread (with a CC to linux-pm) or just
file a bug at bugzilla.kernel.org to avoid confusing things?

^ permalink raw reply	[flat|nested] 28+ messages in thread

[parent not found: <20160511084445.00030b49@gandalf.local.home>]

[parent not found: <20160511133406.GC4329@intel.com>]

[parent not found: <20160516193910.GL4329@intel.com>]

* Re: S3 resume regression [1cf4f629d9d2 ("cpu/hotplug: Move online calls to hotplugged cpu")]
       [not found]         ` <20160516193910.GL4329@intel.com>
@ 2016-05-17 23:14           ` Rafael J. Wysocki
  2016-05-18  7:24             ` Ville Syrjälä
  0 siblings, 1 reply; 28+ messages in thread
From: Rafael J. Wysocki @ 2016-05-17 23:14 UTC (permalink / raw)
  To: Ville Syrjälä
  Cc: Steven Rostedt, Sebastian Andrzej Siewior, Thomas Gleixner,
	linux-arch, Rik van Riel, Srivatsa S. Bhat, Peter Zijlstra,
	Arjan van de Ven, Rusty Russell, Oleg Nesterov, Tejun Heo,
	Andrew Morton, Paul McKenney, Linus Torvalds, Paul Turner,
	linux-kernel, rui.zhang, len.brown, Linux PM, Linux ACPI

On 5/16/2016 9:39 PM, Ville Syrjälä wrote:
> On Wed, May 11, 2016 at 04:34:06PM +0300, Ville Syrjälä wrote:
>> On Wed, May 11, 2016 at 08:44:45AM -0400, Steven Rostedt wrote:
>>> On Wed, 11 May 2016 15:21:16 +0300
>>> Ville Syrjälä <ville.syrjala@linux.intel.com> wrote:
>>>
>>>> Yeah can't get anything from the machine at that point. netconsole
>>>> didn't help either, and no serial on this machine. And IIRC I've
>>>> tried ramoops on this thing in the past but unfortunately the memory
>>>> got cleared on reboot.
>>>>
>>> Can you look at the documentation in the kernel code at
>>>
>>> Documentation/power/basic-pm-debugging.txt And follow the procedures
>>> for testing suspend to RAM (although it requires mostly running the
>>> same tests as for hibernation suspending).
>>>
>>> You can also use the tool s2ram for this as well.
>>>
>>> See Documentation/power/s2ram.txt
>>>
>>> Perhaps this can give us a bit more light onto the problem.
>>>
>>> Basically the above does partial suspend and resume, and can pinpoint
>>> problem areas down to a more select location.
>> All the pm_test modes work fine. The only difference between them was
>> that 'platform' required me to manually wake up the machine (hitting a
>> key was sufficient), whereas the others woke up without help.
>>
>> pm_trace gave me
>> [    1.306633]   Magic number: 0:185:178
>> [    1.322880]   hash matches ../drivers/base/power/main.c:1070
>> [    1.339270] acpi device:0e: hash matches
>> [    1.355414]  platform: hash matches
>>
>> which is the TRACE_SUSPEND in __device_suspend_noirq(), so no help
>> there.
>>
>> I guess I could try to sprinkle more TRACE_RESUMEs around into some
>> early resume code. If anyone has good ideas where to put them it
>> might speed things up a bit.
> So I did a bunch of that and found that it gets stuck somewhere
> around executing the _WAK method:
> platform_resume_noirq
>   acpi_pm_finish
>    acpi_leave_sleep_state
>     acpi_hw_sleep_dispatch
>      acpi_hw_legacy_wake
>       acpi_hw_execute_sleep_method
>        acpi_evaluate_object
>         acpi_ns_evaluate
>          acpi_ps_execute_method
>           acpi_ps_parse_aml
>
> It also seesm that adding a few TRACE_RESUME()s or an msleep() right
> after enable_nonboot_cpus() can avoid the hang, sometimes.
>
> I've attached the DSDT in case anyone is interested in looking at it.
>

What if you comment out the execution of _WAK (line 318 of 
drivers/acpi/acpica/hwsleep.c in 4.6)?  Does that make any difference?


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: S3 resume regression [1cf4f629d9d2 ("cpu/hotplug: Move online calls to hotplugged cpu")]
  2016-05-17 23:14           ` Rafael J. Wysocki
@ 2016-05-18  7:24             ` Ville Syrjälä
  2016-05-26 18:32               ` Ville Syrjälä
  0 siblings, 1 reply; 28+ messages in thread
From: Ville Syrjälä @ 2016-05-18  7:24 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Steven Rostedt, Sebastian Andrzej Siewior, Thomas Gleixner,
	linux-arch, Rik van Riel, Srivatsa S. Bhat, Peter Zijlstra,
	Arjan van de Ven, Rusty Russell, Oleg Nesterov, Tejun Heo,
	Andrew Morton, Paul McKenney, Linus Torvalds, Paul Turner,
	linux-kernel, rui.zhang, len.brown, Linux PM, Linux ACPI

On Wed, May 18, 2016 at 01:14:42AM +0200, Rafael J. Wysocki wrote:
> On 5/16/2016 9:39 PM, Ville Syrjälä wrote:
> > On Wed, May 11, 2016 at 04:34:06PM +0300, Ville Syrjälä wrote:
> >> On Wed, May 11, 2016 at 08:44:45AM -0400, Steven Rostedt wrote:
> >>> On Wed, 11 May 2016 15:21:16 +0300
> >>> Ville Syrjälä <ville.syrjala@linux.intel.com> wrote:
> >>>
> >>>> Yeah can't get anything from the machine at that point. netconsole
> >>>> didn't help either, and no serial on this machine. And IIRC I've
> >>>> tried ramoops on this thing in the past but unfortunately the memory
> >>>> got cleared on reboot.
> >>>>
> >>> Can you look at the documentation in the kernel code at
> >>>
> >>> Documentation/power/basic-pm-debugging.txt And follow the procedures
> >>> for testing suspend to RAM (although it requires mostly running the
> >>> same tests as for hibernation suspending).
> >>>
> >>> You can also use the tool s2ram for this as well.
> >>>
> >>> See Documentation/power/s2ram.txt
> >>>
> >>> Perhaps this can give us a bit more light onto the problem.
> >>>
> >>> Basically the above does partial suspend and resume, and can pinpoint
> >>> problem areas down to a more select location.
> >> All the pm_test modes work fine. The only difference between them was
> >> that 'platform' required me to manually wake up the machine (hitting a
> >> key was sufficient), whereas the others woke up without help.
> >>
> >> pm_trace gave me
> >> [    1.306633]   Magic number: 0:185:178
> >> [    1.322880]   hash matches ../drivers/base/power/main.c:1070
> >> [    1.339270] acpi device:0e: hash matches
> >> [    1.355414]  platform: hash matches
> >>
> >> which is the TRACE_SUSPEND in __device_suspend_noirq(), so no help
> >> there.
> >>
> >> I guess I could try to sprinkle more TRACE_RESUMEs around into some
> >> early resume code. If anyone has good ideas where to put them it
> >> might speed things up a bit.
> > So I did a bunch of that and found that it gets stuck somewhere
> > around executing the _WAK method:
> > platform_resume_noirq
> >   acpi_pm_finish
> >    acpi_leave_sleep_state
> >     acpi_hw_sleep_dispatch
> >      acpi_hw_legacy_wake
> >       acpi_hw_execute_sleep_method
> >        acpi_evaluate_object
> >         acpi_ns_evaluate
> >          acpi_ps_execute_method
> >           acpi_ps_parse_aml
> >
> > It also seesm that adding a few TRACE_RESUME()s or an msleep() right
> > after enable_nonboot_cpus() can avoid the hang, sometimes.
> >
> > I've attached the DSDT in case anyone is interested in looking at it.
> >
> 
> What if you comment out the execution of _WAK (line 318 of 
> drivers/acpi/acpica/hwsleep.c in 4.6)?  Does that make any difference?

Indeed it does. Tried with acpi_idle and intel_idle, and both appear to
resume just fine with that hack.

-       acpi_hw_execute_sleep_method(METHOD_PATHNAME__WAK, sleep_state);
+       //acpi_hw_execute_sleep_method(METHOD_PATHNAME__WAK, sleep_state);
+       printk(KERN_CRIT "skipping _WAK\n");

-- 
Ville Syrjälä
Intel OTC

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: S3 resume regression [1cf4f629d9d2 ("cpu/hotplug: Move online calls to hotplugged cpu")]
  2016-05-18  7:24             ` Ville Syrjälä
@ 2016-05-26 18:32               ` Ville Syrjälä
  2016-05-30 20:43                 ` Rafael J. Wysocki
  0 siblings, 1 reply; 28+ messages in thread
From: Ville Syrjälä @ 2016-05-26 18:32 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Steven Rostedt, Sebastian Andrzej Siewior, Thomas Gleixner,
	linux-arch, Rik van Riel, Srivatsa S. Bhat, Peter Zijlstra,
	Arjan van de Ven, Rusty Russell, Oleg Nesterov, Tejun Heo,
	Andrew Morton, Paul McKenney, Linus Torvalds, Paul Turner,
	linux-kernel, rui.zhang, len.brown, Linux PM, Linux ACPI

On Wed, May 18, 2016 at 10:24:24AM +0300, Ville Syrjälä wrote:
> On Wed, May 18, 2016 at 01:14:42AM +0200, Rafael J. Wysocki wrote:
> > On 5/16/2016 9:39 PM, Ville Syrjälä wrote:
> > > On Wed, May 11, 2016 at 04:34:06PM +0300, Ville Syrjälä wrote:
> > >> On Wed, May 11, 2016 at 08:44:45AM -0400, Steven Rostedt wrote:
> > >>> On Wed, 11 May 2016 15:21:16 +0300
> > >>> Ville Syrjälä <ville.syrjala@linux.intel.com> wrote:
> > >>>
> > >>>> Yeah can't get anything from the machine at that point. netconsole
> > >>>> didn't help either, and no serial on this machine. And IIRC I've
> > >>>> tried ramoops on this thing in the past but unfortunately the memory
> > >>>> got cleared on reboot.
> > >>>>
> > >>> Can you look at the documentation in the kernel code at
> > >>>
> > >>> Documentation/power/basic-pm-debugging.txt And follow the procedures
> > >>> for testing suspend to RAM (although it requires mostly running the
> > >>> same tests as for hibernation suspending).
> > >>>
> > >>> You can also use the tool s2ram for this as well.
> > >>>
> > >>> See Documentation/power/s2ram.txt
> > >>>
> > >>> Perhaps this can give us a bit more light onto the problem.
> > >>>
> > >>> Basically the above does partial suspend and resume, and can pinpoint
> > >>> problem areas down to a more select location.
> > >> All the pm_test modes work fine. The only difference between them was
> > >> that 'platform' required me to manually wake up the machine (hitting a
> > >> key was sufficient), whereas the others woke up without help.
> > >>
> > >> pm_trace gave me
> > >> [    1.306633]   Magic number: 0:185:178
> > >> [    1.322880]   hash matches ../drivers/base/power/main.c:1070
> > >> [    1.339270] acpi device:0e: hash matches
> > >> [    1.355414]  platform: hash matches
> > >>
> > >> which is the TRACE_SUSPEND in __device_suspend_noirq(), so no help
> > >> there.
> > >>
> > >> I guess I could try to sprinkle more TRACE_RESUMEs around into some
> > >> early resume code. If anyone has good ideas where to put them it
> > >> might speed things up a bit.
> > > So I did a bunch of that and found that it gets stuck somewhere
> > > around executing the _WAK method:
> > > platform_resume_noirq
> > >   acpi_pm_finish
> > >    acpi_leave_sleep_state
> > >     acpi_hw_sleep_dispatch
> > >      acpi_hw_legacy_wake
> > >       acpi_hw_execute_sleep_method
> > >        acpi_evaluate_object
> > >         acpi_ns_evaluate
> > >          acpi_ps_execute_method
> > >           acpi_ps_parse_aml
> > >
> > > It also seesm that adding a few TRACE_RESUME()s or an msleep() right
> > > after enable_nonboot_cpus() can avoid the hang, sometimes.
> > >
> > > I've attached the DSDT in case anyone is interested in looking at it.
> > >
> > 
> > What if you comment out the execution of _WAK (line 318 of 
> > drivers/acpi/acpica/hwsleep.c in 4.6)?  Does that make any difference?
> 
> Indeed it does. Tried with acpi_idle and intel_idle, and both appear to
> resume just fine with that hack.
> 
> -       acpi_hw_execute_sleep_method(METHOD_PATHNAME__WAK, sleep_state);
> +       //acpi_hw_execute_sleep_method(METHOD_PATHNAME__WAK, sleep_state);
> +       printk(KERN_CRIT "skipping _WAK\n");

Continuing with my detective work a bit, I decided to hack the DSDT a
bit to see if I can narrow the it down further, and looks like I found
it on the first guess. The following change stops it from hanging.

@ -5056,7 +5056,7 @@
         If (LEqual (Arg0, 0x03))
         {
             Store (0x01, \SPNF)
-	    TRAP (0x46)
+	    //TRAP (0x46)
             P8XH (0x00, 0x03)
         }

So what does that do? Let's see:

    OperationRegion (IO_T, SystemIO, 0x0800, 0x10)
    Field (IO_T, ByteAcc, NoLock, Preserve)
    {
        Offset (0x08), 
        TRP0,   8
    }

    OperationRegion (GNVS, SystemMemory, 0x3F5E0C7C, 0x0200)
    Field (GNVS, AnyAcc, Lock, Preserve)
    {
        OSYS,   16, 
        SMIF,   8,
    ...

    Method (TRAP, 1, Serialized)
    {
        Store (Arg0, SMIF) /* \SMIF */
        Store (0x00, TRP0) /* \TRP0 */
        Return (SMIF) /* \SMIF */
    }

and a dump of the IOTR registers shows:

0x1e80: 0x0000fe01
0x1e84: 0x00020001
0x1e98: 0x000c0801
0x1e9c: 0x000200f0

which seems to be telling me that ports 0x800-0x80f and 
0xfe00-0xfe03 would trigger an SMI.

So the next question is how do the idle drivers and cpu hotplug
fit into this picture. Do we need to force the second HT into
a specific C state before the SMI or something?

-- 
Ville Syrjälä
Intel OTC

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: S3 resume regression [1cf4f629d9d2 ("cpu/hotplug: Move online calls to hotplugged cpu")]
  2016-05-26 18:32               ` Ville Syrjälä
@ 2016-05-30 20:43                 ` Rafael J. Wysocki
  2016-05-31  7:26                   ` Ville Syrjälä
  0 siblings, 1 reply; 28+ messages in thread
From: Rafael J. Wysocki @ 2016-05-30 20:43 UTC (permalink / raw)
  To: Ville Syrjälä
  Cc: Rafael J. Wysocki, Steven Rostedt, Sebastian Andrzej Siewior,
	Thomas Gleixner, linux-arch, Rik van Riel, Srivatsa S. Bhat,
	Peter Zijlstra, Arjan van de Ven, Rusty Russell, Oleg Nesterov,
	Tejun Heo, Andrew Morton, Paul McKenney, Linus Torvalds,
	Paul Turner, Linux Kernel Mailing List, Zhang, Rui, Len Brown,
	Linux PM, Linux ACPI

On Thu, May 26, 2016 at 8:32 PM, Ville Syrjälä
<ville.syrjala@linux.intel.com> wrote:
> On Wed, May 18, 2016 at 10:24:24AM +0300, Ville Syrjälä wrote:
>> On Wed, May 18, 2016 at 01:14:42AM +0200, Rafael J. Wysocki wrote:
>> > On 5/16/2016 9:39 PM, Ville Syrjälä wrote:
>> > > On Wed, May 11, 2016 at 04:34:06PM +0300, Ville Syrjälä wrote:
>> > >> On Wed, May 11, 2016 at 08:44:45AM -0400, Steven Rostedt wrote:
>> > >>> On Wed, 11 May 2016 15:21:16 +0300
>> > >>> Ville Syrjälä <ville.syrjala@linux.intel.com> wrote:
>> > >>>
>> > >>>> Yeah can't get anything from the machine at that point. netconsole
>> > >>>> didn't help either, and no serial on this machine. And IIRC I've
>> > >>>> tried ramoops on this thing in the past but unfortunately the memory
>> > >>>> got cleared on reboot.
>> > >>>>
>> > >>> Can you look at the documentation in the kernel code at
>> > >>>
>> > >>> Documentation/power/basic-pm-debugging.txt And follow the procedures
>> > >>> for testing suspend to RAM (although it requires mostly running the
>> > >>> same tests as for hibernation suspending).
>> > >>>
>> > >>> You can also use the tool s2ram for this as well.
>> > >>>
>> > >>> See Documentation/power/s2ram.txt
>> > >>>
>> > >>> Perhaps this can give us a bit more light onto the problem.
>> > >>>
>> > >>> Basically the above does partial suspend and resume, and can pinpoint
>> > >>> problem areas down to a more select location.
>> > >> All the pm_test modes work fine. The only difference between them was
>> > >> that 'platform' required me to manually wake up the machine (hitting a
>> > >> key was sufficient), whereas the others woke up without help.
>> > >>
>> > >> pm_trace gave me
>> > >> [    1.306633]   Magic number: 0:185:178
>> > >> [    1.322880]   hash matches ../drivers/base/power/main.c:1070
>> > >> [    1.339270] acpi device:0e: hash matches
>> > >> [    1.355414]  platform: hash matches
>> > >>
>> > >> which is the TRACE_SUSPEND in __device_suspend_noirq(), so no help
>> > >> there.
>> > >>
>> > >> I guess I could try to sprinkle more TRACE_RESUMEs around into some
>> > >> early resume code. If anyone has good ideas where to put them it
>> > >> might speed things up a bit.
>> > > So I did a bunch of that and found that it gets stuck somewhere
>> > > around executing the _WAK method:
>> > > platform_resume_noirq
>> > >   acpi_pm_finish
>> > >    acpi_leave_sleep_state
>> > >     acpi_hw_sleep_dispatch
>> > >      acpi_hw_legacy_wake
>> > >       acpi_hw_execute_sleep_method
>> > >        acpi_evaluate_object
>> > >         acpi_ns_evaluate
>> > >          acpi_ps_execute_method
>> > >           acpi_ps_parse_aml
>> > >
>> > > It also seesm that adding a few TRACE_RESUME()s or an msleep() right
>> > > after enable_nonboot_cpus() can avoid the hang, sometimes.
>> > >
>> > > I've attached the DSDT in case anyone is interested in looking at it.
>> > >
>> >
>> > What if you comment out the execution of _WAK (line 318 of
>> > drivers/acpi/acpica/hwsleep.c in 4.6)?  Does that make any difference?
>>
>> Indeed it does. Tried with acpi_idle and intel_idle, and both appear to
>> resume just fine with that hack.
>>
>> -       acpi_hw_execute_sleep_method(METHOD_PATHNAME__WAK, sleep_state);
>> +       //acpi_hw_execute_sleep_method(METHOD_PATHNAME__WAK, sleep_state);
>> +       printk(KERN_CRIT "skipping _WAK\n");
>
> Continuing with my detective work a bit, I decided to hack the DSDT a
> bit to see if I can narrow the it down further, and looks like I found
> it on the first guess. The following change stops it from hanging.
>
> @ -5056,7 +5056,7 @@
>          If (LEqual (Arg0, 0x03))
>          {
>              Store (0x01, \SPNF)
> -           TRAP (0x46)
> +           //TRAP (0x46)
>              P8XH (0x00, 0x03)
>          }
>
> So what does that do? Let's see:
>
>     OperationRegion (IO_T, SystemIO, 0x0800, 0x10)
>     Field (IO_T, ByteAcc, NoLock, Preserve)
>     {
>         Offset (0x08),
>         TRP0,   8
>     }
>
>     OperationRegion (GNVS, SystemMemory, 0x3F5E0C7C, 0x0200)
>     Field (GNVS, AnyAcc, Lock, Preserve)
>     {
>         OSYS,   16,
>         SMIF,   8,
>     ...
>
>     Method (TRAP, 1, Serialized)
>     {
>         Store (Arg0, SMIF) /* \SMIF */
>         Store (0x00, TRP0) /* \TRP0 */
>         Return (SMIF) /* \SMIF */
>     }
>
> and a dump of the IOTR registers shows:
>
> 0x1e80: 0x0000fe01
> 0x1e84: 0x00020001
> 0x1e98: 0x000c0801
> 0x1e9c: 0x000200f0
>
> which seems to be telling me that ports 0x800-0x80f and
> 0xfe00-0xfe03 would trigger an SMI.

Well, the name of the method kind of suggests that it triggers an SMM trap. :-)

> So the next question is how do the idle drivers and cpu hotplug
> fit into this picture. Do we need to force the second HT into
> a specific C state before the SMI or something?

Or you can ask why exactly someone put that SMM trap into _WAK.

Apparently, it was regarded as necessary or no one would have
bothered.  The only reason I can see why it might be regarded as
necessary was that Windows did something Linux doesn't do on that
platform, or, which to me is far more interesting, that Windows didn't
do something actually done by Linux.

My theory would be that Windows didn't reinitialize the second HT
properly during resume and the trap was added to let SMM do that.  If
that's the case, the trap may trigger by the time the second HT
already executes code in Linux and then it will mess up with it and
crash.

Now, what do idles states have to do with that?  IIRC, Windows puts
nonboot CPUs into idle states before suspend, so the SMM code
triggered by the trap may make assumptions about the CPU being in such
a state or similar.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: S3 resume regression [1cf4f629d9d2 ("cpu/hotplug: Move online calls to hotplugged cpu")]
  2016-05-30 20:43                 ` Rafael J. Wysocki
@ 2016-05-31  7:26                   ` Ville Syrjälä
  2016-07-13 14:54                     ` Ville Syrjälä
  0 siblings, 1 reply; 28+ messages in thread
From: Ville Syrjälä @ 2016-05-31  7:26 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Steven Rostedt, Sebastian Andrzej Siewior,
	Thomas Gleixner, linux-arch, Rik van Riel, Srivatsa S. Bhat,
	Peter Zijlstra, Arjan van de Ven, Rusty Russell, Oleg Nesterov,
	Tejun Heo, Andrew Morton, Paul McKenney, Linus Torvalds,
	Paul Turner, Linux Kernel Mailing List, Zhang, Rui, Len Brown,
	Linux PM, Linux ACPI

On Mon, May 30, 2016 at 10:43:51PM +0200, Rafael J. Wysocki wrote:
> On Thu, May 26, 2016 at 8:32 PM, Ville Syrjälä
> <ville.syrjala@linux.intel.com> wrote:
> > On Wed, May 18, 2016 at 10:24:24AM +0300, Ville Syrjälä wrote:
> >> On Wed, May 18, 2016 at 01:14:42AM +0200, Rafael J. Wysocki wrote:
> >> > On 5/16/2016 9:39 PM, Ville Syrjälä wrote:
> >> > > On Wed, May 11, 2016 at 04:34:06PM +0300, Ville Syrjälä wrote:
> >> > >> On Wed, May 11, 2016 at 08:44:45AM -0400, Steven Rostedt wrote:
> >> > >>> On Wed, 11 May 2016 15:21:16 +0300
> >> > >>> Ville Syrjälä <ville.syrjala@linux.intel.com> wrote:
> >> > >>>
> >> > >>>> Yeah can't get anything from the machine at that point. netconsole
> >> > >>>> didn't help either, and no serial on this machine. And IIRC I've
> >> > >>>> tried ramoops on this thing in the past but unfortunately the memory
> >> > >>>> got cleared on reboot.
> >> > >>>>
> >> > >>> Can you look at the documentation in the kernel code at
> >> > >>>
> >> > >>> Documentation/power/basic-pm-debugging.txt And follow the procedures
> >> > >>> for testing suspend to RAM (although it requires mostly running the
> >> > >>> same tests as for hibernation suspending).
> >> > >>>
> >> > >>> You can also use the tool s2ram for this as well.
> >> > >>>
> >> > >>> See Documentation/power/s2ram.txt
> >> > >>>
> >> > >>> Perhaps this can give us a bit more light onto the problem.
> >> > >>>
> >> > >>> Basically the above does partial suspend and resume, and can pinpoint
> >> > >>> problem areas down to a more select location.
> >> > >> All the pm_test modes work fine. The only difference between them was
> >> > >> that 'platform' required me to manually wake up the machine (hitting a
> >> > >> key was sufficient), whereas the others woke up without help.
> >> > >>
> >> > >> pm_trace gave me
> >> > >> [    1.306633]   Magic number: 0:185:178
> >> > >> [    1.322880]   hash matches ../drivers/base/power/main.c:1070
> >> > >> [    1.339270] acpi device:0e: hash matches
> >> > >> [    1.355414]  platform: hash matches
> >> > >>
> >> > >> which is the TRACE_SUSPEND in __device_suspend_noirq(), so no help
> >> > >> there.
> >> > >>
> >> > >> I guess I could try to sprinkle more TRACE_RESUMEs around into some
> >> > >> early resume code. If anyone has good ideas where to put them it
> >> > >> might speed things up a bit.
> >> > > So I did a bunch of that and found that it gets stuck somewhere
> >> > > around executing the _WAK method:
> >> > > platform_resume_noirq
> >> > >   acpi_pm_finish
> >> > >    acpi_leave_sleep_state
> >> > >     acpi_hw_sleep_dispatch
> >> > >      acpi_hw_legacy_wake
> >> > >       acpi_hw_execute_sleep_method
> >> > >        acpi_evaluate_object
> >> > >         acpi_ns_evaluate
> >> > >          acpi_ps_execute_method
> >> > >           acpi_ps_parse_aml
> >> > >
> >> > > It also seesm that adding a few TRACE_RESUME()s or an msleep() right
> >> > > after enable_nonboot_cpus() can avoid the hang, sometimes.
> >> > >
> >> > > I've attached the DSDT in case anyone is interested in looking at it.
> >> > >
> >> >
> >> > What if you comment out the execution of _WAK (line 318 of
> >> > drivers/acpi/acpica/hwsleep.c in 4.6)?  Does that make any difference?
> >>
> >> Indeed it does. Tried with acpi_idle and intel_idle, and both appear to
> >> resume just fine with that hack.
> >>
> >> -       acpi_hw_execute_sleep_method(METHOD_PATHNAME__WAK, sleep_state);
> >> +       //acpi_hw_execute_sleep_method(METHOD_PATHNAME__WAK, sleep_state);
> >> +       printk(KERN_CRIT "skipping _WAK\n");
> >
> > Continuing with my detective work a bit, I decided to hack the DSDT a
> > bit to see if I can narrow the it down further, and looks like I found
> > it on the first guess. The following change stops it from hanging.
> >
> > @ -5056,7 +5056,7 @@
> >          If (LEqual (Arg0, 0x03))
> >          {
> >              Store (0x01, \SPNF)
> > -           TRAP (0x46)
> > +           //TRAP (0x46)
> >              P8XH (0x00, 0x03)
> >          }
> >
> > So what does that do? Let's see:
> >
> >     OperationRegion (IO_T, SystemIO, 0x0800, 0x10)
> >     Field (IO_T, ByteAcc, NoLock, Preserve)
> >     {
> >         Offset (0x08),
> >         TRP0,   8
> >     }
> >
> >     OperationRegion (GNVS, SystemMemory, 0x3F5E0C7C, 0x0200)
> >     Field (GNVS, AnyAcc, Lock, Preserve)
> >     {
> >         OSYS,   16,
> >         SMIF,   8,
> >     ...
> >
> >     Method (TRAP, 1, Serialized)
> >     {
> >         Store (Arg0, SMIF) /* \SMIF */
> >         Store (0x00, TRP0) /* \TRP0 */
> >         Return (SMIF) /* \SMIF */
> >     }
> >
> > and a dump of the IOTR registers shows:
> >
> > 0x1e80: 0x0000fe01
> > 0x1e84: 0x00020001
> > 0x1e98: 0x000c0801
> > 0x1e9c: 0x000200f0
> >
> > which seems to be telling me that ports 0x800-0x80f and
> > 0xfe00-0xfe03 would trigger an SMI.
> 
> Well, the name of the method kind of suggests that it triggers an SMM trap. :-)

Which is why I wanted confirm that by looking at the IOTR regs ;)

> 
> > So the next question is how do the idle drivers and cpu hotplug
> > fit into this picture. Do we need to force the second HT into
> > a specific C state before the SMI or something?
> 
> Or you can ask why exactly someone put that SMM trap into _WAK.
> 
> Apparently, it was regarded as necessary or no one would have
> bothered.  The only reason I can see why it might be regarded as
> necessary was that Windows did something Linux doesn't do on that
> platform, or, which to me is far more interesting, that Windows didn't
> do something actually done by Linux.
> 
> My theory would be that Windows didn't reinitialize the second HT
> properly during resume and the trap was added to let SMM do that.  If
> that's the case, the trap may trigger by the time the second HT
> already executes code in Linux and then it will mess up with it and
> crash.
> 
> Now, what do idles states have to do with that?  IIRC, Windows puts
> nonboot CPUs into idle states before suspend, so the SMM code
> triggered by the trap may make assumptions about the CPU being in such
> a state or similar.

BTW I also tried to move the enable_nonboot_cpus() after _WAK, and I
tried to boot with nosmp, but neither trick helped. If someone could
throw some patches my way to force things into a specific state
before suspend/_WAK I'd be happy to test them out.

-- 
Ville Syrjälä
Intel OTC
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: S3 resume regression [1cf4f629d9d2 ("cpu/hotplug: Move online calls to hotplugged cpu")]
  2016-05-31  7:26                   ` Ville Syrjälä
@ 2016-07-13 14:54                     ` Ville Syrjälä
  2016-07-14  8:29                       ` Feng Tang
  0 siblings, 1 reply; 28+ messages in thread
From: Ville Syrjälä @ 2016-07-13 14:54 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Steven Rostedt, Sebastian Andrzej Siewior,
	Thomas Gleixner, linux-arch, Rik van Riel, Srivatsa S. Bhat,
	Peter Zijlstra, Arjan van de Ven, Rusty Russell, Oleg Nesterov,
	Tejun Heo, Andrew Morton, Paul McKenney, Linus Torvalds,
	Paul Turner, Linux Kernel Mailing List, Zhang, Rui, Len Brown,
	Linux PM, Linux ACPI

On Tue, May 31, 2016 at 10:26:50AM +0300, Ville Syrjälä wrote:
> On Mon, May 30, 2016 at 10:43:51PM +0200, Rafael J. Wysocki wrote:
> > On Thu, May 26, 2016 at 8:32 PM, Ville Syrjälä
> > <ville.syrjala@linux.intel.com> wrote:
> > > On Wed, May 18, 2016 at 10:24:24AM +0300, Ville Syrjälä wrote:
> > >> On Wed, May 18, 2016 at 01:14:42AM +0200, Rafael J. Wysocki wrote:
> > >> > On 5/16/2016 9:39 PM, Ville Syrjälä wrote:
> > >> > > On Wed, May 11, 2016 at 04:34:06PM +0300, Ville Syrjälä wrote:
> > >> > >> On Wed, May 11, 2016 at 08:44:45AM -0400, Steven Rostedt wrote:
> > >> > >>> On Wed, 11 May 2016 15:21:16 +0300
> > >> > >>> Ville Syrjälä <ville.syrjala@linux.intel.com> wrote:
> > >> > >>>
> > >> > >>>> Yeah can't get anything from the machine at that point. netconsole
> > >> > >>>> didn't help either, and no serial on this machine. And IIRC I've
> > >> > >>>> tried ramoops on this thing in the past but unfortunately the memory
> > >> > >>>> got cleared on reboot.
> > >> > >>>>
> > >> > >>> Can you look at the documentation in the kernel code at
> > >> > >>>
> > >> > >>> Documentation/power/basic-pm-debugging.txt And follow the procedures
> > >> > >>> for testing suspend to RAM (although it requires mostly running the
> > >> > >>> same tests as for hibernation suspending).
> > >> > >>>
> > >> > >>> You can also use the tool s2ram for this as well.
> > >> > >>>
> > >> > >>> See Documentation/power/s2ram.txt
> > >> > >>>
> > >> > >>> Perhaps this can give us a bit more light onto the problem.
> > >> > >>>
> > >> > >>> Basically the above does partial suspend and resume, and can pinpoint
> > >> > >>> problem areas down to a more select location.
> > >> > >> All the pm_test modes work fine. The only difference between them was
> > >> > >> that 'platform' required me to manually wake up the machine (hitting a
> > >> > >> key was sufficient), whereas the others woke up without help.
> > >> > >>
> > >> > >> pm_trace gave me
> > >> > >> [    1.306633]   Magic number: 0:185:178
> > >> > >> [    1.322880]   hash matches ../drivers/base/power/main.c:1070
> > >> > >> [    1.339270] acpi device:0e: hash matches
> > >> > >> [    1.355414]  platform: hash matches
> > >> > >>
> > >> > >> which is the TRACE_SUSPEND in __device_suspend_noirq(), so no help
> > >> > >> there.
> > >> > >>
> > >> > >> I guess I could try to sprinkle more TRACE_RESUMEs around into some
> > >> > >> early resume code. If anyone has good ideas where to put them it
> > >> > >> might speed things up a bit.
> > >> > > So I did a bunch of that and found that it gets stuck somewhere
> > >> > > around executing the _WAK method:
> > >> > > platform_resume_noirq
> > >> > >   acpi_pm_finish
> > >> > >    acpi_leave_sleep_state
> > >> > >     acpi_hw_sleep_dispatch
> > >> > >      acpi_hw_legacy_wake
> > >> > >       acpi_hw_execute_sleep_method
> > >> > >        acpi_evaluate_object
> > >> > >         acpi_ns_evaluate
> > >> > >          acpi_ps_execute_method
> > >> > >           acpi_ps_parse_aml
> > >> > >
> > >> > > It also seesm that adding a few TRACE_RESUME()s or an msleep() right
> > >> > > after enable_nonboot_cpus() can avoid the hang, sometimes.
> > >> > >
> > >> > > I've attached the DSDT in case anyone is interested in looking at it.
> > >> > >
> > >> >
> > >> > What if you comment out the execution of _WAK (line 318 of
> > >> > drivers/acpi/acpica/hwsleep.c in 4.6)?  Does that make any difference?
> > >>
> > >> Indeed it does. Tried with acpi_idle and intel_idle, and both appear to
> > >> resume just fine with that hack.
> > >>
> > >> -       acpi_hw_execute_sleep_method(METHOD_PATHNAME__WAK, sleep_state);
> > >> +       //acpi_hw_execute_sleep_method(METHOD_PATHNAME__WAK, sleep_state);
> > >> +       printk(KERN_CRIT "skipping _WAK\n");
> > >
> > > Continuing with my detective work a bit, I decided to hack the DSDT a
> > > bit to see if I can narrow the it down further, and looks like I found
> > > it on the first guess. The following change stops it from hanging.
> > >
> > > @ -5056,7 +5056,7 @@
> > >          If (LEqual (Arg0, 0x03))
> > >          {
> > >              Store (0x01, \SPNF)
> > > -           TRAP (0x46)
> > > +           //TRAP (0x46)
> > >              P8XH (0x00, 0x03)
> > >          }
> > >
> > > So what does that do? Let's see:
> > >
> > >     OperationRegion (IO_T, SystemIO, 0x0800, 0x10)
> > >     Field (IO_T, ByteAcc, NoLock, Preserve)
> > >     {
> > >         Offset (0x08),
> > >         TRP0,   8
> > >     }
> > >
> > >     OperationRegion (GNVS, SystemMemory, 0x3F5E0C7C, 0x0200)
> > >     Field (GNVS, AnyAcc, Lock, Preserve)
> > >     {
> > >         OSYS,   16,
> > >         SMIF,   8,
> > >     ...
> > >
> > >     Method (TRAP, 1, Serialized)
> > >     {
> > >         Store (Arg0, SMIF) /* \SMIF */
> > >         Store (0x00, TRP0) /* \TRP0 */
> > >         Return (SMIF) /* \SMIF */
> > >     }
> > >
> > > and a dump of the IOTR registers shows:
> > >
> > > 0x1e80: 0x0000fe01
> > > 0x1e84: 0x00020001
> > > 0x1e98: 0x000c0801
> > > 0x1e9c: 0x000200f0
> > >
> > > which seems to be telling me that ports 0x800-0x80f and
> > > 0xfe00-0xfe03 would trigger an SMI.
> > 
> > Well, the name of the method kind of suggests that it triggers an SMM trap. :-)
> 
> Which is why I wanted confirm that by looking at the IOTR regs ;)
> 
> > 
> > > So the next question is how do the idle drivers and cpu hotplug
> > > fit into this picture. Do we need to force the second HT into
> > > a specific C state before the SMI or something?
> > 
> > Or you can ask why exactly someone put that SMM trap into _WAK.
> > 
> > Apparently, it was regarded as necessary or no one would have
> > bothered.  The only reason I can see why it might be regarded as
> > necessary was that Windows did something Linux doesn't do on that
> > platform, or, which to me is far more interesting, that Windows didn't
> > do something actually done by Linux.
> > 
> > My theory would be that Windows didn't reinitialize the second HT
> > properly during resume and the trap was added to let SMM do that.  If
> > that's the case, the trap may trigger by the time the second HT
> > already executes code in Linux and then it will mess up with it and
> > crash.
> > 
> > Now, what do idles states have to do with that?  IIRC, Windows puts
> > nonboot CPUs into idle states before suspend, so the SMM code
> > triggered by the trap may make assumptions about the CPU being in such
> > a state or similar.
> 
> BTW I also tried to move the enable_nonboot_cpus() after _WAK, and I
> tried to boot with nosmp, but neither trick helped. If someone could
> throw some patches my way to force things into a specific state
> before suspend/_WAK I'd be happy to test them out.

Ping. Anyone have any ideas what to try here? Would be nice to get this
machine working again...

-- 
Ville Syrjälä
Intel OTC
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: S3 resume regression [1cf4f629d9d2 ("cpu/hotplug: Move online calls to hotplugged cpu")]
  2016-07-13 14:54                     ` Ville Syrjälä
@ 2016-07-14  8:29                       ` Feng Tang
  2016-08-09 17:20                         ` Ville Syrjälä
  0 siblings, 1 reply; 28+ messages in thread
From: Feng Tang @ 2016-07-14  8:29 UTC (permalink / raw)
  To: Ville Syrjälä, feng.tang
  Cc: Rafael J. Wysocki, Rafael J. Wysocki, Steven Rostedt,
	Sebastian Andrzej Siewior, Thomas Gleixner, linux-arch,
	Rik van Riel, Srivatsa S. Bhat, Peter Zijlstra, Arjan van de Ven,
	Rusty Russell, Oleg Nesterov, Tejun Heo, Andrew Morton,
	Paul McKenney, Linus Torvalds, Paul Turner,
	Linux Kernel Mailing List, Zhang, Rui, Len Brown, Linux PM,
	Linux ACPI

if you only want it to work, you can try an old patch
https://bugzilla.kernel.org/attachment.cgi?id=76071 from a similar bug
https://bugzilla.kernel.org/show_bug.cgi?id=41932

Alistair Buxton confirmed it work for 3.18 at least
https://bugzilla.kernel.org/show_bug.cgi?id=107151#c16

Thanks,
Feng

On Wed, Jul 13, 2016 at 10:54 PM, Ville Syrjälä
<ville.syrjala@linux.intel.com> wrote:
> On Tue, May 31, 2016 at 10:26:50AM +0300, Ville Syrjälä wrote:
>> On Mon, May 30, 2016 at 10:43:51PM +0200, Rafael J. Wysocki wrote:
>> > On Thu, May 26, 2016 at 8:32 PM, Ville Syrjälä
>> > <ville.syrjala@linux.intel.com> wrote:
>> > > On Wed, May 18, 2016 at 10:24:24AM +0300, Ville Syrjälä wrote:
>> > >> On Wed, May 18, 2016 at 01:14:42AM +0200, Rafael J. Wysocki wrote:
>> > >> > On 5/16/2016 9:39 PM, Ville Syrjälä wrote:
>> > >> > > On Wed, May 11, 2016 at 04:34:06PM +0300, Ville Syrjälä wrote:
>> > >> > >> On Wed, May 11, 2016 at 08:44:45AM -0400, Steven Rostedt wrote:
>> > >> > >>> On Wed, 11 May 2016 15:21:16 +0300
>> > >> > >>> Ville Syrjälä <ville.syrjala@linux.intel.com> wrote:
>> > >> > >>>
>> > >> > >>>> Yeah can't get anything from the machine at that point. netconsole
>> > >> > >>>> didn't help either, and no serial on this machine. And IIRC I've
>> > >> > >>>> tried ramoops on this thing in the past but unfortunately the memory
>> > >> > >>>> got cleared on reboot.
>> > >> > >>>>
>> > >> > >>> Can you look at the documentation in the kernel code at
>> > >> > >>>
>> > >> > >>> Documentation/power/basic-pm-debugging.txt And follow the procedures
>> > >> > >>> for testing suspend to RAM (although it requires mostly running the
>> > >> > >>> same tests as for hibernation suspending).
>> > >> > >>>
>> > >> > >>> You can also use the tool s2ram for this as well.
>> > >> > >>>
>> > >> > >>> See Documentation/power/s2ram.txt
>> > >> > >>>
>> > >> > >>> Perhaps this can give us a bit more light onto the problem.
>> > >> > >>>
>> > >> > >>> Basically the above does partial suspend and resume, and can pinpoint
>> > >> > >>> problem areas down to a more select location.
>> > >> > >> All the pm_test modes work fine. The only difference between them was
>> > >> > >> that 'platform' required me to manually wake up the machine (hitting a
>> > >> > >> key was sufficient), whereas the others woke up without help.
>> > >> > >>
>> > >> > >> pm_trace gave me
>> > >> > >> [    1.306633]   Magic number: 0:185:178
>> > >> > >> [    1.322880]   hash matches ../drivers/base/power/main.c:1070
>> > >> > >> [    1.339270] acpi device:0e: hash matches
>> > >> > >> [    1.355414]  platform: hash matches
>> > >> > >>
>> > >> > >> which is the TRACE_SUSPEND in __device_suspend_noirq(), so no help
>> > >> > >> there.
>> > >> > >>
>> > >> > >> I guess I could try to sprinkle more TRACE_RESUMEs around into some
>> > >> > >> early resume code. If anyone has good ideas where to put them it
>> > >> > >> might speed things up a bit.
>> > >> > > So I did a bunch of that and found that it gets stuck somewhere
>> > >> > > around executing the _WAK method:
>> > >> > > platform_resume_noirq
>> > >> > >   acpi_pm_finish
>> > >> > >    acpi_leave_sleep_state
>> > >> > >     acpi_hw_sleep_dispatch
>> > >> > >      acpi_hw_legacy_wake
>> > >> > >       acpi_hw_execute_sleep_method
>> > >> > >        acpi_evaluate_object
>> > >> > >         acpi_ns_evaluate
>> > >> > >          acpi_ps_execute_method
>> > >> > >           acpi_ps_parse_aml
>> > >> > >
>> > >> > > It also seesm that adding a few TRACE_RESUME()s or an msleep() right
>> > >> > > after enable_nonboot_cpus() can avoid the hang, sometimes.
>> > >> > >
>> > >> > > I've attached the DSDT in case anyone is interested in looking at it.
>> > >> > >
>> > >> >
>> > >> > What if you comment out the execution of _WAK (line 318 of
>> > >> > drivers/acpi/acpica/hwsleep.c in 4.6)?  Does that make any difference?
>> > >>
>> > >> Indeed it does. Tried with acpi_idle and intel_idle, and both appear to
>> > >> resume just fine with that hack.
>> > >>
>> > >> -       acpi_hw_execute_sleep_method(METHOD_PATHNAME__WAK, sleep_state);
>> > >> +       //acpi_hw_execute_sleep_method(METHOD_PATHNAME__WAK, sleep_state);
>> > >> +       printk(KERN_CRIT "skipping _WAK\n");
>> > >
>> > > Continuing with my detective work a bit, I decided to hack the DSDT a
>> > > bit to see if I can narrow the it down further, and looks like I found
>> > > it on the first guess. The following change stops it from hanging.
>> > >
>> > > @ -5056,7 +5056,7 @@
>> > >          If (LEqual (Arg0, 0x03))
>> > >          {
>> > >              Store (0x01, \SPNF)
>> > > -           TRAP (0x46)
>> > > +           //TRAP (0x46)
>> > >              P8XH (0x00, 0x03)
>> > >          }
>> > >
>> > > So what does that do? Let's see:
>> > >
>> > >     OperationRegion (IO_T, SystemIO, 0x0800, 0x10)
>> > >     Field (IO_T, ByteAcc, NoLock, Preserve)
>> > >     {
>> > >         Offset (0x08),
>> > >         TRP0,   8
>> > >     }
>> > >
>> > >     OperationRegion (GNVS, SystemMemory, 0x3F5E0C7C, 0x0200)
>> > >     Field (GNVS, AnyAcc, Lock, Preserve)
>> > >     {
>> > >         OSYS,   16,
>> > >         SMIF,   8,
>> > >     ...
>> > >
>> > >     Method (TRAP, 1, Serialized)
>> > >     {
>> > >         Store (Arg0, SMIF) /* \SMIF */
>> > >         Store (0x00, TRP0) /* \TRP0 */
>> > >         Return (SMIF) /* \SMIF */
>> > >     }
>> > >
>> > > and a dump of the IOTR registers shows:
>> > >
>> > > 0x1e80: 0x0000fe01
>> > > 0x1e84: 0x00020001
>> > > 0x1e98: 0x000c0801
>> > > 0x1e9c: 0x000200f0
>> > >
>> > > which seems to be telling me that ports 0x800-0x80f and
>> > > 0xfe00-0xfe03 would trigger an SMI.
>> >
>> > Well, the name of the method kind of suggests that it triggers an SMM trap. :-)
>>
>> Which is why I wanted confirm that by looking at the IOTR regs ;)
>>
>> >
>> > > So the next question is how do the idle drivers and cpu hotplug
>> > > fit into this picture. Do we need to force the second HT into
>> > > a specific C state before the SMI or something?
>> >
>> > Or you can ask why exactly someone put that SMM trap into _WAK.
>> >
>> > Apparently, it was regarded as necessary or no one would have
>> > bothered.  The only reason I can see why it might be regarded as
>> > necessary was that Windows did something Linux doesn't do on that
>> > platform, or, which to me is far more interesting, that Windows didn't
>> > do something actually done by Linux.
>> >
>> > My theory would be that Windows didn't reinitialize the second HT
>> > properly during resume and the trap was added to let SMM do that.  If
>> > that's the case, the trap may trigger by the time the second HT
>> > already executes code in Linux and then it will mess up with it and
>> > crash.
>> >
>> > Now, what do idles states have to do with that?  IIRC, Windows puts
>> > nonboot CPUs into idle states before suspend, so the SMM code
>> > triggered by the trap may make assumptions about the CPU being in such
>> > a state or similar.
>>
>> BTW I also tried to move the enable_nonboot_cpus() after _WAK, and I
>> tried to boot with nosmp, but neither trick helped. If someone could
>> throw some patches my way to force things into a specific state
>> before suspend/_WAK I'd be happy to test them out.
>
> Ping. Anyone have any ideas what to try here? Would be nice to get this
> machine working again...
>
> --
> Ville Syrjälä
> Intel OTC
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: S3 resume regression [1cf4f629d9d2 ("cpu/hotplug: Move online calls to hotplugged cpu")]
  2016-07-14  8:29                       ` Feng Tang
@ 2016-08-09 17:20                         ` Ville Syrjälä
  2016-10-27 17:28                           ` Ville Syrjälä
  0 siblings, 1 reply; 28+ messages in thread
From: Ville Syrjälä @ 2016-08-09 17:20 UTC (permalink / raw)
  To: Feng Tang
  Cc: feng.tang, Rafael J. Wysocki, Rafael J. Wysocki, Steven Rostedt,
	Sebastian Andrzej Siewior, Thomas Gleixner, linux-arch,
	Rik van Riel, Srivatsa S. Bhat, Peter Zijlstra, Arjan van de Ven,
	Rusty Russell, Oleg Nesterov, Tejun Heo, Andrew Morton,
	Paul McKenney, Linus Torvalds, Paul Turner,
	Linux Kernel Mailing List, Zhang, Rui, Len Brown, Linux PM,
	Linux ACPI

On Thu, Jul 14, 2016 at 04:29:42PM +0800, Feng Tang wrote:
> if you only want it to work, you can try an old patch
> https://bugzilla.kernel.org/attachment.cgi?id=76071 from a similar bug
> https://bugzilla.kernel.org/show_bug.cgi?id=41932
> 
> Alistair Buxton confirmed it work for 3.18 at least
> https://bugzilla.kernel.org/show_bug.cgi?id=107151#c16

That patch is a bit too ripe by now. Would need a fresh squeezed one.

> 
> Thanks,
> Feng
> 
> On Wed, Jul 13, 2016 at 10:54 PM, Ville Syrjälä
> <ville.syrjala@linux.intel.com> wrote:
> > On Tue, May 31, 2016 at 10:26:50AM +0300, Ville Syrjälä wrote:
> >> On Mon, May 30, 2016 at 10:43:51PM +0200, Rafael J. Wysocki wrote:
> >> > On Thu, May 26, 2016 at 8:32 PM, Ville Syrjälä
> >> > <ville.syrjala@linux.intel.com> wrote:
> >> > > On Wed, May 18, 2016 at 10:24:24AM +0300, Ville Syrjälä wrote:
> >> > >> On Wed, May 18, 2016 at 01:14:42AM +0200, Rafael J. Wysocki wrote:
> >> > >> > On 5/16/2016 9:39 PM, Ville Syrjälä wrote:
> >> > >> > > On Wed, May 11, 2016 at 04:34:06PM +0300, Ville Syrjälä wrote:
> >> > >> > >> On Wed, May 11, 2016 at 08:44:45AM -0400, Steven Rostedt wrote:
> >> > >> > >>> On Wed, 11 May 2016 15:21:16 +0300
> >> > >> > >>> Ville Syrjälä <ville.syrjala@linux.intel.com> wrote:
> >> > >> > >>>
> >> > >> > >>>> Yeah can't get anything from the machine at that point. netconsole
> >> > >> > >>>> didn't help either, and no serial on this machine. And IIRC I've
> >> > >> > >>>> tried ramoops on this thing in the past but unfortunately the memory
> >> > >> > >>>> got cleared on reboot.
> >> > >> > >>>>
> >> > >> > >>> Can you look at the documentation in the kernel code at
> >> > >> > >>>
> >> > >> > >>> Documentation/power/basic-pm-debugging.txt And follow the procedures
> >> > >> > >>> for testing suspend to RAM (although it requires mostly running the
> >> > >> > >>> same tests as for hibernation suspending).
> >> > >> > >>>
> >> > >> > >>> You can also use the tool s2ram for this as well.
> >> > >> > >>>
> >> > >> > >>> See Documentation/power/s2ram.txt
> >> > >> > >>>
> >> > >> > >>> Perhaps this can give us a bit more light onto the problem.
> >> > >> > >>>
> >> > >> > >>> Basically the above does partial suspend and resume, and can pinpoint
> >> > >> > >>> problem areas down to a more select location.
> >> > >> > >> All the pm_test modes work fine. The only difference between them was
> >> > >> > >> that 'platform' required me to manually wake up the machine (hitting a
> >> > >> > >> key was sufficient), whereas the others woke up without help.
> >> > >> > >>
> >> > >> > >> pm_trace gave me
> >> > >> > >> [    1.306633]   Magic number: 0:185:178
> >> > >> > >> [    1.322880]   hash matches ../drivers/base/power/main.c:1070
> >> > >> > >> [    1.339270] acpi device:0e: hash matches
> >> > >> > >> [    1.355414]  platform: hash matches
> >> > >> > >>
> >> > >> > >> which is the TRACE_SUSPEND in __device_suspend_noirq(), so no help
> >> > >> > >> there.
> >> > >> > >>
> >> > >> > >> I guess I could try to sprinkle more TRACE_RESUMEs around into some
> >> > >> > >> early resume code. If anyone has good ideas where to put them it
> >> > >> > >> might speed things up a bit.
> >> > >> > > So I did a bunch of that and found that it gets stuck somewhere
> >> > >> > > around executing the _WAK method:
> >> > >> > > platform_resume_noirq
> >> > >> > >   acpi_pm_finish
> >> > >> > >    acpi_leave_sleep_state
> >> > >> > >     acpi_hw_sleep_dispatch
> >> > >> > >      acpi_hw_legacy_wake
> >> > >> > >       acpi_hw_execute_sleep_method
> >> > >> > >        acpi_evaluate_object
> >> > >> > >         acpi_ns_evaluate
> >> > >> > >          acpi_ps_execute_method
> >> > >> > >           acpi_ps_parse_aml
> >> > >> > >
> >> > >> > > It also seesm that adding a few TRACE_RESUME()s or an msleep() right
> >> > >> > > after enable_nonboot_cpus() can avoid the hang, sometimes.
> >> > >> > >
> >> > >> > > I've attached the DSDT in case anyone is interested in looking at it.
> >> > >> > >
> >> > >> >
> >> > >> > What if you comment out the execution of _WAK (line 318 of
> >> > >> > drivers/acpi/acpica/hwsleep.c in 4.6)?  Does that make any difference?
> >> > >>
> >> > >> Indeed it does. Tried with acpi_idle and intel_idle, and both appear to
> >> > >> resume just fine with that hack.
> >> > >>
> >> > >> -       acpi_hw_execute_sleep_method(METHOD_PATHNAME__WAK, sleep_state);
> >> > >> +       //acpi_hw_execute_sleep_method(METHOD_PATHNAME__WAK, sleep_state);
> >> > >> +       printk(KERN_CRIT "skipping _WAK\n");
> >> > >
> >> > > Continuing with my detective work a bit, I decided to hack the DSDT a
> >> > > bit to see if I can narrow the it down further, and looks like I found
> >> > > it on the first guess. The following change stops it from hanging.
> >> > >
> >> > > @ -5056,7 +5056,7 @@
> >> > >          If (LEqual (Arg0, 0x03))
> >> > >          {
> >> > >              Store (0x01, \SPNF)
> >> > > -           TRAP (0x46)
> >> > > +           //TRAP (0x46)
> >> > >              P8XH (0x00, 0x03)
> >> > >          }
> >> > >
> >> > > So what does that do? Let's see:
> >> > >
> >> > >     OperationRegion (IO_T, SystemIO, 0x0800, 0x10)
> >> > >     Field (IO_T, ByteAcc, NoLock, Preserve)
> >> > >     {
> >> > >         Offset (0x08),
> >> > >         TRP0,   8
> >> > >     }
> >> > >
> >> > >     OperationRegion (GNVS, SystemMemory, 0x3F5E0C7C, 0x0200)
> >> > >     Field (GNVS, AnyAcc, Lock, Preserve)
> >> > >     {
> >> > >         OSYS,   16,
> >> > >         SMIF,   8,
> >> > >     ...
> >> > >
> >> > >     Method (TRAP, 1, Serialized)
> >> > >     {
> >> > >         Store (Arg0, SMIF) /* \SMIF */
> >> > >         Store (0x00, TRP0) /* \TRP0 */
> >> > >         Return (SMIF) /* \SMIF */
> >> > >     }
> >> > >
> >> > > and a dump of the IOTR registers shows:
> >> > >
> >> > > 0x1e80: 0x0000fe01
> >> > > 0x1e84: 0x00020001
> >> > > 0x1e98: 0x000c0801
> >> > > 0x1e9c: 0x000200f0
> >> > >
> >> > > which seems to be telling me that ports 0x800-0x80f and
> >> > > 0xfe00-0xfe03 would trigger an SMI.
> >> >
> >> > Well, the name of the method kind of suggests that it triggers an SMM trap. :-)
> >>
> >> Which is why I wanted confirm that by looking at the IOTR regs ;)
> >>
> >> >
> >> > > So the next question is how do the idle drivers and cpu hotplug
> >> > > fit into this picture. Do we need to force the second HT into
> >> > > a specific C state before the SMI or something?
> >> >
> >> > Or you can ask why exactly someone put that SMM trap into _WAK.
> >> >
> >> > Apparently, it was regarded as necessary or no one would have
> >> > bothered.  The only reason I can see why it might be regarded as
> >> > necessary was that Windows did something Linux doesn't do on that
> >> > platform, or, which to me is far more interesting, that Windows didn't
> >> > do something actually done by Linux.
> >> >
> >> > My theory would be that Windows didn't reinitialize the second HT
> >> > properly during resume and the trap was added to let SMM do that.  If
> >> > that's the case, the trap may trigger by the time the second HT
> >> > already executes code in Linux and then it will mess up with it and
> >> > crash.
> >> >
> >> > Now, what do idles states have to do with that?  IIRC, Windows puts
> >> > nonboot CPUs into idle states before suspend, so the SMM code
> >> > triggered by the trap may make assumptions about the CPU being in such
> >> > a state or similar.
> >>
> >> BTW I also tried to move the enable_nonboot_cpus() after _WAK, and I
> >> tried to boot with nosmp, but neither trick helped. If someone could
> >> throw some patches my way to force things into a specific state
> >> before suspend/_WAK I'd be happy to test them out.
> >
> > Ping. Anyone have any ideas what to try here? Would be nice to get this
> > machine working again...
> >
> > --
> > Ville Syrjälä
> > Intel OTC
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Ville Syrjälä
Intel OTC

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: S3 resume regression [1cf4f629d9d2 ("cpu/hotplug: Move online calls to hotplugged cpu")]
  2016-08-09 17:20                         ` Ville Syrjälä
@ 2016-10-27 17:28                           ` Ville Syrjälä
  2016-10-27 18:48                             ` Thomas Gleixner
  0 siblings, 1 reply; 28+ messages in thread
From: Ville Syrjälä @ 2016-10-27 17:28 UTC (permalink / raw)
  To: Feng Tang
  Cc: feng.tang, Rafael J. Wysocki, Rafael J. Wysocki, Steven Rostedt,
	Sebastian Andrzej Siewior, Thomas Gleixner, linux-arch,
	Rik van Riel, Srivatsa S. Bhat, Peter Zijlstra, Arjan van de Ven,
	Rusty Russell, Oleg Nesterov, Tejun Heo, Andrew Morton,
	Paul McKenney, Linus Torvalds, Paul Turner,
	Linux Kernel Mailing List, Zhang, Rui

On Tue, Aug 09, 2016 at 08:20:57PM +0300, Ville Syrjälä wrote:
> On Thu, Jul 14, 2016 at 04:29:42PM +0800, Feng Tang wrote:
> > if you only want it to work, you can try an old patch
> > https://bugzilla.kernel.org/attachment.cgi?id=76071 from a similar bug
> > https://bugzilla.kernel.org/show_bug.cgi?id=41932
> > 
> > Alistair Buxton confirmed it work for 3.18 at least
> > https://bugzilla.kernel.org/show_bug.cgi?id=107151#c16
> 
> That patch is a bit too ripe by now. Would need a fresh squeezed one.

Since no one else bothered to refresh the patch I did it myself:

diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
index f6aae7977824..d73d094a8972 100644
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -657,8 +657,16 @@ static void tick_handle_oneshot_broadcast(struct clock_event_device *dev)
 	 * - There are pending events on sleeping CPUs which were not
 	 * in the event mask
 	 */
-	if (next_event.tv64 != KTIME_MAX)
+	if (next_event.tv64 != KTIME_MAX) {
+		s64 delta = next_event.tv64 - now.tv64;
+
+		if (delta >= 10000000000) {
+			printk(KERN_CRIT "%s(): The delta is big: %lld\n", __func__, delta);
+			next_event.tv64 = now.tv64 + 3000000000;
+		}
+
 		tick_broadcast_set_event(dev, next_cpu, next_event);
+	}
 
 	raw_spin_unlock(&tick_broadcast_lock);

Unfortunately it doesn't do anything for me.

The fortunate thing is that acpi-idle has magically been fixed in the
meantime, so I can at least go back to using that one and have working
S3.

-- 
Ville Syrjälä
Intel OTC

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: S3 resume regression [1cf4f629d9d2 ("cpu/hotplug: Move online calls to hotplugged cpu")]
  2016-10-27 17:28                           ` Ville Syrjälä
@ 2016-10-27 18:48                             ` Thomas Gleixner
  2016-10-27 19:20                               ` Ville Syrjälä
  0 siblings, 1 reply; 28+ messages in thread
From: Thomas Gleixner @ 2016-10-27 18:48 UTC (permalink / raw)
  To: Ville Syrjälä
  Cc: Feng Tang, feng.tang, Rafael J. Wysocki, Rafael J. Wysocki,
	Steven Rostedt, Sebastian Andrzej Siewior, linux-arch,
	Rik van Riel, Srivatsa S. Bhat, Peter Zijlstra, Arjan van de Ven,
	Rusty Russell, Oleg Nesterov, Tejun Heo, Andrew Morton,
	Paul McKenney, Linus Torvalds, Paul Turner,
	Linux Kernel Mailing List, Zhang, Rui

[-- Attachment #1: Type: text/plain, Size: 2182 bytes --]

On Thu, 27 Oct 2016, Ville Syrjälä wrote:
> On Tue, Aug 09, 2016 at 08:20:57PM +0300, Ville Syrjälä wrote:
> > On Thu, Jul 14, 2016 at 04:29:42PM +0800, Feng Tang wrote:
> > > if you only want it to work, you can try an old patch
> > > https://bugzilla.kernel.org/attachment.cgi?id=76071 from a similar bug
> > > https://bugzilla.kernel.org/show_bug.cgi?id=41932
> > > 
> > > Alistair Buxton confirmed it work for 3.18 at least
> > > https://bugzilla.kernel.org/show_bug.cgi?id=107151#c16
> > 
> > That patch is a bit too ripe by now. Would need a fresh squeezed one.
> 
> Since no one else bothered to refresh the patch I did it myself:
> 
> diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
> index f6aae7977824..d73d094a8972 100644
> --- a/kernel/time/tick-broadcast.c
> +++ b/kernel/time/tick-broadcast.c
> @@ -657,8 +657,16 @@ static void tick_handle_oneshot_broadcast(struct clock_event_device *dev)
>  	 * - There are pending events on sleeping CPUs which were not
>  	 * in the event mask
>  	 */
> -	if (next_event.tv64 != KTIME_MAX)
> +	if (next_event.tv64 != KTIME_MAX) {
> +		s64 delta = next_event.tv64 - now.tv64;
> +
> +		if (delta >= 10000000000) {
> +			printk(KERN_CRIT "%s(): The delta is big: %lld\n", __func__, delta);
> +			next_event.tv64 = now.tv64 + 3000000000;
> +		}
> +
>  		tick_broadcast_set_event(dev, next_cpu, next_event);
> +	}
>  
>  	raw_spin_unlock(&tick_broadcast_lock);
> 
> Unfortunately it doesn't do anything for me.

And I'm not surprised, because the original patch forced a 5 seconds event
in the broadcast device on resume, aside of limiting the reprogramming.

What that old patch did, was:

1) Make sure that the broadcast device is actually armed at resume.

   That might cause the HPET to resume proper.

2) Force a max. 3 seconds rearm when the targeted expiry time is > than 10
   seconds

   That might make sure that lower C-States are never entered.
 
> The fortunate thing is that acpi-idle has magically been fixed in the
> meantime, so I can at least go back to using that one and have working
> S3.

What's the lowest C-State with acpi-idle and what's the lowest one with
intel_idle?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: S3 resume regression [1cf4f629d9d2 ("cpu/hotplug: Move online calls to hotplugged cpu")]
  2016-10-27 18:48                             ` Thomas Gleixner
@ 2016-10-27 19:20                               ` Ville Syrjälä
  2016-10-27 19:25                                 ` Thomas Gleixner
  0 siblings, 1 reply; 28+ messages in thread
From: Ville Syrjälä @ 2016-10-27 19:20 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Feng Tang, feng.tang, Rafael J. Wysocki, Rafael J. Wysocki,
	Steven Rostedt, Sebastian Andrzej Siewior, linux-arch,
	Rik van Riel, Srivatsa S. Bhat, Peter Zijlstra, Arjan van de Ven,
	Rusty Russell, Oleg Nesterov, Tejun Heo, Andrew Morton,
	Paul McKenney, Linus Torvalds, Paul Turner,
	Linux Kernel Mailing List, Zhang, Rui

On Thu, Oct 27, 2016 at 08:48:57PM +0200, Thomas Gleixner wrote:
> On Thu, 27 Oct 2016, Ville Syrjälä wrote:
> > On Tue, Aug 09, 2016 at 08:20:57PM +0300, Ville Syrjälä wrote:
> > > On Thu, Jul 14, 2016 at 04:29:42PM +0800, Feng Tang wrote:
> > > > if you only want it to work, you can try an old patch
> > > > https://bugzilla.kernel.org/attachment.cgi?id=76071 from a similar bug
> > > > https://bugzilla.kernel.org/show_bug.cgi?id=41932
> > > > 
> > > > Alistair Buxton confirmed it work for 3.18 at least
> > > > https://bugzilla.kernel.org/show_bug.cgi?id=107151#c16
> > > 
> > > That patch is a bit too ripe by now. Would need a fresh squeezed one.
> > 
> > Since no one else bothered to refresh the patch I did it myself:
> > 
> > diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
> > index f6aae7977824..d73d094a8972 100644
> > --- a/kernel/time/tick-broadcast.c
> > +++ b/kernel/time/tick-broadcast.c
> > @@ -657,8 +657,16 @@ static void tick_handle_oneshot_broadcast(struct clock_event_device *dev)
> >  	 * - There are pending events on sleeping CPUs which were not
> >  	 * in the event mask
> >  	 */
> > -	if (next_event.tv64 != KTIME_MAX)
> > +	if (next_event.tv64 != KTIME_MAX) {
> > +		s64 delta = next_event.tv64 - now.tv64;
> > +
> > +		if (delta >= 10000000000) {
> > +			printk(KERN_CRIT "%s(): The delta is big: %lld\n", __func__, delta);
> > +			next_event.tv64 = now.tv64 + 3000000000;
> > +		}
> > +
> >  		tick_broadcast_set_event(dev, next_cpu, next_event);
> > +	}
> >  
> >  	raw_spin_unlock(&tick_broadcast_lock);
> > 
> > Unfortunately it doesn't do anything for me.
> 
> And I'm not surprised, because the original patch forced a 5 seconds event
> in the broadcast device on resume, aside of limiting the reprogramming.
> 
> What that old patch did, was:
> 
> 1) Make sure that the broadcast device is actually armed at resume.
> 
>    That might cause the HPET to resume proper.
> 
> 2) Force a max. 3 seconds rearm when the targeted expiry time is > than 10
>    seconds
> 
>    That might make sure that lower C-States are never entered.

Doh. I lost the other hunk somewhere. Let's try that again... And indeed
with the other hunk in tow the machine would appear to resume properly.

diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
index f6aae7977824..e2173aeeb00c 100644
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -507,8 +507,12 @@ void tick_resume_broadcast(void)
 				tick_broadcast_start_periodic(bc);
 			break;
 		case TICKDEV_MODE_ONESHOT:
-			if (!cpumask_empty(tick_broadcast_mask))
+			if (!cpumask_empty(tick_broadcast_mask)) {
 				tick_resume_broadcast_oneshot(bc);
+				clockevents_program_event(bc,
+							  ktime_add_ns(ktime_get(), 5 * NSEC_PER_SEC),
+							  1);
+			}
 			break;
 		}
 	}
@@ -657,8 +661,16 @@ static void tick_handle_oneshot_broadcast(struct clock_event_device *dev)
 	 * - There are pending events on sleeping CPUs which were not
 	 * in the event mask
 	 */
-	if (next_event.tv64 != KTIME_MAX)
+	if (next_event.tv64 != KTIME_MAX) {
+		s64 delta = next_event.tv64 - now.tv64;
+
+		if (delta >= 10000000000) {
+			printk(KERN_CRIT "%s(): The delta is big: %lld\n", __func__, delta);
+			next_event.tv64 = now.tv64 + 3000000000;
+		}
+
 		tick_broadcast_set_event(dev, next_cpu, next_event);
+	}
 
 	raw_spin_unlock(&tick_broadcast_lock);

>  
> > The fortunate thing is that acpi-idle has magically been fixed in the
> > meantime, so I can at least go back to using that one and have working
> > S3.
> 
> What's the lowest C-State with acpi-idle and what's the lowest one with
> intel_idle?

acpi_idle
/sys/devices/system/cpu/cpu0/cpuidle/state3/desc:ACPI FFH INTEL MWAIT 0x30
/sys/devices/system/cpu/cpu0/cpuidle/state3/disable:0
/sys/devices/system/cpu/cpu0/cpuidle/state3/latency:100
/sys/devices/system/cpu/cpu0/cpuidle/state3/name:C3
/sys/devices/system/cpu/cpu0/cpuidle/state3/power:0
/sys/devices/system/cpu/cpu0/cpuidle/state3/residency:200
/sys/devices/system/cpu/cpu0/cpuidle/state3/time:5677316
/sys/devices/system/cpu/cpu0/cpuidle/state3/usage:5920

intel_idle:
/sys/devices/system/cpu/cpu0/cpuidle/state3/desc:MWAIT 0x30
/sys/devices/system/cpu/cpu0/cpuidle/state3/disable:0
/sys/devices/system/cpu/cpu0/cpuidle/state3/latency:100
/sys/devices/system/cpu/cpu0/cpuidle/state3/name:C4-ATM
/sys/devices/system/cpu/cpu0/cpuidle/state3/power:0
/sys/devices/system/cpu/cpu0/cpuidle/state3/residency:400
/sys/devices/system/cpu/cpu0/cpuidle/state3/time:7146705
/sys/devices/system/cpu/cpu0/cpuidle/state3/usage:6826

-- 
Ville Syrjälä
Intel OTC

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: S3 resume regression [1cf4f629d9d2 ("cpu/hotplug: Move online calls to hotplugged cpu")]
  2016-10-27 19:20                               ` Ville Syrjälä
@ 2016-10-27 19:25                                 ` Thomas Gleixner
  2016-10-27 20:37                                   ` Ville Syrjälä
  0 siblings, 1 reply; 28+ messages in thread
From: Thomas Gleixner @ 2016-10-27 19:25 UTC (permalink / raw)
  To: Ville Syrjälä
  Cc: Feng Tang, feng.tang, Rafael J. Wysocki, Rafael J. Wysocki,
	Steven Rostedt, Sebastian Andrzej Siewior, linux-arch,
	Rik van Riel, Srivatsa S. Bhat, Peter Zijlstra, Arjan van de Ven,
	Rusty Russell, Oleg Nesterov, Tejun Heo, Andrew Morton,
	Paul McKenney, Linus Torvalds, Paul Turner,
	Linux Kernel Mailing List, Zhang, Rui

[-- Attachment #1: Type: text/plain, Size: 1854 bytes --]

On Thu, 27 Oct 2016, Ville Syrjälä wrote:
> On Thu, Oct 27, 2016 at 08:48:57PM +0200, Thomas Gleixner wrote:
> > What that old patch did, was:
> > 
> > 1) Make sure that the broadcast device is actually armed at resume.
> > 
> >    That might cause the HPET to resume proper.
> > 
> > 2) Force a max. 3 seconds rearm when the targeted expiry time is > than 10
> >    seconds
> > 
> >    That might make sure that lower C-States are never entered.
> 
> Doh. I lost the other hunk somewhere. Let's try that again... And indeed
> with the other hunk in tow the machine would appear to resume properly.

So it would be interesting whether that hunk in resume_broadcast() is
sufficient.
 
> > What's the lowest C-State with acpi-idle and what's the lowest one with
> > intel_idle?
> 
> acpi_idle
> /sys/devices/system/cpu/cpu0/cpuidle/state3/desc:ACPI FFH INTEL MWAIT 0x30
> /sys/devices/system/cpu/cpu0/cpuidle/state3/disable:0
> /sys/devices/system/cpu/cpu0/cpuidle/state3/latency:100
> /sys/devices/system/cpu/cpu0/cpuidle/state3/name:C3
> /sys/devices/system/cpu/cpu0/cpuidle/state3/power:0
> /sys/devices/system/cpu/cpu0/cpuidle/state3/residency:200
> /sys/devices/system/cpu/cpu0/cpuidle/state3/time:5677316
> /sys/devices/system/cpu/cpu0/cpuidle/state3/usage:5920
> 
> intel_idle:
> /sys/devices/system/cpu/cpu0/cpuidle/state3/desc:MWAIT 0x30
> /sys/devices/system/cpu/cpu0/cpuidle/state3/disable:0
> /sys/devices/system/cpu/cpu0/cpuidle/state3/latency:100
> /sys/devices/system/cpu/cpu0/cpuidle/state3/name:C4-ATM
> /sys/devices/system/cpu/cpu0/cpuidle/state3/power:0
> /sys/devices/system/cpu/cpu0/cpuidle/state3/residency:400
> /sys/devices/system/cpu/cpu0/cpuidle/state3/time:7146705
> /sys/devices/system/cpu/cpu0/cpuidle/state3/usage:6826

Does the machine work, when you limit intel idle to C3, which would then
match acpi idle ?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: S3 resume regression [1cf4f629d9d2 ("cpu/hotplug: Move online calls to hotplugged cpu")]
  2016-10-27 19:25                                 ` Thomas Gleixner
@ 2016-10-27 20:37                                   ` Ville Syrjälä
  2016-10-27 20:41                                     ` Thomas Gleixner
  0 siblings, 1 reply; 28+ messages in thread
From: Ville Syrjälä @ 2016-10-27 20:37 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Feng Tang, feng.tang, Rafael J. Wysocki, Rafael J. Wysocki,
	Steven Rostedt, Sebastian Andrzej Siewior, linux-arch,
	Rik van Riel, Srivatsa S. Bhat, Peter Zijlstra, Arjan van de Ven,
	Rusty Russell, Oleg Nesterov, Tejun Heo, Andrew Morton,
	Paul McKenney, Linus Torvalds, Paul Turner,
	Linux Kernel Mailing List, Zhang, Rui

On Thu, Oct 27, 2016 at 09:25:05PM +0200, Thomas Gleixner wrote:
> On Thu, 27 Oct 2016, Ville Syrjälä wrote:
> > On Thu, Oct 27, 2016 at 08:48:57PM +0200, Thomas Gleixner wrote:
> > > What that old patch did, was:
> > > 
> > > 1) Make sure that the broadcast device is actually armed at resume.
> > > 
> > >    That might cause the HPET to resume proper.
> > > 
> > > 2) Force a max. 3 seconds rearm when the targeted expiry time is > than 10
> > >    seconds
> > > 
> > >    That might make sure that lower C-States are never entered.
> > 
> > Doh. I lost the other hunk somewhere. Let's try that again... And indeed
> > with the other hunk in tow the machine would appear to resume properly.
> 
> So it would be interesting whether that hunk in resume_broadcast() is
> sufficient.

So far it looks like the answer is yes.

Looks to be about 5 seconds slower than acpi-idle in resuming, but
I suppose that's not all that surprising ;)

>  
> > > What's the lowest C-State with acpi-idle and what's the lowest one with
> > > intel_idle?
> > 
> > acpi_idle
> > /sys/devices/system/cpu/cpu0/cpuidle/state3/desc:ACPI FFH INTEL MWAIT 0x30
> > /sys/devices/system/cpu/cpu0/cpuidle/state3/disable:0
> > /sys/devices/system/cpu/cpu0/cpuidle/state3/latency:100
> > /sys/devices/system/cpu/cpu0/cpuidle/state3/name:C3
> > /sys/devices/system/cpu/cpu0/cpuidle/state3/power:0
> > /sys/devices/system/cpu/cpu0/cpuidle/state3/residency:200
> > /sys/devices/system/cpu/cpu0/cpuidle/state3/time:5677316
> > /sys/devices/system/cpu/cpu0/cpuidle/state3/usage:5920
> > 
> > intel_idle:
> > /sys/devices/system/cpu/cpu0/cpuidle/state3/desc:MWAIT 0x30
> > /sys/devices/system/cpu/cpu0/cpuidle/state3/disable:0
> > /sys/devices/system/cpu/cpu0/cpuidle/state3/latency:100
> > /sys/devices/system/cpu/cpu0/cpuidle/state3/name:C4-ATM
> > /sys/devices/system/cpu/cpu0/cpuidle/state3/power:0
> > /sys/devices/system/cpu/cpu0/cpuidle/state3/residency:400
> > /sys/devices/system/cpu/cpu0/cpuidle/state3/time:7146705
> > /sys/devices/system/cpu/cpu0/cpuidle/state3/usage:6826
> 
> Does the machine work, when you limit intel idle to C3, which would then
> match acpi idle ?

I'm pretty sure I had tested all of these, but I just double checked
to make sure. There's no C3 with intel_idle so I limited to C2, but
that did not help.

Isn't it possible that ACPI C3 is in fact C4? I thought ACPI C-states
are always numbered non-sparsely, and in this case ACPI C3 could be
anything from C3 to C11 (if the processor actually supported such
states obviously). Actually now that I look at the descriptions for
the states in sysfs, it says "MWAIT 0x30" for state3 on both drivers,
which I presume means it's in fact C4 for both.

-- 
Ville Syrjälä
Intel OTC

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: S3 resume regression [1cf4f629d9d2 ("cpu/hotplug: Move online calls to hotplugged cpu")]
  2016-10-27 20:37                                   ` Ville Syrjälä
@ 2016-10-27 20:41                                     ` Thomas Gleixner
  2016-10-28 15:56                                       ` Ville Syrjälä
  0 siblings, 1 reply; 28+ messages in thread
From: Thomas Gleixner @ 2016-10-27 20:41 UTC (permalink / raw)
  To: Ville Syrjälä
  Cc: Feng Tang, feng.tang, Rafael J. Wysocki, Rafael J. Wysocki,
	Steven Rostedt, Sebastian Andrzej Siewior, linux-arch,
	Rik van Riel, Srivatsa S. Bhat, Peter Zijlstra, Arjan van de Ven,
	Rusty Russell, Oleg Nesterov, Tejun Heo, Andrew Morton,
	Paul McKenney, Linus Torvalds, Paul Turner,
	Linux Kernel Mailing List, Zhang, Rui

[-- Attachment #1: Type: text/plain, Size: 1215 bytes --]

On Thu, 27 Oct 2016, Ville Syrjälä wrote:
> On Thu, Oct 27, 2016 at 09:25:05PM +0200, Thomas Gleixner wrote:
> > So it would be interesting whether that hunk in resume_broadcast() is
> > sufficient.
> 
> So far it looks like the answer is yes.
> 
> Looks to be about 5 seconds slower than acpi-idle in resuming, but
> I suppose that's not all that surprising ;)

Well, set it to 1msec then. If that works reliably then we really can do
that unconditionally. There is no harm in firing a useless timer during
resume once.

> > Does the machine work, when you limit intel idle to C3, which would then
> > match acpi idle ?
> 
> I'm pretty sure I had tested all of these, but I just double checked
> to make sure. There's no C3 with intel_idle so I limited to C2, but
> that did not help.
> 
> Isn't it possible that ACPI C3 is in fact C4? I thought ACPI C-states
> are always numbered non-sparsely, and in this case ACPI C3 could be
> anything from C3 to C11 (if the processor actually supported such
> states obviously). Actually now that I look at the descriptions for
> the states in sysfs, it says "MWAIT 0x30" for state3 on both drivers,
> which I presume means it's in fact C4 for both.

Indeed.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: S3 resume regression [1cf4f629d9d2 ("cpu/hotplug: Move online calls to hotplugged cpu")]
  2016-10-27 20:41                                     ` Thomas Gleixner
@ 2016-10-28 15:56                                       ` Ville Syrjälä
  2016-10-28 18:58                                         ` Thomas Gleixner
  0 siblings, 1 reply; 28+ messages in thread
From: Ville Syrjälä @ 2016-10-28 15:56 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Feng Tang, feng.tang, Rafael J. Wysocki, Rafael J. Wysocki,
	Steven Rostedt, Sebastian Andrzej Siewior, linux-arch,
	Rik van Riel, Srivatsa S. Bhat, Peter Zijlstra, Arjan van de Ven,
	Rusty Russell, Oleg Nesterov, Tejun Heo, Andrew Morton,
	Paul McKenney, Linus Torvalds, Paul Turner,
	Linux Kernel Mailing List, Zhang, Rui

On Thu, Oct 27, 2016 at 10:41:18PM +0200, Thomas Gleixner wrote:
> On Thu, 27 Oct 2016, Ville Syrjälä wrote:
> > On Thu, Oct 27, 2016 at 09:25:05PM +0200, Thomas Gleixner wrote:
> > > So it would be interesting whether that hunk in resume_broadcast() is
> > > sufficient.
> > 
> > So far it looks like the answer is yes.
> > 
> > Looks to be about 5 seconds slower than acpi-idle in resuming, but
> > I suppose that's not all that surprising ;)
> 
> Well, set it to 1msec then. If that works reliably then we really can do
> that unconditionally. There is no harm in firing a useless timer during
> resume once.

I narrowed down the required timeout, and looks like 25ms is the
minimum that works. With 24ms I already started to have failures. So
maybe just bump it up by an order of magnitude to 250ms for some
safety margin?

In any case I think I'll leave the machine running S3 cycles over the
weekend with the 25ms timeout just to see if it will eventually fail.

-- 
Ville Syrjälä
Intel OTC

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: S3 resume regression [1cf4f629d9d2 ("cpu/hotplug: Move online calls to hotplugged cpu")]
  2016-10-28 15:56                                       ` Ville Syrjälä
@ 2016-10-28 18:58                                         ` Thomas Gleixner
  2016-11-01 20:47                                           ` Ville Syrjälä
  0 siblings, 1 reply; 28+ messages in thread
From: Thomas Gleixner @ 2016-10-28 18:58 UTC (permalink / raw)
  To: Ville Syrjälä
  Cc: Feng Tang, feng.tang, Rafael J. Wysocki, Rafael J. Wysocki,
	Steven Rostedt, Sebastian Andrzej Siewior, linux-arch,
	Rik van Riel, Srivatsa S. Bhat, Peter Zijlstra, Arjan van de Ven,
	Rusty Russell, Oleg Nesterov, Tejun Heo, Andrew Morton,
	Paul McKenney, Linus Torvalds, Paul Turner,
	Linux Kernel Mailing List, Zhang, Rui

[-- Attachment #1: Type: text/plain, Size: 1549 bytes --]

On Fri, 28 Oct 2016, Ville Syrjälä wrote:
> On Thu, Oct 27, 2016 at 10:41:18PM +0200, Thomas Gleixner wrote:
> > On Thu, 27 Oct 2016, Ville Syrjälä wrote:
> > > On Thu, Oct 27, 2016 at 09:25:05PM +0200, Thomas Gleixner wrote:
> > > > So it would be interesting whether that hunk in resume_broadcast() is
> > > > sufficient.
> > > 
> > > So far it looks like the answer is yes.
> > > 
> > > Looks to be about 5 seconds slower than acpi-idle in resuming, but
> > > I suppose that's not all that surprising ;)
> > 
> > Well, set it to 1msec then. If that works reliably then we really can do
> > that unconditionally. There is no harm in firing a useless timer during
> > resume once.
> 
> I narrowed down the required timeout, and looks like 25ms is the
> minimum that works. With 24ms I already started to have failures. So
> maybe just bump it up by an order of magnitude to 250ms for some
> safety margin?

Sure, but what puzzles me is that we need a timeout that big. What happens
between broadcast_resume() and broadcast_resume() + 25ms?

IOW, what is the event/resume function which we need to bridge. We should
really try to track than down.

You might try to enable function tracing and do a tracing_off() when that
25ms timeout fires.

Something like 

	stop_trace = true;

in broadcast_resume() and then in the broadcast timer function:

	if (stop_trace) {
		stop_trace = false;
		tracing_off();
	}

Then when the machine is up read the trace, compress and upload it
somewhere or send it in private mail if it's not that big.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: S3 resume regression [1cf4f629d9d2 ("cpu/hotplug: Move online calls to hotplugged cpu")]
  2016-10-28 18:58                                         ` Thomas Gleixner
@ 2016-11-01 20:47                                           ` Ville Syrjälä
  2016-11-07 11:49                                             ` Ville Syrjälä
  2016-11-09  3:54                                             ` Feng Tang
  0 siblings, 2 replies; 28+ messages in thread
From: Ville Syrjälä @ 2016-11-01 20:47 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Feng Tang, feng.tang, Rafael J. Wysocki, Rafael J. Wysocki,
	Steven Rostedt, Sebastian Andrzej Siewior, linux-arch,
	Rik van Riel, Srivatsa S. Bhat, Peter Zijlstra, Arjan van de Ven,
	Rusty Russell, Oleg Nesterov, Tejun Heo, Andrew Morton,
	Paul McKenney, Linus Torvalds, Paul Turner,
	Linux Kernel Mailing List, Zhang, Rui

On Fri, Oct 28, 2016 at 08:58:41PM +0200, Thomas Gleixner wrote:
> On Fri, 28 Oct 2016, Ville Syrjälä wrote:
> > On Thu, Oct 27, 2016 at 10:41:18PM +0200, Thomas Gleixner wrote:
> > > On Thu, 27 Oct 2016, Ville Syrjälä wrote:
> > > > On Thu, Oct 27, 2016 at 09:25:05PM +0200, Thomas Gleixner wrote:
> > > > > So it would be interesting whether that hunk in resume_broadcast() is
> > > > > sufficient.
> > > > 
> > > > So far it looks like the answer is yes.
> > > > 
> > > > Looks to be about 5 seconds slower than acpi-idle in resuming, but
> > > > I suppose that's not all that surprising ;)
> > > 
> > > Well, set it to 1msec then. If that works reliably then we really can do
> > > that unconditionally. There is no harm in firing a useless timer during
> > > resume once.
> > 
> > I narrowed down the required timeout, and looks like 25ms is the
> > minimum that works. With 24ms I already started to have failures. So
> > maybe just bump it up by an order of magnitude to 250ms for some
> > safety margin?

I left the thing running for the weekend and it failed 26 out of 16057
times with the 25ms timeout. Looks like it takes ~5 minutes to resume
when it fails, but eventually it does come back.

> 
> Sure, but what puzzles me is that we need a timeout that big. What happens
> between broadcast_resume() and broadcast_resume() + 25ms?
> 
> IOW, what is the event/resume function which we need to bridge. We should
> really try to track than down.

My hunch would be that SMM trap in the DSDT/SSDT since that's where
things ended up last time I was tracing these resume problems. Though I
can't recall if that was just with acpi-idle or if intel_idle landed in
the same spot as well.

I guess I can try to repeat that test tomorrow, or I'll try your function
tracer method if the other thing fails.

> 
> You might try to enable function tracing and do a tracing_off() when that
> 25ms timeout fires.
> 
> Something like 
> 
> 	stop_trace = true;
> 
> in broadcast_resume() and then in the broadcast timer function:
> 
> 	if (stop_trace) {
> 		stop_trace = false;
> 		tracing_off();
> 	}
> 
> Then when the machine is up read the trace, compress and upload it
> somewhere or send it in private mail if it's not that big.
> 
> Thanks,
> 
> 	tglx


-- 
Ville Syrjälä
Intel OTC

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: S3 resume regression [1cf4f629d9d2 ("cpu/hotplug: Move online calls to hotplugged cpu")]
  2016-11-01 20:47                                           ` Ville Syrjälä
@ 2016-11-07 11:49                                             ` Ville Syrjälä
  2016-11-07 13:07                                               ` Thomas Gleixner
  2016-11-09  3:54                                             ` Feng Tang
  1 sibling, 1 reply; 28+ messages in thread
From: Ville Syrjälä @ 2016-11-07 11:49 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Feng Tang, feng.tang, Rafael J. Wysocki, Rafael J. Wysocki,
	Steven Rostedt, Sebastian Andrzej Siewior, linux-arch,
	Rik van Riel, Srivatsa S. Bhat, Peter Zijlstra, Arjan van de Ven,
	Rusty Russell, Oleg Nesterov, Tejun Heo, Andrew Morton,
	Paul McKenney, Linus Torvalds, Paul Turner,
	Linux Kernel Mailing List, Zhang, Rui

On Tue, Nov 01, 2016 at 10:47:37PM +0200, Ville Syrjälä wrote:
> On Fri, Oct 28, 2016 at 08:58:41PM +0200, Thomas Gleixner wrote:
> > On Fri, 28 Oct 2016, Ville Syrjälä wrote:
> > > On Thu, Oct 27, 2016 at 10:41:18PM +0200, Thomas Gleixner wrote:
> > > > On Thu, 27 Oct 2016, Ville Syrjälä wrote:
> > > > > On Thu, Oct 27, 2016 at 09:25:05PM +0200, Thomas Gleixner wrote:
> > > > > > So it would be interesting whether that hunk in resume_broadcast() is
> > > > > > sufficient.
> > > > > 
> > > > > So far it looks like the answer is yes.
> > > > > 
> > > > > Looks to be about 5 seconds slower than acpi-idle in resuming, but
> > > > > I suppose that's not all that surprising ;)
> > > > 
> > > > Well, set it to 1msec then. If that works reliably then we really can do
> > > > that unconditionally. There is no harm in firing a useless timer during
> > > > resume once.
> > > 
> > > I narrowed down the required timeout, and looks like 25ms is the
> > > minimum that works. With 24ms I already started to have failures. So
> > > maybe just bump it up by an order of magnitude to 250ms for some
> > > safety margin?
> 
> I left the thing running for the weekend and it failed 26 out of 16057
> times with the 25ms timeout. Looks like it takes ~5 minutes to resume
> when it fails, but eventually it does come back.
> 
> > 
> > Sure, but what puzzles me is that we need a timeout that big. What happens
> > between broadcast_resume() and broadcast_resume() + 25ms?
> > 
> > IOW, what is the event/resume function which we need to bridge. We should
> > really try to track than down.
> 
> My hunch would be that SMM trap in the DSDT/SSDT since that's where
> things ended up last time I was tracing these resume problems. Though I
> can't recall if that was just with acpi-idle or if intel_idle landed in
> the same spot as well.
> 
> I guess I can try to repeat that test tomorrow, or I'll try your function
> tracer method if the other thing fails.

I didn't manage to find a lot of time to play around with this, but it
definitely looks like the SMM trap is the problem here. I repeated my
pm_trace experiemnts and when it gets stuck it is trying to execute the
_WAK ACPI method which is where the SMM trap happens.

Maybe the SMM code was written with the expectation of a periodic tick
or something like that?

> 
> > 
> > You might try to enable function tracing and do a tracing_off() when that
> > 25ms timeout fires.
> > 
> > Something like 
> > 
> > 	stop_trace = true;
> > 
> > in broadcast_resume() and then in the broadcast timer function:
> > 
> > 	if (stop_trace) {
> > 		stop_trace = false;
> > 		tracing_off();
> > 	}
> > 
> > Then when the machine is up read the trace, compress and upload it
> > somewhere or send it in private mail if it's not that big.
> > 
> > Thanks,
> > 
> > 	tglx
> 
> 
> -- 
> Ville Syrjälä
> Intel OTC

-- 
Ville Syrjälä
Intel OTC

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: S3 resume regression [1cf4f629d9d2 ("cpu/hotplug: Move online calls to hotplugged cpu")]
  2016-11-07 11:49                                             ` Ville Syrjälä
@ 2016-11-07 13:07                                               ` Thomas Gleixner
  2016-11-07 16:45                                                 ` Ville Syrjälä
  0 siblings, 1 reply; 28+ messages in thread
From: Thomas Gleixner @ 2016-11-07 13:07 UTC (permalink / raw)
  To: Ville Syrjälä
  Cc: Feng Tang, feng.tang, Rafael J. Wysocki, Rafael J. Wysocki,
	Steven Rostedt, Sebastian Andrzej Siewior, linux-arch,
	Rik van Riel, Srivatsa S. Bhat, Peter Zijlstra, Arjan van de Ven,
	Rusty Russell, Oleg Nesterov, Tejun Heo, Andrew Morton,
	Paul McKenney, Linus Torvalds, Paul Turner,
	Linux Kernel Mailing List, Zhang, Rui

[-- Attachment #1: Type: text/plain, Size: 1860 bytes --]

On Mon, 7 Nov 2016, Ville Syrjälä wrote:
> I didn't manage to find a lot of time to play around with this, but it
> definitely looks like the SMM trap is the problem here. I repeated my
> pm_trace experiemnts and when it gets stuck it is trying to execute the
> _WAK ACPI method which is where the SMM trap happens.
> 
> Maybe the SMM code was written with the expectation of a periodic tick
> or something like that?

Can you try the untested hack below, please? It should confirm that.

Thanks,

	tglx

8<---------------
--- a/drivers/acpi/acpica/hwsleep.c
+++ b/drivers/acpi/acpica/hwsleep.c
@@ -269,6 +269,17 @@ acpi_status acpi_hw_legacy_wake_prep(u8
 	return_ACPI_STATUS(status);
 }
 
+static const ktime_t time10ms = { .tv64 = 10 * NSEC_PER_MSEC };
+
+static enum hrtimer_restart acpi_hw_legacy_tmr(struct hrtimer *tmr)
+{
+	hrtimer_forward_now(tmr, time10ms);
+
+	return HRTIMER_RESTART;
+}
+
+
+
 /*******************************************************************************
  *
  * FUNCTION:    acpi_hw_legacy_wake
@@ -284,6 +295,7 @@ acpi_status acpi_hw_legacy_wake_prep(u8
 
 acpi_status acpi_hw_legacy_wake(u8 sleep_state)
 {
+	struct hrtimer timer;
 	acpi_status status;
 
 	ACPI_FUNCTION_TRACE(hw_legacy_wake);
@@ -311,12 +323,18 @@ acpi_status acpi_hw_legacy_wake(u8 sleep
 		return_ACPI_STATUS(status);
 	}
 
+	hrtimer_init_on_stack(&timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+	timer.function = acpi_hw_legacy_tmr;
+	hrtimer_start(&timer, time10ms, HRTIMER_MODE_REL);
+
 	/*
 	 * Now we can execute _WAK, etc. Some machines require that the GPEs
 	 * are enabled before the wake methods are executed.
 	 */
 	acpi_hw_execute_sleep_method(METHOD_PATHNAME__WAK, sleep_state);
 
+	hrtimer_cancel(&timer);
+
 	/*
 	 * Some BIOS code assumes that WAK_STS will be cleared on resume
 	 * and use it to determine whether the system is rebooting or

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: S3 resume regression [1cf4f629d9d2 ("cpu/hotplug: Move online calls to hotplugged cpu")]
  2016-11-07 13:07                                               ` Thomas Gleixner
@ 2016-11-07 16:45                                                 ` Ville Syrjälä
  0 siblings, 0 replies; 28+ messages in thread
From: Ville Syrjälä @ 2016-11-07 16:45 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Feng Tang, feng.tang, Rafael J. Wysocki, Rafael J. Wysocki,
	Steven Rostedt, Sebastian Andrzej Siewior, linux-arch,
	Rik van Riel, Srivatsa S. Bhat, Peter Zijlstra, Arjan van de Ven,
	Rusty Russell, Oleg Nesterov, Tejun Heo, Andrew Morton,
	Paul McKenney, Linus Torvalds, Paul Turner,
	Linux Kernel Mailing List, Zhang, Rui

On Mon, Nov 07, 2016 at 02:07:43PM +0100, Thomas Gleixner wrote:
> On Mon, 7 Nov 2016, Ville Syrjälä wrote:
> > I didn't manage to find a lot of time to play around with this, but it
> > definitely looks like the SMM trap is the problem here. I repeated my
> > pm_trace experiemnts and when it gets stuck it is trying to execute the
> > _WAK ACPI method which is where the SMM trap happens.
> > 
> > Maybe the SMM code was written with the expectation of a periodic tick
> > or something like that?
> 
> Can you try the untested hack below, please? It should confirm that.
> 
> Thanks,
> 
> 	tglx
> 
> 8<---------------
> --- a/drivers/acpi/acpica/hwsleep.c
> +++ b/drivers/acpi/acpica/hwsleep.c
> @@ -269,6 +269,17 @@ acpi_status acpi_hw_legacy_wake_prep(u8
>  	return_ACPI_STATUS(status);
>  }
>  
> +static const ktime_t time10ms = { .tv64 = 10 * NSEC_PER_MSEC };
> +
> +static enum hrtimer_restart acpi_hw_legacy_tmr(struct hrtimer *tmr)
> +{
> +	hrtimer_forward_now(tmr, time10ms);
> +
> +	return HRTIMER_RESTART;
> +}
> +
> +
> +
>  /*******************************************************************************
>   *
>   * FUNCTION:    acpi_hw_legacy_wake
> @@ -284,6 +295,7 @@ acpi_status acpi_hw_legacy_wake_prep(u8
>  
>  acpi_status acpi_hw_legacy_wake(u8 sleep_state)
>  {
> +	struct hrtimer timer;
>  	acpi_status status;
>  
>  	ACPI_FUNCTION_TRACE(hw_legacy_wake);
> @@ -311,12 +323,18 @@ acpi_status acpi_hw_legacy_wake(u8 sleep
>  		return_ACPI_STATUS(status);
>  	}
>  
> +	hrtimer_init_on_stack(&timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> +	timer.function = acpi_hw_legacy_tmr;
> +	hrtimer_start(&timer, time10ms, HRTIMER_MODE_REL);
> +
>  	/*
>  	 * Now we can execute _WAK, etc. Some machines require that the GPEs
>  	 * are enabled before the wake methods are executed.
>  	 */
>  	acpi_hw_execute_sleep_method(METHOD_PATHNAME__WAK, sleep_state);
>  
> +	hrtimer_cancel(&timer);
> +
>  	/*
>  	 * Some BIOS code assumes that WAK_STS will be cleared on resume
>  	 * and use it to determine whether the system is rebooting or

Doesn't really seem to help. I did get a few random successes, but
mostly it just fails.

diff --git a/drivers/acpi/acpica/hwsleep.c b/drivers/acpi/acpica/hwsleep.c
index 3c9c10bd49e9..950319b619f1 100644
--- a/drivers/acpi/acpica/hwsleep.c
+++ b/drivers/acpi/acpica/hwsleep.c
@@ -44,6 +44,8 @@
 
 #include <acpi/acpi.h>
 #include <linux/acpi.h>
+#include <linux/pm-trace.h>
+#include <linux/delay.h>
 #include "accommon.h"
 
 #define _COMPONENT          ACPI_HARDWARE
@@ -275,6 +277,8 @@ static enum hrtimer_restart acpi_hw_legacy_tmr(struct hrtimer *tmr)
 {
 	hrtimer_forward_now(tmr, time10ms);
 
+	TRACE_RESUME(0);
+
 	return HRTIMER_RESTART;
 }
 
@@ -327,14 +331,17 @@ acpi_status acpi_hw_legacy_wake(u8 sleep_state)
 	timer.function = acpi_hw_legacy_tmr;
 	hrtimer_start(&timer, time10ms, HRTIMER_MODE_REL);
 
+	TRACE_RESUME(0);
 	/*
 	 * Now we can execute _WAK, etc. Some machines require that the GPEs
 	 * are enabled before the wake methods are executed.
 	 */
 	acpi_hw_execute_sleep_method(METHOD_PATHNAME__WAK, sleep_state);
+	TRACE_RESUME(0);
 
 	hrtimer_cancel(&timer);
 
+	TRACE_RESUME(0);
 	/*
 	 * Some BIOS code assumes that WAK_STS will be cleared on resume
 	 * and use it to determine whether the system is rebooting or

Tossing that on top shows the trace before the _WAK being the last one
executed. Adding an msleep() before the _WAK does make the trace from
the timer handler show up so at least the timer seems to be ticking
up until some point.

-- 
Ville Syrjälä
Intel OTC

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: S3 resume regression [1cf4f629d9d2 ("cpu/hotplug: Move online calls to hotplugged cpu")]
  2016-11-01 20:47                                           ` Ville Syrjälä
  2016-11-07 11:49                                             ` Ville Syrjälä
@ 2016-11-09  3:54                                             ` Feng Tang
  2016-11-09  6:08                                               ` Linus Torvalds
  1 sibling, 1 reply; 28+ messages in thread
From: Feng Tang @ 2016-11-09  3:54 UTC (permalink / raw)
  To: Ville Syrjälä
  Cc: Thomas Gleixner, Feng Tang, Rafael J. Wysocki, Wysocki, Rafael J,
	Steven Rostedt, Sebastian Andrzej Siewior,
	linux-arch@vger.kernel.org, Rik van Riel, Srivatsa S. Bhat,
	Peter Zijlstra, Arjan van de Ven, Rusty Russell, Oleg Nesterov,
	Tejun Heo, Andrew Morton, Paul McKenney, Linus Torvalds,
	Paul Turner

On Wed, Nov 02, 2016 at 04:47:37AM +0800, Ville Syrjälä wrote:
> On Fri, Oct 28, 2016 at 08:58:41PM +0200, Thomas Gleixner wrote:
> > On Fri, 28 Oct 2016, Ville Syrjälä wrote:
> > > On Thu, Oct 27, 2016 at 10:41:18PM +0200, Thomas Gleixner wrote:
> > > > On Thu, 27 Oct 2016, Ville Syrjälä wrote:
> > > > > On Thu, Oct 27, 2016 at 09:25:05PM +0200, Thomas Gleixner wrote:
> > > > > > So it would be interesting whether that hunk in resume_broadcast() is
> > > > > > sufficient.
> > > > > 
> > > > > So far it looks like the answer is yes.
> > > > > 
> > > > > Looks to be about 5 seconds slower than acpi-idle in resuming, but
> > > > > I suppose that's not all that surprising ;)
> > > > 
> > > > Well, set it to 1msec then. If that works reliably then we really can do
> > > > that unconditionally. There is no harm in firing a useless timer during
> > > > resume once.
> > > 
> > > I narrowed down the required timeout, and looks like 25ms is the
> > > minimum that works. With 24ms I already started to have failures. So
> > > maybe just bump it up by an order of magnitude to 250ms for some
> > > safety margin?
> 
> I left the thing running for the weekend and it failed 26 out of 16057
> times with the 25ms timeout. Looks like it takes ~5 minutes to resume
> when it fails, but eventually it does come back.
> 

Just came back from a travel. Yes, the 5 minutes delay may be due to the
expiration of the HPET timer, counting from 0 to 0xffffffff for a 13M
frequencey HPET takes about 300 seconds. After resume, it seems nobody
arms it so my old patch forces to arm one event. 

Thanks,
Feng

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: S3 resume regression [1cf4f629d9d2 ("cpu/hotplug: Move online calls to hotplugged cpu")]
  2016-11-09  3:54                                             ` Feng Tang
@ 2016-11-09  6:08                                               ` Linus Torvalds
  2016-11-17 17:14                                                 ` Ville Syrjälä
  0 siblings, 1 reply; 28+ messages in thread
From: Linus Torvalds @ 2016-11-09  6:08 UTC (permalink / raw)
  To: Feng Tang
  Cc: Ville Syrjälä, Thomas Gleixner, Feng Tang,
	Rafael J. Wysocki, Wysocki, Rafael J, Steven Rostedt,
	Sebastian Andrzej Siewior, linux-arch@vger.kernel.org,
	Rik van Riel, Srivatsa S. Bhat, Peter Zijlstra, Arjan van de Ven,
	Rusty Russell, Oleg Nesterov, Tejun Heo, Andrew Morton,
	Paul McKenney, Paul Turner

On Tue, Nov 8, 2016 at 7:54 PM, Feng Tang <feng.tang@intel.com> wrote:
> On Wed, Nov 02, 2016 at 04:47:37AM +0800, Ville Syrjälä wrote:
>>
>> I left the thing running for the weekend and it failed 26 out of 16057
>> times with the 25ms timeout. Looks like it takes ~5 minutes to resume
>> when it fails, but eventually it does come back.
>
> Just came back from a travel. Yes, the 5 minutes delay may be due to the
> expiration of the HPET timer, counting from 0 to 0xffffffff for a 13M
> frequencey HPET takes about 300 seconds. After resume, it seems nobody
> arms it so my old patch forces to arm one event.

Ville, what happens if you disable HPET? Can you force the TSC with
"clocksource=tsc" or "tsc=reliable". Does resume work reliably then?

Or is this one of the CPU's where tsc just doesn't work?

              Linus

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: S3 resume regression [1cf4f629d9d2 ("cpu/hotplug: Move online calls to hotplugged cpu")]
  2016-11-09  6:08                                               ` Linus Torvalds
@ 2016-11-17 17:14                                                 ` Ville Syrjälä
  0 siblings, 0 replies; 28+ messages in thread
From: Ville Syrjälä @ 2016-11-17 17:14 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Feng Tang, Thomas Gleixner, Feng Tang, Rafael J. Wysocki,
	Wysocki, Rafael J, Steven Rostedt, Sebastian Andrzej Siewior,
	linux-arch@vger.kernel.org, Rik van Riel, Srivatsa S. Bhat,
	Peter Zijlstra, Arjan van de Ven, Rusty Russell, Oleg Nesterov,
	Tejun Heo, Andrew Morton, Paul McKenney, Paul Turner,
	Linux Kernel Mailing List

On Tue, Nov 08, 2016 at 10:08:37PM -0800, Linus Torvalds wrote:
> On Tue, Nov 8, 2016 at 7:54 PM, Feng Tang <feng.tang@intel.com> wrote:
> > On Wed, Nov 02, 2016 at 04:47:37AM +0800, Ville Syrjälä wrote:
> >>
> >> I left the thing running for the weekend and it failed 26 out of 16057
> >> times with the 25ms timeout. Looks like it takes ~5 minutes to resume
> >> when it fails, but eventually it does come back.
> >
> > Just came back from a travel. Yes, the 5 minutes delay may be due to the
> > expiration of the HPET timer, counting from 0 to 0xffffffff for a 13M
> > frequencey HPET takes about 300 seconds. After resume, it seems nobody
> > arms it so my old patch forces to arm one event.
> 
> Ville, what happens if you disable HPET? Can you force the TSC with
> "clocksource=tsc" or "tsc=reliable". Does resume work reliably then?
> 
> Or is this one of the CPU's where tsc just doesn't work?

tsc=reliable allows use of the tsc it seems. Doesn't seem to help
with resuming though.

-- 
Ville Syrjälä
Intel OTC

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2016-11-17 17:14 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20160511101920.GZ4329@intel.com>
     [not found] ` <57332171.8070403@linutronix.de>
     [not found]   ` <20160511122116.GA4329@intel.com>
2016-05-11 13:36     ` S3 resume regression [1cf4f629d9d2 ("cpu/hotplug: Move online calls to hotplugged cpu")] Rafael J. Wysocki
2016-05-11 15:25       ` Jim Bos
2016-05-11 16:19         ` Rafael J. Wysocki
2016-05-11 16:21           ` Sebastian Andrzej Siewior
2016-05-11 16:24             ` Rafael J. Wysocki
     [not found]     ` <20160511084445.00030b49@gandalf.local.home>
     [not found]       ` <20160511133406.GC4329@intel.com>
     [not found]         ` <20160516193910.GL4329@intel.com>
2016-05-17 23:14           ` Rafael J. Wysocki
2016-05-18  7:24             ` Ville Syrjälä
2016-05-26 18:32               ` Ville Syrjälä
2016-05-30 20:43                 ` Rafael J. Wysocki
2016-05-31  7:26                   ` Ville Syrjälä
2016-07-13 14:54                     ` Ville Syrjälä
2016-07-14  8:29                       ` Feng Tang
2016-08-09 17:20                         ` Ville Syrjälä
2016-10-27 17:28                           ` Ville Syrjälä
2016-10-27 18:48                             ` Thomas Gleixner
2016-10-27 19:20                               ` Ville Syrjälä
2016-10-27 19:25                                 ` Thomas Gleixner
2016-10-27 20:37                                   ` Ville Syrjälä
2016-10-27 20:41                                     ` Thomas Gleixner
2016-10-28 15:56                                       ` Ville Syrjälä
2016-10-28 18:58                                         ` Thomas Gleixner
2016-11-01 20:47                                           ` Ville Syrjälä
2016-11-07 11:49                                             ` Ville Syrjälä
2016-11-07 13:07                                               ` Thomas Gleixner
2016-11-07 16:45                                                 ` Ville Syrjälä
2016-11-09  3:54                                             ` Feng Tang
2016-11-09  6:08                                               ` Linus Torvalds
2016-11-17 17:14                                                 ` Ville Syrjälä

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).