public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [REGRESSION,STABLE,BISECTED] Hang on resume from standby in 3.1.[56], 3.2-rc*
@ 2011-12-23 17:31 Phil Miller
  2011-12-24  6:40 ` Phil Miller
  0 siblings, 1 reply; 6+ messages in thread
From: Phil Miller @ 2011-12-23 17:31 UTC (permalink / raw)
  To: Thomas Gleixner, Greg Kroah-Hartman, stable, LKML

I've got a Dell Precision T1500 (lspci, dmidecode, and dmesg output at
http://charm.cs.uiuc.edu/~phil/linux-suspend-hang/ ) that I generally
suspend when I'm out of the house or asleep, and wake up when I want
to use it. Sadly, a recent change to the kernel has disrupted that
happy state of affairs. When I run the most recent stable or
pre-release versions, the kernel hangs on resume. I can still switch
virtual consoles, and get keyboard output echoed to the screen, but no
userspace code seems to be running (e.g. login doesn't give me a
password prompt after entering a username), nor does the system
respond to ping or SSH connections.

Bisection between v3.1 and v3.1.6 points to the following commit as the culprit:
=====
commit aeed6baa702a285cf03b7dc4182ffc1a7f4e4ed6
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Fri Dec 2 16:02:45 2011 +0100

    clockevents: Set noop handler in clockevents_exchange_device()

    commit de28f25e8244c7353abed8de0c7792f5f883588c upstream.

    If a device is shutdown, then there might be a pending interrupt,
    which will be processed after we reenable interrupts, which causes the
    original handler to be run. If the old handler is the (broadcast)
    periodic handler the shutdown state might hang the kernel completely.

    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

:040000 040000 1cfa477e4be68746d7ef2818a430d50424b06572
ccd4ef67437a19acba03df2debac6eb8c5957b30 M	kernel
=====

I've tested that reverting this commit also restores my ability to
resume from suspend on 3.2-rc6.

If there's anything else I can do to help diagnose this issue or get
it fixed, I'm happy to help.

Thanks.

Phil

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [REGRESSION,STABLE,BISECTED] Hang on resume from standby in 3.1.[56], 3.2-rc*
  2011-12-23 17:31 [REGRESSION,STABLE,BISECTED] Hang on resume from standby in 3.1.[56], 3.2-rc* Phil Miller
@ 2011-12-24  6:40 ` Phil Miller
  2011-12-24  9:09   ` Michael Tokarev
       [not found]   ` <d71755af705048a4868348343417a4c4@CITESHT4.ad.uillinois.edu>
  0 siblings, 2 replies; 6+ messages in thread
From: Phil Miller @ 2011-12-24  6:40 UTC (permalink / raw)
  To: Thomas Gleixner, Greg Kroah-Hartman, stable, LKML,
	Venkatesh Pallipadi

I just went digging through the history, and it looks like the commit
I found to be problematic partially reverts
7c1e76897492d92b6a1c2d6892494d39ded9680c, from 2008.

On Fri, Dec 23, 2011 at 11:31, Phil Miller <mille121@illinois.edu> wrote:
> I've got a Dell Precision T1500 (lspci, dmidecode, and dmesg output at
> http://charm.cs.uiuc.edu/~phil/linux-suspend-hang/ ) that I generally
> suspend when I'm out of the house or asleep, and wake up when I want
> to use it. Sadly, a recent change to the kernel has disrupted that
> happy state of affairs. When I run the most recent stable or
> pre-release versions, the kernel hangs on resume. I can still switch
> virtual consoles, and get keyboard output echoed to the screen, but no
> userspace code seems to be running (e.g. login doesn't give me a
> password prompt after entering a username), nor does the system
> respond to ping or SSH connections.
>
> Bisection between v3.1 and v3.1.6 points to the following commit as the culprit:
> =====
> commit aeed6baa702a285cf03b7dc4182ffc1a7f4e4ed6
> Author: Thomas Gleixner <tglx@linutronix.de>
> Date:   Fri Dec 2 16:02:45 2011 +0100
>
>    clockevents: Set noop handler in clockevents_exchange_device()
>
>    commit de28f25e8244c7353abed8de0c7792f5f883588c upstream.
>
>    If a device is shutdown, then there might be a pending interrupt,
>    which will be processed after we reenable interrupts, which causes the
>    original handler to be run. If the old handler is the (broadcast)
>    periodic handler the shutdown state might hang the kernel completely.
>
>    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
>
> :040000 040000 1cfa477e4be68746d7ef2818a430d50424b06572
> ccd4ef67437a19acba03df2debac6eb8c5957b30 M      kernel
> =====
>
> I've tested that reverting this commit also restores my ability to
> resume from suspend on 3.2-rc6.
>
> If there's anything else I can do to help diagnose this issue or get
> it fixed, I'm happy to help.
>
> Thanks.
>
> Phil

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [REGRESSION,STABLE,BISECTED] Hang on resume from standby in 3.1.[56], 3.2-rc*
  2011-12-24  6:40 ` Phil Miller
@ 2011-12-24  9:09   ` Michael Tokarev
       [not found]   ` <d71755af705048a4868348343417a4c4@CITESHT4.ad.uillinois.edu>
  1 sibling, 0 replies; 6+ messages in thread
From: Michael Tokarev @ 2011-12-24  9:09 UTC (permalink / raw)
  To: Phil Miller
  Cc: Thomas Gleixner, Greg Kroah-Hartman, stable, LKML,
	Venkatesh Pallipadi

On 24.12.2011 10:40, Phil Miller wrote:
> I just went digging through the history, and it looks like the commit
> I found to be problematic partially reverts
> 7c1e76897492d92b6a1c2d6892494d39ded9680c, from 2008.
> 
> On Fri, Dec 23, 2011 at 11:31, Phil Miller <mille121@illinois.edu> wrote:
>> I've got a Dell Precision T1500 (lspci, dmidecode, and dmesg output at
>> http://charm.cs.uiuc.edu/~phil/linux-suspend-hang/ ) that I generally
>> suspend when I'm out of the house or asleep, and wake up when I want
>> to use it. Sadly, a recent change to the kernel has disrupted that
>> happy state of affairs. When I run the most recent stable or
>> pre-release versions, the kernel hangs on resume. I can still switch
>> virtual consoles, and get keyboard output echoed to the screen, but no
>> userspace code seems to be running (e.g. login doesn't give me a
>> password prompt after entering a username), nor does the system
>> respond to ping or SSH connections.
>>
>> Bisection between v3.1 and v3.1.6 points to the following commit as the culprit:
>> =====
>> commit aeed6baa702a285cf03b7dc4182ffc1a7f4e4ed6
>> Author: Thomas Gleixner <tglx@linutronix.de>
>> Date:   Fri Dec 2 16:02:45 2011 +0100
>>
>>    clockevents: Set noop handler in clockevents_exchange_device()

I  noticed that my host also stopped resuming with 3.1, and noted that
with 3.1.3 it works ok.  I'm now trying to revert this commit too, to
see if that's the problem.

/mjt

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [REGRESSION,STABLE,BISECTED] Hang on resume from standby in 3.1.[56], 3.2-rc*
       [not found]   ` <d71755af705048a4868348343417a4c4@CITESHT4.ad.uillinois.edu>
@ 2011-12-24 16:51     ` Phil Miller
  2011-12-24 19:47       ` Michael Tokarev
  0 siblings, 1 reply; 6+ messages in thread
From: Phil Miller @ 2011-12-24 16:51 UTC (permalink / raw)
  To: Michael Tokarev
  Cc: Thomas Gleixner, Greg Kroah-Hartman, stable@kernel.org, LKML,
	Venkatesh Pallipadi

On Sat, Dec 24, 2011 at 03:09, Michael Tokarev <mjt@tls.msk.ru> wrote:
> On 24.12.2011 10:40, Phil Miller wrote:
>> On Fri, Dec 23, 2011 at 11:31, Phil Miller <mille121@illinois.edu> wrote:
>>> I've got a Dell Precision T1500 (lspci, dmidecode, and dmesg output at
>>> http://charm.cs.uiuc.edu/~phil/linux-suspend-hang/ ) that I generally
>>> suspend when I'm out of the house or asleep, and wake up when I want
>>> to use it. Sadly, a recent change to the kernel has disrupted that
>>> happy state of affairs. When I run the most recent stable or
>>> pre-release versions, the kernel hangs on resume. I can still switch
>>> virtual consoles, and get keyboard output echoed to the screen, but no
>>> userspace code seems to be running (e.g. login doesn't give me a
>>> password prompt after entering a username), nor does the system
>>> respond to ping or SSH connections.
>>>
>>> Bisection between v3.1 and v3.1.6 points to the following commit as the culprit:
>>> =====
>>> commit aeed6baa702a285cf03b7dc4182ffc1a7f4e4ed6
>>> Author: Thomas Gleixner <tglx@linutronix.de>
>>> Date:   Fri Dec 2 16:02:45 2011 +0100
>>>
>>>    clockevents: Set noop handler in clockevents_exchange_device()
>>
>> I just went digging through the history, and it looks like the commit
>> I found to be problematic partially reverts
>> 7c1e76897492d92b6a1c2d6892494d39ded9680c, from 2008.
>
> I  noticed that my host also stopped resuming with 3.1, and noted that
> with 3.1.3 it works ok.  I'm now trying to revert this commit too, to
> see if that's the problem.

I first noticed this using Debian unstable's packaged kernels, which
call themselves 3.x.0, but actually get revved through the stable
versions 3.x.y (I'll probably complain about that misnaming to them).
The upgrade from 3.1.4 to 3.1.5 is where it broke, matching the
bisection's results.

Thanks for the confirmation.

Phil

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [REGRESSION,STABLE,BISECTED] Hang on resume from standby in 3.1.[56], 3.2-rc*
  2011-12-24 16:51     ` Phil Miller
@ 2011-12-24 19:47       ` Michael Tokarev
       [not found]         ` <CAGHUO13PjU2tMj=oge3J5gYtMEiNMeCYas7PuV8djuSiOyXoFQ@mail.gmail.com>
  0 siblings, 1 reply; 6+ messages in thread
From: Michael Tokarev @ 2011-12-24 19:47 UTC (permalink / raw)
  To: Phil Miller
  Cc: Thomas Gleixner, Greg Kroah-Hartman, stable@kernel.org, LKML,
	Venkatesh Pallipadi

On 24.12.2011 20:51, Phil Miller wrote:
> On Sat, Dec 24, 2011 at 03:09, Michael Tokarev <mjt@tls.msk.ru> wrote:
>> On 24.12.2011 10:40, Phil Miller wrote:
>>> On Fri, Dec 23, 2011 at 11:31, Phil Miller <mille121@illinois.edu> wrote:
>>>> I've got a Dell Precision T1500 (lspci, dmidecode, and dmesg output at
>>>> http://charm.cs.uiuc.edu/~phil/linux-suspend-hang/ ) that I generally
>>>> suspend when I'm out of the house or asleep, and wake up when I want
>>>> to use it. Sadly, a recent change to the kernel has disrupted that
>>>> happy state of affairs. When I run the most recent stable or
[]
>> I  noticed that my host also stopped resuming with 3.1, and noted that
>> with 3.1.3 it works ok.  I'm now trying to revert this commit too, to
>> see if that's the problem.

Actually that wasn't the issue.  After several iterations (which took
some time) I found out that I can't reproduce the hangs which I were
able to trigger trivially just yesterday.  I can only guess these hangs
were due to some other software components (gnome, X stuff, whatever)
which happened to be upgraded today too, together with the kernel, and
the problem went away.  I booted into kernel with which the system
definitely had the issue at hand (3.1.3), but it resumes from suspend
(both s2ram and s2disk) without any issue whatsoever, I did several
resumes of each kind in a row, intermixed them together.

> I first noticed this using Debian unstable's packaged kernels, which
> call themselves 3.x.0, but actually get revved through the stable
> versions 3.x.y (I'll probably complain about that misnaming to them).
> The upgrade from 3.1.4 to 3.1.5 is where it broke, matching the
> bisection's results.

So it looks like not all systems suffer from this issue...

Please excuse me for the noize -- it really looked like the kernel
broke.

Thanks,

/mjt

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [REGRESSION,STABLE,BISECTED] Hang on resume from standby in 3.1.[56], 3.2-rc*
       [not found]         ` <CAGHUO13PjU2tMj=oge3J5gYtMEiNMeCYas7PuV8djuSiOyXoFQ@mail.gmail.com>
@ 2011-12-27 18:12           ` Venki Pallipadi
  0 siblings, 0 replies; 6+ messages in thread
From: Venki Pallipadi @ 2011-12-27 18:12 UTC (permalink / raw)
  To: Joseph Salisbury
  Cc: Michael Tokarev, Phil Miller, Thomas Gleixner, Greg Kroah-Hartman,
	stable@kernel.org, LKML

On Sat, Dec 24, 2011 at 1:12 PM, Joseph Salisbury
<josephtsalisbury@gmail.com> wrote:
>
>
> On Sat, Dec 24, 2011 at 2:47 PM, Michael Tokarev <mjt@tls.msk.ru> wrote:
>>
>> On 24.12.2011 20:51, Phil Miller wrote:
>> > On Sat, Dec 24, 2011 at 03:09, Michael Tokarev <mjt@tls.msk.ru> wrote:
>> >> On 24.12.2011 10:40, Phil Miller wrote:
>> >>> On Fri, Dec 23, 2011 at 11:31, Phil Miller <mille121@illinois.edu>
>> >>> wrote:
>> >>>> I've got a Dell Precision T1500 (lspci, dmidecode, and dmesg output
>> >>>> at
>> >>>> http://charm.cs.uiuc.edu/~phil/linux-suspend-hang/ ) that I generally
>> >>>> suspend when I'm out of the house or asleep, and wake up when I want
>> >>>> to use it. Sadly, a recent change to the kernel has disrupted that
>> >>>> happy state of affairs. When I run the most recent stable or
>> []
>> >> I  noticed that my host also stopped resuming with 3.1, and noted that
>> >> with 3.1.3 it works ok.  I'm now trying to revert this commit too, to
>> >> see if that's the problem.
>>
>> Actually that wasn't the issue.  After several iterations (which took
>> some time) I found out that I can't reproduce the hangs which I were
>> able to trigger trivially just yesterday.  I can only guess these hangs
>> were due to some other software components (gnome, X stuff, whatever)
>> which happened to be upgraded today too, together with the kernel, and
>> the problem went away.  I booted into kernel with which the system
>> definitely had the issue at hand (3.1.3), but it resumes from suspend
>> (both s2ram and s2disk) without any issue whatsoever, I did several
>> resumes of each kind in a row, intermixed them together.
>>
>> > I first noticed this using Debian unstable's packaged kernels, which
>> > call themselves 3.x.0, but actually get revved through the stable
>> > versions 3.x.y (I'll probably complain about that misnaming to them).
>> > The upgrade from 3.1.4 to 3.1.5 is where it broke, matching the
>> > bisection's results.
>>
>> So it looks like not all systems suffer from this issue...
>>
>> Please excuse me for the noize -- it really looked like the kernel
>> broke.
>>
>> Thanks,
>>
>> /mjt
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>
>
>
> This issue also happens in the 3.0 kernel.  A bisect and revert was done for
> a bug report[1].  Testing indicated the same commit caused the
> suspend/resume issue.
>

Both the reports here from Phil and Joe are with systems hav HPET MSI support
--
[    1.010213] HPET: 8 timers in total, 5 timers will be used for per-cpu timer
[    1.010221] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 40, 41, 42, 43, 44, 0
[    1.010226] hpet0: 8 comparators, 64-bit 14.318180 MHz counter
[    1.014637] hpet: hpet2 irq 40 for MSI
[    1.014771] hpet: hpet3 irq 41 for MSI
[    1.018748] hpet: hpet4 irq 42 for MSI
[    1.022810] hpet: hpet5 irq 43 for MSI
[    1.026712] hpet: hpet6 irq 44 for MSI
--

The ordering issue addressed in
7c1e76897492d92b6a1c2d6892494d39ded9680c was found with HPET MSI.
So, it seems like change de28f25e8244c7353abed8de0c7792f5f883588c
opened up that ordering problem again.

Thomas: Is de28f25e8244c7353abed8de0c7792f5f883588c for a specific bug
or regression on some platform?

Thanks,
Venki

>
> Thanks,
>
> Joe
>
> [1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/904569

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2011-12-27 18:12 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-12-23 17:31 [REGRESSION,STABLE,BISECTED] Hang on resume from standby in 3.1.[56], 3.2-rc* Phil Miller
2011-12-24  6:40 ` Phil Miller
2011-12-24  9:09   ` Michael Tokarev
     [not found]   ` <d71755af705048a4868348343417a4c4@CITESHT4.ad.uillinois.edu>
2011-12-24 16:51     ` Phil Miller
2011-12-24 19:47       ` Michael Tokarev
     [not found]         ` <CAGHUO13PjU2tMj=oge3J5gYtMEiNMeCYas7PuV8djuSiOyXoFQ@mail.gmail.com>
2011-12-27 18:12           ` Venki Pallipadi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox