* [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected)
@ 2008-01-27 21:29 Rafael J. Wysocki
2008-01-27 21:59 ` Ingo Molnar
0 siblings, 1 reply; 27+ messages in thread
From: Rafael J. Wysocki @ 2008-01-27 21:29 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Steven Rostedt, Andrew Morton, Linus Torvalds, LKML
Hi,
2.6.24-git3 adds a 5 - 10 sec delay to the suspend and hibernation code paths
(probably related to the disabling of nonboot CPUs), which is !@#$%^&*()
annoying.
It's 100% reproducible on my HP nx6325 and bisection idendified the following
commit as the first bad one:
commit 764a9d6fe4b52995c8aba277e3634385699354f4
Author: Steven Rostedt <srostedt@redhat.com>
Date: Fri Jan 25 21:08:04 2008 +0100
sched: track highest prio task queued
Greetings,
Rafael
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected) 2008-01-27 21:29 [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected) Rafael J. Wysocki @ 2008-01-27 21:59 ` Ingo Molnar 2008-01-28 1:26 ` Rafael J. Wysocki 0 siblings, 1 reply; 27+ messages in thread From: Ingo Molnar @ 2008-01-27 21:59 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Steven Rostedt, Andrew Morton, Linus Torvalds, LKML, Dmitry Adamushko, Peter Zijlstra * Rafael J. Wysocki <rjw@sisk.pl> wrote: > Hi, > > 2.6.24-git3 adds a 5 - 10 sec delay to the suspend and hibernation > code paths (probably related to the disabling of nonboot CPUs), which > is !@#$%^&*() annoying. > > It's 100% reproducible on my HP nx6325 and bisection idendified the > following commit as the first bad one: > > commit 764a9d6fe4b52995c8aba277e3634385699354f4 > Author: Steven Rostedt <srostedt@redhat.com> > Date: Fri Jan 25 21:08:04 2008 +0100 > > sched: track highest prio task queued hm, this patch is a NOP, so it's weird that it has an effect. Do you have serial logging enabled perhaps? If the following WARN_ON() triggers: + WARN_ON(p->prio < rq->rt.highest_prio); then perhaps that can cause a 5-10 seconds delay. (that's how much time it takes to printk a warning on the slowest serial settings) but if you use suspend, then any such printks would be preserved in the dmesg, right? If the WARN_ON() triggers, and if you remove it, do things get faster? this does have the feel of being scheduling related, but are you absolutely sure about the precise identity of the patch? It's not the next patch or the preceding patch by any chance? (which would also be scheduler patches) Ingo ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected) 2008-01-27 21:59 ` Ingo Molnar @ 2008-01-28 1:26 ` Rafael J. Wysocki 2008-01-28 1:40 ` Steven Rostedt ` (2 more replies) 0 siblings, 3 replies; 27+ messages in thread From: Rafael J. Wysocki @ 2008-01-28 1:26 UTC (permalink / raw) To: Ingo Molnar Cc: Steven Rostedt, Andrew Morton, Linus Torvalds, LKML, Dmitry Adamushko, Peter Zijlstra On Sunday, 27 of January 2008, Ingo Molnar wrote: > > * Rafael J. Wysocki <rjw@sisk.pl> wrote: > > > Hi, > > > > 2.6.24-git3 adds a 5 - 10 sec delay to the suspend and hibernation > > code paths (probably related to the disabling of nonboot CPUs), which > > is !@#$%^&*() annoying. > > > > It's 100% reproducible on my HP nx6325 and bisection idendified the > > following commit as the first bad one: > > > > commit 764a9d6fe4b52995c8aba277e3634385699354f4 > > Author: Steven Rostedt <srostedt@redhat.com> > > Date: Fri Jan 25 21:08:04 2008 +0100 > > > > sched: track highest prio task queued > > hm, this patch is a NOP, so it's weird that it has an effect. > > Do you have serial logging enabled perhaps? If the following WARN_ON() > triggers: > > + WARN_ON(p->prio < rq->rt.highest_prio); > > then perhaps that can cause a 5-10 seconds delay. (that's how much time > it takes to printk a warning on the slowest serial settings) > > but if you use suspend, then any such printks would be preserved in the > dmesg, right? If the WARN_ON() triggers, and if you remove it, do things > get faster? No, this isn't the WARN_ON(). > this does have the feel of being scheduling related, but are you > absolutely sure about the precise identity of the patch? Actually, not quite. That's why I have verified it and found that another patch is really responsible for the issue, namely: commit 82a1fcb90287052aabfa235e7ffc693ea003fe69 Author: Ingo Molnar <mingo@elte.hu> Date: Fri Jan 25 21:08:02 2008 +0100 softlockup: automatically detect hung TASK_UNINTERRUPTIBLE tasks Reverting this commit (it reverts with some minor modifications) fixes the problem for me. Thanks, Rafael ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected) 2008-01-28 1:26 ` Rafael J. Wysocki @ 2008-01-28 1:40 ` Steven Rostedt 2008-01-28 11:31 ` Rafael J. Wysocki 2008-01-28 8:56 ` Dmitry Adamushko 2008-01-31 15:58 ` Peter Zijlstra 2 siblings, 1 reply; 27+ messages in thread From: Steven Rostedt @ 2008-01-28 1:40 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Ingo Molnar, Andrew Morton, Linus Torvalds, LKML, Dmitry Adamushko, Peter Zijlstra Rafael J. Wysocki wrote: > No, this isn't the WARN_ON(). > >> this does have the feel of being scheduling related, but are you >> absolutely sure about the precise identity of the patch? > > Actually, not quite. That's why I have verified it and found that another > patch is really responsible for the issue, namely: > > commit 82a1fcb90287052aabfa235e7ffc693ea003fe69 > Author: Ingo Molnar <mingo@elte.hu> > Date: Fri Jan 25 21:08:02 2008 +0100 > > softlockup: automatically detect hung TASK_UNINTERRUPTIBLE tasks Are you getting a bunch of prints from the softlockup detector in dmesg? I wonder if the detector can detect a long timeout caused by suspend and resume and if not is triggering false positives? -- Steve > > Reverting this commit (it reverts with some minor modifications) fixes the > problem for me. > ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected) 2008-01-28 1:40 ` Steven Rostedt @ 2008-01-28 11:31 ` Rafael J. Wysocki 2008-01-28 16:31 ` Rafael J. Wysocki 0 siblings, 1 reply; 27+ messages in thread From: Rafael J. Wysocki @ 2008-01-28 11:31 UTC (permalink / raw) To: Steven Rostedt Cc: Ingo Molnar, Andrew Morton, Linus Torvalds, LKML, Dmitry Adamushko, Peter Zijlstra On Monday, 28 of January 2008, Steven Rostedt wrote: > Rafael J. Wysocki wrote: > > > No, this isn't the WARN_ON(). > > > >> this does have the feel of being scheduling related, but are you > >> absolutely sure about the precise identity of the patch? > > > > Actually, not quite. That's why I have verified it and found that another > > patch is really responsible for the issue, namely: > > > > commit 82a1fcb90287052aabfa235e7ffc693ea003fe69 > > Author: Ingo Molnar <mingo@elte.hu> > > Date: Fri Jan 25 21:08:02 2008 +0100 > > > > softlockup: automatically detect hung TASK_UNINTERRUPTIBLE tasks > > Are you getting a bunch of prints from the softlockup detector in dmesg? No, I don't. In fact, I don't get _any_ messages from it whatsoever. > I wonder if the detector can detect a long timeout caused by suspend and > resume and if not is triggering false positives? I'm not sure, but the code is supposed to be suspend-aware, IIRC. However, I'm seeing a similar symptom on poweroff on an SMP x86-64 box, so it may be more directly related to the CPU hotplug. I'll try to verify that. Thanks, Rafael ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected) 2008-01-28 11:31 ` Rafael J. Wysocki @ 2008-01-28 16:31 ` Rafael J. Wysocki 2008-01-28 16:46 ` Steven Rostedt 2008-01-29 0:08 ` Rafael J. Wysocki 0 siblings, 2 replies; 27+ messages in thread From: Rafael J. Wysocki @ 2008-01-28 16:31 UTC (permalink / raw) To: Ingo Molnar Cc: Steven Rostedt, Andrew Morton, Linus Torvalds, LKML, Dmitry Adamushko, Peter Zijlstra On Monday, 28 of January 2008, Rafael J. Wysocki wrote: > On Monday, 28 of January 2008, Steven Rostedt wrote: > > Rafael J. Wysocki wrote: > > > > > No, this isn't the WARN_ON(). > > > > > >> this does have the feel of being scheduling related, but are you > > >> absolutely sure about the precise identity of the patch? > > > > > > Actually, not quite. That's why I have verified it and found that another > > > patch is really responsible for the issue, namely: > > > > > > commit 82a1fcb90287052aabfa235e7ffc693ea003fe69 > > > Author: Ingo Molnar <mingo@elte.hu> > > > Date: Fri Jan 25 21:08:02 2008 +0100 > > > > > > softlockup: automatically detect hung TASK_UNINTERRUPTIBLE tasks > > > > Are you getting a bunch of prints from the softlockup detector in dmesg? > > No, I don't. In fact, I don't get _any_ messages from it whatsoever. > > > I wonder if the detector can detect a long timeout caused by suspend and > > resume and if not is triggering false positives? > > I'm not sure, but the code is supposed to be suspend-aware, IIRC. However, > I'm seeing a similar symptom on poweroff on an SMP x86-64 box, so it may be > more directly related to the CPU hotplug. I'll try to verify that. As I expected, the delay is also observable when I do: echo 0 > /sys/devices/system/cpu/cpu1/online (it's variable, between 3 and 30 seconds). Again, no messages appear in dmesg when this happens. I suspect I'll be able to reproduce it on another x86-64 SMP machine (I'm going to try that later today). Thanks, Rafael ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected) 2008-01-28 16:31 ` Rafael J. Wysocki @ 2008-01-28 16:46 ` Steven Rostedt 2008-01-29 0:08 ` Rafael J. Wysocki 1 sibling, 0 replies; 27+ messages in thread From: Steven Rostedt @ 2008-01-28 16:46 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Ingo Molnar, Andrew Morton, Linus Torvalds, LKML, Dmitry Adamushko, Peter Zijlstra Rafael J. Wysocki wrote: > On Monday, 28 of January 2008, Rafael J. Wysocki wrote: >> On Monday, 28 of January 2008, Steven Rostedt wrote: >>> Rafael J. Wysocki wrote: >>> >>>> No, this isn't the WARN_ON(). >>>> >>>>> this does have the feel of being scheduling related, but are you >>>>> absolutely sure about the precise identity of the patch? >>>> Actually, not quite. That's why I have verified it and found that another >>>> patch is really responsible for the issue, namely: >>>> >>>> commit 82a1fcb90287052aabfa235e7ffc693ea003fe69 >>>> Author: Ingo Molnar <mingo@elte.hu> >>>> Date: Fri Jan 25 21:08:02 2008 +0100 >>>> >>>> softlockup: automatically detect hung TASK_UNINTERRUPTIBLE tasks >>> Are you getting a bunch of prints from the softlockup detector in dmesg? >> No, I don't. In fact, I don't get _any_ messages from it whatsoever. >> >>> I wonder if the detector can detect a long timeout caused by suspend and >>> resume and if not is triggering false positives? >> I'm not sure, but the code is supposed to be suspend-aware, IIRC. However, >> I'm seeing a similar symptom on poweroff on an SMP x86-64 box, so it may be >> more directly related to the CPU hotplug. I'll try to verify that. > > As I expected, the delay is also observable when I do: > > echo 0 > /sys/devices/system/cpu/cpu1/online > > (it's variable, between 3 and 30 seconds). Again, no messages appear in dmesg > when this happens. > > I suspect I'll be able to reproduce it on another x86-64 SMP machine (I'm going > to try that later today). Could you download my tracing patches from here: http://people.redhat.com/srostedt/tracing/mcount-tracing-patches-v6.tar.bz2 Apply the above patches and select all but the histogram tracers. Then run this program http://people.redhat.com/srostedt/tracing/trace-cmd.c ./trace-cmd -f echo 0 > /sys/devices/system/cpu/cpu1/online then do this (mount debugfs): bzip2 -c /debugfs/tracing/latency_trace > /tmp/trace.bz2 and send the result to myself and Ingo. Thanks, -- Steve > > Thanks, > Rafael ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected) 2008-01-28 16:31 ` Rafael J. Wysocki 2008-01-28 16:46 ` Steven Rostedt @ 2008-01-29 0:08 ` Rafael J. Wysocki 1 sibling, 0 replies; 27+ messages in thread From: Rafael J. Wysocki @ 2008-01-29 0:08 UTC (permalink / raw) To: Ingo Molnar Cc: Steven Rostedt, Andrew Morton, Linus Torvalds, LKML, Dmitry Adamushko, Peter Zijlstra On Monday, 28 of January 2008, Rafael J. Wysocki wrote: > On Monday, 28 of January 2008, Rafael J. Wysocki wrote: > > On Monday, 28 of January 2008, Steven Rostedt wrote: > > > Rafael J. Wysocki wrote: > > > > > > > No, this isn't the WARN_ON(). > > > > > > > >> this does have the feel of being scheduling related, but are you > > > >> absolutely sure about the precise identity of the patch? > > > > > > > > Actually, not quite. That's why I have verified it and found that another > > > > patch is really responsible for the issue, namely: > > > > > > > > commit 82a1fcb90287052aabfa235e7ffc693ea003fe69 > > > > Author: Ingo Molnar <mingo@elte.hu> > > > > Date: Fri Jan 25 21:08:02 2008 +0100 > > > > > > > > softlockup: automatically detect hung TASK_UNINTERRUPTIBLE tasks > > > > > > Are you getting a bunch of prints from the softlockup detector in dmesg? > > > > No, I don't. In fact, I don't get _any_ messages from it whatsoever. > > > > > I wonder if the detector can detect a long timeout caused by suspend and > > > resume and if not is triggering false positives? > > > > I'm not sure, but the code is supposed to be suspend-aware, IIRC. However, > > I'm seeing a similar symptom on poweroff on an SMP x86-64 box, so it may be > > more directly related to the CPU hotplug. I'll try to verify that. > > As I expected, the delay is also observable when I do: > > echo 0 > /sys/devices/system/cpu/cpu1/online > > (it's variable, between 3 and 30 seconds). Again, no messages appear in dmesg > when this happens. > > I suspect I'll be able to reproduce it on another x86-64 SMP machine (I'm going > to try that later today). FWIW, the problem is reproducible on the second machine, with the same symptoms. Both machines are AMD-based. Thanks, Rafael ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected) 2008-01-28 1:26 ` Rafael J. Wysocki 2008-01-28 1:40 ` Steven Rostedt @ 2008-01-28 8:56 ` Dmitry Adamushko 2008-01-28 11:32 ` Rafael J. Wysocki 2008-01-28 16:08 ` Rafael J. Wysocki 2008-01-31 15:58 ` Peter Zijlstra 2 siblings, 2 replies; 27+ messages in thread From: Dmitry Adamushko @ 2008-01-28 8:56 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Ingo Molnar, Steven Rostedt, Andrew Morton, Linus Torvalds, LKML, Peter Zijlstra On 28/01/2008, Rafael J. Wysocki <rjw@sisk.pl> wrote: > On Sunday, 27 of January 2008, Ingo Molnar wrote: > > > > * Rafael J. Wysocki <rjw@sisk.pl> wrote: > > > > > Hi, > > > > > > 2.6.24-git3 adds a 5 - 10 sec delay to the suspend and hibernation > > > code paths (probably related to the disabling of nonboot CPUs), which > > > is !@#$%^&*() annoying. > > > > > > It's 100% reproducible on my HP nx6325 and bisection idendified the > > > following commit as the first bad one: > > > > > > commit 764a9d6fe4b52995c8aba277e3634385699354f4 > > > Author: Steven Rostedt <srostedt@redhat.com> > > > Date: Fri Jan 25 21:08:04 2008 +0100 > > > > > > sched: track highest prio task queued > > > > hm, this patch is a NOP, so it's weird that it has an effect. > > > > Do you have serial logging enabled perhaps? If the following WARN_ON() > > triggers: > > > > + WARN_ON(p->prio < rq->rt.highest_prio); > > > > then perhaps that can cause a 5-10 seconds delay. (that's how much time > > it takes to printk a warning on the slowest serial settings) > > > > but if you use suspend, then any such printks would be preserved in the > > dmesg, right? If the WARN_ON() triggers, and if you remove it, do things > > get faster? > > No, this isn't the WARN_ON(). > > > this does have the feel of being scheduling related, but are you > > absolutely sure about the precise identity of the patch? > > Actually, not quite. That's why I have verified it and found that another > patch is really responsible for the issue, namely: > > commit 82a1fcb90287052aabfa235e7ffc693ea003fe69 > Author: Ingo Molnar <mingo@elte.hu> > Date: Fri Jan 25 21:08:02 2008 +0100 > > softlockup: automatically detect hung TASK_UNINTERRUPTIBLE tasks > > Reverting this commit (it reverts with some minor modifications) fixes the > problem for me. What if you use the same kernel that triggers a problem and just disable this new 'softlockup' functionality: echo 0 > /proc/sys/kernel/hung_task_timeout_secs does the problem disapear? TIA, > > Thanks, > Rafael > -- Best regards, Dmitry Adamushko ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected) 2008-01-28 8:56 ` Dmitry Adamushko @ 2008-01-28 11:32 ` Rafael J. Wysocki 2008-01-28 16:08 ` Rafael J. Wysocki 1 sibling, 0 replies; 27+ messages in thread From: Rafael J. Wysocki @ 2008-01-28 11:32 UTC (permalink / raw) To: Dmitry Adamushko Cc: Ingo Molnar, Steven Rostedt, Andrew Morton, Linus Torvalds, LKML, Peter Zijlstra On Monday, 28 of January 2008, Dmitry Adamushko wrote: > On 28/01/2008, Rafael J. Wysocki <rjw@sisk.pl> wrote: > > On Sunday, 27 of January 2008, Ingo Molnar wrote: > > > > > > * Rafael J. Wysocki <rjw@sisk.pl> wrote: > > > > > > > Hi, > > > > > > > > 2.6.24-git3 adds a 5 - 10 sec delay to the suspend and hibernation > > > > code paths (probably related to the disabling of nonboot CPUs), which > > > > is !@#$%^&*() annoying. > > > > > > > > It's 100% reproducible on my HP nx6325 and bisection idendified the > > > > following commit as the first bad one: > > > > > > > > commit 764a9d6fe4b52995c8aba277e3634385699354f4 > > > > Author: Steven Rostedt <srostedt@redhat.com> > > > > Date: Fri Jan 25 21:08:04 2008 +0100 > > > > > > > > sched: track highest prio task queued > > > > > > hm, this patch is a NOP, so it's weird that it has an effect. > > > > > > Do you have serial logging enabled perhaps? If the following WARN_ON() > > > triggers: > > > > > > + WARN_ON(p->prio < rq->rt.highest_prio); > > > > > > then perhaps that can cause a 5-10 seconds delay. (that's how much time > > > it takes to printk a warning on the slowest serial settings) > > > > > > but if you use suspend, then any such printks would be preserved in the > > > dmesg, right? If the WARN_ON() triggers, and if you remove it, do things > > > get faster? > > > > No, this isn't the WARN_ON(). > > > > > this does have the feel of being scheduling related, but are you > > > absolutely sure about the precise identity of the patch? > > > > Actually, not quite. That's why I have verified it and found that another > > patch is really responsible for the issue, namely: > > > > commit 82a1fcb90287052aabfa235e7ffc693ea003fe69 > > Author: Ingo Molnar <mingo@elte.hu> > > Date: Fri Jan 25 21:08:02 2008 +0100 > > > > softlockup: automatically detect hung TASK_UNINTERRUPTIBLE tasks > > > > Reverting this commit (it reverts with some minor modifications) fixes the > > problem for me. > > What if you use the same kernel that triggers a problem and just disable > this new 'softlockup' functionality: > > echo 0 > /proc/sys/kernel/hung_task_timeout_secs > > does the problem disapear? I haven't checked that yet, will do it later today. Thanks, Rafael ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected) 2008-01-28 8:56 ` Dmitry Adamushko 2008-01-28 11:32 ` Rafael J. Wysocki @ 2008-01-28 16:08 ` Rafael J. Wysocki 1 sibling, 0 replies; 27+ messages in thread From: Rafael J. Wysocki @ 2008-01-28 16:08 UTC (permalink / raw) To: Dmitry Adamushko Cc: Ingo Molnar, Steven Rostedt, Andrew Morton, Linus Torvalds, LKML, Peter Zijlstra On Monday, 28 of January 2008, Dmitry Adamushko wrote: > On 28/01/2008, Rafael J. Wysocki <rjw@sisk.pl> wrote: > > On Sunday, 27 of January 2008, Ingo Molnar wrote: > > > > > > * Rafael J. Wysocki <rjw@sisk.pl> wrote: > > > > > > > Hi, > > > > > > > > 2.6.24-git3 adds a 5 - 10 sec delay to the suspend and hibernation > > > > code paths (probably related to the disabling of nonboot CPUs), which > > > > is !@#$%^&*() annoying. > > > > > > > > It's 100% reproducible on my HP nx6325 and bisection idendified the > > > > following commit as the first bad one: > > > > > > > > commit 764a9d6fe4b52995c8aba277e3634385699354f4 > > > > Author: Steven Rostedt <srostedt@redhat.com> > > > > Date: Fri Jan 25 21:08:04 2008 +0100 > > > > > > > > sched: track highest prio task queued > > > > > > hm, this patch is a NOP, so it's weird that it has an effect. > > > > > > Do you have serial logging enabled perhaps? If the following WARN_ON() > > > triggers: > > > > > > + WARN_ON(p->prio < rq->rt.highest_prio); > > > > > > then perhaps that can cause a 5-10 seconds delay. (that's how much time > > > it takes to printk a warning on the slowest serial settings) > > > > > > but if you use suspend, then any such printks would be preserved in the > > > dmesg, right? If the WARN_ON() triggers, and if you remove it, do things > > > get faster? > > > > No, this isn't the WARN_ON(). > > > > > this does have the feel of being scheduling related, but are you > > > absolutely sure about the precise identity of the patch? > > > > Actually, not quite. That's why I have verified it and found that another > > patch is really responsible for the issue, namely: > > > > commit 82a1fcb90287052aabfa235e7ffc693ea003fe69 > > Author: Ingo Molnar <mingo@elte.hu> > > Date: Fri Jan 25 21:08:02 2008 +0100 > > > > softlockup: automatically detect hung TASK_UNINTERRUPTIBLE tasks > > > > Reverting this commit (it reverts with some minor modifications) fixes the > > problem for me. > > What if you use the same kernel that triggers a problem and just disable > this new 'softlockup' functionality: > > echo 0 > /proc/sys/kernel/hung_task_timeout_secs > > does the problem disapear? No, it doesn't. The setting doesn't seem to have any effect on it. Thanks, Rafael ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected) 2008-01-28 1:26 ` Rafael J. Wysocki 2008-01-28 1:40 ` Steven Rostedt 2008-01-28 8:56 ` Dmitry Adamushko @ 2008-01-31 15:58 ` Peter Zijlstra 2008-01-31 20:54 ` Rafael J. Wysocki 2 siblings, 1 reply; 27+ messages in thread From: Peter Zijlstra @ 2008-01-31 15:58 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Ingo Molnar, Steven Rostedt, Andrew Morton, Linus Torvalds, LKML, Dmitry Adamushko On Mon, 2008-01-28 at 02:26 +0100, Rafael J. Wysocki wrote: > On Sunday, 27 of January 2008, Ingo Molnar wrote: > > > > * Rafael J. Wysocki <rjw@sisk.pl> wrote: > > > > > Hi, > > > > > > 2.6.24-git3 adds a 5 - 10 sec delay to the suspend and hibernation > > > code paths (probably related to the disabling of nonboot CPUs), which > > > is !@#$%^&*() annoying. > > > > > > It's 100% reproducible on my HP nx6325 and bisection idendified the > > > following commit as the first bad one: > > > > > > commit 764a9d6fe4b52995c8aba277e3634385699354f4 > > > Author: Steven Rostedt <srostedt@redhat.com> > > > Date: Fri Jan 25 21:08:04 2008 +0100 > > > > > > sched: track highest prio task queued > > > > hm, this patch is a NOP, so it's weird that it has an effect. > > > > Do you have serial logging enabled perhaps? If the following WARN_ON() > > triggers: > > > > + WARN_ON(p->prio < rq->rt.highest_prio); > > > > then perhaps that can cause a 5-10 seconds delay. (that's how much time > > it takes to printk a warning on the slowest serial settings) > > > > but if you use suspend, then any such printks would be preserved in the > > dmesg, right? If the WARN_ON() triggers, and if you remove it, do things > > get faster? > > No, this isn't the WARN_ON(). > > > this does have the feel of being scheduling related, but are you > > absolutely sure about the precise identity of the patch? > > Actually, not quite. That's why I have verified it and found that another > patch is really responsible for the issue, namely: > > commit 82a1fcb90287052aabfa235e7ffc693ea003fe69 > Author: Ingo Molnar <mingo@elte.hu> > Date: Fri Jan 25 21:08:02 2008 +0100 > > softlockup: automatically detect hung TASK_UNINTERRUPTIBLE tasks > > Reverting this commit (it reverts with some minor modifications) fixes the > problem for me. I can seem to reproduce this: [root@opteron cpu1]# time echo 0 > online real 0m6.230s user 0m0.000s sys 0m0.010s [root@opteron cpu1]# echo 1 > online [root@opteron cpu1]# time echo 0 > online real 0m7.966s user 0m0.000s sys 0m0.011s I'll have a look at it. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected) 2008-01-31 15:58 ` Peter Zijlstra @ 2008-01-31 20:54 ` Rafael J. Wysocki 2008-02-01 12:04 ` Peter Zijlstra 0 siblings, 1 reply; 27+ messages in thread From: Rafael J. Wysocki @ 2008-01-31 20:54 UTC (permalink / raw) To: Peter Zijlstra Cc: Ingo Molnar, Steven Rostedt, Andrew Morton, Linus Torvalds, LKML, Dmitry Adamushko On Thursday, 31 of January 2008, Peter Zijlstra wrote: > > On Mon, 2008-01-28 at 02:26 +0100, Rafael J. Wysocki wrote: > > On Sunday, 27 of January 2008, Ingo Molnar wrote: > > > > > > * Rafael J. Wysocki <rjw@sisk.pl> wrote: > > > > > > > Hi, > > > > > > > > 2.6.24-git3 adds a 5 - 10 sec delay to the suspend and hibernation > > > > code paths (probably related to the disabling of nonboot CPUs), which > > > > is !@#$%^&*() annoying. > > > > > > > > It's 100% reproducible on my HP nx6325 and bisection idendified the > > > > following commit as the first bad one: > > > > > > > > commit 764a9d6fe4b52995c8aba277e3634385699354f4 > > > > Author: Steven Rostedt <srostedt@redhat.com> > > > > Date: Fri Jan 25 21:08:04 2008 +0100 > > > > > > > > sched: track highest prio task queued > > > > > > hm, this patch is a NOP, so it's weird that it has an effect. > > > > > > Do you have serial logging enabled perhaps? If the following WARN_ON() > > > triggers: > > > > > > + WARN_ON(p->prio < rq->rt.highest_prio); > > > > > > then perhaps that can cause a 5-10 seconds delay. (that's how much time > > > it takes to printk a warning on the slowest serial settings) > > > > > > but if you use suspend, then any such printks would be preserved in the > > > dmesg, right? If the WARN_ON() triggers, and if you remove it, do things > > > get faster? > > > > No, this isn't the WARN_ON(). > > > > > this does have the feel of being scheduling related, but are you > > > absolutely sure about the precise identity of the patch? > > > > Actually, not quite. That's why I have verified it and found that another > > patch is really responsible for the issue, namely: > > > > commit 82a1fcb90287052aabfa235e7ffc693ea003fe69 > > Author: Ingo Molnar <mingo@elte.hu> > > Date: Fri Jan 25 21:08:02 2008 +0100 > > > > softlockup: automatically detect hung TASK_UNINTERRUPTIBLE tasks > > > > Reverting this commit (it reverts with some minor modifications) fixes the > > problem for me. > > I can seem to reproduce this: > > [root@opteron cpu1]# time echo 0 > online > > real 0m6.230s > user 0m0.000s > sys 0m0.010s > [root@opteron cpu1]# echo 1 > online > [root@opteron cpu1]# time echo 0 > online > > real 0m7.966s > user 0m0.000s > sys 0m0.011s > > > I'll have a look at it. Much appreciated, thanks! ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected) 2008-01-31 20:54 ` Rafael J. Wysocki @ 2008-02-01 12:04 ` Peter Zijlstra 2008-02-01 12:47 ` Ingo Molnar 2008-02-01 15:11 ` Dmitry Adamushko 0 siblings, 2 replies; 27+ messages in thread From: Peter Zijlstra @ 2008-02-01 12:04 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Ingo Molnar, Steven Rostedt, Andrew Morton, Linus Torvalds, LKML, Dmitry Adamushko On Thu, 2008-01-31 at 21:54 +0100, Rafael J. Wysocki wrote: > On Thursday, 31 of January 2008, Peter Zijlstra wrote: > > I can seem to reproduce this: > > > > [root@opteron cpu1]# time echo 0 > online > > > > real 0m6.230s > > user 0m0.000s > > sys 0m0.010s > > [root@opteron cpu1]# echo 1 > online > > [root@opteron cpu1]# time echo 0 > online > > > > real 0m7.966s > > user 0m0.000s > > sys 0m0.011s > > > > > > I'll have a look at it. > > Much appreciated, thanks! The below fixes it for me.. --- - restore the old wakeup mechanism - fix break usage in do_each_thread() { } while_eac_thread(). - fix the hotplug switch stmt, a fall-through case was broken. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> --- diff --git a/kernel/softlockup.c b/kernel/softlockup.c index c1d7655..7c2da88 100644 --- a/kernel/softlockup.c +++ b/kernel/softlockup.c @@ -101,6 +101,10 @@ void softlockup_tick(void) now = get_timestamp(this_cpu); + /* Wake up the high-prio watchdog task every second: */ + if (now > (touch_timestamp + 1)) + wake_up_process(per_cpu(watchdog_task, this_cpu)); + /* Warn about unreasonable delays: */ if (now <= (touch_timestamp + softlockup_thresh)) return; @@ -191,11 +195,11 @@ static void check_hung_uninterruptible_tasks(int this_cpu) read_lock(&tasklist_lock); do_each_thread(g, t) { if (!--max_count) - break; + goto unlock; if (t->state & TASK_UNINTERRUPTIBLE) check_hung_task(t, now); } while_each_thread(g, t); - + unlock: read_unlock(&tasklist_lock); } @@ -218,14 +222,19 @@ static int watchdog(void *__bind_cpu) * debug-printout triggers in softlockup_tick(). */ while (!kthread_should_stop()) { + set_current_state(TASK_INTERRUPTIBLE); touch_softlockup_watchdog(); - msleep_interruptible(10000); + schedule(); + + if (kthread_should_stop()) + break; if (this_cpu != check_cpu) continue; if (sysctl_hung_task_timeout_secs) check_hung_uninterruptible_tasks(this_cpu); + } return 0; @@ -259,13 +268,6 @@ cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu) wake_up_process(per_cpu(watchdog_task, hotcpu)); break; #ifdef CONFIG_HOTPLUG_CPU - case CPU_UP_CANCELED: - case CPU_UP_CANCELED_FROZEN: - if (!per_cpu(watchdog_task, hotcpu)) - break; - /* Unbind so it can run. Fall thru. */ - kthread_bind(per_cpu(watchdog_task, hotcpu), - any_online_cpu(cpu_online_map)); case CPU_DOWN_PREPARE: case CPU_DOWN_PREPARE_FROZEN: if (hotcpu == check_cpu) { @@ -275,6 +277,14 @@ cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu) check_cpu = any_online_cpu(temp_cpu_online_map); } break; + + case CPU_UP_CANCELED: + case CPU_UP_CANCELED_FROZEN: + if (!per_cpu(watchdog_task, hotcpu)) + break; + /* Unbind so it can run. Fall thru. */ + kthread_bind(per_cpu(watchdog_task, hotcpu), + any_online_cpu(cpu_online_map)); case CPU_DEAD: case CPU_DEAD_FROZEN: p = per_cpu(watchdog_task, hotcpu); ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected) 2008-02-01 12:04 ` Peter Zijlstra @ 2008-02-01 12:47 ` Ingo Molnar 2008-02-01 14:42 ` Rafael J. Wysocki 2008-02-01 15:11 ` Dmitry Adamushko 1 sibling, 1 reply; 27+ messages in thread From: Ingo Molnar @ 2008-02-01 12:47 UTC (permalink / raw) To: Peter Zijlstra Cc: Rafael J. Wysocki, Steven Rostedt, Andrew Morton, Linus Torvalds, LKML, Dmitry Adamushko * Peter Zijlstra <a.p.zijlstra@chello.nl> wrote: > The below fixes it for me.. > > --- > - restore the old wakeup mechanism > - fix break usage in do_each_thread() { } while_eac_thread(). > - fix the hotplug switch stmt, a fall-through case was broken. thanks Peter, i've picked your fix up. I hope this solves the problem for Rafael too. Ingo ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected) 2008-02-01 12:47 ` Ingo Molnar @ 2008-02-01 14:42 ` Rafael J. Wysocki 2008-02-01 23:19 ` Rafael J. Wysocki 0 siblings, 1 reply; 27+ messages in thread From: Rafael J. Wysocki @ 2008-02-01 14:42 UTC (permalink / raw) To: Ingo Molnar Cc: Peter Zijlstra, Steven Rostedt, Andrew Morton, Linus Torvalds, LKML, Dmitry Adamushko On Friday, 1 of February 2008, Ingo Molnar wrote: > > * Peter Zijlstra <a.p.zijlstra@chello.nl> wrote: > > > The below fixes it for me.. > > > > --- > > - restore the old wakeup mechanism > > - fix break usage in do_each_thread() { } while_eac_thread(). > > - fix the hotplug switch stmt, a fall-through case was broken. > > thanks Peter, i've picked your fix up. I hope this solves the problem > for Rafael too. No, it doesn't, unfortunately. Actually, it breaks things even more, as suspend now hangs (probably deadlocks) while disabling the nonboot CPU. Thanks, Rafael ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected) 2008-02-01 14:42 ` Rafael J. Wysocki @ 2008-02-01 23:19 ` Rafael J. Wysocki 2008-02-01 23:24 ` Ingo Molnar 0 siblings, 1 reply; 27+ messages in thread From: Rafael J. Wysocki @ 2008-02-01 23:19 UTC (permalink / raw) To: Ingo Molnar Cc: Peter Zijlstra, Steven Rostedt, Andrew Morton, Linus Torvalds, LKML, Dmitry Adamushko On Friday, 1 of February 2008, Rafael J. Wysocki wrote: > On Friday, 1 of February 2008, Ingo Molnar wrote: > > > > * Peter Zijlstra <a.p.zijlstra@chello.nl> wrote: > > > > > The below fixes it for me.. > > > > > > --- > > > - restore the old wakeup mechanism > > > - fix break usage in do_each_thread() { } while_eac_thread(). > > > - fix the hotplug switch stmt, a fall-through case was broken. > > > > thanks Peter, i've picked your fix up. I hope this solves the problem > > for Rafael too. > > No, it doesn't, unfortunately. Actually, it breaks things even more, as > suspend now hangs (probably deadlocks) while disabling the nonboot CPU. For some obscure reason, I'm no longer able to reproduce the problem with the Peter's patch and I think it's correct. You can add Tested-by: Rafael J. Wysocki <rjw@sisk.pl> to it. Thanks, Rafael ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected) 2008-02-01 23:19 ` Rafael J. Wysocki @ 2008-02-01 23:24 ` Ingo Molnar 0 siblings, 0 replies; 27+ messages in thread From: Ingo Molnar @ 2008-02-01 23:24 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Peter Zijlstra, Steven Rostedt, Andrew Morton, Linus Torvalds, LKML, Dmitry Adamushko * Rafael J. Wysocki <rjw@sisk.pl> wrote: > On Friday, 1 of February 2008, Rafael J. Wysocki wrote: > > On Friday, 1 of February 2008, Ingo Molnar wrote: > > > > > > * Peter Zijlstra <a.p.zijlstra@chello.nl> wrote: > > > > > > > The below fixes it for me.. > > > > > > > > --- > > > > - restore the old wakeup mechanism > > > > - fix break usage in do_each_thread() { } while_eac_thread(). > > > > - fix the hotplug switch stmt, a fall-through case was broken. > > > > > > thanks Peter, i've picked your fix up. I hope this solves the problem > > > for Rafael too. > > > > No, it doesn't, unfortunately. Actually, it breaks things even more, as > > suspend now hangs (probably deadlocks) while disabling the nonboot CPU. > > For some obscure reason, I'm no longer able to reproduce the problem > with the Peter's patch and I think it's correct. You can add > > Tested-by: Rafael J. Wysocki <rjw@sisk.pl> > > to it. Great! I sent the fix to Linus with your Tested-by line. Thanks Rafael for all the testing! Ingo ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected) 2008-02-01 12:04 ` Peter Zijlstra 2008-02-01 12:47 ` Ingo Molnar @ 2008-02-01 15:11 ` Dmitry Adamushko 2008-02-01 17:10 ` Ingo Molnar 1 sibling, 1 reply; 27+ messages in thread From: Dmitry Adamushko @ 2008-02-01 15:11 UTC (permalink / raw) To: Peter Zijlstra Cc: Rafael J. Wysocki, Ingo Molnar, Steven Rostedt, Andrew Morton, Linus Torvalds, LKML On 01/02/2008, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote: > --- > - restore the old wakeup mechanism and how does it change behavior, logically-wise? do we somehow miss a 'wake-up' from kthread_stop() so that its caller gets blocked until watchdog's msleep_interruptible(10000) timeouts? On average, it would take +-5 sec. and might explain the first observation of Ravael -- "...adds a 5 - 10 sec delay..." (although, lately he reported up to +30 sec. delays). (/me goint to also try reproducing it later today) > [ ... ] -- Best regards, Dmitry Adamushko ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected) 2008-02-01 15:11 ` Dmitry Adamushko @ 2008-02-01 17:10 ` Ingo Molnar 2008-02-01 21:54 ` Dmitry Adamushko 0 siblings, 1 reply; 27+ messages in thread From: Ingo Molnar @ 2008-02-01 17:10 UTC (permalink / raw) To: Dmitry Adamushko Cc: Peter Zijlstra, Rafael J. Wysocki, Steven Rostedt, Andrew Morton, Linus Torvalds, LKML * Dmitry Adamushko <dmitry.adamushko@gmail.com> wrote: > On 01/02/2008, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote: > > > --- > > - restore the old wakeup mechanism > > and how does it change behavior, logically-wise? > > do we somehow miss a 'wake-up' from kthread_stop() so that its caller > gets blocked until watchdog's msleep_interruptible(10000) timeouts? On > average, it would take +-5 sec. and might explain the first > observation of Ravael -- "...adds a 5 - 10 sec delay..." (although, > lately he reported up to +30 sec. delays). > > (/me goint to also try reproducing it later today) thanks - i cannot reproduce it on my usual suspend/resume testbox because e1000 broke on it, and this is a pretty annoying regression. We'll have to undo the hung-tasks detection feature if it's not fixed quickly. (there's no point in debugging features that _add_ bugs) Ingo ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected) 2008-02-01 17:10 ` Ingo Molnar @ 2008-02-01 21:54 ` Dmitry Adamushko 2008-02-01 22:44 ` Dmitry Adamushko 0 siblings, 1 reply; 27+ messages in thread From: Dmitry Adamushko @ 2008-02-01 21:54 UTC (permalink / raw) To: Ingo Molnar Cc: Peter Zijlstra, Rafael J. Wysocki, Steven Rostedt, Andrew Morton, Linus Torvalds, LKML On 01/02/2008, Ingo Molnar <mingo@elte.hu> wrote: > > thanks - i cannot reproduce it on my usual suspend/resume testbox > because e1000 broke on it, and this is a pretty annoying regression. > We'll have to undo the hung-tasks detection feature if it's not fixed > quickly. (there's no point in debugging features that _add_ bugs) I can reproduce it and Peter's patch does fix it for me. e.g. root@earth:/sys/devices/system/cpu/cpu1# time echo 0 > online real 0m6.743s root@earth:/sys/devices/system/cpu/cpu1# time echo 0 > online real 0m7.770s I've observed delays from ~3 s. up to ~8 s. (out of ~20 tests) so the 10s. delay of msleep_interruptible() might be related but I'm still looking for the reason why this fix helps (and what goes wrong with the current code). > > Ingo > -- Best regards, Dmitry Adamushko ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected) 2008-02-01 21:54 ` Dmitry Adamushko @ 2008-02-01 22:44 ` Dmitry Adamushko 2008-02-01 22:48 ` Ingo Molnar 0 siblings, 1 reply; 27+ messages in thread From: Dmitry Adamushko @ 2008-02-01 22:44 UTC (permalink / raw) To: Ingo Molnar Cc: Peter Zijlstra, Rafael J. Wysocki, Steven Rostedt, Andrew Morton, Linus Torvalds, LKML On 01/02/2008, Dmitry Adamushko <dmitry.adamushko@gmail.com> wrote: > On 01/02/2008, Ingo Molnar <mingo@elte.hu> wrote: > > > > thanks - i cannot reproduce it on my usual suspend/resume testbox > > because e1000 broke on it, and this is a pretty annoying regression. > > We'll have to undo the hung-tasks detection feature if it's not fixed > > quickly. (there's no point in debugging features that _add_ bugs) > > I can reproduce it and Peter's patch does fix it for me. > > e.g. > root@earth:/sys/devices/system/cpu/cpu1# time echo 0 > online > real 0m6.743s > root@earth:/sys/devices/system/cpu/cpu1# time echo 0 > online > real 0m7.770s > > I've observed delays from ~3 s. up to ~8 s. (out of ~20 tests) so the > 10s. delay of msleep_interruptible() might be related but > I'm still looking for the reason why this fix helps (and what goes > wrong with the current code). > heh... it's pretty obvious indeed. What's msleep_interruptible() is all about? :-) "sleep waiting for signals" so the 'watchdog' thread gets woken up [ as a result of cpu_callback(action = CPU_DEAD) --> kthread_stop() ] just to be immediately scheduled out again for as long as the remaining timeout > 0. So it should work if we substitute msleep_interruptible() with schedule_timeout_interruptible(). -- Best regards, Dmitry Adamushko ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected) 2008-02-01 22:44 ` Dmitry Adamushko @ 2008-02-01 22:48 ` Ingo Molnar 2008-02-01 23:22 ` Dmitry Adamushko 0 siblings, 1 reply; 27+ messages in thread From: Ingo Molnar @ 2008-02-01 22:48 UTC (permalink / raw) To: Dmitry Adamushko Cc: Peter Zijlstra, Rafael J. Wysocki, Steven Rostedt, Andrew Morton, Linus Torvalds, LKML * Dmitry Adamushko <dmitry.adamushko@gmail.com> wrote: > > I've observed delays from ~3 s. up to ~8 s. (out of ~20 tests) so > > the 10s. delay of msleep_interruptible() might be related but I'm > > still looking for the reason why this fix helps (and what goes wrong > > with the current code). > > heh... it's pretty obvious indeed. What's msleep_interruptible() is > all about? :-) > > "sleep waiting for signals" > > so the 'watchdog' thread gets woken up > > [ as a result of cpu_callback(action = CPU_DEAD) --> kthread_stop() ] > > just to be immediately scheduled out again for as long as the > remaining timeout > 0. > > So it should work if we substitute msleep_interruptible() with > schedule_timeout_interruptible(). Doh. Could someone with ths problem please test the patch below, does it do the trick? Ingo --- kernel/softlockup.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: linux/kernel/softlockup.c =================================================================== --- linux.orig/kernel/softlockup.c +++ linux/kernel/softlockup.c @@ -228,7 +228,7 @@ static int watchdog(void *__bind_cpu) */ while (!kthread_should_stop()) { touch_softlockup_watchdog(); - msleep_interruptible(10000); + schedule_timeout_interruptible(10*HZ); if (this_cpu != check_cpu) continue; ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected) 2008-02-01 22:48 ` Ingo Molnar @ 2008-02-01 23:22 ` Dmitry Adamushko 2008-02-01 23:25 ` Ingo Molnar 2008-02-01 23:26 ` Rafael J. Wysocki 0 siblings, 2 replies; 27+ messages in thread From: Dmitry Adamushko @ 2008-02-01 23:22 UTC (permalink / raw) To: Ingo Molnar, Rafael J. Wysocki Cc: Peter Zijlstra, Steven Rostedt, Andrew Morton, Linus Torvalds, LKML On 01/02/2008, Ingo Molnar <mingo@elte.hu> wrote: > > * Dmitry Adamushko <dmitry.adamushko@gmail.com> wrote: > > > > I've observed delays from ~3 s. up to ~8 s. (out of ~20 tests) so > > > the 10s. delay of msleep_interruptible() might be related but I'm > > > still looking for the reason why this fix helps (and what goes wrong > > > with the current code). > > > > heh... it's pretty obvious indeed. What's msleep_interruptible() is > > all about? :-) > > > > "sleep waiting for signals" > > > > so the 'watchdog' thread gets woken up > > > > [ as a result of cpu_callback(action = CPU_DEAD) --> kthread_stop() ] > > > > just to be immediately scheduled out again for as long as the > > remaining timeout > 0. > > > > So it should work if we substitute msleep_interruptible() with > > schedule_timeout_interruptible(). > > Doh. Could someone with ths problem please test the patch below, does it > do the trick? yeah, I was already on a half-way to check it out. It does fix a problem for me. Don't forget to take along these 2 fixes from Peter's patch: - fix break usage in do_each_thread() { } while_each_thread(). - fix the hotplug switch stmt, a fall-through case was broken. Rafael, does your system still hangs? I'd expect, yes -- as effectively this fix is not that different from Peter's one when it comes to suspend-case. Can you then send your config? Anything special about this machine (say, some /proc tunables are altered)? > > Ingo > -- Best regards, Dmitry Adamushko ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected) 2008-02-01 23:22 ` Dmitry Adamushko @ 2008-02-01 23:25 ` Ingo Molnar 2008-02-02 0:03 ` Dmitry Adamushko 2008-02-01 23:26 ` Rafael J. Wysocki 1 sibling, 1 reply; 27+ messages in thread From: Ingo Molnar @ 2008-02-01 23:25 UTC (permalink / raw) To: Dmitry Adamushko Cc: Rafael J. Wysocki, Peter Zijlstra, Steven Rostedt, Andrew Morton, Linus Torvalds, LKML * Dmitry Adamushko <dmitry.adamushko@gmail.com> wrote: > yeah, I was already on a half-way to check it out. > > It does fix a problem for me. > > Don't forget to take along these 2 fixes from Peter's patch: > > - fix break usage in do_each_thread() { } while_each_thread(). > - fix the hotplug switch stmt, a fall-through case was broken. Dmitry, i sent Peter's fix(es) below to Linus. Do you concur that it fixes all the practical and theoretical problems you could see with the code too? Ingo ---------------> Subject: debug: softlockup looping fix From: Peter Zijlstra <a.p.zijlstra@chello.nl> Rafael J. Wysocki reported weird, multi-seconds delays during suspend/resume and bisected it back to: commit 82a1fcb90287052aabfa235e7ffc693ea003fe69 Author: Ingo Molnar <mingo@elte.hu> Date: Fri Jan 25 21:08:02 2008 +0100 softlockup: automatically detect hung TASK_UNINTERRUPTIBLE tasks fix it: - restore the old wakeup mechanism - fix break usage in do_each_thread() { } while_each_thread(). - fix the hotplug switch stmt, a fall-through case was broken. Bisected-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Ingo Molnar <mingo@elte.hu> --- kernel/softlockup.c | 30 ++++++++++++++++++++---------- 1 file changed, 20 insertions(+), 10 deletions(-) Index: linux/kernel/softlockup.c =================================================================== --- linux.orig/kernel/softlockup.c +++ linux/kernel/softlockup.c @@ -101,6 +101,10 @@ void softlockup_tick(void) now = get_timestamp(this_cpu); + /* Wake up the high-prio watchdog task every second: */ + if (now > (touch_timestamp + 1)) + wake_up_process(per_cpu(watchdog_task, this_cpu)); + /* Warn about unreasonable delays: */ if (now <= (touch_timestamp + softlockup_thresh)) return; @@ -191,11 +195,11 @@ static void check_hung_uninterruptible_t read_lock(&tasklist_lock); do_each_thread(g, t) { if (!--max_count) - break; + goto unlock; if (t->state & TASK_UNINTERRUPTIBLE) check_hung_task(t, now); } while_each_thread(g, t); - + unlock: read_unlock(&tasklist_lock); } @@ -218,14 +222,19 @@ static int watchdog(void *__bind_cpu) * debug-printout triggers in softlockup_tick(). */ while (!kthread_should_stop()) { + set_current_state(TASK_INTERRUPTIBLE); touch_softlockup_watchdog(); - msleep_interruptible(10000); + schedule(); + + if (kthread_should_stop()) + break; if (this_cpu != check_cpu) continue; if (sysctl_hung_task_timeout_secs) check_hung_uninterruptible_tasks(this_cpu); + } return 0; @@ -259,13 +268,6 @@ cpu_callback(struct notifier_block *nfb, wake_up_process(per_cpu(watchdog_task, hotcpu)); break; #ifdef CONFIG_HOTPLUG_CPU - case CPU_UP_CANCELED: - case CPU_UP_CANCELED_FROZEN: - if (!per_cpu(watchdog_task, hotcpu)) - break; - /* Unbind so it can run. Fall thru. */ - kthread_bind(per_cpu(watchdog_task, hotcpu), - any_online_cpu(cpu_online_map)); case CPU_DOWN_PREPARE: case CPU_DOWN_PREPARE_FROZEN: if (hotcpu == check_cpu) { @@ -275,6 +277,14 @@ cpu_callback(struct notifier_block *nfb, check_cpu = any_online_cpu(temp_cpu_online_map); } break; + + case CPU_UP_CANCELED: + case CPU_UP_CANCELED_FROZEN: + if (!per_cpu(watchdog_task, hotcpu)) + break; + /* Unbind so it can run. Fall thru. */ + kthread_bind(per_cpu(watchdog_task, hotcpu), + any_online_cpu(cpu_online_map)); case CPU_DEAD: case CPU_DEAD_FROZEN: p = per_cpu(watchdog_task, hotcpu); ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected) 2008-02-01 23:25 ` Ingo Molnar @ 2008-02-02 0:03 ` Dmitry Adamushko 0 siblings, 0 replies; 27+ messages in thread From: Dmitry Adamushko @ 2008-02-02 0:03 UTC (permalink / raw) To: Ingo Molnar Cc: Rafael J. Wysocki, Peter Zijlstra, Steven Rostedt, Andrew Morton, Linus Torvalds, LKML On 02/02/2008, Ingo Molnar <mingo@elte.hu> wrote: > > * Dmitry Adamushko <dmitry.adamushko@gmail.com> wrote: > > > yeah, I was already on a half-way to check it out. > > > > It does fix a problem for me. > > > > Don't forget to take along these 2 fixes from Peter's patch: > > > > - fix break usage in do_each_thread() { } while_each_thread(). > > - fix the hotplug switch stmt, a fall-through case was broken. > > Dmitry, i sent Peter's fix(es) below to Linus. Do you concur that it > fixes all the practical and theoretical problems you could see with the > code too? > One comment: any argument for waking up the 'watchdog' thread from softlockup_tick() wrt just having schedule_timeout_interruptible(HZ) in watchdog() ? (although, it's not that important). IMHO, it looks good. (I'll look again tomorrow when I'm less sleepy :-/) -- Best regards, Dmitry Adamushko ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected) 2008-02-01 23:22 ` Dmitry Adamushko 2008-02-01 23:25 ` Ingo Molnar @ 2008-02-01 23:26 ` Rafael J. Wysocki 1 sibling, 0 replies; 27+ messages in thread From: Rafael J. Wysocki @ 2008-02-01 23:26 UTC (permalink / raw) To: Dmitry Adamushko Cc: Ingo Molnar, Peter Zijlstra, Steven Rostedt, Andrew Morton, Linus Torvalds, LKML On Saturday, 2 of February 2008, Dmitry Adamushko wrote: > On 01/02/2008, Ingo Molnar <mingo@elte.hu> wrote: > > > > * Dmitry Adamushko <dmitry.adamushko@gmail.com> wrote: > > > > > > I've observed delays from ~3 s. up to ~8 s. (out of ~20 tests) so > > > > the 10s. delay of msleep_interruptible() might be related but I'm > > > > still looking for the reason why this fix helps (and what goes wrong > > > > with the current code). > > > > > > heh... it's pretty obvious indeed. What's msleep_interruptible() is > > > all about? :-) > > > > > > "sleep waiting for signals" > > > > > > so the 'watchdog' thread gets woken up > > > > > > [ as a result of cpu_callback(action = CPU_DEAD) --> kthread_stop() ] > > > > > > just to be immediately scheduled out again for as long as the > > > remaining timeout > 0. > > > > > > So it should work if we substitute msleep_interruptible() with > > > schedule_timeout_interruptible(). > > > > Doh. Could someone with ths problem please test the patch below, does it > > do the trick? > > yeah, I was already on a half-way to check it out. > > It does fix a problem for me. > > Don't forget to take along these 2 fixes from Peter's patch: > > - fix break usage in do_each_thread() { } while_each_thread(). > - fix the hotplug switch stmt, a fall-through case was broken. > > > Rafael, does your system still hangs? I'd expect, yes -- as > effectively this fix is not that different from Peter's one when it > comes to suspend-case. I haven't tested it yet, but I don't expect it to hang, as the Peter's patch started to work for me. I'm going to test it in a while. Thanks, Rafael ^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2008-02-02 0:03 UTC | newest] Thread overview: 27+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-01-27 21:29 [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected) Rafael J. Wysocki 2008-01-27 21:59 ` Ingo Molnar 2008-01-28 1:26 ` Rafael J. Wysocki 2008-01-28 1:40 ` Steven Rostedt 2008-01-28 11:31 ` Rafael J. Wysocki 2008-01-28 16:31 ` Rafael J. Wysocki 2008-01-28 16:46 ` Steven Rostedt 2008-01-29 0:08 ` Rafael J. Wysocki 2008-01-28 8:56 ` Dmitry Adamushko 2008-01-28 11:32 ` Rafael J. Wysocki 2008-01-28 16:08 ` Rafael J. Wysocki 2008-01-31 15:58 ` Peter Zijlstra 2008-01-31 20:54 ` Rafael J. Wysocki 2008-02-01 12:04 ` Peter Zijlstra 2008-02-01 12:47 ` Ingo Molnar 2008-02-01 14:42 ` Rafael J. Wysocki 2008-02-01 23:19 ` Rafael J. Wysocki 2008-02-01 23:24 ` Ingo Molnar 2008-02-01 15:11 ` Dmitry Adamushko 2008-02-01 17:10 ` Ingo Molnar 2008-02-01 21:54 ` Dmitry Adamushko 2008-02-01 22:44 ` Dmitry Adamushko 2008-02-01 22:48 ` Ingo Molnar 2008-02-01 23:22 ` Dmitry Adamushko 2008-02-01 23:25 ` Ingo Molnar 2008-02-02 0:03 ` Dmitry Adamushko 2008-02-01 23:26 ` Rafael J. Wysocki
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox