* Re: [PATCH 1/3] [lockup detector] sync touch_*_watchdog back to old semantics
@ 2010-09-01 15:51 Don Zickus
2010-09-01 16:36 ` Cyrill Gorcunov
2010-09-01 17:15 ` Ingo Molnar
0 siblings, 2 replies; 9+ messages in thread
From: Don Zickus @ 2010-09-01 15:51 UTC (permalink / raw)
To: Ingo Molnar, Cyrill Gorcunov; +Cc: peterz, fweisbec, linux-kernel
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=utf-8, Size: 1951 bytes --]
Top posting because droid won't let me bottom post
This patch was the result of a regression with acpi and preempt. Akpm asked that I not change the semantics of the old touch_nmi_watchdog. So I tried to revert to the old behaviour.
Sorry for not properly explaining that.
Cheers,
Don
Ingo Molnar <mingo@elte.hu> wrote:
>
>* Cyrill Gorcunov <gorcunov@gmail.com> wrote:
>
>> On 9/1/10, Cyrill Gorcunov <gorcunov@gmail.com> wrote:
>> > On 9/1/10, Ingo Molnar <mingo@elte.hu> wrote:
>> >>
>> >> * Don Zickus <dzickus@redhat.com> wrote:
>> >>
>> >>> void touch_nmi_watchdog(void)
>> >>> {
>> >>> - __get_cpu_var(watchdog_nmi_touch) = true;
>> >>> + if (watchdog_enabled) {
>> >>> + unsigned cpu;
>> >>> +
>> >>> + for_each_present_cpu(cpu) {
>> >>> + if (per_cpu(watchdog_nmi_touch, cpu) != true)
>> >>> + per_cpu(watchdog_nmi_touch, cpu) = true;
>> >>> + }
>> >>
>> >> Hm, this is going to be a scalability nightmare with lots of CPUs. Not
>> >> only do we have a nr_cpus loop, but we touch per-cpu areas of _other_
>> >> CPUs - a big scalability nono.
>> >>
>> >> Why do we need to do this? We never needed to touch other CPU's NMI
>> >> lockup accounting data areas - why has this changed? The changelog does
>> >> not explain this.
>> >>
>> >> Thanks,
>> >>
>> >> Ingo
>> >>
>> > I believe this came from old nmi watchdog code where it might be
>> > useful when nmi watchdog activated via io-apic. I'm trying to figure
>> > out if we really need it still.
>>
>> Well, we can't drop it or make per-cpu specific, for example we need
>> it in case of panic with watchdog enabled and panic timeout set, or
>> boot delay set and etc. Seems same applies to printk_delay. Hmm...
>
>Ok - can you cite the old watchdog code, did it really do a nr_cpus
>loop?
>
>Thanks,
>
> Ingo
ÿôèº{.nÇ+·®+%Ëÿ±éݶ\x17¥wÿº{.nÇ+·¥{±þG«éÿ{ayº\x1dÊÚë,j\a¢f£¢·hïêÿêçz_è®\x03(éÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?¨èÚ&£ø§~á¶iOæ¬z·vØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?I¥
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH 1/3] [lockup detector] sync touch_*_watchdog back to old semantics
2010-09-01 15:51 [PATCH 1/3] [lockup detector] sync touch_*_watchdog back to old semantics Don Zickus
@ 2010-09-01 16:36 ` Cyrill Gorcunov
2010-09-01 17:15 ` Ingo Molnar
1 sibling, 0 replies; 9+ messages in thread
From: Cyrill Gorcunov @ 2010-09-01 16:36 UTC (permalink / raw)
To: Don Zickus; +Cc: Ingo Molnar, peterz, fweisbec, linux-kernel
On Wed, Sep 01, 2010 at 11:51:12AM -0400, Don Zickus wrote:
> Top posting because droid won't let me bottom post
>
> This patch was the result of a regression with acpi and preempt.
> Akpm asked that I not change the semantics of the old touch_nmi_watchdog.
> So I tried to revert to the old behaviour.
>
> Sorry for not properly explaining that.
>
yup, stareing at old behaviour I think there is a place where we could
get rid of traversing all cpus: native_cpu_up() -- I don't get the reason why
watchdog counter should be reset on every other cpu as well. Perhaps
I miss something. On the other hands I think changing behaviour of
touch_nmi_watchdog just for one entry might not be worth thing to do :)
> Cheers,
> Don
>
> Ingo Molnar <mingo@elte.hu> wrote:
>
...
> >Ok - can you cite the old watchdog code, did it really do a nr_cpus
> >loop?
> >
> >Thanks,
> >
> > Ingo
-- Cyrill
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/3] [lockup detector] sync touch_*_watchdog back to old semantics
2010-09-01 15:51 [PATCH 1/3] [lockup detector] sync touch_*_watchdog back to old semantics Don Zickus
2010-09-01 16:36 ` Cyrill Gorcunov
@ 2010-09-01 17:15 ` Ingo Molnar
1 sibling, 0 replies; 9+ messages in thread
From: Ingo Molnar @ 2010-09-01 17:15 UTC (permalink / raw)
To: Don Zickus; +Cc: Cyrill Gorcunov, peterz, fweisbec, linux-kernel
* Don Zickus <dzickus@redhat.com> wrote:
> Top posting because droid won't let me bottom post
>
> This patch was the result of a regression with acpi and preempt. Akpm
> asked that I not change the semantics of the old touch_nmi_watchdog.
> So I tried to revert to the old behaviour.
that's ok - i extended the changelog a bit before applying the patch.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 0/3] lockup detector fixes
@ 2010-09-01 3:00 Don Zickus
2010-09-01 3:00 ` [PATCH 1/3] [lockup detector] sync touch_*_watchdog back to old semantics Don Zickus
0 siblings, 1 reply; 9+ messages in thread
From: Don Zickus @ 2010-09-01 3:00 UTC (permalink / raw)
To: mingo; +Cc: peterz, gorcunov, fweisbec, linux-kernel, Don Zickus
A bunch of fixes for the lockup detector.
Tested on intel/amd on top of ingo's tip branch.
Akinobu Mita (2):
lockup_detector: convert cpu notifier to return encapsulate errno
value
lockup_detector: remove unnecessary panic_notifier
Don Zickus (1):
[lockup detector] sync touch_*_watchdog back to old semantics
kernel/watchdog.c | 53 +++++++++++++++++++++++------------------------------
1 files changed, 23 insertions(+), 30 deletions(-)
--
1.7.2.2
^ permalink raw reply [flat|nested] 9+ messages in thread* [PATCH 1/3] [lockup detector] sync touch_*_watchdog back to old semantics
2010-09-01 3:00 [PATCH 0/3] lockup detector fixes Don Zickus
@ 2010-09-01 3:00 ` Don Zickus
2010-09-01 5:30 ` Ingo Molnar
0 siblings, 1 reply; 9+ messages in thread
From: Don Zickus @ 2010-09-01 3:00 UTC (permalink / raw)
To: mingo; +Cc: peterz, gorcunov, fweisbec, linux-kernel, Don Zickus
During my rewrite, the semantics of touch_nmi_watchdog and
touch_softlockup_watchdog changed enough to break some drivers
(mostly over preemptable regions).
This change brings those touch_*_watchdog functions back in line
to how they used to work.
Signed-off-by: Don Zickus <dzickus@redhat.com>
---
kernel/watchdog.c | 17 ++++++++++++-----
1 files changed, 12 insertions(+), 5 deletions(-)
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 0d53c8e..7f9c3c5 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -122,7 +122,7 @@ static void __touch_watchdog(void)
void touch_softlockup_watchdog(void)
{
- __get_cpu_var(watchdog_touch_ts) = 0;
+ __raw_get_cpu_var(watchdog_touch_ts) = 0;
}
EXPORT_SYMBOL(touch_softlockup_watchdog);
@@ -142,7 +142,14 @@ void touch_all_softlockup_watchdogs(void)
#ifdef CONFIG_HARDLOCKUP_DETECTOR
void touch_nmi_watchdog(void)
{
- __get_cpu_var(watchdog_nmi_touch) = true;
+ if (watchdog_enabled) {
+ unsigned cpu;
+
+ for_each_present_cpu(cpu) {
+ if (per_cpu(watchdog_nmi_touch, cpu) != true)
+ per_cpu(watchdog_nmi_touch, cpu) = true;
+ }
+ }
touch_softlockup_watchdog();
}
EXPORT_SYMBOL(touch_nmi_watchdog);
@@ -433,6 +440,9 @@ static int watchdog_enable(int cpu)
wake_up_process(p);
}
+ /* if any cpu succeeds, watchdog is considered enabled for the system */
+ watchdog_enabled = 1;
+
return 0;
}
@@ -455,9 +465,6 @@ static void watchdog_disable(int cpu)
per_cpu(softlockup_watchdog, cpu) = NULL;
kthread_stop(p);
}
-
- /* if any cpu succeeds, watchdog is considered enabled for the system */
- watchdog_enabled = 1;
}
static void watchdog_enable_all_cpus(void)
--
1.7.2.2
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH 1/3] [lockup detector] sync touch_*_watchdog back to old semantics
2010-09-01 3:00 ` [PATCH 1/3] [lockup detector] sync touch_*_watchdog back to old semantics Don Zickus
@ 2010-09-01 5:30 ` Ingo Molnar
2010-09-01 6:00 ` Cyrill Gorcunov
0 siblings, 1 reply; 9+ messages in thread
From: Ingo Molnar @ 2010-09-01 5:30 UTC (permalink / raw)
To: Don Zickus; +Cc: peterz, gorcunov, fweisbec, linux-kernel
* Don Zickus <dzickus@redhat.com> wrote:
> void touch_nmi_watchdog(void)
> {
> - __get_cpu_var(watchdog_nmi_touch) = true;
> + if (watchdog_enabled) {
> + unsigned cpu;
> +
> + for_each_present_cpu(cpu) {
> + if (per_cpu(watchdog_nmi_touch, cpu) != true)
> + per_cpu(watchdog_nmi_touch, cpu) = true;
> + }
Hm, this is going to be a scalability nightmare with lots of CPUs. Not
only do we have a nr_cpus loop, but we touch per-cpu areas of _other_
CPUs - a big scalability nono.
Why do we need to do this? We never needed to touch other CPU's NMI
lockup accounting data areas - why has this changed? The changelog does
not explain this.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH 1/3] [lockup detector] sync touch_*_watchdog back to old semantics
2010-09-01 5:30 ` Ingo Molnar
@ 2010-09-01 6:00 ` Cyrill Gorcunov
2010-09-01 7:01 ` Cyrill Gorcunov
0 siblings, 1 reply; 9+ messages in thread
From: Cyrill Gorcunov @ 2010-09-01 6:00 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Don Zickus, peterz, fweisbec, linux-kernel
On 9/1/10, Ingo Molnar <mingo@elte.hu> wrote:
>
> * Don Zickus <dzickus@redhat.com> wrote:
>
>> void touch_nmi_watchdog(void)
>> {
>> - __get_cpu_var(watchdog_nmi_touch) = true;
>> + if (watchdog_enabled) {
>> + unsigned cpu;
>> +
>> + for_each_present_cpu(cpu) {
>> + if (per_cpu(watchdog_nmi_touch, cpu) != true)
>> + per_cpu(watchdog_nmi_touch, cpu) = true;
>> + }
>
> Hm, this is going to be a scalability nightmare with lots of CPUs. Not
> only do we have a nr_cpus loop, but we touch per-cpu areas of _other_
> CPUs - a big scalability nono.
>
> Why do we need to do this? We never needed to touch other CPU's NMI
> lockup accounting data areas - why has this changed? The changelog does
> not explain this.
>
> Thanks,
>
> Ingo
>
I believe this came from old nmi watchdog code where it might be
useful when nmi watchdog activated via io-apic. I'm trying to figure
out if we really need it still.
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH 1/3] [lockup detector] sync touch_*_watchdog back to old semantics
2010-09-01 6:00 ` Cyrill Gorcunov
@ 2010-09-01 7:01 ` Cyrill Gorcunov
2010-09-01 7:20 ` Ingo Molnar
0 siblings, 1 reply; 9+ messages in thread
From: Cyrill Gorcunov @ 2010-09-01 7:01 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Don Zickus, peterz, fweisbec, linux-kernel
On 9/1/10, Cyrill Gorcunov <gorcunov@gmail.com> wrote:
> On 9/1/10, Ingo Molnar <mingo@elte.hu> wrote:
>>
>> * Don Zickus <dzickus@redhat.com> wrote:
>>
>>> void touch_nmi_watchdog(void)
>>> {
>>> - __get_cpu_var(watchdog_nmi_touch) = true;
>>> + if (watchdog_enabled) {
>>> + unsigned cpu;
>>> +
>>> + for_each_present_cpu(cpu) {
>>> + if (per_cpu(watchdog_nmi_touch, cpu) != true)
>>> + per_cpu(watchdog_nmi_touch, cpu) = true;
>>> + }
>>
>> Hm, this is going to be a scalability nightmare with lots of CPUs. Not
>> only do we have a nr_cpus loop, but we touch per-cpu areas of _other_
>> CPUs - a big scalability nono.
>>
>> Why do we need to do this? We never needed to touch other CPU's NMI
>> lockup accounting data areas - why has this changed? The changelog does
>> not explain this.
>>
>> Thanks,
>>
>> Ingo
>>
> I believe this came from old nmi watchdog code where it might be
> useful when nmi watchdog activated via io-apic. I'm trying to figure
> out if we really need it still.
>
Well, we can't drop it or make per-cpu specific, for example we need
it in case of panic with watchdog enabled and panic timeout set, or
boot delay set and etc. Seems same applies to printk_delay. Hmm...
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH 1/3] [lockup detector] sync touch_*_watchdog back to old semantics
2010-09-01 7:01 ` Cyrill Gorcunov
@ 2010-09-01 7:20 ` Ingo Molnar
2010-09-01 7:42 ` Cyrill Gorcunov
0 siblings, 1 reply; 9+ messages in thread
From: Ingo Molnar @ 2010-09-01 7:20 UTC (permalink / raw)
To: Cyrill Gorcunov; +Cc: Don Zickus, peterz, fweisbec, linux-kernel
* Cyrill Gorcunov <gorcunov@gmail.com> wrote:
> On 9/1/10, Cyrill Gorcunov <gorcunov@gmail.com> wrote:
> > On 9/1/10, Ingo Molnar <mingo@elte.hu> wrote:
> >>
> >> * Don Zickus <dzickus@redhat.com> wrote:
> >>
> >>> void touch_nmi_watchdog(void)
> >>> {
> >>> - __get_cpu_var(watchdog_nmi_touch) = true;
> >>> + if (watchdog_enabled) {
> >>> + unsigned cpu;
> >>> +
> >>> + for_each_present_cpu(cpu) {
> >>> + if (per_cpu(watchdog_nmi_touch, cpu) != true)
> >>> + per_cpu(watchdog_nmi_touch, cpu) = true;
> >>> + }
> >>
> >> Hm, this is going to be a scalability nightmare with lots of CPUs. Not
> >> only do we have a nr_cpus loop, but we touch per-cpu areas of _other_
> >> CPUs - a big scalability nono.
> >>
> >> Why do we need to do this? We never needed to touch other CPU's NMI
> >> lockup accounting data areas - why has this changed? The changelog does
> >> not explain this.
> >>
> >> Thanks,
> >>
> >> Ingo
> >>
> > I believe this came from old nmi watchdog code where it might be
> > useful when nmi watchdog activated via io-apic. I'm trying to figure
> > out if we really need it still.
>
> Well, we can't drop it or make per-cpu specific, for example we need
> it in case of panic with watchdog enabled and panic timeout set, or
> boot delay set and etc. Seems same applies to printk_delay. Hmm...
Ok - can you cite the old watchdog code, did it really do a nr_cpus
loop?
Thanks,
Ingo
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH 1/3] [lockup detector] sync touch_*_watchdog back to old semantics
2010-09-01 7:20 ` Ingo Molnar
@ 2010-09-01 7:42 ` Cyrill Gorcunov
0 siblings, 0 replies; 9+ messages in thread
From: Cyrill Gorcunov @ 2010-09-01 7:42 UTC (permalink / raw)
To: Ingo Molnar
Cc: Don Zickus, peterz@infradead.org, fweisbec@gmail.com,
linux-kernel@vger.kernel.org
On Wednesday, September 1, 2010, Ingo Molnar <mingo@elte.hu> wrote:
>
> * Cyrill Gorcunov <gorcunov@gmail.com> wrote:
>
>> On 9/1/10, Cyrill Gorcunov <gorcunov@gmail.com> wrote:
>> > On 9/1/10, Ingo Molnar <mingo@elte.hu> wrote:
>> >>
>> >> * Don Zickus <dzickus@redhat.com> wrote:
>> >>
>> >>> void touch_nmi_watchdog(void)
>> >>> {
>> >>> - __get_cpu_var(watchdog_nmi_touch) = true;
>> >>> + if (watchdog_enabled) {
>> >>> + unsigned cpu;
>> >>> +
>> >>> + for_each_present_cpu(cpu) {
>> >>> + if (per_cpu(watchdog_nmi_touch, cpu) != true)
>> >>> + per_cpu(watchdog_nmi_touch, cpu) = true;
>> >>> + }
>> >>
>> >> Hm, this is going to be a scalability nightmare with lots of CPUs. Not
>> >> only do we have a nr_cpus loop, but we touch per-cpu areas of _other_
>> >> CPUs - a big scalability nono.
>> >>
>> >> Why do we need to do this? We never needed to touch other CPU's NMI
>> >> lockup accounting data areas - why has this changed? The changelog does
>> >> not explain this.
>> >>
>> >> Thanks,
>> >>
>> >> Ingo
>> >>
>> > I believe this came from old nmi watchdog code where it might be
>> > useful when nmi watchdog activated via io-apic. I'm trying to figure
>> > out if we really need it still.
>>
>> Well, we can't drop it or make per-cpu specific, for example we need
>> it in case of panic with watchdog enabled and panic timeout set, or
>> boot delay set and etc. Seems same applies to printk_delay. Hmm...
>
> Ok - can you cite the old watchdog code, did it really do a nr_cpus
> loop?
>
> Thanks,
>
> Ingo
>
Yes, previous touch_nmi_watchdog really did a loop as
for_each_present_cpu and touching per-cpu variable
(bad format)
void touch_nmi_watchdog(void)
{
if (nmi_watchdog_active()) {
unsigned cpu;
for_each_present_cpu(cpu) {
if (per_cpu(nmi_touch,cpu) != 1)
per_cpu(nmi_touch, cpu) = 1;
}
}
touch_softlockup_watchdog();
}
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2010-09-01 17:15 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-09-01 15:51 [PATCH 1/3] [lockup detector] sync touch_*_watchdog back to old semantics Don Zickus
2010-09-01 16:36 ` Cyrill Gorcunov
2010-09-01 17:15 ` Ingo Molnar
-- strict thread matches above, loose matches on Subject: below --
2010-09-01 3:00 [PATCH 0/3] lockup detector fixes Don Zickus
2010-09-01 3:00 ` [PATCH 1/3] [lockup detector] sync touch_*_watchdog back to old semantics Don Zickus
2010-09-01 5:30 ` Ingo Molnar
2010-09-01 6:00 ` Cyrill Gorcunov
2010-09-01 7:01 ` Cyrill Gorcunov
2010-09-01 7:20 ` Ingo Molnar
2010-09-01 7:42 ` Cyrill Gorcunov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox