public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/3] [lockup detector] sync touch_*_watchdog back to old semantics
  2010-09-01  3:00 [PATCH 0/3] lockup detector fixes Don Zickus
@ 2010-09-01  3:00 ` Don Zickus
  2010-09-01  5:30   ` Ingo Molnar
  0 siblings, 1 reply; 9+ messages in thread
From: Don Zickus @ 2010-09-01  3:00 UTC (permalink / raw)
  To: mingo; +Cc: peterz, gorcunov, fweisbec, linux-kernel, Don Zickus

During my rewrite, the semantics of touch_nmi_watchdog and
touch_softlockup_watchdog changed enough to break some drivers
(mostly over preemptable regions).

This change brings those touch_*_watchdog functions back in line
to how they used to work.

Signed-off-by: Don Zickus <dzickus@redhat.com>
---
 kernel/watchdog.c |   17 ++++++++++++-----
 1 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 0d53c8e..7f9c3c5 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -122,7 +122,7 @@ static void __touch_watchdog(void)
 
 void touch_softlockup_watchdog(void)
 {
-	__get_cpu_var(watchdog_touch_ts) = 0;
+	__raw_get_cpu_var(watchdog_touch_ts) = 0;
 }
 EXPORT_SYMBOL(touch_softlockup_watchdog);
 
@@ -142,7 +142,14 @@ void touch_all_softlockup_watchdogs(void)
 #ifdef CONFIG_HARDLOCKUP_DETECTOR
 void touch_nmi_watchdog(void)
 {
-	__get_cpu_var(watchdog_nmi_touch) = true;
+	if (watchdog_enabled) {
+		unsigned cpu;
+
+		for_each_present_cpu(cpu) {
+			if (per_cpu(watchdog_nmi_touch, cpu) != true)
+				per_cpu(watchdog_nmi_touch, cpu) = true;
+		}
+	}
 	touch_softlockup_watchdog();
 }
 EXPORT_SYMBOL(touch_nmi_watchdog);
@@ -433,6 +440,9 @@ static int watchdog_enable(int cpu)
 		wake_up_process(p);
 	}
 
+	/* if any cpu succeeds, watchdog is considered enabled for the system */
+	watchdog_enabled = 1;
+
 	return 0;
 }
 
@@ -455,9 +465,6 @@ static void watchdog_disable(int cpu)
 		per_cpu(softlockup_watchdog, cpu) = NULL;
 		kthread_stop(p);
 	}
-
-	/* if any cpu succeeds, watchdog is considered enabled for the system */
-	watchdog_enabled = 1;
 }
 
 static void watchdog_enable_all_cpus(void)
-- 
1.7.2.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/3] [lockup detector] sync touch_*_watchdog back to old semantics
  2010-09-01  3:00 ` [PATCH 1/3] [lockup detector] sync touch_*_watchdog back to old semantics Don Zickus
@ 2010-09-01  5:30   ` Ingo Molnar
  2010-09-01  6:00     ` Cyrill Gorcunov
  0 siblings, 1 reply; 9+ messages in thread
From: Ingo Molnar @ 2010-09-01  5:30 UTC (permalink / raw)
  To: Don Zickus; +Cc: peterz, gorcunov, fweisbec, linux-kernel


* Don Zickus <dzickus@redhat.com> wrote:

>  void touch_nmi_watchdog(void)
>  {
> -	__get_cpu_var(watchdog_nmi_touch) = true;
> +	if (watchdog_enabled) {
> +		unsigned cpu;
> +
> +		for_each_present_cpu(cpu) {
> +			if (per_cpu(watchdog_nmi_touch, cpu) != true)
> +				per_cpu(watchdog_nmi_touch, cpu) = true;
> +		}

Hm, this is going to be a scalability nightmare with lots of CPUs. Not 
only do we have a nr_cpus loop, but we touch per-cpu areas of _other_ 
CPUs - a big scalability nono.

Why do we need to do this? We never needed to touch other CPU's NMI 
lockup accounting data areas - why has this changed? The changelog does 
not explain this.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/3] [lockup detector] sync touch_*_watchdog back to old semantics
  2010-09-01  5:30   ` Ingo Molnar
@ 2010-09-01  6:00     ` Cyrill Gorcunov
  2010-09-01  7:01       ` Cyrill Gorcunov
  0 siblings, 1 reply; 9+ messages in thread
From: Cyrill Gorcunov @ 2010-09-01  6:00 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Don Zickus, peterz, fweisbec, linux-kernel

On 9/1/10, Ingo Molnar <mingo@elte.hu> wrote:
>
> * Don Zickus <dzickus@redhat.com> wrote:
>
>>  void touch_nmi_watchdog(void)
>>  {
>> -	__get_cpu_var(watchdog_nmi_touch) = true;
>> +	if (watchdog_enabled) {
>> +		unsigned cpu;
>> +
>> +		for_each_present_cpu(cpu) {
>> +			if (per_cpu(watchdog_nmi_touch, cpu) != true)
>> +				per_cpu(watchdog_nmi_touch, cpu) = true;
>> +		}
>
> Hm, this is going to be a scalability nightmare with lots of CPUs. Not
> only do we have a nr_cpus loop, but we touch per-cpu areas of _other_
> CPUs - a big scalability nono.
>
> Why do we need to do this? We never needed to touch other CPU's NMI
> lockup accounting data areas - why has this changed? The changelog does
> not explain this.
>
> Thanks,
>
> 	Ingo
>
I believe this came from old nmi watchdog code where it might be
useful when nmi watchdog activated via io-apic. I'm trying to figure
out if we really need it still.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/3] [lockup detector] sync touch_*_watchdog back to old semantics
  2010-09-01  6:00     ` Cyrill Gorcunov
@ 2010-09-01  7:01       ` Cyrill Gorcunov
  2010-09-01  7:20         ` Ingo Molnar
  0 siblings, 1 reply; 9+ messages in thread
From: Cyrill Gorcunov @ 2010-09-01  7:01 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Don Zickus, peterz, fweisbec, linux-kernel

On 9/1/10, Cyrill Gorcunov <gorcunov@gmail.com> wrote:
> On 9/1/10, Ingo Molnar <mingo@elte.hu> wrote:
>>
>> * Don Zickus <dzickus@redhat.com> wrote:
>>
>>>  void touch_nmi_watchdog(void)
>>>  {
>>> -	__get_cpu_var(watchdog_nmi_touch) = true;
>>> +	if (watchdog_enabled) {
>>> +		unsigned cpu;
>>> +
>>> +		for_each_present_cpu(cpu) {
>>> +			if (per_cpu(watchdog_nmi_touch, cpu) != true)
>>> +				per_cpu(watchdog_nmi_touch, cpu) = true;
>>> +		}
>>
>> Hm, this is going to be a scalability nightmare with lots of CPUs. Not
>> only do we have a nr_cpus loop, but we touch per-cpu areas of _other_
>> CPUs - a big scalability nono.
>>
>> Why do we need to do this? We never needed to touch other CPU's NMI
>> lockup accounting data areas - why has this changed? The changelog does
>> not explain this.
>>
>> Thanks,
>>
>> 	Ingo
>>
> I believe this came from old nmi watchdog code where it might be
> useful when nmi watchdog activated via io-apic. I'm trying to figure
> out if we really need it still.
>
Well, we can't drop it or make per-cpu specific, for example we need
it in case of panic with watchdog enabled and panic timeout set, or
boot delay set and etc. Seems same applies to printk_delay. Hmm...

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/3] [lockup detector] sync touch_*_watchdog back to old semantics
  2010-09-01  7:01       ` Cyrill Gorcunov
@ 2010-09-01  7:20         ` Ingo Molnar
  2010-09-01  7:42           ` Cyrill Gorcunov
  0 siblings, 1 reply; 9+ messages in thread
From: Ingo Molnar @ 2010-09-01  7:20 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: Don Zickus, peterz, fweisbec, linux-kernel


* Cyrill Gorcunov <gorcunov@gmail.com> wrote:

> On 9/1/10, Cyrill Gorcunov <gorcunov@gmail.com> wrote:
> > On 9/1/10, Ingo Molnar <mingo@elte.hu> wrote:
> >>
> >> * Don Zickus <dzickus@redhat.com> wrote:
> >>
> >>>  void touch_nmi_watchdog(void)
> >>>  {
> >>> -	__get_cpu_var(watchdog_nmi_touch) = true;
> >>> +	if (watchdog_enabled) {
> >>> +		unsigned cpu;
> >>> +
> >>> +		for_each_present_cpu(cpu) {
> >>> +			if (per_cpu(watchdog_nmi_touch, cpu) != true)
> >>> +				per_cpu(watchdog_nmi_touch, cpu) = true;
> >>> +		}
> >>
> >> Hm, this is going to be a scalability nightmare with lots of CPUs. Not
> >> only do we have a nr_cpus loop, but we touch per-cpu areas of _other_
> >> CPUs - a big scalability nono.
> >>
> >> Why do we need to do this? We never needed to touch other CPU's NMI
> >> lockup accounting data areas - why has this changed? The changelog does
> >> not explain this.
> >>
> >> Thanks,
> >>
> >> 	Ingo
> >>
> > I believe this came from old nmi watchdog code where it might be
> > useful when nmi watchdog activated via io-apic. I'm trying to figure
> > out if we really need it still.
>
> Well, we can't drop it or make per-cpu specific, for example we need 
> it in case of panic with watchdog enabled and panic timeout set, or 
> boot delay set and etc. Seems same applies to printk_delay. Hmm...

Ok - can you cite the old watchdog code, did it really do a nr_cpus 
loop?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/3] [lockup detector] sync touch_*_watchdog back to old semantics
  2010-09-01  7:20         ` Ingo Molnar
@ 2010-09-01  7:42           ` Cyrill Gorcunov
  0 siblings, 0 replies; 9+ messages in thread
From: Cyrill Gorcunov @ 2010-09-01  7:42 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Don Zickus, peterz@infradead.org, fweisbec@gmail.com,
	linux-kernel@vger.kernel.org

On Wednesday, September 1, 2010, Ingo Molnar <mingo@elte.hu> wrote:
>
> * Cyrill Gorcunov <gorcunov@gmail.com> wrote:
>
>> On 9/1/10, Cyrill Gorcunov <gorcunov@gmail.com> wrote:
>> > On 9/1/10, Ingo Molnar <mingo@elte.hu> wrote:
>> >>
>> >> * Don Zickus <dzickus@redhat.com> wrote:
>> >>
>> >>>  void touch_nmi_watchdog(void)
>> >>>  {
>> >>> - __get_cpu_var(watchdog_nmi_touch) = true;
>> >>> + if (watchdog_enabled) {
>> >>> +         unsigned cpu;
>> >>> +
>> >>> +         for_each_present_cpu(cpu) {
>> >>> +                 if (per_cpu(watchdog_nmi_touch, cpu) != true)
>> >>> +                         per_cpu(watchdog_nmi_touch, cpu) = true;
>> >>> +         }
>> >>
>> >> Hm, this is going to be a scalability nightmare with lots of CPUs. Not
>> >> only do we have a nr_cpus loop, but we touch per-cpu areas of _other_
>> >> CPUs - a big scalability nono.
>> >>
>> >> Why do we need to do this? We never needed to touch other CPU's NMI
>> >> lockup accounting data areas - why has this changed? The changelog does
>> >> not explain this.
>> >>
>> >> Thanks,
>> >>
>> >>    Ingo
>> >>
>> > I believe this came from old nmi watchdog code where it might be
>> > useful when nmi watchdog activated via io-apic. I'm trying to figure
>> > out if we really need it still.
>>
>> Well, we can't drop it or make per-cpu specific, for example we need
>> it in case of panic with watchdog enabled and panic timeout set, or
>> boot delay set and etc. Seems same applies to printk_delay. Hmm...
>
> Ok - can you cite the old watchdog code, did it really do a nr_cpus
> loop?
>
> Thanks,
>
>         Ingo
>

Yes, previous touch_nmi_watchdog really did a loop as
for_each_present_cpu and touching per-cpu variable

(bad format)

void touch_nmi_watchdog(void)
{
if (nmi_watchdog_active()) {
  unsigned cpu;
  for_each_present_cpu(cpu) {
    if (per_cpu(nmi_touch,cpu) != 1)
       per_cpu(nmi_touch, cpu) = 1;
    }
  }
touch_softlockup_watchdog();
}

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/3] [lockup detector] sync touch_*_watchdog back to old semantics
@ 2010-09-01 15:51 Don Zickus
  2010-09-01 16:36 ` Cyrill Gorcunov
  2010-09-01 17:15 ` Ingo Molnar
  0 siblings, 2 replies; 9+ messages in thread
From: Don Zickus @ 2010-09-01 15:51 UTC (permalink / raw)
  To: Ingo Molnar, Cyrill Gorcunov; +Cc: peterz, fweisbec, linux-kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=utf-8, Size: 1951 bytes --]

Top posting because droid won't let me bottom post

This patch was the result of a regression with acpi and preempt.  Akpm asked that I not change the semantics of the old touch_nmi_watchdog.  So I tried to revert to the old behaviour.

Sorry for not properly explaining that.

Cheers,
Don

Ingo Molnar <mingo@elte.hu> wrote:

>
>* Cyrill Gorcunov <gorcunov@gmail.com> wrote:
>
>> On 9/1/10, Cyrill Gorcunov <gorcunov@gmail.com> wrote:
>> > On 9/1/10, Ingo Molnar <mingo@elte.hu> wrote:
>> >>
>> >> * Don Zickus <dzickus@redhat.com> wrote:
>> >>
>> >>>  void touch_nmi_watchdog(void)
>> >>>  {
>> >>> -	__get_cpu_var(watchdog_nmi_touch) = true;
>> >>> +	if (watchdog_enabled) {
>> >>> +		unsigned cpu;
>> >>> +
>> >>> +		for_each_present_cpu(cpu) {
>> >>> +			if (per_cpu(watchdog_nmi_touch, cpu) != true)
>> >>> +				per_cpu(watchdog_nmi_touch, cpu) = true;
>> >>> +		}
>> >>
>> >> Hm, this is going to be a scalability nightmare with lots of CPUs. Not
>> >> only do we have a nr_cpus loop, but we touch per-cpu areas of _other_
>> >> CPUs - a big scalability nono.
>> >>
>> >> Why do we need to do this? We never needed to touch other CPU's NMI
>> >> lockup accounting data areas - why has this changed? The changelog does
>> >> not explain this.
>> >>
>> >> Thanks,
>> >>
>> >> 	Ingo
>> >>
>> > I believe this came from old nmi watchdog code where it might be
>> > useful when nmi watchdog activated via io-apic. I'm trying to figure
>> > out if we really need it still.
>>
>> Well, we can't drop it or make per-cpu specific, for example we need 
>> it in case of panic with watchdog enabled and panic timeout set, or 
>> boot delay set and etc. Seems same applies to printk_delay. Hmm...
>
>Ok - can you cite the old watchdog code, did it really do a nr_cpus 
>loop?
>
>Thanks,
>
>	Ingo
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/3] [lockup detector] sync touch_*_watchdog back to old semantics
  2010-09-01 15:51 [PATCH 1/3] [lockup detector] sync touch_*_watchdog back to old semantics Don Zickus
@ 2010-09-01 16:36 ` Cyrill Gorcunov
  2010-09-01 17:15 ` Ingo Molnar
  1 sibling, 0 replies; 9+ messages in thread
From: Cyrill Gorcunov @ 2010-09-01 16:36 UTC (permalink / raw)
  To: Don Zickus; +Cc: Ingo Molnar, peterz, fweisbec, linux-kernel

On Wed, Sep 01, 2010 at 11:51:12AM -0400, Don Zickus wrote:
> Top posting because droid won't let me bottom post
> 
> This patch was the result of a regression with acpi and preempt.
> Akpm asked that I not change the semantics of the old touch_nmi_watchdog.
> So I tried to revert to the old behaviour.
> 
> Sorry for not properly explaining that.
> 

yup, stareing at old behaviour I think there is a place where we could
get rid of traversing all cpus: native_cpu_up() -- I don't get the reason why
watchdog counter should be reset on every other cpu as well. Perhaps
I miss something. On the other hands I think changing behaviour of
touch_nmi_watchdog just for one entry might not be worth thing to do :)

> Cheers,
> Don
> 
> Ingo Molnar <mingo@elte.hu> wrote:
> 
...
> >Ok - can you cite the old watchdog code, did it really do a nr_cpus 
> >loop?
> >
> >Thanks,
> >
> >	Ingo

	-- Cyrill

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/3] [lockup detector] sync touch_*_watchdog back to old semantics
  2010-09-01 15:51 [PATCH 1/3] [lockup detector] sync touch_*_watchdog back to old semantics Don Zickus
  2010-09-01 16:36 ` Cyrill Gorcunov
@ 2010-09-01 17:15 ` Ingo Molnar
  1 sibling, 0 replies; 9+ messages in thread
From: Ingo Molnar @ 2010-09-01 17:15 UTC (permalink / raw)
  To: Don Zickus; +Cc: Cyrill Gorcunov, peterz, fweisbec, linux-kernel


* Don Zickus <dzickus@redhat.com> wrote:

> Top posting because droid won't let me bottom post
> 
> This patch was the result of a regression with acpi and preempt.  Akpm 
> asked that I not change the semantics of the old touch_nmi_watchdog.  
> So I tried to revert to the old behaviour.

that's ok - i extended the changelog a bit before applying the patch.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2010-09-01 17:15 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-09-01 15:51 [PATCH 1/3] [lockup detector] sync touch_*_watchdog back to old semantics Don Zickus
2010-09-01 16:36 ` Cyrill Gorcunov
2010-09-01 17:15 ` Ingo Molnar
  -- strict thread matches above, loose matches on Subject: below --
2010-09-01  3:00 [PATCH 0/3] lockup detector fixes Don Zickus
2010-09-01  3:00 ` [PATCH 1/3] [lockup detector] sync touch_*_watchdog back to old semantics Don Zickus
2010-09-01  5:30   ` Ingo Molnar
2010-09-01  6:00     ` Cyrill Gorcunov
2010-09-01  7:01       ` Cyrill Gorcunov
2010-09-01  7:20         ` Ingo Molnar
2010-09-01  7:42           ` Cyrill Gorcunov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox