* Re: [PATCH] fix BUG using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog
2010-08-20 2:57 ` Don Zickus
@ 2010-08-20 3:42 ` Andrew Morton
2010-08-20 12:34 ` Don Zickus
2010-08-26 17:17 ` acpi_os_stall() and touch_nmi_watchdog() (was Re: [PATCH] fix BUG using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog) Len Brown
2010-08-20 15:02 ` [PATCH] fix BUG using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog Yong Zhang
2010-08-26 10:14 ` Maxim Levitsky
2 siblings, 2 replies; 9+ messages in thread
From: Andrew Morton @ 2010-08-20 3:42 UTC (permalink / raw)
To: Don Zickus
Cc: Frederic Weisbecker, Len Brown, Sergey Senozhatsky, Yong Zhang,
Peter Zijlstra, Ingo Molnar, linux-kernel, linux-acpi,
Andy Grover, H. Peter Anvin
On Thu, 19 Aug 2010 22:57:49 -0400 Don Zickus <dzickus@redhat.com> wrote:
> On Wed, Aug 18, 2010 at 01:01:56PM -0700, Andrew Morton wrote:
> > The surprise new requirement that touch_nmi_watchdog() be called from
> > non-preemptible code does seem to make sense IMO. It's hard to see why
> > anyone would be touching the watchdog unless he's spinning in irqs-off
> > code. Except, of course, when we have a utility function which can be
> > called from wither irqs-on or irqs-off: acpi_os_stall().
> >
> > That being said, it's not good to introduce new API requirements by
> > accident! An audit of all callers should first be performed, at least.
> >
> >
> > The surprise new requirement that touch_softlockup_watchdog() be called
> > from non-preemptible code doesn't make sense IMO. If I have a piece of
> > code in the kernel which I expect to sit in TASK_UNINTERRUPTIBLE state
> > for three minutes waiting for my egg to boil, I should be able to do
> > that and I should be able to touch the softlockup detector without
> > needing to go non-preemptible.
>
> Ok, so here is my patch that syncs the touch_*_watchdog back in line with
> the old semantics. Hopefully this will undo any harm I caused.
>
> ------------cut -->---------------------------
>
> >From b372e821c804982438db090db6b4a2f753c78091 Mon Sep 17 00:00:00 2001
> From: Don Zickus <dzickus@redhat.com>
> Date: Thu, 19 Aug 2010 22:48:26 -0400
> Subject: [PATCH] [lockup detector] sync touch_*_watchdog back to old semantics
>
> During my rewrite, the semantics of touch_nmi_watchdog and
> touch_softlockup_watchdog changed enough to break some drivers
> (mostly over preemptable regions).
>
> This change brings those touch_*_watchdog functions back in line
> to how they used to work.
>
> Signed-off-by: Don Zickus <dzickus@redhat.com>
> ---
> kernel/watchdog.c | 17 ++++++++++++-----
> 1 files changed, 12 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 613bc1f..99e35a2 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -122,7 +122,7 @@ static void __touch_watchdog(void)
>
> void touch_softlockup_watchdog(void)
> {
> - __get_cpu_var(watchdog_touch_ts) = 0;
> + __raw_get_cpu_var(watchdog_touch_ts) = 0;
> }
> EXPORT_SYMBOL(touch_softlockup_watchdog);
>
> @@ -142,7 +142,14 @@ void touch_all_softlockup_watchdogs(void)
> #ifdef CONFIG_HARDLOCKUP_DETECTOR
> void touch_nmi_watchdog(void)
> {
> - __get_cpu_var(watchdog_nmi_touch) = true;
> + if (watchdog_enabled) {
> + unsigned cpu;
> +
> + for_each_present_cpu(cpu) {
> + if (per_cpu(watchdog_nmi_touch, cpu) != true)
> + per_cpu(watchdog_nmi_touch, cpu) = true;
> + }
> + }
> touch_softlockup_watchdog();
> }
> EXPORT_SYMBOL(touch_nmi_watchdog);
> @@ -430,6 +437,9 @@ static int watchdog_enable(int cpu)
> wake_up_process(p);
> }
>
> + /* if any cpu succeeds, watchdog is considered enabled for the system */
> + watchdog_enabled = 1;
> +
> return 0;
> }
>
> @@ -452,9 +462,6 @@ static void watchdog_disable(int cpu)
> per_cpu(softlockup_watchdog, cpu) = NULL;
> kthread_stop(p);
> }
> -
> - /* if any cpu succeeds, watchdog is considered enabled for the system */
> - watchdog_enabled = 1;
> }
>
> static void watchdog_enable_all_cpus(void)
hm, the code seems a bit screwy. Maybe it was always thus.
watchdog_enabled gets set in the per-cpu function but it gets cleared
in the all-cpus function. Asymmetric.
Also afacit the action of cpu-hotunplug+cpu-hotplug will reenable the
watchdog on a CPU which was supposed to have it disabled. Perhaps you
could recheck that and make sure it all makes sense - perhaps we need a
separate state variable which is purely "current setting of
/proc/sys/kernel/nmi_watchdog" and doesn't get altered internally.
Anyway, I'll be disappearing for a few days so perhaps Frederic or hpa
can help get this all fixed/merged up?
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH] fix BUG using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog
2010-08-20 3:42 ` Andrew Morton
@ 2010-08-20 12:34 ` Don Zickus
2010-08-26 17:17 ` acpi_os_stall() and touch_nmi_watchdog() (was Re: [PATCH] fix BUG using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog) Len Brown
1 sibling, 0 replies; 9+ messages in thread
From: Don Zickus @ 2010-08-20 12:34 UTC (permalink / raw)
To: Andrew Morton
Cc: Frederic Weisbecker, Len Brown, Sergey Senozhatsky, Yong Zhang,
Peter Zijlstra, Ingo Molnar, linux-kernel, linux-acpi,
Andy Grover, H. Peter Anvin
On Thu, Aug 19, 2010 at 08:42:56PM -0700, Andrew Morton wrote:
> On Thu, 19 Aug 2010 22:57:49 -0400 Don Zickus <dzickus@redhat.com> wrote:
>
> > On Wed, Aug 18, 2010 at 01:01:56PM -0700, Andrew Morton wrote:
> > @@ -430,6 +437,9 @@ static int watchdog_enable(int cpu)
> > wake_up_process(p);
> > }
> >
> > + /* if any cpu succeeds, watchdog is considered enabled for the system */
> > + watchdog_enabled = 1;
> > +
> > return 0;
> > }
> >
> > @@ -452,9 +462,6 @@ static void watchdog_disable(int cpu)
> > per_cpu(softlockup_watchdog, cpu) = NULL;
> > kthread_stop(p);
> > }
> > -
> > - /* if any cpu succeeds, watchdog is considered enabled for the system */
> > - watchdog_enabled = 1;
> > }
> >
> > static void watchdog_enable_all_cpus(void)
>
> hm, the code seems a bit screwy. Maybe it was always thus.
No, watchdog_enabled was something newly created for the lockup dectector.
>
> watchdog_enabled gets set in the per-cpu function but it gets cleared
> in the all-cpus function. Asymmetric.
Yes it is by design. I was using watchdog_enabled as a global state
variable. As soon as one cpu was enabled, I would set the bit. But only
if all the cpus disabled the watchdog would I clear the bit.
>
> Also afacit the action of cpu-hotunplug+cpu-hotplug will reenable the
> watchdog on a CPU which was supposed to have it disabled. Perhaps you
> could recheck that and make sure it all makes sense - perhaps we need a
> separate state variable which is purely "current setting of
> /proc/sys/kernel/nmi_watchdog" and doesn't get altered internally.
I wasn't tracking it on a per cpu basis. I didn't see a need to. The
watchdog should globally be on/off across the system. If a system comes
up and one of the cpus could not bring the watchdog online for some
reason, then that is a problem. If a cpu-hotunplug+cpu-hotplug fixes it,
all the better. :-)
Also, if I wanted to track it per cpu, there is a bunch of status bits in
per-cpu variables that could let the code know whether a particular cpu
watchdog is on/off for either hardlockup or softlockup.
Cheers,
Don
^ permalink raw reply [flat|nested] 9+ messages in thread
* acpi_os_stall() and touch_nmi_watchdog() (was Re: [PATCH] fix BUG using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog)
2010-08-20 3:42 ` Andrew Morton
2010-08-20 12:34 ` Don Zickus
@ 2010-08-26 17:17 ` Len Brown
1 sibling, 0 replies; 9+ messages in thread
From: Len Brown @ 2010-08-26 17:17 UTC (permalink / raw)
To: Andrew Morton
Cc: Don Zickus, Frederic Weisbecker, Len Brown, Sergey Senozhatsky,
Yong Zhang, Peter Zijlstra, Ingo Molnar, linux-kernel, linux-acpi,
Andy Grover, H. Peter Anvin
acpi_os_stall() is used in two ways.
The typical way is what triggered this e-mail thread.
It implements the AML "Stall()" operator, and is called
with interrupts enabled with durations <= 100 usec.
So one would expect it to be identical to udelay().
The exception case is when ACPICA calls it with interrupts off
and huge durations when we wrote the poweroff or sleep
register, yet we find outselves still running...
Apparently akpm added touch_nmi_watchdog() to keep the
watchdog from firing in this exception case.
Is it useful to have the watchdog running when
we are waiting for firmware to poweroff the machine?
If no, maybe we should turn it off as part of the shutdown
process rather than using yet another invocation
of touch_nmi_watchdog()?
Is calling delay() with IRQs disabled the best thing
we can do after we ask the firmware to cut power
and it takes a long time?
thanks,
Len Brown, Intel Open Source Technology Center
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] fix BUG using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog
2010-08-20 2:57 ` Don Zickus
2010-08-20 3:42 ` Andrew Morton
@ 2010-08-20 15:02 ` Yong Zhang
2010-08-26 10:14 ` Maxim Levitsky
2 siblings, 0 replies; 9+ messages in thread
From: Yong Zhang @ 2010-08-20 15:02 UTC (permalink / raw)
To: Don Zickus
Cc: Andrew Morton, Frederic Weisbecker, Len Brown, Sergey Senozhatsky,
Peter Zijlstra, Ingo Molnar, linux-kernel, linux-acpi,
Andy Grover
On Thu, Aug 19, 2010 at 10:57:49PM -0400, Don Zickus wrote:
> On Wed, Aug 18, 2010 at 01:01:56PM -0700, Andrew Morton wrote:
> > The surprise new requirement that touch_nmi_watchdog() be called from
> > non-preemptible code does seem to make sense IMO. It's hard to see why
> > anyone would be touching the watchdog unless he's spinning in irqs-off
> > code. Except, of course, when we have a utility function which can be
> > called from wither irqs-on or irqs-off: acpi_os_stall().
> >
> > That being said, it's not good to introduce new API requirements by
> > accident! An audit of all callers should first be performed, at least.
> >
> >
> > The surprise new requirement that touch_softlockup_watchdog() be called
> > from non-preemptible code doesn't make sense IMO. If I have a piece of
> > code in the kernel which I expect to sit in TASK_UNINTERRUPTIBLE state
> > for three minutes waiting for my egg to boil, I should be able to do
> > that and I should be able to touch the softlockup detector without
> > needing to go non-preemptible.
>
> Ok, so here is my patch that syncs the touch_*_watchdog back in line with
> the old semantics. Hopefully this will undo any harm I caused.
>
> ------------cut -->---------------------------
>
> >From b372e821c804982438db090db6b4a2f753c78091 Mon Sep 17 00:00:00 2001
> From: Don Zickus <dzickus@redhat.com>
> Date: Thu, 19 Aug 2010 22:48:26 -0400
> Subject: [PATCH] [lockup detector] sync touch_*_watchdog back to old semantics
>
> During my rewrite, the semantics of touch_nmi_watchdog and
> touch_softlockup_watchdog changed enough to break some drivers
> (mostly over preemptable regions).
>
> This change brings those touch_*_watchdog functions back in line
> to how they used to work.
This one looks good to me.
Thank you Don.
-Yong
>
> Signed-off-by: Don Zickus <dzickus@redhat.com>
> ---
> kernel/watchdog.c | 17 ++++++++++++-----
> 1 files changed, 12 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 613bc1f..99e35a2 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -122,7 +122,7 @@ static void __touch_watchdog(void)
>
> void touch_softlockup_watchdog(void)
> {
> - __get_cpu_var(watchdog_touch_ts) = 0;
> + __raw_get_cpu_var(watchdog_touch_ts) = 0;
> }
> EXPORT_SYMBOL(touch_softlockup_watchdog);
>
> @@ -142,7 +142,14 @@ void touch_all_softlockup_watchdogs(void)
> #ifdef CONFIG_HARDLOCKUP_DETECTOR
> void touch_nmi_watchdog(void)
> {
> - __get_cpu_var(watchdog_nmi_touch) = true;
> + if (watchdog_enabled) {
> + unsigned cpu;
> +
> + for_each_present_cpu(cpu) {
> + if (per_cpu(watchdog_nmi_touch, cpu) != true)
> + per_cpu(watchdog_nmi_touch, cpu) = true;
> + }
> + }
> touch_softlockup_watchdog();
> }
> EXPORT_SYMBOL(touch_nmi_watchdog);
> @@ -430,6 +437,9 @@ static int watchdog_enable(int cpu)
> wake_up_process(p);
> }
>
> + /* if any cpu succeeds, watchdog is considered enabled for the system */
> + watchdog_enabled = 1;
> +
> return 0;
> }
>
> @@ -452,9 +462,6 @@ static void watchdog_disable(int cpu)
> per_cpu(softlockup_watchdog, cpu) = NULL;
> kthread_stop(p);
> }
> -
> - /* if any cpu succeeds, watchdog is considered enabled for the system */
> - watchdog_enabled = 1;
> }
>
> static void watchdog_enable_all_cpus(void)
> --
> 1.7.2.1
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH] fix BUG using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog
2010-08-20 2:57 ` Don Zickus
2010-08-20 3:42 ` Andrew Morton
2010-08-20 15:02 ` [PATCH] fix BUG using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog Yong Zhang
@ 2010-08-26 10:14 ` Maxim Levitsky
2010-08-26 14:40 ` Don Zickus
2 siblings, 1 reply; 9+ messages in thread
From: Maxim Levitsky @ 2010-08-26 10:14 UTC (permalink / raw)
To: Don Zickus
Cc: Andrew Morton, Frederic Weisbecker, Len Brown, Sergey Senozhatsky,
Yong Zhang, Peter Zijlstra, Ingo Molnar, linux-kernel, linux-acpi,
Andy Grover
On Thu, 2010-08-19 at 22:57 -0400, Don Zickus wrote:
> On Wed, Aug 18, 2010 at 01:01:56PM -0700, Andrew Morton wrote:
> > The surprise new requirement that touch_nmi_watchdog() be called from
> > non-preemptible code does seem to make sense IMO. It's hard to see why
> > anyone would be touching the watchdog unless he's spinning in irqs-off
> > code. Except, of course, when we have a utility function which can be
> > called from wither irqs-on or irqs-off: acpi_os_stall().
> >
> > That being said, it's not good to introduce new API requirements by
> > accident! An audit of all callers should first be performed, at least.
> >
> >
> > The surprise new requirement that touch_softlockup_watchdog() be called
> > from non-preemptible code doesn't make sense IMO. If I have a piece of
> > code in the kernel which I expect to sit in TASK_UNINTERRUPTIBLE state
> > for three minutes waiting for my egg to boil, I should be able to do
> > that and I should be able to touch the softlockup detector without
> > needing to go non-preemptible.
>
> Ok, so here is my patch that syncs the touch_*_watchdog back in line with
> the old semantics. Hopefully this will undo any harm I caused.
Was this patch forgotten?
Best regards,
Maxim Levitsky
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] fix BUG using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog
2010-08-26 10:14 ` Maxim Levitsky
@ 2010-08-26 14:40 ` Don Zickus
0 siblings, 0 replies; 9+ messages in thread
From: Don Zickus @ 2010-08-26 14:40 UTC (permalink / raw)
To: Maxim Levitsky
Cc: Andrew Morton, Frederic Weisbecker, Len Brown, Sergey Senozhatsky,
Yong Zhang, Peter Zijlstra, Ingo Molnar, linux-kernel, linux-acpi,
Andy Grover
On Thu, Aug 26, 2010 at 01:14:31PM +0300, Maxim Levitsky wrote:
> On Thu, 2010-08-19 at 22:57 -0400, Don Zickus wrote:
> > On Wed, Aug 18, 2010 at 01:01:56PM -0700, Andrew Morton wrote:
> > > The surprise new requirement that touch_nmi_watchdog() be called from
> > > non-preemptible code does seem to make sense IMO. It's hard to see why
> > > anyone would be touching the watchdog unless he's spinning in irqs-off
> > > code. Except, of course, when we have a utility function which can be
> > > called from wither irqs-on or irqs-off: acpi_os_stall().
> > >
> > > That being said, it's not good to introduce new API requirements by
> > > accident! An audit of all callers should first be performed, at least.
> > >
> > >
> > > The surprise new requirement that touch_softlockup_watchdog() be called
> > > from non-preemptible code doesn't make sense IMO. If I have a piece of
> > > code in the kernel which I expect to sit in TASK_UNINTERRUPTIBLE state
> > > for three minutes waiting for my egg to boil, I should be able to do
> > > that and I should be able to touch the softlockup detector without
> > > needing to go non-preemptible.
> >
> > Ok, so here is my patch that syncs the touch_*_watchdog back in line with
> > the old semantics. Hopefully this will undo any harm I caused.
>
> Was this patch forgotten?
Hm, apparently it was separated out by the mail server. Here it is again
with some of the headers removed I guess.
Cheers,
Don
From: Don Zickus <dzickus@redhat.com>
Date: Thu, 19 Aug 2010 22:48:26 -0400
Subject: [PATCH] [lockup detector] sync touch_*_watchdog back to old semantics
During my rewrite, the semantics of touch_nmi_watchdog and
touch_softlockup_watchdog changed enough to break some drivers
(mostly over preemptable regions).
This change brings those touch_*_watchdog functions back in line
to how they used to work.
Signed-off-by: Don Zickus <dzickus@redhat.com>
---
kernel/watchdog.c | 17 ++++++++++++-----
1 files changed, 12 insertions(+), 5 deletions(-)
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 613bc1f..99e35a2 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -122,7 +122,7 @@ static void __touch_watchdog(void)
void touch_softlockup_watchdog(void)
{
- __get_cpu_var(watchdog_touch_ts) = 0;
+ __raw_get_cpu_var(watchdog_touch_ts) = 0;
}
EXPORT_SYMBOL(touch_softlockup_watchdog);
@@ -142,7 +142,14 @@ void touch_all_softlockup_watchdogs(void)
#ifdef CONFIG_HARDLOCKUP_DETECTOR
void touch_nmi_watchdog(void)
{
- __get_cpu_var(watchdog_nmi_touch) = true;
+ if (watchdog_enabled) {
+ unsigned cpu;
+
+ for_each_present_cpu(cpu) {
+ if (per_cpu(watchdog_nmi_touch, cpu) != true)
+ per_cpu(watchdog_nmi_touch, cpu) = true;
+ }
+ }
touch_softlockup_watchdog();
}
EXPORT_SYMBOL(touch_nmi_watchdog);
@@ -430,6 +437,9 @@ static int watchdog_enable(int cpu)
wake_up_process(p);
}
+ /* if any cpu succeeds, watchdog is considered enabled for the system */
+ watchdog_enabled = 1;
+
return 0;
}
@@ -452,9 +462,6 @@ static void watchdog_disable(int cpu)
per_cpu(softlockup_watchdog, cpu) = NULL;
kthread_stop(p);
}
-
- /* if any cpu succeeds, watchdog is considered enabled for the system */
- watchdog_enabled = 1;
}
static void watchdog_enable_all_cpus(void)
--
1.7.2.1
^ permalink raw reply related [flat|nested] 9+ messages in thread