* Re: [PATCH] fix BUG using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog
[not found] ` <20100818024802.GA24748@nowhere>
@ 2010-08-18 20:01 ` Andrew Morton
2010-08-19 2:27 ` Don Zickus
2010-08-20 2:57 ` Don Zickus
0 siblings, 2 replies; 9+ messages in thread
From: Andrew Morton @ 2010-08-18 20:01 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: Don Zickus, Len Brown, Sergey Senozhatsky, Yong Zhang,
Peter Zijlstra, Ingo Molnar, linux-kernel, linux-acpi,
Andy Grover
On Wed, 18 Aug 2010 04:48:05 +0200
Frederic Weisbecker <fweisbec@gmail.com> wrote:
> On Tue, Aug 17, 2010 at 09:13:20AM -0400, Don Zickus wrote:
> > On Tue, Aug 17, 2010 at 01:39:48PM +0300, Sergey Senozhatsky wrote:
> > > Please kindly review.
> >
> > I don't have a deep enough understanding of the subtleties between
> > per_cpu, __get_cpu_var, and __raw_get_cpu_var to really say which is
> > correct. To me, all three versions of your patch look they do the same
> > thing.
> >
> > Technically, it seems like preempt_disable/enable would be the correct
> > thing to do. But as someone pointed out earlier, if the code is preempted
> > and switches cpu, then the touch_*_watchdog effectively becomes a no-op
> > (which I guess it can do even with the preempt_disable/enable surrounding
> > it). So I have no idea. I am going to wait for smarter people than me to
> > provide an opinion. :-)
> >
> > Cheers,
> > Don
>
>
> (Adding Len Brown in Cc.
I'm not sure who looks after osl.c. I added linux-acpi to cc.
> Len, this is about acpi_os_stall() that touches the watchdog while
> running in a preemptable section, this triggers warnings because of
> the use of local cpu accessors. We are debating about the appropriate
> way to solve this).
>
> The more I think about it, the more I think that doesn't make sense
> to have touch_nmi_watchdog() callable from preemptable code.
>
> It is buggy by nature.
>
> If you run in a preemptable section, then interrupts can fire, and if
> they can, the nmi watchdog is fine and doesn't need to be touched.
>
> Here the problem is more in the softlockup watchdog, because even if you
> run in a preemptable section, if you run a !CONFIG_PREEMPT kernel, then
> you can't be preempted and the watchdog won't be scheduled until the
> udelay loop finishes. But to solve that you would need cond_resched()
> calls, not touching the watchdog.
>
> Because touching the softlockup watchdog doesn't make sense either
> if you can migrate: you can run the udelay on CPU 0, then migrate on
> CPU 1 and call touch_softlockup_watchdog() from there. Which makes
> definetely no sense. This is buggy.
>
> And because we want to avoid such buggy uses of the touch_whatever_watchdog()
> APIs, these function must continue to check they are called from non-preemptable
> code. Randomly touching the watchdog could hide real lockups to the user.
>
> The problem is on the caller. Considering such udelays loop:
>
> * if it's in a irq disabled section, call touch_nmi_watchdog(), because this
> could prevent the nmi watchdog irq from firing
> * if it's in a non-preemptable section, call touch_softlockup_watchdog(), because
> this could prevent the softlockup watchdog task from beeing scheduled
> * if it's from a preemptable task context, this should call cond_resched() to
> avoid huge latencies on !CONFIG_PREEMPT
>
>
> But acpi_os_stall() seem to be called from 4 different places, and these places
> may run in different context like the above described.
>
> The ACPI code should probably use more specific busy-loop APIs, depending on the
> context it runs.
The touch_nmi_watchdog() was added to acpi_os_stall() by little old me
in 2003. It was committed by Andy with the patch title "ACPI:
Correctly handle NMI watchdog during long stalls (Andrew Morton)". My
title was "ACPI poweroff trigers the NMI watchdog". My changelog was
ACPI poweroff trigers the NMI watchdog. Fix.
(My spelling has improved with age).
So. If we remove that touch, will poweroff still trigger the NMI?
Dunno.
The surprise new requirement that touch_nmi_watchdog() be called from
non-preemptible code does seem to make sense IMO. It's hard to see why
anyone would be touching the watchdog unless he's spinning in irqs-off
code. Except, of course, when we have a utility function which can be
called from wither irqs-on or irqs-off: acpi_os_stall().
That being said, it's not good to introduce new API requirements by
accident! An audit of all callers should first be performed, at least.
The surprise new requirement that touch_softlockup_watchdog() be called
from non-preemptible code doesn't make sense IMO. If I have a piece of
code in the kernel which I expect to sit in TASK_UNINTERRUPTIBLE state
for three minutes waiting for my egg to boil, I should be able to do
that and I should be able to touch the softlockup detector without
needing to go non-preemptible.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] fix BUG using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog
2010-08-18 20:01 ` [PATCH] fix BUG using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog Andrew Morton
@ 2010-08-19 2:27 ` Don Zickus
2010-08-20 2:57 ` Don Zickus
1 sibling, 0 replies; 9+ messages in thread
From: Don Zickus @ 2010-08-19 2:27 UTC (permalink / raw)
To: Andrew Morton
Cc: Frederic Weisbecker, Len Brown, Sergey Senozhatsky, Yong Zhang,
Peter Zijlstra, Ingo Molnar, linux-kernel, linux-acpi,
Andy Grover
On Wed, Aug 18, 2010 at 01:01:56PM -0700, Andrew Morton wrote:
> The surprise new requirement that touch_nmi_watchdog() be called from
> non-preemptible code does seem to make sense IMO. It's hard to see why
> anyone would be touching the watchdog unless he's spinning in irqs-off
> code. Except, of course, when we have a utility function which can be
> called from wither irqs-on or irqs-off: acpi_os_stall().
>
> That being said, it's not good to introduce new API requirements by
> accident! An audit of all callers should first be performed, at least.
>
>
> The surprise new requirement that touch_softlockup_watchdog() be called
> from non-preemptible code doesn't make sense IMO. If I have a piece of
> code in the kernel which I expect to sit in TASK_UNINTERRUPTIBLE state
> for three minutes waiting for my egg to boil, I should be able to do
> that and I should be able to touch the softlockup detector without
> needing to go non-preemptible.
Wow. So after re-reading what the original touch_*_watchdog code did and what I
copied to kernel/watchdog.c, I'm a little embarrassed on how I managed to
mangle the internals of both those functions.
While the idea is the same, the semantics are clearly different.
touch_nmi_watchdog had a for_each_cpu_present loop, which means it didn't
have to deal with the preempt issue.
touch_softlockup_watchdog used __raw_get_cpu_var to excuse itself from
dealing with the preempt issue.
I'll put together a patch that brings those functions back in line with
what they used to be. Sorry for the trouble.
Cheers,
Don
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] fix BUG using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog
2010-08-18 20:01 ` [PATCH] fix BUG using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog Andrew Morton
2010-08-19 2:27 ` Don Zickus
@ 2010-08-20 2:57 ` Don Zickus
2010-08-20 3:42 ` Andrew Morton
` (2 more replies)
1 sibling, 3 replies; 9+ messages in thread
From: Don Zickus @ 2010-08-20 2:57 UTC (permalink / raw)
To: Andrew Morton
Cc: Frederic Weisbecker, Len Brown, Sergey Senozhatsky, Yong Zhang,
Peter Zijlstra, Ingo Molnar, linux-kernel, linux-acpi,
Andy Grover
On Wed, Aug 18, 2010 at 01:01:56PM -0700, Andrew Morton wrote:
> The surprise new requirement that touch_nmi_watchdog() be called from
> non-preemptible code does seem to make sense IMO. It's hard to see why
> anyone would be touching the watchdog unless he's spinning in irqs-off
> code. Except, of course, when we have a utility function which can be
> called from wither irqs-on or irqs-off: acpi_os_stall().
>
> That being said, it's not good to introduce new API requirements by
> accident! An audit of all callers should first be performed, at least.
>
>
> The surprise new requirement that touch_softlockup_watchdog() be called
> from non-preemptible code doesn't make sense IMO. If I have a piece of
> code in the kernel which I expect to sit in TASK_UNINTERRUPTIBLE state
> for three minutes waiting for my egg to boil, I should be able to do
> that and I should be able to touch the softlockup detector without
> needing to go non-preemptible.
Ok, so here is my patch that syncs the touch_*_watchdog back in line with
the old semantics. Hopefully this will undo any harm I caused.
------------cut -->---------------------------
>From b372e821c804982438db090db6b4a2f753c78091 Mon Sep 17 00:00:00 2001
From: Don Zickus <dzickus@redhat.com>
Date: Thu, 19 Aug 2010 22:48:26 -0400
Subject: [PATCH] [lockup detector] sync touch_*_watchdog back to old semantics
During my rewrite, the semantics of touch_nmi_watchdog and
touch_softlockup_watchdog changed enough to break some drivers
(mostly over preemptable regions).
This change brings those touch_*_watchdog functions back in line
to how they used to work.
Signed-off-by: Don Zickus <dzickus@redhat.com>
---
kernel/watchdog.c | 17 ++++++++++++-----
1 files changed, 12 insertions(+), 5 deletions(-)
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 613bc1f..99e35a2 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -122,7 +122,7 @@ static void __touch_watchdog(void)
void touch_softlockup_watchdog(void)
{
- __get_cpu_var(watchdog_touch_ts) = 0;
+ __raw_get_cpu_var(watchdog_touch_ts) = 0;
}
EXPORT_SYMBOL(touch_softlockup_watchdog);
@@ -142,7 +142,14 @@ void touch_all_softlockup_watchdogs(void)
#ifdef CONFIG_HARDLOCKUP_DETECTOR
void touch_nmi_watchdog(void)
{
- __get_cpu_var(watchdog_nmi_touch) = true;
+ if (watchdog_enabled) {
+ unsigned cpu;
+
+ for_each_present_cpu(cpu) {
+ if (per_cpu(watchdog_nmi_touch, cpu) != true)
+ per_cpu(watchdog_nmi_touch, cpu) = true;
+ }
+ }
touch_softlockup_watchdog();
}
EXPORT_SYMBOL(touch_nmi_watchdog);
@@ -430,6 +437,9 @@ static int watchdog_enable(int cpu)
wake_up_process(p);
}
+ /* if any cpu succeeds, watchdog is considered enabled for the system */
+ watchdog_enabled = 1;
+
return 0;
}
@@ -452,9 +462,6 @@ static void watchdog_disable(int cpu)
per_cpu(softlockup_watchdog, cpu) = NULL;
kthread_stop(p);
}
-
- /* if any cpu succeeds, watchdog is considered enabled for the system */
- watchdog_enabled = 1;
}
static void watchdog_enable_all_cpus(void)
--
1.7.2.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] fix BUG using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog
2010-08-20 2:57 ` Don Zickus
@ 2010-08-20 3:42 ` Andrew Morton
2010-08-20 12:34 ` Don Zickus
2010-08-26 17:17 ` acpi_os_stall() and touch_nmi_watchdog() (was Re: [PATCH] fix BUG using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog) Len Brown
2010-08-20 15:02 ` [PATCH] fix BUG using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog Yong Zhang
2010-08-26 10:14 ` Maxim Levitsky
2 siblings, 2 replies; 9+ messages in thread
From: Andrew Morton @ 2010-08-20 3:42 UTC (permalink / raw)
To: Don Zickus
Cc: Frederic Weisbecker, Len Brown, Sergey Senozhatsky, Yong Zhang,
Peter Zijlstra, Ingo Molnar, linux-kernel, linux-acpi,
Andy Grover, H. Peter Anvin
On Thu, 19 Aug 2010 22:57:49 -0400 Don Zickus <dzickus@redhat.com> wrote:
> On Wed, Aug 18, 2010 at 01:01:56PM -0700, Andrew Morton wrote:
> > The surprise new requirement that touch_nmi_watchdog() be called from
> > non-preemptible code does seem to make sense IMO. It's hard to see why
> > anyone would be touching the watchdog unless he's spinning in irqs-off
> > code. Except, of course, when we have a utility function which can be
> > called from wither irqs-on or irqs-off: acpi_os_stall().
> >
> > That being said, it's not good to introduce new API requirements by
> > accident! An audit of all callers should first be performed, at least.
> >
> >
> > The surprise new requirement that touch_softlockup_watchdog() be called
> > from non-preemptible code doesn't make sense IMO. If I have a piece of
> > code in the kernel which I expect to sit in TASK_UNINTERRUPTIBLE state
> > for three minutes waiting for my egg to boil, I should be able to do
> > that and I should be able to touch the softlockup detector without
> > needing to go non-preemptible.
>
> Ok, so here is my patch that syncs the touch_*_watchdog back in line with
> the old semantics. Hopefully this will undo any harm I caused.
>
> ------------cut -->---------------------------
>
> >From b372e821c804982438db090db6b4a2f753c78091 Mon Sep 17 00:00:00 2001
> From: Don Zickus <dzickus@redhat.com>
> Date: Thu, 19 Aug 2010 22:48:26 -0400
> Subject: [PATCH] [lockup detector] sync touch_*_watchdog back to old semantics
>
> During my rewrite, the semantics of touch_nmi_watchdog and
> touch_softlockup_watchdog changed enough to break some drivers
> (mostly over preemptable regions).
>
> This change brings those touch_*_watchdog functions back in line
> to how they used to work.
>
> Signed-off-by: Don Zickus <dzickus@redhat.com>
> ---
> kernel/watchdog.c | 17 ++++++++++++-----
> 1 files changed, 12 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 613bc1f..99e35a2 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -122,7 +122,7 @@ static void __touch_watchdog(void)
>
> void touch_softlockup_watchdog(void)
> {
> - __get_cpu_var(watchdog_touch_ts) = 0;
> + __raw_get_cpu_var(watchdog_touch_ts) = 0;
> }
> EXPORT_SYMBOL(touch_softlockup_watchdog);
>
> @@ -142,7 +142,14 @@ void touch_all_softlockup_watchdogs(void)
> #ifdef CONFIG_HARDLOCKUP_DETECTOR
> void touch_nmi_watchdog(void)
> {
> - __get_cpu_var(watchdog_nmi_touch) = true;
> + if (watchdog_enabled) {
> + unsigned cpu;
> +
> + for_each_present_cpu(cpu) {
> + if (per_cpu(watchdog_nmi_touch, cpu) != true)
> + per_cpu(watchdog_nmi_touch, cpu) = true;
> + }
> + }
> touch_softlockup_watchdog();
> }
> EXPORT_SYMBOL(touch_nmi_watchdog);
> @@ -430,6 +437,9 @@ static int watchdog_enable(int cpu)
> wake_up_process(p);
> }
>
> + /* if any cpu succeeds, watchdog is considered enabled for the system */
> + watchdog_enabled = 1;
> +
> return 0;
> }
>
> @@ -452,9 +462,6 @@ static void watchdog_disable(int cpu)
> per_cpu(softlockup_watchdog, cpu) = NULL;
> kthread_stop(p);
> }
> -
> - /* if any cpu succeeds, watchdog is considered enabled for the system */
> - watchdog_enabled = 1;
> }
>
> static void watchdog_enable_all_cpus(void)
hm, the code seems a bit screwy. Maybe it was always thus.
watchdog_enabled gets set in the per-cpu function but it gets cleared
in the all-cpus function. Asymmetric.
Also afacit the action of cpu-hotunplug+cpu-hotplug will reenable the
watchdog on a CPU which was supposed to have it disabled. Perhaps you
could recheck that and make sure it all makes sense - perhaps we need a
separate state variable which is purely "current setting of
/proc/sys/kernel/nmi_watchdog" and doesn't get altered internally.
Anyway, I'll be disappearing for a few days so perhaps Frederic or hpa
can help get this all fixed/merged up?
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] fix BUG using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog
2010-08-20 3:42 ` Andrew Morton
@ 2010-08-20 12:34 ` Don Zickus
2010-08-26 17:17 ` acpi_os_stall() and touch_nmi_watchdog() (was Re: [PATCH] fix BUG using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog) Len Brown
1 sibling, 0 replies; 9+ messages in thread
From: Don Zickus @ 2010-08-20 12:34 UTC (permalink / raw)
To: Andrew Morton
Cc: Frederic Weisbecker, Len Brown, Sergey Senozhatsky, Yong Zhang,
Peter Zijlstra, Ingo Molnar, linux-kernel, linux-acpi,
Andy Grover, H. Peter Anvin
On Thu, Aug 19, 2010 at 08:42:56PM -0700, Andrew Morton wrote:
> On Thu, 19 Aug 2010 22:57:49 -0400 Don Zickus <dzickus@redhat.com> wrote:
>
> > On Wed, Aug 18, 2010 at 01:01:56PM -0700, Andrew Morton wrote:
> > @@ -430,6 +437,9 @@ static int watchdog_enable(int cpu)
> > wake_up_process(p);
> > }
> >
> > + /* if any cpu succeeds, watchdog is considered enabled for the system */
> > + watchdog_enabled = 1;
> > +
> > return 0;
> > }
> >
> > @@ -452,9 +462,6 @@ static void watchdog_disable(int cpu)
> > per_cpu(softlockup_watchdog, cpu) = NULL;
> > kthread_stop(p);
> > }
> > -
> > - /* if any cpu succeeds, watchdog is considered enabled for the system */
> > - watchdog_enabled = 1;
> > }
> >
> > static void watchdog_enable_all_cpus(void)
>
> hm, the code seems a bit screwy. Maybe it was always thus.
No, watchdog_enabled was something newly created for the lockup dectector.
>
> watchdog_enabled gets set in the per-cpu function but it gets cleared
> in the all-cpus function. Asymmetric.
Yes it is by design. I was using watchdog_enabled as a global state
variable. As soon as one cpu was enabled, I would set the bit. But only
if all the cpus disabled the watchdog would I clear the bit.
>
> Also afacit the action of cpu-hotunplug+cpu-hotplug will reenable the
> watchdog on a CPU which was supposed to have it disabled. Perhaps you
> could recheck that and make sure it all makes sense - perhaps we need a
> separate state variable which is purely "current setting of
> /proc/sys/kernel/nmi_watchdog" and doesn't get altered internally.
I wasn't tracking it on a per cpu basis. I didn't see a need to. The
watchdog should globally be on/off across the system. If a system comes
up and one of the cpus could not bring the watchdog online for some
reason, then that is a problem. If a cpu-hotunplug+cpu-hotplug fixes it,
all the better. :-)
Also, if I wanted to track it per cpu, there is a bunch of status bits in
per-cpu variables that could let the code know whether a particular cpu
watchdog is on/off for either hardlockup or softlockup.
Cheers,
Don
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] fix BUG using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog
2010-08-20 2:57 ` Don Zickus
2010-08-20 3:42 ` Andrew Morton
@ 2010-08-20 15:02 ` Yong Zhang
2010-08-26 10:14 ` Maxim Levitsky
2 siblings, 0 replies; 9+ messages in thread
From: Yong Zhang @ 2010-08-20 15:02 UTC (permalink / raw)
To: Don Zickus
Cc: Andrew Morton, Frederic Weisbecker, Len Brown, Sergey Senozhatsky,
Peter Zijlstra, Ingo Molnar, linux-kernel, linux-acpi,
Andy Grover
On Thu, Aug 19, 2010 at 10:57:49PM -0400, Don Zickus wrote:
> On Wed, Aug 18, 2010 at 01:01:56PM -0700, Andrew Morton wrote:
> > The surprise new requirement that touch_nmi_watchdog() be called from
> > non-preemptible code does seem to make sense IMO. It's hard to see why
> > anyone would be touching the watchdog unless he's spinning in irqs-off
> > code. Except, of course, when we have a utility function which can be
> > called from wither irqs-on or irqs-off: acpi_os_stall().
> >
> > That being said, it's not good to introduce new API requirements by
> > accident! An audit of all callers should first be performed, at least.
> >
> >
> > The surprise new requirement that touch_softlockup_watchdog() be called
> > from non-preemptible code doesn't make sense IMO. If I have a piece of
> > code in the kernel which I expect to sit in TASK_UNINTERRUPTIBLE state
> > for three minutes waiting for my egg to boil, I should be able to do
> > that and I should be able to touch the softlockup detector without
> > needing to go non-preemptible.
>
> Ok, so here is my patch that syncs the touch_*_watchdog back in line with
> the old semantics. Hopefully this will undo any harm I caused.
>
> ------------cut -->---------------------------
>
> >From b372e821c804982438db090db6b4a2f753c78091 Mon Sep 17 00:00:00 2001
> From: Don Zickus <dzickus@redhat.com>
> Date: Thu, 19 Aug 2010 22:48:26 -0400
> Subject: [PATCH] [lockup detector] sync touch_*_watchdog back to old semantics
>
> During my rewrite, the semantics of touch_nmi_watchdog and
> touch_softlockup_watchdog changed enough to break some drivers
> (mostly over preemptable regions).
>
> This change brings those touch_*_watchdog functions back in line
> to how they used to work.
This one looks good to me.
Thank you Don.
-Yong
>
> Signed-off-by: Don Zickus <dzickus@redhat.com>
> ---
> kernel/watchdog.c | 17 ++++++++++++-----
> 1 files changed, 12 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 613bc1f..99e35a2 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -122,7 +122,7 @@ static void __touch_watchdog(void)
>
> void touch_softlockup_watchdog(void)
> {
> - __get_cpu_var(watchdog_touch_ts) = 0;
> + __raw_get_cpu_var(watchdog_touch_ts) = 0;
> }
> EXPORT_SYMBOL(touch_softlockup_watchdog);
>
> @@ -142,7 +142,14 @@ void touch_all_softlockup_watchdogs(void)
> #ifdef CONFIG_HARDLOCKUP_DETECTOR
> void touch_nmi_watchdog(void)
> {
> - __get_cpu_var(watchdog_nmi_touch) = true;
> + if (watchdog_enabled) {
> + unsigned cpu;
> +
> + for_each_present_cpu(cpu) {
> + if (per_cpu(watchdog_nmi_touch, cpu) != true)
> + per_cpu(watchdog_nmi_touch, cpu) = true;
> + }
> + }
> touch_softlockup_watchdog();
> }
> EXPORT_SYMBOL(touch_nmi_watchdog);
> @@ -430,6 +437,9 @@ static int watchdog_enable(int cpu)
> wake_up_process(p);
> }
>
> + /* if any cpu succeeds, watchdog is considered enabled for the system */
> + watchdog_enabled = 1;
> +
> return 0;
> }
>
> @@ -452,9 +462,6 @@ static void watchdog_disable(int cpu)
> per_cpu(softlockup_watchdog, cpu) = NULL;
> kthread_stop(p);
> }
> -
> - /* if any cpu succeeds, watchdog is considered enabled for the system */
> - watchdog_enabled = 1;
> }
>
> static void watchdog_enable_all_cpus(void)
> --
> 1.7.2.1
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] fix BUG using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog
2010-08-20 2:57 ` Don Zickus
2010-08-20 3:42 ` Andrew Morton
2010-08-20 15:02 ` [PATCH] fix BUG using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog Yong Zhang
@ 2010-08-26 10:14 ` Maxim Levitsky
2010-08-26 14:40 ` Don Zickus
2 siblings, 1 reply; 9+ messages in thread
From: Maxim Levitsky @ 2010-08-26 10:14 UTC (permalink / raw)
To: Don Zickus
Cc: Andrew Morton, Frederic Weisbecker, Len Brown, Sergey Senozhatsky,
Yong Zhang, Peter Zijlstra, Ingo Molnar, linux-kernel, linux-acpi,
Andy Grover
On Thu, 2010-08-19 at 22:57 -0400, Don Zickus wrote:
> On Wed, Aug 18, 2010 at 01:01:56PM -0700, Andrew Morton wrote:
> > The surprise new requirement that touch_nmi_watchdog() be called from
> > non-preemptible code does seem to make sense IMO. It's hard to see why
> > anyone would be touching the watchdog unless he's spinning in irqs-off
> > code. Except, of course, when we have a utility function which can be
> > called from wither irqs-on or irqs-off: acpi_os_stall().
> >
> > That being said, it's not good to introduce new API requirements by
> > accident! An audit of all callers should first be performed, at least.
> >
> >
> > The surprise new requirement that touch_softlockup_watchdog() be called
> > from non-preemptible code doesn't make sense IMO. If I have a piece of
> > code in the kernel which I expect to sit in TASK_UNINTERRUPTIBLE state
> > for three minutes waiting for my egg to boil, I should be able to do
> > that and I should be able to touch the softlockup detector without
> > needing to go non-preemptible.
>
> Ok, so here is my patch that syncs the touch_*_watchdog back in line with
> the old semantics. Hopefully this will undo any harm I caused.
Was this patch forgotten?
Best regards,
Maxim Levitsky
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] fix BUG using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog
2010-08-26 10:14 ` Maxim Levitsky
@ 2010-08-26 14:40 ` Don Zickus
0 siblings, 0 replies; 9+ messages in thread
From: Don Zickus @ 2010-08-26 14:40 UTC (permalink / raw)
To: Maxim Levitsky
Cc: Andrew Morton, Frederic Weisbecker, Len Brown, Sergey Senozhatsky,
Yong Zhang, Peter Zijlstra, Ingo Molnar, linux-kernel, linux-acpi,
Andy Grover
On Thu, Aug 26, 2010 at 01:14:31PM +0300, Maxim Levitsky wrote:
> On Thu, 2010-08-19 at 22:57 -0400, Don Zickus wrote:
> > On Wed, Aug 18, 2010 at 01:01:56PM -0700, Andrew Morton wrote:
> > > The surprise new requirement that touch_nmi_watchdog() be called from
> > > non-preemptible code does seem to make sense IMO. It's hard to see why
> > > anyone would be touching the watchdog unless he's spinning in irqs-off
> > > code. Except, of course, when we have a utility function which can be
> > > called from wither irqs-on or irqs-off: acpi_os_stall().
> > >
> > > That being said, it's not good to introduce new API requirements by
> > > accident! An audit of all callers should first be performed, at least.
> > >
> > >
> > > The surprise new requirement that touch_softlockup_watchdog() be called
> > > from non-preemptible code doesn't make sense IMO. If I have a piece of
> > > code in the kernel which I expect to sit in TASK_UNINTERRUPTIBLE state
> > > for three minutes waiting for my egg to boil, I should be able to do
> > > that and I should be able to touch the softlockup detector without
> > > needing to go non-preemptible.
> >
> > Ok, so here is my patch that syncs the touch_*_watchdog back in line with
> > the old semantics. Hopefully this will undo any harm I caused.
>
> Was this patch forgotten?
Hm, apparently it was separated out by the mail server. Here it is again
with some of the headers removed I guess.
Cheers,
Don
From: Don Zickus <dzickus@redhat.com>
Date: Thu, 19 Aug 2010 22:48:26 -0400
Subject: [PATCH] [lockup detector] sync touch_*_watchdog back to old semantics
During my rewrite, the semantics of touch_nmi_watchdog and
touch_softlockup_watchdog changed enough to break some drivers
(mostly over preemptable regions).
This change brings those touch_*_watchdog functions back in line
to how they used to work.
Signed-off-by: Don Zickus <dzickus@redhat.com>
---
kernel/watchdog.c | 17 ++++++++++++-----
1 files changed, 12 insertions(+), 5 deletions(-)
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 613bc1f..99e35a2 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -122,7 +122,7 @@ static void __touch_watchdog(void)
void touch_softlockup_watchdog(void)
{
- __get_cpu_var(watchdog_touch_ts) = 0;
+ __raw_get_cpu_var(watchdog_touch_ts) = 0;
}
EXPORT_SYMBOL(touch_softlockup_watchdog);
@@ -142,7 +142,14 @@ void touch_all_softlockup_watchdogs(void)
#ifdef CONFIG_HARDLOCKUP_DETECTOR
void touch_nmi_watchdog(void)
{
- __get_cpu_var(watchdog_nmi_touch) = true;
+ if (watchdog_enabled) {
+ unsigned cpu;
+
+ for_each_present_cpu(cpu) {
+ if (per_cpu(watchdog_nmi_touch, cpu) != true)
+ per_cpu(watchdog_nmi_touch, cpu) = true;
+ }
+ }
touch_softlockup_watchdog();
}
EXPORT_SYMBOL(touch_nmi_watchdog);
@@ -430,6 +437,9 @@ static int watchdog_enable(int cpu)
wake_up_process(p);
}
+ /* if any cpu succeeds, watchdog is considered enabled for the system */
+ watchdog_enabled = 1;
+
return 0;
}
@@ -452,9 +462,6 @@ static void watchdog_disable(int cpu)
per_cpu(softlockup_watchdog, cpu) = NULL;
kthread_stop(p);
}
-
- /* if any cpu succeeds, watchdog is considered enabled for the system */
- watchdog_enabled = 1;
}
static void watchdog_enable_all_cpus(void)
--
1.7.2.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* acpi_os_stall() and touch_nmi_watchdog() (was Re: [PATCH] fix BUG using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog)
2010-08-20 3:42 ` Andrew Morton
2010-08-20 12:34 ` Don Zickus
@ 2010-08-26 17:17 ` Len Brown
1 sibling, 0 replies; 9+ messages in thread
From: Len Brown @ 2010-08-26 17:17 UTC (permalink / raw)
To: Andrew Morton
Cc: Don Zickus, Frederic Weisbecker, Len Brown, Sergey Senozhatsky,
Yong Zhang, Peter Zijlstra, Ingo Molnar, linux-kernel, linux-acpi,
Andy Grover, H. Peter Anvin
acpi_os_stall() is used in two ways.
The typical way is what triggered this e-mail thread.
It implements the AML "Stall()" operator, and is called
with interrupts enabled with durations <= 100 usec.
So one would expect it to be identical to udelay().
The exception case is when ACPICA calls it with interrupts off
and huge durations when we wrote the poweroff or sleep
register, yet we find outselves still running...
Apparently akpm added touch_nmi_watchdog() to keep the
watchdog from firing in this exception case.
Is it useful to have the watchdog running when
we are waiting for firmware to poweroff the machine?
If no, maybe we should turn it off as part of the shutdown
process rather than using yet another invocation
of touch_nmi_watchdog()?
Is calling delay() with IRQs disabled the best thing
we can do after we ask the firmware to cut power
and it takes a long time?
thanks,
Len Brown, Intel Open Source Technology Center
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2010-08-26 17:17 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1281966418.1926.1421.camel@laptop>
[not found] ` <20100816140829.GA5225@swordfish.minsk.epam.com>
[not found] ` <20100817025954.GA12366@nowhere>
[not found] ` <AANLkTikdng247Fct0ANXiitC3ja98_ee3-fUVDGraEwo@mail.gmail.com>
[not found] ` <20100817083945.GA12022@swordfish.minsk.epam.com>
[not found] ` <AANLkTik-4S_=L5FLhMfommf5Umd0xO+h4QP9uoxwrgFE@mail.gmail.com>
[not found] ` <20100817092407.GB12022@swordfish.minsk.epam.com>
[not found] ` <AANLkTikjR1nC3DYdvSw=FEmAVZ98=ju2RNq-MF1L7U3u@mail.gmail.com>
[not found] ` <20100817103948.GA5352@swordfish.minsk.epam.com>
[not found] ` <20100817131320.GX4879@redhat.com>
[not found] ` <20100818024802.GA24748@nowhere>
2010-08-18 20:01 ` [PATCH] fix BUG using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog Andrew Morton
2010-08-19 2:27 ` Don Zickus
2010-08-20 2:57 ` Don Zickus
2010-08-20 3:42 ` Andrew Morton
2010-08-20 12:34 ` Don Zickus
2010-08-26 17:17 ` acpi_os_stall() and touch_nmi_watchdog() (was Re: [PATCH] fix BUG using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog) Len Brown
2010-08-20 15:02 ` [PATCH] fix BUG using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog Yong Zhang
2010-08-26 10:14 ` Maxim Levitsky
2010-08-26 14:40 ` Don Zickus
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).