* [REGRESSION] um: rcu_sched_state detected stall on CPU 0
@ 2010-10-14 18:27 richard -rw- weinberger
2010-10-14 19:50 ` Arjan van de Ven
0 siblings, 1 reply; 8+ messages in thread
From: richard -rw- weinberger @ 2010-10-14 18:27 UTC (permalink / raw)
To: arjan; +Cc: penberg, LKML, user-mode-linux-devel
Hi Arjan!
This commit causes some problems on UML.
The kernel freezes after a few seconds until it gets some input.
e.g: When I run top it stops refreshing the process list until i press a button.
Messages like this appear:
INFO: rcu_sched_state detected stall on CPU 0 (t=7348 jiffies)
After reverting UML works fine again.
commit 78b435368fcd615e695a06012cd963a556284e00
Author: Arjan van de Ven <arjan@linux.intel.com>
Date: Mon Jul 19 10:59:42 2010 -0700
slab: use deferable timers for its periodic housekeeping
slab has a "once every 2 second" timer for its housekeeping.
As the number of logical processors is growing, its more and more
common that this 2 second timer becomes the primary wakeup source.
This patch turns this housekeeping timer into a deferable timer,
which means that the timer does not interrupt idle, but just runs
at the next event that wakes the cpu up.
The impact is that the timer likely runs a bit later, but during the
delay no code is running so there's not all that much reason for
a difference in housekeeping to occur because of this delay.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
diff --git a/mm/slab.c b/mm/slab.c
index e49f8f4..29aad44 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -861,7 +861,7 @@ static void __cpuinit start_cpu_timer(int cpu)
*/
if (keventd_up() && reap_work->work.func == NULL) {
init_reap_node(cpu);
- INIT_DELAYED_WORK(reap_work, cache_reap);
+ INIT_DELAYED_WORK_DEFERRABLE(reap_work, cache_reap);
schedule_delayed_work_on(cpu, reap_work,
__round_jiffies_relative(HZ, cpu));
}
--
Thanks,
//richard
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [REGRESSION] um: rcu_sched_state detected stall on CPU 0
2010-10-14 18:27 [REGRESSION] um: rcu_sched_state detected stall on CPU 0 richard -rw- weinberger
@ 2010-10-14 19:50 ` Arjan van de Ven
2010-10-14 20:06 ` richard -rw- weinberger
2010-10-14 23:44 ` richard -rw- weinberger
0 siblings, 2 replies; 8+ messages in thread
From: Arjan van de Ven @ 2010-10-14 19:50 UTC (permalink / raw)
To: richard -rw- weinberger; +Cc: penberg, LKML, user-mode-linux-devel
On 10/14/2010 11:27 AM, richard -rw- weinberger wrote:
> Hi Arjan!
>
> This commit causes some problems on UML.
>
that is extremely weird.
> The kernel freezes after a few seconds until it gets some input.
> e.g: When I run top it stops refreshing the process list until i press a button.
a slab timer change (to not be as critical) causing global timer
issues.... that's very obviously not a problem with this patch.
has this been seem anywhere except UML ?
> Messages like this appear:
> INFO: rcu_sched_state detected stall on CPU 0 (t=7348 jiffies)
>
> After reverting UML works fine again.
>
> commit 78b435368fcd615e695a06012cd963a556284e00
> Author: Arjan van de Ven<arjan@linux.intel.com>
> Date: Mon Jul 19 10:59:42 2010 -0700
>
> slab: use deferable timers for its periodic housekeeping
>
> slab has a "once every 2 second" timer for its housekeeping.
> As the number of logical processors is growing, its more and more
> common that this 2 second timer becomes the primary wakeup source.
>
> This patch turns this housekeeping timer into a deferable timer,
> which means that the timer does not interrupt idle, but just runs
> at the next event that wakes the cpu up.
>
> The impact is that the timer likely runs a bit later, but during the
> delay no code is running so there's not all that much reason for
> a difference in housekeeping to occur because of this delay.
>
> Signed-off-by: Arjan van de Ven<arjan@linux.intel.com>
> Signed-off-by: Pekka Enberg<penberg@cs.helsinki.fi>
>
> diff --git a/mm/slab.c b/mm/slab.c
> index e49f8f4..29aad44 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -861,7 +861,7 @@ static void __cpuinit start_cpu_timer(int cpu)
> */
> if (keventd_up()&& reap_work->work.func == NULL) {
> init_reap_node(cpu);
> - INIT_DELAYED_WORK(reap_work, cache_reap);
> + INIT_DELAYED_WORK_DEFERRABLE(reap_work, cache_reap);
> schedule_delayed_work_on(cpu, reap_work,
> __round_jiffies_relative(HZ, cpu));
> }
>
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [REGRESSION] um: rcu_sched_state detected stall on CPU 0
2010-10-14 19:50 ` Arjan van de Ven
@ 2010-10-14 20:06 ` richard -rw- weinberger
2010-10-14 23:44 ` richard -rw- weinberger
1 sibling, 0 replies; 8+ messages in thread
From: richard -rw- weinberger @ 2010-10-14 20:06 UTC (permalink / raw)
To: Arjan van de Ven; +Cc: penberg, LKML, user-mode-linux-devel, cdfrey
On Thu, Oct 14, 2010 at 9:50 PM, Arjan van de Ven <arjan@linux.intel.com> wrote:
> On 10/14/2010 11:27 AM, richard -rw- weinberger wrote:
>>
>> Hi Arjan!
>>
>> This commit causes some problems on UML.
>>
> that is extremely weird.
>>
>> The kernel freezes after a few seconds until it gets some input.
>> e.g: When I run top it stops refreshing the process list until i press a
>> button.
>
> a slab timer change (to not be as critical) causing global timer issues....
> that's very obviously not a problem with this patch.
> has this been seem anywhere except UML ?
>
So far I've seen this problem only on UML.
Chris saw it too:
http://marc.info/?l=linux-kernel&m=128622041625323&w=2
Maybe your patch triggers a general timer problem within UML?
>> Messages like this appear:
>> INFO: rcu_sched_state detected stall on CPU 0 (t=7348 jiffies)
>>
>> After reverting UML works fine again.
>>
>> commit 78b435368fcd615e695a06012cd963a556284e00
>> Author: Arjan van de Ven<arjan@linux.intel.com>
>> Date: Mon Jul 19 10:59:42 2010 -0700
>>
>> slab: use deferable timers for its periodic housekeeping
>>
>> slab has a "once every 2 second" timer for its housekeeping.
>> As the number of logical processors is growing, its more and more
>> common that this 2 second timer becomes the primary wakeup source.
>>
>> This patch turns this housekeeping timer into a deferable timer,
>> which means that the timer does not interrupt idle, but just runs
>> at the next event that wakes the cpu up.
>>
>> The impact is that the timer likely runs a bit later, but during the
>> delay no code is running so there's not all that much reason for
>> a difference in housekeeping to occur because of this delay.
>>
>> Signed-off-by: Arjan van de Ven<arjan@linux.intel.com>
>> Signed-off-by: Pekka Enberg<penberg@cs.helsinki.fi>
>>
>> diff --git a/mm/slab.c b/mm/slab.c
>> index e49f8f4..29aad44 100644
>> --- a/mm/slab.c
>> +++ b/mm/slab.c
>> @@ -861,7 +861,7 @@ static void __cpuinit start_cpu_timer(int cpu)
>> */
>> if (keventd_up()&& reap_work->work.func == NULL) {
>> init_reap_node(cpu);
>> - INIT_DELAYED_WORK(reap_work, cache_reap);
>> + INIT_DELAYED_WORK_DEFERRABLE(reap_work, cache_reap);
>> schedule_delayed_work_on(cpu, reap_work,
>> __round_jiffies_relative(HZ,
>> cpu));
>> }
>>
>>
>
>
--
Thanks,
//richard
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [REGRESSION] um: rcu_sched_state detected stall on CPU 0
2010-10-14 19:50 ` Arjan van de Ven
2010-10-14 20:06 ` richard -rw- weinberger
@ 2010-10-14 23:44 ` richard -rw- weinberger
2010-10-15 7:02 ` Pekka Enberg
1 sibling, 1 reply; 8+ messages in thread
From: richard -rw- weinberger @ 2010-10-14 23:44 UTC (permalink / raw)
To: Arjan van de Ven; +Cc: penberg, LKML, user-mode-linux-devel
On Thu, Oct 14, 2010 at 9:50 PM, Arjan van de Ven <arjan@linux.intel.com> wrote:
> On 10/14/2010 11:27 AM, richard -rw- weinberger wrote:
>>
>> Hi Arjan!
>>
>> This commit causes some problems on UML.
>>
> that is extremely weird.
>>
>> The kernel freezes after a few seconds until it gets some input.
>> e.g: When I run top it stops refreshing the process list until i press a
>> button.
>
> a slab timer change (to not be as critical) causing global timer issues....
> that's very obviously not a problem with this patch.
> has this been seem anywhere except UML ?
A small update:
It seems that CONFIG_NO_HZ is broken on UML. :-(
CONFIG_NO_HZ + CONFIG_SLAB: works
CONFIG_NO_HZ + CONFIG_SLAB + your patch: broken
CONFIG_NO_HZ + CONFIG_SLUB: broken
CONFIG_SLAB + your patch: works
CONFIG_SLAB: works
CONFIG_SLUB: works
>> Messages like this appear:
>> INFO: rcu_sched_state detected stall on CPU 0 (t=7348 jiffies)
>>
>> After reverting UML works fine again.
>>
>> commit 78b435368fcd615e695a06012cd963a556284e00
>> Author: Arjan van de Ven<arjan@linux.intel.com>
>> Date: Mon Jul 19 10:59:42 2010 -0700
>>
>> slab: use deferable timers for its periodic housekeeping
>>
>> slab has a "once every 2 second" timer for its housekeeping.
>> As the number of logical processors is growing, its more and more
>> common that this 2 second timer becomes the primary wakeup source.
>>
>> This patch turns this housekeeping timer into a deferable timer,
>> which means that the timer does not interrupt idle, but just runs
>> at the next event that wakes the cpu up.
>>
>> The impact is that the timer likely runs a bit later, but during the
>> delay no code is running so there's not all that much reason for
>> a difference in housekeeping to occur because of this delay.
>>
>> Signed-off-by: Arjan van de Ven<arjan@linux.intel.com>
>> Signed-off-by: Pekka Enberg<penberg@cs.helsinki.fi>
>>
>> diff --git a/mm/slab.c b/mm/slab.c
>> index e49f8f4..29aad44 100644
>> --- a/mm/slab.c
>> +++ b/mm/slab.c
>> @@ -861,7 +861,7 @@ static void __cpuinit start_cpu_timer(int cpu)
>> */
>> if (keventd_up()&& reap_work->work.func == NULL) {
>> init_reap_node(cpu);
>> - INIT_DELAYED_WORK(reap_work, cache_reap);
>> + INIT_DELAYED_WORK_DEFERRABLE(reap_work, cache_reap);
>> schedule_delayed_work_on(cpu, reap_work,
>> __round_jiffies_relative(HZ,
>> cpu));
>> }
>>
>>
>
>
--
Thanks,
//richard
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [REGRESSION] um: rcu_sched_state detected stall on CPU 0
2010-10-14 23:44 ` richard -rw- weinberger
@ 2010-10-15 7:02 ` Pekka Enberg
2010-10-15 7:48 ` Peter Zijlstra
0 siblings, 1 reply; 8+ messages in thread
From: Pekka Enberg @ 2010-10-15 7:02 UTC (permalink / raw)
To: richard -rw- weinberger
Cc: Arjan van de Ven, penberg, LKML, user-mode-linux-devel,
Thomas Gleixner, Peter Zijlstra, Ingo Molnar
On Fri, Oct 15, 2010 at 2:44 AM, richard -rw- weinberger
<richard.weinberger@gmail.com> wrote:
> On Thu, Oct 14, 2010 at 9:50 PM, Arjan van de Ven <arjan@linux.intel.com> wrote:
>> On 10/14/2010 11:27 AM, richard -rw- weinberger wrote:
>>>
>>> Hi Arjan!
>>>
>>> This commit causes some problems on UML.
>>>
>> that is extremely weird.
>>>
>>> The kernel freezes after a few seconds until it gets some input.
>>> e.g: When I run top it stops refreshing the process list until i press a
>>> button.
>>
>> a slab timer change (to not be as critical) causing global timer issues....
>> that's very obviously not a problem with this patch.
>> has this been seem anywhere except UML ?
>
> A small update:
> It seems that CONFIG_NO_HZ is broken on UML. :-(
>
> CONFIG_NO_HZ + CONFIG_SLAB: works
> CONFIG_NO_HZ + CONFIG_SLAB + your patch: broken
> CONFIG_NO_HZ + CONFIG_SLUB: broken
>
> CONFIG_SLAB + your patch: works
> CONFIG_SLAB: works
> CONFIG_SLUB: works
Thanks for testing! Thomas, Ingo, Peter, I'm not sure who maintains
CONFIG_NO_HZ so I CC'd you. The problem here is that Arjan's
deferrable timers patch in SLAB triggered something that looks like a
latent bug with UML and NOHZ.
Pekka
>>> Messages like this appear:
>>> INFO: rcu_sched_state detected stall on CPU 0 (t=7348 jiffies)
>>>
>>> After reverting UML works fine again.
>>>
>>> commit 78b435368fcd615e695a06012cd963a556284e00
>>> Author: Arjan van de Ven<arjan@linux.intel.com>
>>> Date: Mon Jul 19 10:59:42 2010 -0700
>>>
>>> slab: use deferable timers for its periodic housekeeping
>>>
>>> slab has a "once every 2 second" timer for its housekeeping.
>>> As the number of logical processors is growing, its more and more
>>> common that this 2 second timer becomes the primary wakeup source.
>>>
>>> This patch turns this housekeeping timer into a deferable timer,
>>> which means that the timer does not interrupt idle, but just runs
>>> at the next event that wakes the cpu up.
>>>
>>> The impact is that the timer likely runs a bit later, but during the
>>> delay no code is running so there's not all that much reason for
>>> a difference in housekeeping to occur because of this delay.
>>>
>>> Signed-off-by: Arjan van de Ven<arjan@linux.intel.com>
>>> Signed-off-by: Pekka Enberg<penberg@cs.helsinki.fi>
>>>
>>> diff --git a/mm/slab.c b/mm/slab.c
>>> index e49f8f4..29aad44 100644
>>> --- a/mm/slab.c
>>> +++ b/mm/slab.c
>>> @@ -861,7 +861,7 @@ static void __cpuinit start_cpu_timer(int cpu)
>>> */
>>> if (keventd_up()&& reap_work->work.func == NULL) {
>>> init_reap_node(cpu);
>>> - INIT_DELAYED_WORK(reap_work, cache_reap);
>>> + INIT_DELAYED_WORK_DEFERRABLE(reap_work, cache_reap);
>>> schedule_delayed_work_on(cpu, reap_work,
>>> __round_jiffies_relative(HZ,
>>> cpu));
>>> }
>>>
>>>
>>
>>
>
>
>
> --
> Thanks,
> //richard
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [REGRESSION] um: rcu_sched_state detected stall on CPU 0
2010-10-15 7:02 ` Pekka Enberg
@ 2010-10-15 7:48 ` Peter Zijlstra
2010-10-15 9:24 ` richard -rw- weinberger
2010-10-16 15:27 ` richard -rw- weinberger
0 siblings, 2 replies; 8+ messages in thread
From: Peter Zijlstra @ 2010-10-15 7:48 UTC (permalink / raw)
To: Pekka Enberg
Cc: richard -rw- weinberger, Arjan van de Ven, penberg, LKML,
user-mode-linux-devel, Thomas Gleixner, Ingo Molnar, Jeff Dike
On Fri, 2010-10-15 at 10:02 +0300, Pekka Enberg wrote:
> On Fri, Oct 15, 2010 at 2:44 AM, richard -rw- weinberger
> <richard.weinberger@gmail.com> wrote:
> > On Thu, Oct 14, 2010 at 9:50 PM, Arjan van de Ven
> <arjan@linux.intel.com> wrote:
> >> On 10/14/2010 11:27 AM, richard -rw- weinberger wrote:
> >>>
> >>> Hi Arjan!
> >>>
> >>> This commit causes some problems on UML.
> >>>
> >> that is extremely weird.
> >>>
> >>> The kernel freezes after a few seconds until it gets some input.
> >>> e.g: When I run top it stops refreshing the process list until i
> press a
> >>> button.
> >>
> >> a slab timer change (to not be as critical) causing global timer
> issues....
> >> that's very obviously not a problem with this patch.
> >> has this been seem anywhere except UML ?
> >
> > A small update:
> > It seems that CONFIG_NO_HZ is broken on UML. :-(
> >
> > CONFIG_NO_HZ + CONFIG_SLAB: works
> > CONFIG_NO_HZ + CONFIG_SLAB + your patch: broken
> > CONFIG_NO_HZ + CONFIG_SLUB: broken
> >
> > CONFIG_SLAB + your patch: works
> > CONFIG_SLAB: works
> > CONFIG_SLUB: works
>
> Thanks for testing! Thomas, Ingo, Peter, I'm not sure who maintains
> CONFIG_NO_HZ so I CC'd you. The problem here is that Arjan's
> deferrable timers patch in SLAB triggered something that looks like a
> latent bug with UML and NOHZ.
Thomas does mostly, but if its UML specific, I guess its Jeff Dike
you'll be wanting to talk to, since he's the arch maintainer.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [REGRESSION] um: rcu_sched_state detected stall on CPU 0
2010-10-15 7:48 ` Peter Zijlstra
@ 2010-10-15 9:24 ` richard -rw- weinberger
2010-10-16 15:27 ` richard -rw- weinberger
1 sibling, 0 replies; 8+ messages in thread
From: richard -rw- weinberger @ 2010-10-15 9:24 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Pekka Enberg, Arjan van de Ven, penberg, LKML,
user-mode-linux-devel, Thomas Gleixner, Ingo Molnar, Jeff Dike
On Fri, Oct 15, 2010 at 9:48 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Fri, 2010-10-15 at 10:02 +0300, Pekka Enberg wrote:
>> On Fri, Oct 15, 2010 at 2:44 AM, richard -rw- weinberger
>> <richard.weinberger@gmail.com> wrote:
>> > On Thu, Oct 14, 2010 at 9:50 PM, Arjan van de Ven
>> <arjan@linux.intel.com> wrote:
>> >> On 10/14/2010 11:27 AM, richard -rw- weinberger wrote:
>> >>>
>> >>> Hi Arjan!
>> >>>
>> >>> This commit causes some problems on UML.
>> >>>
>> >> that is extremely weird.
>> >>>
>> >>> The kernel freezes after a few seconds until it gets some input.
>> >>> e.g: When I run top it stops refreshing the process list until i
>> press a
>> >>> button.
>> >>
>> >> a slab timer change (to not be as critical) causing global timer
>> issues....
>> >> that's very obviously not a problem with this patch.
>> >> has this been seem anywhere except UML ?
>> >
>> > A small update:
>> > It seems that CONFIG_NO_HZ is broken on UML. :-(
>> >
>> > CONFIG_NO_HZ + CONFIG_SLAB: works
>> > CONFIG_NO_HZ + CONFIG_SLAB + your patch: broken
>> > CONFIG_NO_HZ + CONFIG_SLUB: broken
>> >
>> > CONFIG_SLAB + your patch: works
>> > CONFIG_SLAB: works
>> > CONFIG_SLUB: works
>>
>> Thanks for testing! Thomas, Ingo, Peter, I'm not sure who maintains
>> CONFIG_NO_HZ so I CC'd you. The problem here is that Arjan's
>> deferrable timers patch in SLAB triggered something that looks like a
>> latent bug with UML and NOHZ.
>
> Thomas does mostly, but if its UML specific, I guess its Jeff Dike
> you'll be wanting to talk to, since he's the arch maintainer.
>
Jeff is the UML maintainer only in theory.
He seems to be very busy and hasn't touched UML since 2008.
Very sad...
--
Thanks,
//richard
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [REGRESSION] um: rcu_sched_state detected stall on CPU 0
2010-10-15 7:48 ` Peter Zijlstra
2010-10-15 9:24 ` richard -rw- weinberger
@ 2010-10-16 15:27 ` richard -rw- weinberger
1 sibling, 0 replies; 8+ messages in thread
From: richard -rw- weinberger @ 2010-10-16 15:27 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Pekka Enberg, Arjan van de Ven, penberg, LKML,
user-mode-linux-devel, Thomas Gleixner, Ingo Molnar, Jeff Dike
On Fri, Oct 15, 2010 at 9:48 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Fri, 2010-10-15 at 10:02 +0300, Pekka Enberg wrote:
>> On Fri, Oct 15, 2010 at 2:44 AM, richard -rw- weinberger
>> <richard.weinberger@gmail.com> wrote:
>> > On Thu, Oct 14, 2010 at 9:50 PM, Arjan van de Ven
>> <arjan@linux.intel.com> wrote:
>> >> On 10/14/2010 11:27 AM, richard -rw- weinberger wrote:
>> >>>
>> >>> Hi Arjan!
>> >>>
>> >>> This commit causes some problems on UML.
>> >>>
>> >> that is extremely weird.
>> >>>
>> >>> The kernel freezes after a few seconds until it gets some input.
>> >>> e.g: When I run top it stops refreshing the process list until i
>> press a
>> >>> button.
>> >>
>> >> a slab timer change (to not be as critical) causing global timer
>> issues....
>> >> that's very obviously not a problem with this patch.
>> >> has this been seem anywhere except UML ?
>> >
>> > A small update:
>> > It seems that CONFIG_NO_HZ is broken on UML. :-(
>> >
>> > CONFIG_NO_HZ + CONFIG_SLAB: works
>> > CONFIG_NO_HZ + CONFIG_SLAB + your patch: broken
>> > CONFIG_NO_HZ + CONFIG_SLUB: broken
>> >
>> > CONFIG_SLAB + your patch: works
>> > CONFIG_SLAB: works
>> > CONFIG_SLUB: works
>>
>> Thanks for testing! Thomas, Ingo, Peter, I'm not sure who maintains
>> CONFIG_NO_HZ so I CC'd you. The problem here is that Arjan's
>> deferrable timers patch in SLAB triggered something that looks like a
>> latent bug with UML and NOHZ.
>
> Thomas does mostly, but if its UML specific, I guess its Jeff Dike
> you'll be wanting to talk to, since he's the arch maintainer.
After reviewing the code for hours I've found the bug.
It's a int/long long issue within arch/um/os-Linux/time.c.
A patch is on the way!
--
Thanks,
//richard
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2010-10-16 15:27 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-10-14 18:27 [REGRESSION] um: rcu_sched_state detected stall on CPU 0 richard -rw- weinberger
2010-10-14 19:50 ` Arjan van de Ven
2010-10-14 20:06 ` richard -rw- weinberger
2010-10-14 23:44 ` richard -rw- weinberger
2010-10-15 7:02 ` Pekka Enberg
2010-10-15 7:48 ` Peter Zijlstra
2010-10-15 9:24 ` richard -rw- weinberger
2010-10-16 15:27 ` richard -rw- weinberger
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox