* preempt rcu bug on s390 @ 2008-02-09 11:34 Heiko Carstens 2008-02-09 14:07 ` Paul E. McKenney 0 siblings, 1 reply; 7+ messages in thread From: Heiko Carstens @ 2008-02-09 11:34 UTC (permalink / raw) To: Paul E. McKenney Cc: Gautham R Shenoy, Dipankar Sarma, Steven Rostedt, Ingo Molnar, Martin Schwidefsky, linux-kernel Using CONFIG_PREEMPT_RCU and CONFIG_NO_IDLE_HZ on s390 my system always gets stuck when running with more than one cpu. When booting with four cpus I get all four cpus caught withing cpu_idle and not advancing anymore. However there is the init process which is waitung for synchronize_rcu() to complete (lcrash output): STACK TRACE FOR TASK: 0xf84d968 (swapper) STACK: 0 schedule+842 [0x36c956] 1 schedule_timeout+172 [0x36d0e4] 2 wait_for_common+204 [0x36c398] 3 synchronize_rcu+76 [0x567bc] 4 netlink_change_ngroups+150 [0x2b4302] 5 genl_register_mc_group+256 [0x2b6174] 6 genl_init+188 [0x534e44] 7 kernel_init+444 [0x518334] 8 kernel_thread_starter+6 [0x192a6] If I change the code so that timer ticks won't be disabled everything runs fine. So my guess is that rcu_needs_cpu() doesn't do the right thing for the rcu preemptible case. Kernel version is git head of today. Any ideas? ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: preempt rcu bug on s390 2008-02-09 11:34 preempt rcu bug on s390 Heiko Carstens @ 2008-02-09 14:07 ` Paul E. McKenney 2008-02-09 17:14 ` Heiko Carstens 0 siblings, 1 reply; 7+ messages in thread From: Paul E. McKenney @ 2008-02-09 14:07 UTC (permalink / raw) To: Heiko Carstens Cc: Gautham R Shenoy, Dipankar Sarma, Steven Rostedt, Ingo Molnar, Martin Schwidefsky, linux-kernel On Sat, Feb 09, 2008 at 12:34:35PM +0100, Heiko Carstens wrote: > Using CONFIG_PREEMPT_RCU and CONFIG_NO_IDLE_HZ on s390 my system always > gets stuck when running with more than one cpu. > When booting with four cpus I get all four cpus caught withing cpu_idle > and not advancing anymore. However there is the init process which is > waitung for synchronize_rcu() to complete (lcrash output): > > STACK TRACE FOR TASK: 0xf84d968 (swapper) > > STACK: > 0 schedule+842 [0x36c956] > 1 schedule_timeout+172 [0x36d0e4] > 2 wait_for_common+204 [0x36c398] > 3 synchronize_rcu+76 [0x567bc] > 4 netlink_change_ngroups+150 [0x2b4302] > 5 genl_register_mc_group+256 [0x2b6174] > 6 genl_init+188 [0x534e44] > 7 kernel_init+444 [0x518334] > 8 kernel_thread_starter+6 [0x192a6] > > If I change the code so that timer ticks won't be disabled everything > runs fine. So my guess is that rcu_needs_cpu() doesn't do the right > thing for the rcu preemptible case. > > Kernel version is git head of today. > > Any ideas? Does this tree have http://lkml.org/lkml/2008/1/29/208 applied? If not, could you please check it out? Thanx, Paul ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: preempt rcu bug on s390 2008-02-09 14:07 ` Paul E. McKenney @ 2008-02-09 17:14 ` Heiko Carstens 2008-02-09 22:02 ` Paul E. McKenney 0 siblings, 1 reply; 7+ messages in thread From: Heiko Carstens @ 2008-02-09 17:14 UTC (permalink / raw) To: Paul E. McKenney Cc: Gautham R Shenoy, Dipankar Sarma, Steven Rostedt, Ingo Molnar, Martin Schwidefsky, linux-kernel On Sat, Feb 09, 2008 at 06:07:11AM -0800, Paul E. McKenney wrote: > On Sat, Feb 09, 2008 at 12:34:35PM +0100, Heiko Carstens wrote: > > Using CONFIG_PREEMPT_RCU and CONFIG_NO_IDLE_HZ on s390 my system always > > gets stuck when running with more than one cpu. > > When booting with four cpus I get all four cpus caught withing cpu_idle > > and not advancing anymore. However there is the init process which is > > waitung for synchronize_rcu() to complete (lcrash output): > > > > STACK TRACE FOR TASK: 0xf84d968 (swapper) > > > > STACK: > > 0 schedule+842 [0x36c956] > > 1 schedule_timeout+172 [0x36d0e4] > > 2 wait_for_common+204 [0x36c398] > > 3 synchronize_rcu+76 [0x567bc] > > 4 netlink_change_ngroups+150 [0x2b4302] > > 5 genl_register_mc_group+256 [0x2b6174] > > 6 genl_init+188 [0x534e44] > > 7 kernel_init+444 [0x518334] > > 8 kernel_thread_starter+6 [0x192a6] > > > > If I change the code so that timer ticks won't be disabled everything > > runs fine. So my guess is that rcu_needs_cpu() doesn't do the right > > thing for the rcu preemptible case. > > > > Kernel version is git head of today. > > > > Any ideas? > > Does this tree have http://lkml.org/lkml/2008/1/29/208 applied? > > If not, could you please check it out? It's not applied, however it doesn't change anything. Also the patch is tied to the dynticks implementation which is differently from s390's nohz implementation. I had to add the patch below so it would make at least some sense. But it doesn't fix the problem. --- arch/s390/kernel/time.c | 2 ++ include/linux/hardirq.h | 2 +- kernel/rcupreempt.c | 2 +- 3 files changed, 4 insertions(+), 2 deletions(-) Index: linux-2.6/kernel/rcupreempt.c =================================================================== --- linux-2.6.orig/kernel/rcupreempt.c +++ linux-2.6/kernel/rcupreempt.c @@ -413,7 +413,7 @@ static void __rcu_advance_callbacks(stru } } -#ifdef CONFIG_NO_HZ +#if defined(CONFIG_NO_HZ) || defined(CONFIG_NO_IDLE_HZ) DEFINE_PER_CPU(long, dynticks_progress_counter) = 1; static DEFINE_PER_CPU(long, rcu_dyntick_snapshot); Index: linux-2.6/arch/s390/kernel/time.c =================================================================== --- linux-2.6.orig/arch/s390/kernel/time.c +++ linux-2.6/arch/s390/kernel/time.c @@ -200,6 +200,7 @@ static void stop_hz_timer(void) if (timer >= jiffies_timer_cc) todval = timer; } + rcu_enter_nohz(); set_clock_comparator(todval); } @@ -213,6 +214,7 @@ static void start_hz_timer(void) if (!cpu_isset(smp_processor_id(), nohz_cpu_mask)) return; + rcu_exit_nohz(); account_ticks(get_clock()); set_clock_comparator(S390_lowcore.jiffy_timer + CPU_DEVIATION); cpu_clear(smp_processor_id(), nohz_cpu_mask); Index: linux-2.6/include/linux/hardirq.h =================================================================== --- linux-2.6.orig/include/linux/hardirq.h +++ linux-2.6/include/linux/hardirq.h @@ -109,7 +109,7 @@ static inline void account_system_vtime( } #endif -#if defined(CONFIG_PREEMPT_RCU) && defined(CONFIG_NO_HZ) +#if defined(CONFIG_PREEMPT_RCU) && (defined(CONFIG_NO_HZ) || defined(CONFIG_NO_IDLE_HZ)) extern void rcu_irq_enter(void); extern void rcu_irq_exit(void); #else ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: preempt rcu bug on s390 2008-02-09 17:14 ` Heiko Carstens @ 2008-02-09 22:02 ` Paul E. McKenney 2008-02-10 13:01 ` Heiko Carstens 0 siblings, 1 reply; 7+ messages in thread From: Paul E. McKenney @ 2008-02-09 22:02 UTC (permalink / raw) To: Heiko Carstens Cc: Gautham R Shenoy, Dipankar Sarma, Steven Rostedt, Ingo Molnar, Martin Schwidefsky, linux-kernel On Sat, Feb 09, 2008 at 06:14:51PM +0100, Heiko Carstens wrote: > On Sat, Feb 09, 2008 at 06:07:11AM -0800, Paul E. McKenney wrote: > > On Sat, Feb 09, 2008 at 12:34:35PM +0100, Heiko Carstens wrote: > > > Using CONFIG_PREEMPT_RCU and CONFIG_NO_IDLE_HZ on s390 my system always > > > gets stuck when running with more than one cpu. > > > When booting with four cpus I get all four cpus caught withing cpu_idle > > > and not advancing anymore. However there is the init process which is > > > waitung for synchronize_rcu() to complete (lcrash output): > > > > > > STACK TRACE FOR TASK: 0xf84d968 (swapper) > > > > > > STACK: > > > 0 schedule+842 [0x36c956] > > > 1 schedule_timeout+172 [0x36d0e4] > > > 2 wait_for_common+204 [0x36c398] > > > 3 synchronize_rcu+76 [0x567bc] > > > 4 netlink_change_ngroups+150 [0x2b4302] > > > 5 genl_register_mc_group+256 [0x2b6174] > > > 6 genl_init+188 [0x534e44] > > > 7 kernel_init+444 [0x518334] > > > 8 kernel_thread_starter+6 [0x192a6] > > > > > > If I change the code so that timer ticks won't be disabled everything > > > runs fine. So my guess is that rcu_needs_cpu() doesn't do the right > > > thing for the rcu preemptible case. > > > > > > Kernel version is git head of today. > > > > > > Any ideas? > > > > Does this tree have http://lkml.org/lkml/2008/1/29/208 applied? > > > > If not, could you please check it out? > > It's not applied, however it doesn't change anything. Also the patch > is tied to the dynticks implementation which is differently from > s390's nohz implementation. > I had to add the patch below so it would make at least some sense. > But it doesn't fix the problem. OK, I was afraid of that. ;-) Does s390 start out in nohz mode? The reason I ask is that it feels like an off-by-one error for the dynticks_progress_counter. Thanx, Paul > --- > arch/s390/kernel/time.c | 2 ++ > include/linux/hardirq.h | 2 +- > kernel/rcupreempt.c | 2 +- > 3 files changed, 4 insertions(+), 2 deletions(-) > > Index: linux-2.6/kernel/rcupreempt.c > =================================================================== > --- linux-2.6.orig/kernel/rcupreempt.c > +++ linux-2.6/kernel/rcupreempt.c > @@ -413,7 +413,7 @@ static void __rcu_advance_callbacks(stru > } > } > > -#ifdef CONFIG_NO_HZ > +#if defined(CONFIG_NO_HZ) || defined(CONFIG_NO_IDLE_HZ) > > DEFINE_PER_CPU(long, dynticks_progress_counter) = 1; > static DEFINE_PER_CPU(long, rcu_dyntick_snapshot); > Index: linux-2.6/arch/s390/kernel/time.c > =================================================================== > --- linux-2.6.orig/arch/s390/kernel/time.c > +++ linux-2.6/arch/s390/kernel/time.c > @@ -200,6 +200,7 @@ static void stop_hz_timer(void) > if (timer >= jiffies_timer_cc) > todval = timer; > } > + rcu_enter_nohz(); > set_clock_comparator(todval); > } > > @@ -213,6 +214,7 @@ static void start_hz_timer(void) > > if (!cpu_isset(smp_processor_id(), nohz_cpu_mask)) > return; > + rcu_exit_nohz(); > account_ticks(get_clock()); > set_clock_comparator(S390_lowcore.jiffy_timer + CPU_DEVIATION); > cpu_clear(smp_processor_id(), nohz_cpu_mask); > Index: linux-2.6/include/linux/hardirq.h > =================================================================== > --- linux-2.6.orig/include/linux/hardirq.h > +++ linux-2.6/include/linux/hardirq.h > @@ -109,7 +109,7 @@ static inline void account_system_vtime( > } > #endif > > -#if defined(CONFIG_PREEMPT_RCU) && defined(CONFIG_NO_HZ) > +#if defined(CONFIG_PREEMPT_RCU) && (defined(CONFIG_NO_HZ) || defined(CONFIG_NO_IDLE_HZ)) > extern void rcu_irq_enter(void); > extern void rcu_irq_exit(void); > #else ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: preempt rcu bug on s390 2008-02-09 22:02 ` Paul E. McKenney @ 2008-02-10 13:01 ` Heiko Carstens 2008-02-10 17:43 ` Paul E. McKenney 2008-02-11 15:37 ` Steven Rostedt 0 siblings, 2 replies; 7+ messages in thread From: Heiko Carstens @ 2008-02-10 13:01 UTC (permalink / raw) To: Paul E. McKenney Cc: Gautham R Shenoy, Dipankar Sarma, Steven Rostedt, Ingo Molnar, Martin Schwidefsky, linux-kernel > > > > Using CONFIG_PREEMPT_RCU and CONFIG_NO_IDLE_HZ on s390 my system always > > > > gets stuck when running with more than one cpu. > > > > When booting with four cpus I get all four cpus caught withing cpu_idle > > > > and not advancing anymore. However there is the init process which is > > > > waitung for synchronize_rcu() to complete (lcrash output): > > > > > > > > If I change the code so that timer ticks won't be disabled everything > > > > runs fine. So my guess is that rcu_needs_cpu() doesn't do the right > > > > thing for the rcu preemptible case. > > > > > > > > Kernel version is git head of today. > > > > > > > > Any ideas? > > > > > > Does this tree have http://lkml.org/lkml/2008/1/29/208 applied? > > > > > > If not, could you please check it out? > > > > It's not applied, however it doesn't change anything. Also the patch > > is tied to the dynticks implementation which is differently from > > s390's nohz implementation. > > I had to add the patch below so it would make at least some sense. > > But it doesn't fix the problem. > > OK, I was afraid of that. ;-) > > Does s390 start out in nohz mode? The reason I ask is that it feels like > an off-by-one error for the dynticks_progress_counter. Actually I forgot to add a few ifdefs to make the code do something :) That just reveals that we have a conflict with the dynticks implementation and s390's nohz that shows up in what rcu_irq_enter/exit assume. I didn't patch s390 and common code so it will work, but I think the patch you mentionened will fix the problem I reported. So I guess we should either convert s390 to use the generic dynticks implementation or disable preemptible rcu on s390 until we converted our code. Thanks for helping debugging this! ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: preempt rcu bug on s390 2008-02-10 13:01 ` Heiko Carstens @ 2008-02-10 17:43 ` Paul E. McKenney 2008-02-11 15:37 ` Steven Rostedt 1 sibling, 0 replies; 7+ messages in thread From: Paul E. McKenney @ 2008-02-10 17:43 UTC (permalink / raw) To: Heiko Carstens Cc: Gautham R Shenoy, Dipankar Sarma, Steven Rostedt, Ingo Molnar, Martin Schwidefsky, linux-kernel On Sun, Feb 10, 2008 at 02:01:50PM +0100, Heiko Carstens wrote: > > > > > Using CONFIG_PREEMPT_RCU and CONFIG_NO_IDLE_HZ on s390 my system always > > > > > gets stuck when running with more than one cpu. > > > > > When booting with four cpus I get all four cpus caught withing cpu_idle > > > > > and not advancing anymore. However there is the init process which is > > > > > waitung for synchronize_rcu() to complete (lcrash output): > > > > > > > > > > If I change the code so that timer ticks won't be disabled everything > > > > > runs fine. So my guess is that rcu_needs_cpu() doesn't do the right > > > > > thing for the rcu preemptible case. > > > > > > > > > > Kernel version is git head of today. > > > > > > > > > > Any ideas? > > > > > > > > Does this tree have http://lkml.org/lkml/2008/1/29/208 applied? > > > > > > > > If not, could you please check it out? > > > > > > It's not applied, however it doesn't change anything. Also the patch > > > is tied to the dynticks implementation which is differently from > > > s390's nohz implementation. > > > I had to add the patch below so it would make at least some sense. > > > But it doesn't fix the problem. > > > > OK, I was afraid of that. ;-) > > > > Does s390 start out in nohz mode? The reason I ask is that it feels like > > an off-by-one error for the dynticks_progress_counter. > > Actually I forgot to add a few ifdefs to make the code do something :) > That just reveals that we have a conflict with the dynticks implementation > and s390's nohz that shows up in what rcu_irq_enter/exit assume. > I didn't patch s390 and common code so it will work, but I think the > patch you mentionened will fix the problem I reported. > So I guess we should either convert s390 to use the generic dynticks > implementation or disable preemptible rcu on s390 until we converted > our code. Sounds good to me!!! (Especially converting s390 to generic algorithm.) I believe that the generic implementation will do what you need, but I am sure you will let me know of any problems that arise. > Thanks for helping debugging this! Thank you for tracking it down! Thanx, Paul ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: preempt rcu bug on s390 2008-02-10 13:01 ` Heiko Carstens 2008-02-10 17:43 ` Paul E. McKenney @ 2008-02-11 15:37 ` Steven Rostedt 1 sibling, 0 replies; 7+ messages in thread From: Steven Rostedt @ 2008-02-11 15:37 UTC (permalink / raw) To: Heiko Carstens Cc: Paul E. McKenney, Gautham R Shenoy, Dipankar Sarma, Ingo Molnar, Martin Schwidefsky, linux-kernel Heiko Carstens wrote: >> Does s390 start out in nohz mode? The reason I ask is that it feels like >> an off-by-one error for the dynticks_progress_counter. > > Actually I forgot to add a few ifdefs to make the code do something :) > That just reveals that we have a conflict with the dynticks implementation > and s390's nohz that shows up in what rcu_irq_enter/exit assume. > I didn't patch s390 and common code so it will work, but I think the > patch you mentionened will fix the problem I reported. > So I guess we should either convert s390 to use the generic dynticks > implementation or disable preemptible rcu on s390 until we converted > our code. > > Thanks for helping debugging this! Heiko, thanks for reporting this. This patch still didn't make it into -rc1, and it really should. Because without this patch, PREEMPT_RCU and NO_HZ together is broken, on all boxes. The patch is in Ingo's sched-devel git tree, as 9460545f81ea48b07dbb20456a8ede776d8ebc1b (last I checked) and titled: rcu: add support for dynamic ticks and preempt rcu -- Steve ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2008-02-11 15:39 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-02-09 11:34 preempt rcu bug on s390 Heiko Carstens 2008-02-09 14:07 ` Paul E. McKenney 2008-02-09 17:14 ` Heiko Carstens 2008-02-09 22:02 ` Paul E. McKenney 2008-02-10 13:01 ` Heiko Carstens 2008-02-10 17:43 ` Paul E. McKenney 2008-02-11 15:37 ` Steven Rostedt
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox