public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* BUG: tracer_alloc_buffers returned with preemption imbalance
@ 2012-06-23 10:12 Fengguang Wu
  2012-06-23 17:01 ` Paul E. McKenney
  0 siblings, 1 reply; 8+ messages in thread
From: Fengguang Wu @ 2012-06-23 10:12 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: linux-kernel@vger.kernel.org

Hi Paul,

I got some boot hangs in the below branches and linux-next:

        rcu/dev.2012.06.21c-4f4a59a
        rcu/fixes.2012.06.20a-b43145c
        rcu/rcu.next-3f26dd3

They all start with an error

[    0.131005] initcall tracer_alloc_buffers+0x0/0x22e returned with preemption imbalance

Then followed by lots of scheduling while atomic bugs:

[    0.143206] Brought up 2 CPUs
[    0.144996] ----------------
[    0.145989] | NMI testsuite:
[    0.147988] --------------------
[    0.149238]   remote IPI:
[    0.149239] BUG: scheduling while atomic: swapper/0/1/0x00000002
[    0.151082] no locks held by swapper/0/1.
[    0.152991] Modules linked in:
[    0.153992] Pid: 1, comm: swapper/0 Not tainted 3.5.0-rc3+ #24
[    0.155990] Call Trace:
[    0.157056]  [<ffffffff81a21ee1>] __schedule_bug+0x67/0x75
[    0.158999]  [<ffffffff81a3717c>] __schedule+0xb0/0x8b0
[    0.160995]  [<ffffffff810b8fe2>] ? save_trace+0x3d/0xae
[    0.162993]  [<ffffffff8105ad19>] ? pvclock_clocksource_read+0x4d/0xc3
[    0.164992]  [<ffffffff810bbdff>] ? mark_lock+0x23c/0x502
[    0.166993]  [<ffffffff8105a10b>] ? kvm_clock_read+0x2e/0x36
[    0.168991]  [<ffffffff81a37c61>] schedule+0x65/0x67
[    0.170992]  [<ffffffff81a35542>] schedule_timeout+0x2e/0x22a
[    0.172991]  [<ffffffff810b8c1c>] ? trace_hardirqs_off+0xd/0xf
[    0.174992]  [<ffffffff8109c890>] ? local_clock+0x41/0x5a
[    0.176995]  [<ffffffff81a392f0>] ? _raw_spin_unlock_irq+0x2e/0x48
[    0.178990]  [<ffffffff810bd647>] ? trace_hardirqs_on_caller+0x125/0x181
[    0.180995]  [<ffffffff810bd6b0>] ? trace_hardirqs_on+0xd/0xf
[    0.182989]  [<ffffffff81a37b06>] wait_for_common+0x10a/0x140
[    0.184989]  [<ffffffff8109a2f2>] ? try_to_wake_up+0x2cb/0x2cb
[    0.186990]  [<ffffffff810ea4c6>] ? call_rcu_bh+0x19/0x19
[    0.188988]  [<ffffffff81a37bfa>] wait_for_completion+0x1d/0x1f
[    0.190989]  [<ffffffff8108752d>] wait_rcu_gp+0x4d/0x54
[    0.192988]  [<ffffffff81087534>] ? wait_rcu_gp+0x54/0x54
[    0.194987]  [<ffffffff81a37a40>] ? wait_for_common+0x44/0x140
[    0.196987]  [<ffffffff810e8713>] synchronize_sched+0x50/0x52
[    0.198987]  [<ffffffff8103b272>] unregister_nmi_handler+0xc6/0xd5
[    0.200989]  [<ffffffff8142fa79>] ? delay_tsc+0x49/0xa5
[    0.202988]  [<ffffffff82259db7>] test_nmi_ipi.constprop.2+0x72/0x85
[    0.204986]  [<ffffffff82259dca>] ? test_nmi_ipi.constprop.2+0x85/0x85
[    0.206986]  [<ffffffff82259e27>] remote_ipi+0x5d/0x62
[    0.208985]  [<ffffffff82259ce2>] dotest.constprop.1+0x6/0x69
[    0.210985]  [<ffffffff82259ede>] nmi_selftest+0x60/0x16d
[    0.212986]  [<ffffffff822558da>] native_smp_cpus_done+0x1c/0x111
[    0.214989]  [<ffffffff8226198b>] smp_init+0xe2/0xeb
[    0.216986]  [<ffffffff82248c0f>] kernel_init+0xa9/0x1c0
[    0.218984]  [<ffffffff810bd538>] ? trace_hardirqs_on_caller+0x16/0x181
[    0.220989]  [<ffffffff81a417b4>] kernel_thread_helper+0x4/0x10
[    0.222989]  [<ffffffff81a396f0>] ? retint_restore_args+0x13/0x13
[    0.224990]  [<ffffffff82248b66>] ? start_kernel+0x3a3/0x3a3
[    0.226983]  [<ffffffff81a417b0>] ? gs_change+0x13/0x13
[    0.231002]   ok  |
[    0.231998]    local IPI:
[    0.231998] BUG: scheduling while atomic: swapper/0/1/0x00000003
...

The first bad commit is

commit b43145c912b7dabd190ec45cc31c2b96d64715ee
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date:   Tue Jun 19 11:58:27 2012 -0700

    rcu: Disable preemption in rcu_blocking_is_gp()
    
    It is time to optimize CONFIG_TREE_PREEMPT_RCU's synchronize_rcu()
    for uniprocessor optimization, which means that rcu_blocking_is_gp()
    can no longer rely on RCU read-side critical sections having disabled
    preemption.  This commit therefore disables preemption across
    rcu_blocking_is_gp()'s scan of the cpu_online_mask.
    
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index ce175b6..c0cc41f 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
...
 static inline int rcu_blocking_is_gp(void)
 {
        might_sleep();  /* Check for RCU read-side critical section. */
+       preempt_disable();
        return num_online_cpus() <= 1;
+       preempt_enable();
 }

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: BUG: tracer_alloc_buffers returned with preemption imbalance
  2012-06-23 10:12 BUG: tracer_alloc_buffers returned with preemption imbalance Fengguang Wu
@ 2012-06-23 17:01 ` Paul E. McKenney
  2012-06-23 17:48   ` Fengguang Wu
  0 siblings, 1 reply; 8+ messages in thread
From: Paul E. McKenney @ 2012-06-23 17:01 UTC (permalink / raw)
  To: Fengguang Wu; +Cc: linux-kernel@vger.kernel.org

On Sat, Jun 23, 2012 at 06:12:51PM +0800, Fengguang Wu wrote:
> Hi Paul,
> 
> I got some boot hangs in the below branches and linux-next:
> 
>         rcu/dev.2012.06.21c-4f4a59a
>         rcu/fixes.2012.06.20a-b43145c
>         rcu/rcu.next-3f26dd3
> 
> They all start with an error
> 
> [    0.131005] initcall tracer_alloc_buffers+0x0/0x22e returned with preemption imbalance
> 
> Then followed by lots of scheduling while atomic bugs:
> 
> [    0.143206] Brought up 2 CPUs
> [    0.144996] ----------------
> [    0.145989] | NMI testsuite:
> [    0.147988] --------------------
> [    0.149238]   remote IPI:
> [    0.149239] BUG: scheduling while atomic: swapper/0/1/0x00000002
> [    0.151082] no locks held by swapper/0/1.
> [    0.152991] Modules linked in:
> [    0.153992] Pid: 1, comm: swapper/0 Not tainted 3.5.0-rc3+ #24
> [    0.155990] Call Trace:
> [    0.157056]  [<ffffffff81a21ee1>] __schedule_bug+0x67/0x75
> [    0.158999]  [<ffffffff81a3717c>] __schedule+0xb0/0x8b0
> [    0.160995]  [<ffffffff810b8fe2>] ? save_trace+0x3d/0xae
> [    0.162993]  [<ffffffff8105ad19>] ? pvclock_clocksource_read+0x4d/0xc3
> [    0.164992]  [<ffffffff810bbdff>] ? mark_lock+0x23c/0x502
> [    0.166993]  [<ffffffff8105a10b>] ? kvm_clock_read+0x2e/0x36
> [    0.168991]  [<ffffffff81a37c61>] schedule+0x65/0x67
> [    0.170992]  [<ffffffff81a35542>] schedule_timeout+0x2e/0x22a
> [    0.172991]  [<ffffffff810b8c1c>] ? trace_hardirqs_off+0xd/0xf
> [    0.174992]  [<ffffffff8109c890>] ? local_clock+0x41/0x5a
> [    0.176995]  [<ffffffff81a392f0>] ? _raw_spin_unlock_irq+0x2e/0x48
> [    0.178990]  [<ffffffff810bd647>] ? trace_hardirqs_on_caller+0x125/0x181
> [    0.180995]  [<ffffffff810bd6b0>] ? trace_hardirqs_on+0xd/0xf
> [    0.182989]  [<ffffffff81a37b06>] wait_for_common+0x10a/0x140
> [    0.184989]  [<ffffffff8109a2f2>] ? try_to_wake_up+0x2cb/0x2cb
> [    0.186990]  [<ffffffff810ea4c6>] ? call_rcu_bh+0x19/0x19
> [    0.188988]  [<ffffffff81a37bfa>] wait_for_completion+0x1d/0x1f
> [    0.190989]  [<ffffffff8108752d>] wait_rcu_gp+0x4d/0x54
> [    0.192988]  [<ffffffff81087534>] ? wait_rcu_gp+0x54/0x54
> [    0.194987]  [<ffffffff81a37a40>] ? wait_for_common+0x44/0x140
> [    0.196987]  [<ffffffff810e8713>] synchronize_sched+0x50/0x52
> [    0.198987]  [<ffffffff8103b272>] unregister_nmi_handler+0xc6/0xd5
> [    0.200989]  [<ffffffff8142fa79>] ? delay_tsc+0x49/0xa5
> [    0.202988]  [<ffffffff82259db7>] test_nmi_ipi.constprop.2+0x72/0x85
> [    0.204986]  [<ffffffff82259dca>] ? test_nmi_ipi.constprop.2+0x85/0x85
> [    0.206986]  [<ffffffff82259e27>] remote_ipi+0x5d/0x62
> [    0.208985]  [<ffffffff82259ce2>] dotest.constprop.1+0x6/0x69
> [    0.210985]  [<ffffffff82259ede>] nmi_selftest+0x60/0x16d
> [    0.212986]  [<ffffffff822558da>] native_smp_cpus_done+0x1c/0x111
> [    0.214989]  [<ffffffff8226198b>] smp_init+0xe2/0xeb
> [    0.216986]  [<ffffffff82248c0f>] kernel_init+0xa9/0x1c0
> [    0.218984]  [<ffffffff810bd538>] ? trace_hardirqs_on_caller+0x16/0x181
> [    0.220989]  [<ffffffff81a417b4>] kernel_thread_helper+0x4/0x10
> [    0.222989]  [<ffffffff81a396f0>] ? retint_restore_args+0x13/0x13
> [    0.224990]  [<ffffffff82248b66>] ? start_kernel+0x3a3/0x3a3
> [    0.226983]  [<ffffffff81a417b0>] ? gs_change+0x13/0x13
> [    0.231002]   ok  |
> [    0.231998]    local IPI:
> [    0.231998] BUG: scheduling while atomic: swapper/0/1/0x00000003
> ...
> 
> The first bad commit is
> 
> commit b43145c912b7dabd190ec45cc31c2b96d64715ee
> Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Date:   Tue Jun 19 11:58:27 2012 -0700
> 
>     rcu: Disable preemption in rcu_blocking_is_gp()
>     
>     It is time to optimize CONFIG_TREE_PREEMPT_RCU's synchronize_rcu()
>     for uniprocessor optimization, which means that rcu_blocking_is_gp()
>     can no longer rely on RCU read-side critical sections having disabled
>     preemption.  This commit therefore disables preemption across
>     rcu_blocking_is_gp()'s scan of the cpu_online_mask.
>     
>     Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> 
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index ce175b6..c0cc41f 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> ...
>  static inline int rcu_blocking_is_gp(void)
>  {
>         might_sleep();  /* Check for RCU read-side critical section. */
> +       preempt_disable();
>         return num_online_cpus() <= 1;
> +       preempt_enable();
>  }

Thank you!  I have no idea how a preempt_disable() causes that badness
to happen, but this commit is not yet critically important, so I will
drop it.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: BUG: tracer_alloc_buffers returned with preemption imbalance
  2012-06-23 17:01 ` Paul E. McKenney
@ 2012-06-23 17:48   ` Fengguang Wu
  2012-06-23 20:11     ` Paul E. McKenney
  2012-06-24  8:37     ` Cong Wang
  0 siblings, 2 replies; 8+ messages in thread
From: Fengguang Wu @ 2012-06-23 17:48 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: linux-kernel@vger.kernel.org

> >  static inline int rcu_blocking_is_gp(void)
> >  {
> >         might_sleep();  /* Check for RCU read-side critical section. */
> > +       preempt_disable();
> >         return num_online_cpus() <= 1;
> > +       preempt_enable();
> >  }
> 
> Thank you!  I have no idea how a preempt_disable() causes that badness
> to happen, but this commit is not yet critically important, so I will
> drop it.

preempt_enable() becomes dead code because of the return statement?
I wonder why gcc didn't issue a warning (or I failed to catch it)...

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: BUG: tracer_alloc_buffers returned with preemption imbalance
  2012-06-23 17:48   ` Fengguang Wu
@ 2012-06-23 20:11     ` Paul E. McKenney
  2012-06-24  8:37     ` Cong Wang
  1 sibling, 0 replies; 8+ messages in thread
From: Paul E. McKenney @ 2012-06-23 20:11 UTC (permalink / raw)
  To: Fengguang Wu; +Cc: linux-kernel@vger.kernel.org

On Sun, Jun 24, 2012 at 01:48:30AM +0800, Fengguang Wu wrote:
> > >  static inline int rcu_blocking_is_gp(void)
> > >  {
> > >         might_sleep();  /* Check for RCU read-side critical section. */
> > > +       preempt_disable();
> > >         return num_online_cpus() <= 1;
> > > +       preempt_enable();
> > >  }
> > 
> > Thank you!  I have no idea how a preempt_disable() causes that badness
> > to happen, but this commit is not yet critically important, so I will
> > drop it.
> 
> preempt_enable() becomes dead code because of the return statement?
> I wonder why gcc didn't issue a warning (or I failed to catch it)...

Color me blind and stupid!!!  Thank you, will fix.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: BUG: tracer_alloc_buffers returned with preemption imbalance
  2012-06-23 17:48   ` Fengguang Wu
  2012-06-23 20:11     ` Paul E. McKenney
@ 2012-06-24  8:37     ` Cong Wang
  2012-06-25  3:57       ` Paul E. McKenney
  1 sibling, 1 reply; 8+ messages in thread
From: Cong Wang @ 2012-06-24  8:37 UTC (permalink / raw)
  To: Fengguang Wu; +Cc: Paul E. McKenney, linux-kernel@vger.kernel.org

On Sun, Jun 24, 2012 at 1:48 AM, Fengguang Wu <fengguang.wu@intel.com> wrote:
>> >  static inline int rcu_blocking_is_gp(void)
>> >  {
>> >         might_sleep();  /* Check for RCU read-side critical section. */
>> > +       preempt_disable();
>> >         return num_online_cpus() <= 1;
>> > +       preempt_enable();
>> >  }
>>
>> Thank you!  I have no idea how a preempt_disable() causes that badness
>> to happen, but this commit is not yet critically important, so I will
>> drop it.
>
> preempt_enable() becomes dead code because of the return statement?
> I wonder why gcc didn't issue a warning (or I failed to catch it)...
>

gcc has an option -Wunreachable-code, but we don't enable it
when building kernel, nor it will be enabled with -Wall. If we enable it,
we will have many false-positives as we have lots of debugging code
which is not reachable unless we enable some debugging option.

However, when I test it manually with the following code:

~% cat /tmp/unreachable.c
int main(void)
{
	int a = 0;
	a++;
	return ++a;
	a++;
}
~% gcc -Wunreachable-code -O0 -c /tmp/unreachable.c

gcc still doesn't give me any warning for the last line of the code,
gcc optimizes it out silently, I am wondering if this is a gcc bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: BUG: tracer_alloc_buffers returned with preemption imbalance
  2012-06-24  8:37     ` Cong Wang
@ 2012-06-25  3:57       ` Paul E. McKenney
  2012-06-25  9:10         ` David Rientjes
  0 siblings, 1 reply; 8+ messages in thread
From: Paul E. McKenney @ 2012-06-25  3:57 UTC (permalink / raw)
  To: Cong Wang; +Cc: Fengguang Wu, linux-kernel@vger.kernel.org

On Sun, Jun 24, 2012 at 04:37:28PM +0800, Cong Wang wrote:
> On Sun, Jun 24, 2012 at 1:48 AM, Fengguang Wu <fengguang.wu@intel.com> wrote:
> >> >  static inline int rcu_blocking_is_gp(void)
> >> >  {
> >> >         might_sleep();  /* Check for RCU read-side critical section. */
> >> > +       preempt_disable();
> >> >         return num_online_cpus() <= 1;
> >> > +       preempt_enable();
> >> >  }
> >>
> >> Thank you!  I have no idea how a preempt_disable() causes that badness
> >> to happen, but this commit is not yet critically important, so I will
> >> drop it.
> >
> > preempt_enable() becomes dead code because of the return statement?
> > I wonder why gcc didn't issue a warning (or I failed to catch it)...
> >
> 
> gcc has an option -Wunreachable-code, but we don't enable it
> when building kernel, nor it will be enabled with -Wall. If we enable it,
> we will have many false-positives as we have lots of debugging code
> which is not reachable unless we enable some debugging option.
> 
> However, when I test it manually with the following code:
> 
> ~% cat /tmp/unreachable.c
> int main(void)
> {
> 	int a = 0;
> 	a++;
> 	return ++a;
> 	a++;
> }
> ~% gcc -Wunreachable-code -O0 -c /tmp/unreachable.c
> 
> gcc still doesn't give me any warning for the last line of the code,
> gcc optimizes it out silently, I am wondering if this is a gcc bug.

But in my case, the trailing preempt_enable() should not have been
optimized away, right?  Wouldn't it be more like the following?

int a = 0;
int main(void)
{
	a++;
	return ++a;
	a++;
}

Hmmm...  But this -still- doesn't emit any warnings.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: BUG: tracer_alloc_buffers returned with preemption imbalance
  2012-06-25  3:57       ` Paul E. McKenney
@ 2012-06-25  9:10         ` David Rientjes
  2012-06-25 13:20           ` Paul E. McKenney
  0 siblings, 1 reply; 8+ messages in thread
From: David Rientjes @ 2012-06-25  9:10 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Cong Wang, Fengguang Wu, linux-kernel@vger.kernel.org

On Sun, 24 Jun 2012, Paul E. McKenney wrote:

> But in my case, the trailing preempt_enable() should not have been
> optimized away, right?  Wouldn't it be more like the following?
> 
> int a = 0;
> int main(void)
> {
> 	a++;
> 	return ++a;
> 	a++;
> }
> 
> Hmmm...  But this -still- doesn't emit any warnings.
> 

gcc removed -Wunreachable-code a couple years ago.  It doesn't complain 
when you pass it for backwards compatibility with old Makefiles.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: BUG: tracer_alloc_buffers returned with preemption imbalance
  2012-06-25  9:10         ` David Rientjes
@ 2012-06-25 13:20           ` Paul E. McKenney
  0 siblings, 0 replies; 8+ messages in thread
From: Paul E. McKenney @ 2012-06-25 13:20 UTC (permalink / raw)
  To: David Rientjes; +Cc: Cong Wang, Fengguang Wu, linux-kernel@vger.kernel.org

On Mon, Jun 25, 2012 at 02:10:23AM -0700, David Rientjes wrote:
> On Sun, 24 Jun 2012, Paul E. McKenney wrote:
> 
> > But in my case, the trailing preempt_enable() should not have been
> > optimized away, right?  Wouldn't it be more like the following?
> > 
> > int a = 0;
> > int main(void)
> > {
> > 	a++;
> > 	return ++a;
> > 	a++;
> > }
> > 
> > Hmmm...  But this -still- doesn't emit any warnings.
> > 
> 
> gcc removed -Wunreachable-code a couple years ago.  It doesn't complain 
> when you pass it for backwards compatibility with old Makefiles.

Ah well, back to manual inspection, then.  :-/

							Thanx, Paul


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2012-06-25 13:28 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-06-23 10:12 BUG: tracer_alloc_buffers returned with preemption imbalance Fengguang Wu
2012-06-23 17:01 ` Paul E. McKenney
2012-06-23 17:48   ` Fengguang Wu
2012-06-23 20:11     ` Paul E. McKenney
2012-06-24  8:37     ` Cong Wang
2012-06-25  3:57       ` Paul E. McKenney
2012-06-25  9:10         ` David Rientjes
2012-06-25 13:20           ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox