* Lockup with "BUG: using smp_processor_id() in preemptible" @ 2009-12-31 16:21 Bryan Donlan 2009-12-31 17:16 ` Leyendecker, Robert 0 siblings, 1 reply; 3+ messages in thread From: Bryan Donlan @ 2009-12-31 16:21 UTC (permalink / raw) To: RT Hi, With 2.6.31.6-rt19, I have an application which reliably triggers a system freeze on a dual-processor system. Prior to the lockup, there's this spam in logs: Dec 29 14:48:07 Ubuntu kernel: [ 346.332026] BUG: using smp_processor_id() in preemptible [00000000] code: SmartTool/4191 Dec 29 14:48:07 Ubuntu kernel: [ 346.332205] caller is __schedule+0x13/0xa70 Dec 29 14:48:07 Ubuntu kernel: [ 346.332210] Pid: 4191, comm: SmartTool Not tainted 2.6.31.6-rt19-ceng1 #1 Dec 29 14:48:07 Ubuntu kernel: [ 346.332214] Call Trace: Dec 29 14:48:07 Ubuntu kernel: [ 346.332224] [<c031dd09>] debug_smp_processor_id+0xb9/0xd0 Dec 29 14:48:07 Ubuntu kernel: [ 346.332229] [<c0568583>] __schedule+0x13/0xa70 Dec 29 14:48:07 Ubuntu kernel: [ 346.332236] [<c014c984>] ? irq_exit+0x54/0x90 Dec 29 14:48:07 Ubuntu kernel: [ 346.332243] [<c011d486>] ? smp_apic_timer_interrupt+0x56/0x90 Dec 29 14:48:07 Ubuntu kernel: [ 346.332249] [<c010332a>] work_resched+0x5/0x19 Dec 29 14:48:07 Ubuntu kernel: [ 346.332256] BUG: using smp_processor_id() in preemptible [00000000] code: SmartTool/4191 Dec 29 14:48:07 Ubuntu kernel: [ 346.332425] caller is __schedule+0x6a/0xa70 Dec 29 14:48:07 Ubuntu kernel: [ 346.332429] Pid: 4191, comm: SmartTool Not tainted 2.6.31.6-rt19-ceng1 #1 Dec 29 14:48:07 Ubuntu kernel: [ 346.332432] Call Trace: Dec 29 14:48:07 Ubuntu kernel: [ 346.332437] [<c031dd09>] debug_smp_processor_id+0xb9/0xd0 Dec 29 14:48:07 Ubuntu kernel: [ 346.332443] [<c05685da>] __schedule+0x6a/0xa70 Dec 29 14:48:07 Ubuntu kernel: [ 346.332449] [<c014c984>] ? irq_exit+0x54/0x90 Dec 29 14:48:07 Ubuntu kernel: [ 346.332454] [<c011d486>] ? smp_apic_timer_interrupt+0x56/0x90 Dec 29 14:48:07 Ubuntu kernel: [ 346.332460] [<c010332a>] work_resched+0x5/0x19 Dec 29 14:48:09 Ubuntu kernel: [ 349.658309] __ratelimit: 2 callbacks suppressed These two traces repeat constantly in the logs - I suppose the crash occurred when a migration eventually occurred in the middle of this. The processes running are polling several usb-serial devices. This does not occur with 2.6.29.6-rt24, or with SMP disabled (including after disabling a CPU at runtime). I'll try to get some ftrace results; in the meantime, any ideas? Kernel log, lspci, lsmod, and config at http://fushizen.net/~bd/rt-oops.tar.gz Thanks, Bryan Donlan ^ permalink raw reply [flat|nested] 3+ messages in thread
* RE: Lockup with "BUG: using smp_processor_id() in preemptible" 2009-12-31 16:21 Lockup with "BUG: using smp_processor_id() in preemptible" Bryan Donlan @ 2009-12-31 17:16 ` Leyendecker, Robert 2009-12-31 17:34 ` Bryan Donlan 0 siblings, 1 reply; 3+ messages in thread From: Leyendecker, Robert @ 2009-12-31 17:16 UTC (permalink / raw) To: Bryan Donlan, RT > -----Original Message----- > From: linux-rt-users-owner@vger.kernel.org [mailto:linux-rt-users- > owner@vger.kernel.org] On Behalf Of Bryan Donlan > Sent: Thursday, December 31, 2009 10:22 AM > To: RT > Subject: Lockup with "BUG: using smp_processor_id() in preemptible" > > Hi, > > With 2.6.31.6-rt19, I have an application which reliably triggers a > system freeze on a dual-processor system. Prior to the lockup, there's > this spam in logs: > > Dec 29 14:48:07 Ubuntu kernel: [ 346.332026] BUG: using > smp_processor_id() in preemptible [00000000] code: SmartTool/4191 > Dec 29 14:48:07 Ubuntu kernel: [ 346.332205] caller is > __schedule+0x13/0xa70 > Dec 29 14:48:07 Ubuntu kernel: [ 346.332210] Pid: 4191, comm: > SmartTool Not tainted 2.6.31.6-rt19-ceng1 #1 > Dec 29 14:48:07 Ubuntu kernel: [ 346.332214] Call Trace: > Dec 29 14:48:07 Ubuntu kernel: [ 346.332224] [<c031dd09>] > debug_smp_processor_id+0xb9/0xd0 > Dec 29 14:48:07 Ubuntu kernel: [ 346.332229] [<c0568583>] > __schedule+0x13/0xa70 > Dec 29 14:48:07 Ubuntu kernel: [ 346.332236] [<c014c984>] ? > irq_exit+0x54/0x90 > Dec 29 14:48:07 Ubuntu kernel: [ 346.332243] [<c011d486>] ? > smp_apic_timer_interrupt+0x56/0x90 > Dec 29 14:48:07 Ubuntu kernel: [ 346.332249] [<c010332a>] > work_resched+0x5/0x19 > Dec 29 14:48:07 Ubuntu kernel: [ 346.332256] BUG: using > smp_processor_id() in preemptible [00000000] code: SmartTool/4191 > Dec 29 14:48:07 Ubuntu kernel: [ 346.332425] caller is > __schedule+0x6a/0xa70 > Dec 29 14:48:07 Ubuntu kernel: [ 346.332429] Pid: 4191, comm: > SmartTool Not tainted 2.6.31.6-rt19-ceng1 #1 > Dec 29 14:48:07 Ubuntu kernel: [ 346.332432] Call Trace: > Dec 29 14:48:07 Ubuntu kernel: [ 346.332437] [<c031dd09>] > debug_smp_processor_id+0xb9/0xd0 > Dec 29 14:48:07 Ubuntu kernel: [ 346.332443] [<c05685da>] > __schedule+0x6a/0xa70 > Dec 29 14:48:07 Ubuntu kernel: [ 346.332449] [<c014c984>] ? > irq_exit+0x54/0x90 > Dec 29 14:48:07 Ubuntu kernel: [ 346.332454] [<c011d486>] ? > smp_apic_timer_interrupt+0x56/0x90 > Dec 29 14:48:07 Ubuntu kernel: [ 346.332460] [<c010332a>] > work_resched+0x5/0x19 > Dec 29 14:48:09 Ubuntu kernel: [ 349.658309] __ratelimit: 2 callbacks > suppressed > > These two traces repeat constantly in the logs - I suppose the crash > occurred when a migration eventually occurred in the middle of this. > The processes running are polling several usb-serial devices. > > This does not occur with 2.6.29.6-rt24, or with SMP disabled > (including after disabling a CPU at runtime). > > I'll try to get some ftrace results; in the meantime, any ideas? > > Kernel log, lspci, lsmod, and config at http://fushizen.net/~bd/rt- > oops.tar.gz > > Thanks, > > Bryan Donlan > -- > To unsubscribe from this list: send the line "unsubscribe linux-rt- > users" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html Yes - from looking at trace and smp code it seems to occur during migration, however, there is a fair amount of asm woven in to this part and I have trouble following it and have no idea about root cause. With smp support I eventually hit this trap using brute force polling, epoll or async signals, regardless of application level affinity, irq priority, etc. Time to fault seems to very between a few minutes to several hours. I have encountered this exception using core duo with a network application. It does not occur on single core machines (I am also testing with SMP disabled and it also seems to resolve the issue). It seems very reproducible on my core duo, both 32 and 64 bit and using the latest stable kernel and rt patch. My app hammers the network interface with packets. I'm planning to boil it down into a couple of peer to peer test routines so that network processing latency can be accurately measured under rt patch for streaming applications. Rt patch seems to give excellent results that I cannot achieve using non-patch kernel (even with hand tuned affinity, IRQs, priorities, etc), so I'm hoping we can figure it out and fix it. For now, I plan to set everyone up with smp disabled in our test lab. Things seem stable with this setting. Here are a few additional refs... http://thread.gmane.org/gmane.linux.rt.user/5343/focus=5346 http://lkml.org/lkml/2009/11/26/302 http://lkml.org/lkml/2009/11/23/548 http://lkml.org/lkml/2009/11/26/318 -Bob ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Lockup with "BUG: using smp_processor_id() in preemptible" 2009-12-31 17:16 ` Leyendecker, Robert @ 2009-12-31 17:34 ` Bryan Donlan 0 siblings, 0 replies; 3+ messages in thread From: Bryan Donlan @ 2009-12-31 17:34 UTC (permalink / raw) To: Leyendecker, Robert; +Cc: RT On Thu, Dec 31, 2009 at 12:16 PM, Leyendecker, Robert <Robert.Leyendecker@lsi.com> wrote: > Yes - from looking at trace and smp code it seems to occur during migration, however, there is a fair amount of asm woven in to this part and I have trouble following it and have no idea about root cause. With smp support I eventually hit this trap using brute force polling, epoll or async signals, regardless of application level affinity, irq priority, etc. Time to fault seems to very between a few minutes to several hours. > > I have encountered this exception using core duo with a network application. It does not occur on single core machines (I am also testing with SMP disabled and it also seems to resolve the issue). It seems very reproducible on my core duo, both 32 and 64 bit and using the latest stable kernel and rt patch. > > My app hammers the network interface with packets. I'm planning to boil it down into a couple of peer to peer test routines so that network processing latency can be accurately measured under rt patch for streaming applications. Rt patch seems to give excellent results that I cannot achieve using non-patch kernel (even with hand tuned affinity, IRQs, priorities, etc), so I'm hoping we can figure it out and fix it. > > For now, I plan to set everyone up with smp disabled in our test lab. Things seem stable with this setting. Okay, good to know it's a known issue. I've captured a function_graph trace, if it will help, of the flow leading up to the printk in debug_smp_processor_id(): http://fushizen.net/~bd/trace.1.gz (occurs on CPU 1 at the end of the trace; the printk is clearly visible) The patch used to generate it is at http://fushizen.net/~bd/smp-procid-trace-trap.patch ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2009-12-31 17:34 UTC | newest] Thread overview: 3+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-12-31 16:21 Lockup with "BUG: using smp_processor_id() in preemptible" Bryan Donlan 2009-12-31 17:16 ` Leyendecker, Robert 2009-12-31 17:34 ` Bryan Donlan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).