Lockup with "BUG: using smp_processor

linux-rt-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Lockup with "BUG: using smp_processor_id() in preemptible"
@ 2009-12-31 16:21 Bryan Donlan
  2009-12-31 17:16 ` Leyendecker, Robert
  0 siblings, 1 reply; 3+ messages in thread
From: Bryan Donlan @ 2009-12-31 16:21 UTC (permalink / raw)
  To: RT

Hi,

With 2.6.31.6-rt19, I have an application which reliably triggers a
system freeze on a dual-processor system. Prior to the lockup, there's
this spam in logs:

Dec 29 14:48:07 Ubuntu kernel: [  346.332026] BUG: using
smp_processor_id() in preemptible [00000000] code: SmartTool/4191
Dec 29 14:48:07 Ubuntu kernel: [  346.332205] caller is __schedule+0x13/0xa70
Dec 29 14:48:07 Ubuntu kernel: [  346.332210] Pid: 4191, comm:
SmartTool Not tainted 2.6.31.6-rt19-ceng1 #1
Dec 29 14:48:07 Ubuntu kernel: [  346.332214] Call Trace:
Dec 29 14:48:07 Ubuntu kernel: [  346.332224]  [<c031dd09>]
debug_smp_processor_id+0xb9/0xd0
Dec 29 14:48:07 Ubuntu kernel: [  346.332229]  [<c0568583>]
__schedule+0x13/0xa70
Dec 29 14:48:07 Ubuntu kernel: [  346.332236]  [<c014c984>] ? irq_exit+0x54/0x90
Dec 29 14:48:07 Ubuntu kernel: [  346.332243]  [<c011d486>] ?
smp_apic_timer_interrupt+0x56/0x90
Dec 29 14:48:07 Ubuntu kernel: [  346.332249]  [<c010332a>]
work_resched+0x5/0x19
Dec 29 14:48:07 Ubuntu kernel: [  346.332256] BUG: using
smp_processor_id() in preemptible [00000000] code: SmartTool/4191
Dec 29 14:48:07 Ubuntu kernel: [  346.332425] caller is __schedule+0x6a/0xa70
Dec 29 14:48:07 Ubuntu kernel: [  346.332429] Pid: 4191, comm:
SmartTool Not tainted 2.6.31.6-rt19-ceng1 #1
Dec 29 14:48:07 Ubuntu kernel: [  346.332432] Call Trace:
Dec 29 14:48:07 Ubuntu kernel: [  346.332437]  [<c031dd09>]
debug_smp_processor_id+0xb9/0xd0
Dec 29 14:48:07 Ubuntu kernel: [  346.332443]  [<c05685da>]
__schedule+0x6a/0xa70
Dec 29 14:48:07 Ubuntu kernel: [  346.332449]  [<c014c984>] ? irq_exit+0x54/0x90
Dec 29 14:48:07 Ubuntu kernel: [  346.332454]  [<c011d486>] ?
smp_apic_timer_interrupt+0x56/0x90
Dec 29 14:48:07 Ubuntu kernel: [  346.332460]  [<c010332a>]
work_resched+0x5/0x19
Dec 29 14:48:09 Ubuntu kernel: [  349.658309] __ratelimit: 2 callbacks
suppressed

These two traces repeat constantly in the logs - I suppose the crash
occurred when a migration eventually occurred in the middle of this.
The processes running are polling several usb-serial devices.

This does not occur with 2.6.29.6-rt24, or with SMP disabled
(including after disabling a CPU at runtime).

I'll try to get some ftrace results; in the meantime, any ideas?

Kernel log, lspci, lsmod, and config at http://fushizen.net/~bd/rt-oops.tar.gz

Thanks,

Bryan Donlan

^ permalink raw reply	[flat|nested] 3+ messages in thread

* RE: Lockup with "BUG: using smp_processor_id() in preemptible"
  2009-12-31 16:21 Lockup with "BUG: using smp_processor_id() in preemptible" Bryan Donlan
@ 2009-12-31 17:16 ` Leyendecker, Robert
  2009-12-31 17:34   ` Bryan Donlan
  0 siblings, 1 reply; 3+ messages in thread
From: Leyendecker, Robert @ 2009-12-31 17:16 UTC (permalink / raw)
  To: Bryan Donlan, RT

> -----Original Message-----
> From: linux-rt-users-owner@vger.kernel.org [mailto:linux-rt-users-
> owner@vger.kernel.org] On Behalf Of Bryan Donlan
> Sent: Thursday, December 31, 2009 10:22 AM
> To: RT
> Subject: Lockup with "BUG: using smp_processor_id() in preemptible"
> 
> Hi,
> 
> With 2.6.31.6-rt19, I have an application which reliably triggers a
> system freeze on a dual-processor system. Prior to the lockup, there's
> this spam in logs:
> 
> Dec 29 14:48:07 Ubuntu kernel: [  346.332026] BUG: using
> smp_processor_id() in preemptible [00000000] code: SmartTool/4191
> Dec 29 14:48:07 Ubuntu kernel: [  346.332205] caller is
> __schedule+0x13/0xa70
> Dec 29 14:48:07 Ubuntu kernel: [  346.332210] Pid: 4191, comm:
> SmartTool Not tainted 2.6.31.6-rt19-ceng1 #1
> Dec 29 14:48:07 Ubuntu kernel: [  346.332214] Call Trace:
> Dec 29 14:48:07 Ubuntu kernel: [  346.332224]  [<c031dd09>]
> debug_smp_processor_id+0xb9/0xd0
> Dec 29 14:48:07 Ubuntu kernel: [  346.332229]  [<c0568583>]
> __schedule+0x13/0xa70
> Dec 29 14:48:07 Ubuntu kernel: [  346.332236]  [<c014c984>] ?
> irq_exit+0x54/0x90
> Dec 29 14:48:07 Ubuntu kernel: [  346.332243]  [<c011d486>] ?
> smp_apic_timer_interrupt+0x56/0x90
> Dec 29 14:48:07 Ubuntu kernel: [  346.332249]  [<c010332a>]
> work_resched+0x5/0x19
> Dec 29 14:48:07 Ubuntu kernel: [  346.332256] BUG: using
> smp_processor_id() in preemptible [00000000] code: SmartTool/4191
> Dec 29 14:48:07 Ubuntu kernel: [  346.332425] caller is
> __schedule+0x6a/0xa70
> Dec 29 14:48:07 Ubuntu kernel: [  346.332429] Pid: 4191, comm:
> SmartTool Not tainted 2.6.31.6-rt19-ceng1 #1
> Dec 29 14:48:07 Ubuntu kernel: [  346.332432] Call Trace:
> Dec 29 14:48:07 Ubuntu kernel: [  346.332437]  [<c031dd09>]
> debug_smp_processor_id+0xb9/0xd0
> Dec 29 14:48:07 Ubuntu kernel: [  346.332443]  [<c05685da>]
> __schedule+0x6a/0xa70
> Dec 29 14:48:07 Ubuntu kernel: [  346.332449]  [<c014c984>] ?
> irq_exit+0x54/0x90
> Dec 29 14:48:07 Ubuntu kernel: [  346.332454]  [<c011d486>] ?
> smp_apic_timer_interrupt+0x56/0x90
> Dec 29 14:48:07 Ubuntu kernel: [  346.332460]  [<c010332a>]
> work_resched+0x5/0x19
> Dec 29 14:48:09 Ubuntu kernel: [  349.658309] __ratelimit: 2 callbacks
> suppressed
> 
> These two traces repeat constantly in the logs - I suppose the crash
> occurred when a migration eventually occurred in the middle of this.
> The processes running are polling several usb-serial devices.
> 
> This does not occur with 2.6.29.6-rt24, or with SMP disabled
> (including after disabling a CPU at runtime).
> 
> I'll try to get some ftrace results; in the meantime, any ideas?
> 
> Kernel log, lspci, lsmod, and config at http://fushizen.net/~bd/rt-
> oops.tar.gz
> 
> Thanks,
> 
> Bryan Donlan
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rt-
> users" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Yes - from looking at trace and smp code it seems to occur during migration, however, there is a fair amount of asm woven in to this part and I have trouble following it and have no idea about root cause. With smp support I eventually hit this trap using brute force polling, epoll or async signals, regardless of application level affinity, irq priority, etc. Time to fault seems to very between a few minutes to several hours.

I have encountered this exception using core duo with a network application. It does not occur on single core machines (I am also testing with SMP disabled and it also seems to resolve the issue). It seems very reproducible on my core duo, both 32 and 64 bit and using the latest stable kernel and rt patch.

My app hammers the network interface with packets. I'm planning to boil it down into a couple of peer to peer test routines so that network processing latency can be accurately measured under rt patch for streaming applications. Rt patch seems to give excellent results that I cannot achieve using non-patch kernel (even with hand tuned affinity, IRQs, priorities, etc), so I'm hoping we can figure it out and fix it. 

For now, I plan to set everyone up with smp disabled in our test lab. Things seem stable with this setting.

Here are a few additional refs...

http://thread.gmane.org/gmane.linux.rt.user/5343/focus=5346

http://lkml.org/lkml/2009/11/26/302

http://lkml.org/lkml/2009/11/23/548

http://lkml.org/lkml/2009/11/26/318


-Bob


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Lockup with "BUG: using smp_processor_id() in preemptible"
  2009-12-31 17:16 ` Leyendecker, Robert
@ 2009-12-31 17:34   ` Bryan Donlan
  0 siblings, 0 replies; 3+ messages in thread
From: Bryan Donlan @ 2009-12-31 17:34 UTC (permalink / raw)
  To: Leyendecker, Robert; +Cc: RT

On Thu, Dec 31, 2009 at 12:16 PM, Leyendecker, Robert
<Robert.Leyendecker@lsi.com> wrote:

> Yes - from looking at trace and smp code it seems to occur during migration, however, there is a fair amount of asm woven in to this part and I have trouble following it and have no idea about root cause. With smp support I eventually hit this trap using brute force polling, epoll or async signals, regardless of application level affinity, irq priority, etc. Time to fault seems to very between a few minutes to several hours.
>
> I have encountered this exception using core duo with a network application. It does not occur on single core machines (I am also testing with SMP disabled and it also seems to resolve the issue). It seems very reproducible on my core duo, both 32 and 64 bit and using the latest stable kernel and rt patch.
>
> My app hammers the network interface with packets. I'm planning to boil it down into a couple of peer to peer test routines so that network processing latency can be accurately measured under rt patch for streaming applications. Rt patch seems to give excellent results that I cannot achieve using non-patch kernel (even with hand tuned affinity, IRQs, priorities, etc), so I'm hoping we can figure it out and fix it.
>
> For now, I plan to set everyone up with smp disabled in our test lab. Things seem stable with this setting.

Okay, good to know it's a known issue. I've captured a function_graph
trace, if it will help, of the flow leading up to the printk in
debug_smp_processor_id(): http://fushizen.net/~bd/trace.1.gz (occurs
on CPU 1 at the end of the trace; the printk is clearly visible)

The patch used to generate it is at
http://fushizen.net/~bd/smp-procid-trace-trap.patch

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2009-12-31 17:34 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-12-31 16:21 Lockup with "BUG: using smp_processor_id() in preemptible" Bryan Donlan
2009-12-31 17:16 ` Leyendecker, Robert
2009-12-31 17:34   ` Bryan Donlan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).