IRQ sharing: BUG: spinlock lockup on CPU#0

All of lore.kernel.org
 help / color / mirror / Atom feed

* IRQ sharing: BUG: spinlock lockup on CPU#0
@ 2006-06-02  1:29 Keith Chew
  2006-06-02  2:59 ` Andrew Morton
  0 siblings, 1 reply; 9+ messages in thread
From: Keith Chew @ 2006-06-02  1:29 UTC (permalink / raw)
  To: linux-kernel

Hi

I recently started a thread called "IO APIC IRQ assignment" to find a
way to separate the IRQs assignments for the USB and BT878 chip. In
the meantime, our workaround was to tweak the PCI latency to make the
2 devices work nicely on the same IRQ, but last night, 1 of the 10 PCs
under test crashed.

I apologise in advance for posting the same console stack trace again,
but I figured this problem needed a new topic as it is not related to
IO APIC IRQ Assignment topic. If anyone can guide us to where to look
for the problem, it will be much appreciated. We have 5 x PCs running
2.6.14.2 kernel, and 5 x PCs running 2.6.16.18. This crash happened on
the 2.6.14.2 kernel.

===================================
Unable to handle kernel paging request at virtual address 23232327
 printing eip:
c014b569
*pde = 00000000
Oops: 0002 [#1]
Modules linked in: ipt_LOG ipt_state iptable_filter ip_tables ip_conntrack_tf
ip_conntrack_proto_sctp ip_conntrack_irc ip_conntrack_ftp ip_conntrack_amanda
_conntrack rt2570 zd1211 autofs4 video button battery ac uhci_hcd bt878 tuner
audio bttv video_buf i2c_algo_bit v4l2_common btcx_risc tveeprom videodev i2c
01 i2c_core snd_intel8x0 snd_ac97_codec snd_ac97_bus snd_pcm_oss snd_mixer_os
nd_pcm snd_timer snd soundcore snd_page_alloc e100 mii dm_snapshot dm_zero dm
rror ext3 jbd dm_mod
CPU:    0
EIP:    0060:[<c014b569>]    Not tainted VLI
EFLAGS: 00010046   (2.6.14.2)
EIP is at activate_page+0x59/0xd0
eax: 23232323   ebx: c1400160   ecx: c1400178   edx: 23232323
esi: c03b7300   edi: c03b73e0   ebp: 00000001   esp: d3769ae4
ds: 007b   es: 007b   ss: 0068
Process mencoder (pid: 23017, threadinfo=d3768000 task=d615d030)
Stack: d3769b7c 00000000 c1400160 00000040 c1000000 c014b613 c1400160 c0150b1
      c7fe9d68 0000000f 00000001 b771d000 00000001 00000000 00000001 e5d26ee
      f724124c f7241200 c0150cbe f7241200 b775a000 00000000 00000001 0000000
Call Trace:
 [<c014b613>] mark_page_accessed+0x33/0x40
 [<c0150b13>] __follow_page+0x153/0x160
 [<c0150cbe>] get_user_pages+0x11e/0x380
 [<f896e348>] videobuf_dma_init_user+0x118/0x190 [video_buf]
 [<f896eac7>] videobuf_iolock+0x77/0x110 [video_buf]
 [<f89a0909>] bttv_prepare_buffer+0x179/0x1c0 [bttv]
 [<f89a2b9c>] bttv_do_ioctl+0xbdc/0x1850 [bttv]
 [<c035d2eb>] _read_unlock+0xb/0x10
 [<f8a553c4>] zd1205_xmit_frame+0x94/0x4a0 [zd1211]
 [<c035d2ab>] _spin_lock+0xb/0x10
 [<c0300c73>] qdisc_restart+0x23/0x200
 [<c035d2cb>] _spin_unlock+0xb/0x10
 [<c02f0312>] dev_queue_xmit+0x2a2/0x330
 [<f8cae3c3>] tcp_in_window+0x303/0x510 [ip_conntrack]
 [<c035d1bf>] _spin_lock_irqsave+0xf/0x20
 [<c0124d28>] __mod_timer+0xa8/0xd0
 [<c035d3cb>] _write_unlock_bh+0xb/0x20
 [<f8cad773>] __ip_ct_refresh_acct+0x73/0xc0 [ip_conntrack]
 [<f8caeaa3>] tcp_packet+0x1a3/0x580 [ip_conntrack]
 [<c02ea2e1>] __alloc_skb+0x61/0x150
 [<c026b0e8>] dma_pool_alloc+0x98/0x180
 [<c0117597>] activate_task+0x67/0x80
 [<c0117688>] try_to_wake_up+0x88/0xd0
 [<c01176ed>] wake_up_process+0x1d/0x20
 [<c035d31b>] _spin_unlock_irq+0xb/0x10
 [<f8a466f3>] uhci_alloc_qh+0x23/0x60 [uhci_hcd]
 [<c0117597>] activate_task+0x67/0x80
 [<c0117688>] try_to_wake_up+0x88/0xd0
 [<c0131b7f>] autoremove_wake_function+0x2f/0x60
 [<c0118021>] __wake_up_common+0x41/0x80
 [<c01e4756>] copy_from_user+0x66/0xa0
 [<f897b427>] video_usercopy+0xf7/0x180 [videodev]
 [<c0117688>] try_to_wake_up+0x88/0xd0
 [<c0118021>] __wake_up_common+0x41/0x80
 [<c011809e>] __wake_up+0x3e/0x60
 [<f89a384f>] bttv_ioctl+0x3f/0x70 [bttv]
 [<f89a1fc0>] bttv_do_ioctl+0x0/0x1850 [bttv]
 [<c0177b48>] do_ioctl+0x58/0x80
 [<c0177cd5>] vfs_ioctl+0x65/0x1f0
 [<c01e46d6>] copy_to_user+0x66/0x80
 [<c0177ee8>] sys_ioctl+0x88/0xa0
 [<c01031af>] sysenter_past_esp+0x54/0x75
Code: 74 06 8b 03 a8 40 74 1a 89 f8 8b 5c 24 08 8b 74 24 0c 8b 7c 24 10 83 c4
 e9 b4 1d 21 00 8d 74 26 00 8d 4b 18 8b 43 18 8b 51 04 <89> 50 04 89 02 c7 41
 00 02 20 00 c7 43 18 00 01 10 00 ff 8e
 <6>bttv2: timeout: drop=79382 irq=27663181/27663182, risc=063fd218, bits:
bttv1: timeout: drop=79789 irq=27657488/37569186, risc=11efc8c0, bits:
BUG: spinlock cpu recursion on CPU#0, mencoder/23014
 lock: f724124c, .magic: dead4ead, .owner: mencoder/23017, .owner_cpu: 0
 [<c01e52b3>] _raw_spin_lock+0x83/0xa0
 [<c035d2ab>] _spin_lock+0xb/0x10
 [<c0152432>] __handle_mm_fault+0x72/0x240
 [<c0129b15>] notifier_call_chain+0x25/0x50
 [<c035e3d3>] do_page_fault+0x1e3/0x600
 [<c035e1f0>] do_page_fault+0x0/0x600
 [<c0103433>] error_code+0x4f/0x54
BUG: soft lockup detected on CPU#0!

Pid: 23014, comm:             mencoder
EIP: 0060:[<c01e5192>] CPU: 0
EIP is at __spin_lock_debug+0x42/0xe0
 EFLAGS: 00000212    Not tainted  (2.6.14.2)
EAX: 6b541ad0 EBX: 00000000 ECX: 3af5ead5 EDX: 00000000
ESI: 001b79e2 EDI: f724124c EBP: e59d8000 DS: 007b ES: 007b
CR0: 80050033 CR2: b6213000 CR3: 1791d000 CR4: 000006d0
 [<c010368b>] show_trace+0x3b/0x90
 [<c0103433>] error_code+0x4f/0x54
 [<c01e529c>] _raw_spin_lock+0x6c/0xa0
 [<c035d2ab>] _spin_lock+0xb/0x10
 [<c0152432>] __handle_mm_fault+0x72/0x240
 [<c0129b15>] notifier_call_chain+0x25/0x50
 [<c035e3d3>] do_page_fault+0x1e3/0x600
 [<c035e1f0>] do_page_fault+0x0/0x600
 [<c0103433>] error_code+0x4f/0x54
BUG: spinlock lockup on CPU#0, mencoder/23014, f724124c
 [<c01e51fc>] __spin_lock_debug+0xac/0xe0
 [<c01e529c>] _raw_spin_lock+0x6c/0xa0
 [<c035d2ab>] _spin_lock+0xb/0x10
 [<c0152432>] __handle_mm_fault+0x72/0x240
 [<c0129b15>] notifier_call_chain+0x25/0x50
 [<c035e3d3>] do_page_fault+0x1e3/0x600
 [<c035e1f0>] do_page_fault+0x0/0x600
 [<c0103433>] error_code+0x4f/0x54

===================================

Regards
Keith

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: IRQ sharing: BUG: spinlock lockup on CPU#0
  2006-06-02  1:29 Keith Chew
@ 2006-06-02  2:59 ` Andrew Morton
  2006-06-02  3:20   ` Keith Chew
  0 siblings, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2006-06-02  2:59 UTC (permalink / raw)
  To: Keith Chew; +Cc: linux-kernel, Hugh Dickins

On Fri, 2 Jun 2006 13:29:18 +1200
"Keith Chew" <keith.chew@gmail.com> wrote:

> Hi
> 
> I recently started a thread called "IO APIC IRQ assignment" to find a
> way to separate the IRQs assignments for the USB and BT878 chip. In
> the meantime, our workaround was to tweak the PCI latency to make the
> 2 devices work nicely on the same IRQ, but last night, 1 of the 10 PCs
> under test crashed.
> 
> I apologise in advance for posting the same console stack trace again,
> but I figured this problem needed a new topic as it is not related to
> IO APIC IRQ Assignment topic. If anyone can guide us to where to look
> for the problem, it will be much appreciated. We have 5 x PCs running
> 2.6.14.2 kernel, and 5 x PCs running 2.6.16.18. This crash happened on
> the 2.6.14.2 kernel.
> 
> ===================================
> Unable to handle kernel paging request at virtual address 23232327
>  printing eip:
> c014b569
> *pde = 00000000
> Oops: 0002 [#1]
> Modules linked in: ipt_LOG ipt_state iptable_filter ip_tables ip_conntrack_tf
> ip_conntrack_proto_sctp ip_conntrack_irc ip_conntrack_ftp ip_conntrack_amanda
> _conntrack rt2570 zd1211 autofs4 video button battery ac uhci_hcd bt878 tuner
> audio bttv video_buf i2c_algo_bit v4l2_common btcx_risc tveeprom videodev i2c
> 01 i2c_core snd_intel8x0 snd_ac97_codec snd_ac97_bus snd_pcm_oss snd_mixer_os
> nd_pcm snd_timer snd soundcore snd_page_alloc e100 mii dm_snapshot dm_zero dm
> rror ext3 jbd dm_mod
> CPU:    0
> EIP:    0060:[<c014b569>]    Not tainted VLI
> EFLAGS: 00010046   (2.6.14.2)
> EIP is at activate_page+0x59/0xd0
> eax: 23232323   ebx: c1400160   ecx: c1400178   edx: 23232323
> esi: c03b7300   edi: c03b73e0   ebp: 00000001   esp: d3769ae4
> ds: 007b   es: 007b   ss: 0068
> Process mencoder (pid: 23017, threadinfo=d3768000 task=d615d030)
> Stack: d3769b7c 00000000 c1400160 00000040 c1000000 c014b613 c1400160 c0150b1
>       c7fe9d68 0000000f 00000001 b771d000 00000001 00000000 00000001 e5d26ee
>       f724124c f7241200 c0150cbe f7241200 b775a000 00000000 00000001 0000000
> Call Trace:
>  [<c014b613>] mark_page_accessed+0x33/0x40
>  [<c0150b13>] __follow_page+0x153/0x160
>  [<c0150cbe>] get_user_pages+0x11e/0x380
>  [<f896e348>] videobuf_dma_init_user+0x118/0x190 [video_buf]
>  [<f896eac7>] videobuf_iolock+0x77/0x110 [video_buf]
>  [<f89a0909>] bttv_prepare_buffer+0x179/0x1c0 [bttv]
>  [<f89a2b9c>] bttv_do_ioctl+0xbdc/0x1850 [bttv]
>  [<c035d2eb>] _read_unlock+0xb/0x10
>  [<f8a553c4>] zd1205_xmit_frame+0x94/0x4a0 [zd1211]
>  [<c035d2ab>] _spin_lock+0xb/0x10
>  [<c0300c73>] qdisc_restart+0x23/0x200
>  [<c035d2cb>] _spin_unlock+0xb/0x10
>  [<c02f0312>] dev_queue_xmit+0x2a2/0x330
>  [<f8cae3c3>] tcp_in_window+0x303/0x510 [ip_conntrack]
>  [<c035d1bf>] _spin_lock_irqsave+0xf/0x20
>  [<c0124d28>] __mod_timer+0xa8/0xd0
>  [<c035d3cb>] _write_unlock_bh+0xb/0x20
>  [<f8cad773>] __ip_ct_refresh_acct+0x73/0xc0 [ip_conntrack]
>  [<f8caeaa3>] tcp_packet+0x1a3/0x580 [ip_conntrack]
>  [<c02ea2e1>] __alloc_skb+0x61/0x150
>  [<c026b0e8>] dma_pool_alloc+0x98/0x180
>  [<c0117597>] activate_task+0x67/0x80
>  [<c0117688>] try_to_wake_up+0x88/0xd0
>  [<c01176ed>] wake_up_process+0x1d/0x20
>  [<c035d31b>] _spin_unlock_irq+0xb/0x10
>  [<f8a466f3>] uhci_alloc_qh+0x23/0x60 [uhci_hcd]
>  [<c0117597>] activate_task+0x67/0x80
>  [<c0117688>] try_to_wake_up+0x88/0xd0
>  [<c0131b7f>] autoremove_wake_function+0x2f/0x60
>  [<c0118021>] __wake_up_common+0x41/0x80
>  [<c01e4756>] copy_from_user+0x66/0xa0
>  [<f897b427>] video_usercopy+0xf7/0x180 [videodev]
>  [<c0117688>] try_to_wake_up+0x88/0xd0
>  [<c0118021>] __wake_up_common+0x41/0x80
>  [<c011809e>] __wake_up+0x3e/0x60
>  [<f89a384f>] bttv_ioctl+0x3f/0x70 [bttv]
>  [<f89a1fc0>] bttv_do_ioctl+0x0/0x1850 [bttv]
>  [<c0177b48>] do_ioctl+0x58/0x80
>  [<c0177cd5>] vfs_ioctl+0x65/0x1f0
>  [<c01e46d6>] copy_to_user+0x66/0x80
>  [<c0177ee8>] sys_ioctl+0x88/0xa0
>  [<c01031af>] sysenter_past_esp+0x54/0x75

We've certianly screwed around with the mmapping of IO space and such
things in recent months, and iirc 2.6.14 was somewhat in the middle of it
all.

Are you seeing the above problem on 2.6.16.x?


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: IRQ sharing: BUG: spinlock lockup on CPU#0
  2006-06-02  2:59 ` Andrew Morton
@ 2006-06-02  3:20   ` Keith Chew
  0 siblings, 0 replies; 9+ messages in thread
From: Keith Chew @ 2006-06-02  3:20 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, Hugh Dickins

Hi Andrew

Thank you very much for your reply.

> We've certianly screwed around with the mmapping of IO space and such
> things in recent months, and iirc 2.6.14 was somewhat in the middle of it
> all.
>
> Are you seeing the above problem on 2.6.16.x?
>

We used to see freezes more frequently (on both 2.6.16.18 and
2.6.14.2) before setting these tweaks:
- increase PCI latency of bttv device
- disable overlay on bttv driver
- enable HPET support for Character Devices
- disable "load DRI" in xorg.conf

After the tweaks, all 10 PCs have been under stress test for 48 hours,
and the first crash was on the 2.6.14.2. We will upgrade 3 more PCs to
2.6.16.18 today, to increase the probability of crashing on that
kernel.

Will keep you posted.

Regards
Keith

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: IRQ sharing: BUG: spinlock lockup on CPU#0
@ 2006-06-02  5:33 Chuck Ebbert
  2006-06-02  6:17 ` Keith Chew
  0 siblings, 1 reply; 9+ messages in thread
From: Chuck Ebbert @ 2006-06-02  5:33 UTC (permalink / raw)
  To: Keith Chew; +Cc: linux-kernel

In-Reply-To: <20f65d530606011829n2ee1d76fg9d2c7bbc02a6a0aa@mail.gmail.com>

On Fri, 2 Jun 2006 13:29:18 +1200, Keith Chew wrote:

> I apologise in advance for posting the same console stack trace again,
> but I figured this problem needed a new topic as it is not related to
> IO APIC IRQ Assignment topic. If anyone can guide us to where to look
> for the problem, it will be much appreciated. We have 5 x PCs running
> 2.6.14.2 kernel, and 5 x PCs running 2.6.16.18. This crash happened on
> the 2.6.14.2 kernel.

If you could reproduce this on 2.6.16.x it would be much better.  And if
you don't have it set, add CONFIG_FRAME_POINTER so the stack traces will
be cleaner.

> Unable to handle kernel paging request at virtual address 23232327
> printing eip:
> c014b569
> *pde = 00000000
> Oops: 0002 [#1]

This is the real problem AFAICT.

> Modules linked in: ipt_LOG ipt_state iptable_filter ip_tables ip_conntrack_tf
> ip_conntrack_proto_sctp ip_conntrack_irc ip_conntrack_ftp ip_conntrack_amanda
> _conntrack rt2570 zd1211 autofs4 video button battery ac uhci_hcd bt878 tuner
> audio bttv video_buf i2c_algo_bit v4l2_common btcx_risc tveeprom videodev i2c
> 01 i2c_core snd_intel8x0 snd_ac97_codec snd_ac97_bus snd_pcm_oss snd_mixer_os
> nd_pcm snd_timer snd soundcore snd_page_alloc e100 mii dm_snapshot dm_zero dm
> rror ext3 jbd dm_mod

Your mailer is garbling long lines when it wraps them...

> CPU:    0
> EIP:    0060:[<c014b569>]    Not tainted VLI
> EFLAGS: 00010046   (2.6.14.2)
> EIP is at activate_page+0x59/0xd0
> eax: 23232323   ebx: c1400160   ecx: c1400178   edx: 23232323
> esi: c03b7300   edi: c03b73e0   ebp: 00000001   esp: d3769ae4
> ds: 007b   es: 007b   ss: 0068
> Process mencoder (pid: 23017, threadinfo=d3768000 task=d615d030)
> Stack: d3769b7c 00000000 c1400160 00000040 c1000000 c014b613 c1400160 c0150b1
>       c7fe9d68 0000000f 00000001 b771d000 00000001 00000000 00000001 e5d26ee
>       f724124c f7241200 c0150cbe f7241200 b775a000 00000000 00000001 0000000
> Call Trace:
>  [<c014b613>] mark_page_accessed+0x33/0x40
>  [<c0150b13>] __follow_page+0x153/0x160
>  [<c0150cbe>] get_user_pages+0x11e/0x380
>  [<f896e348>] videobuf_dma_init_user+0x118/0x190 [video_buf]
>  [<f896eac7>] videobuf_iolock+0x77/0x110 [video_buf]
>  [<f89a0909>] bttv_prepare_buffer+0x179/0x1c0 [bttv]
>  [<f89a2b9c>] bttv_do_ioctl+0xbdc/0x1850 [bttv]
>  [<c035d2eb>] _read_unlock+0xb/0x10
>  [<f8a553c4>] zd1205_xmit_frame+0x94/0x4a0 [zd1211]
>  [<c035d2ab>] _spin_lock+0xb/0x10
>  [<c0300c73>] qdisc_restart+0x23/0x200
>  [<c035d2cb>] _spin_unlock+0xb/0x10
>  [<c02f0312>] dev_queue_xmit+0x2a2/0x330
>  [<f8cae3c3>] tcp_in_window+0x303/0x510 [ip_conntrack]
>  [<c035d1bf>] _spin_lock_irqsave+0xf/0x20
>  [<c0124d28>] __mod_timer+0xa8/0xd0
>  [<c035d3cb>] _write_unlock_bh+0xb/0x20
>  [<f8cad773>] __ip_ct_refresh_acct+0x73/0xc0 [ip_conntrack]
>  [<f8caeaa3>] tcp_packet+0x1a3/0x580 [ip_conntrack]
>  [<c02ea2e1>] __alloc_skb+0x61/0x150
>  [<c026b0e8>] dma_pool_alloc+0x98/0x180
>  [<c0117597>] activate_task+0x67/0x80
>  [<c0117688>] try_to_wake_up+0x88/0xd0
>  [<c01176ed>] wake_up_process+0x1d/0x20
>  [<c035d31b>] _spin_unlock_irq+0xb/0x10
>  [<f8a466f3>] uhci_alloc_qh+0x23/0x60 [uhci_hcd]
>  [<c0117597>] activate_task+0x67/0x80
>  [<c0117688>] try_to_wake_up+0x88/0xd0
>  [<c0131b7f>] autoremove_wake_function+0x2f/0x60
>  [<c0118021>] __wake_up_common+0x41/0x80
>  [<c01e4756>] copy_from_user+0x66/0xa0
>  [<f897b427>] video_usercopy+0xf7/0x180 [videodev]
>  [<c0117688>] try_to_wake_up+0x88/0xd0
>  [<c0118021>] __wake_up_common+0x41/0x80
>  [<c011809e>] __wake_up+0x3e/0x60
>  [<f89a384f>] bttv_ioctl+0x3f/0x70 [bttv]
>  [<f89a1fc0>] bttv_do_ioctl+0x0/0x1850 [bttv]
>  [<c0177b48>] do_ioctl+0x58/0x80
>  [<c0177cd5>] vfs_ioctl+0x65/0x1f0
>  [<c01e46d6>] copy_to_user+0x66/0x80
>  [<c0177ee8>] sys_ioctl+0x88/0xa0
>  [<c01031af>] sysenter_past_esp+0x54/0x75
> Code: 74 06 8b 03 a8 40 74 1a 89 f8 8b 5c 24 08 8b 74 24 0c 8b 7c 24 10 83 c4
>  e9 b4 1d 21 00 8d 74 26 00 8d 4b 18 8b 43 18 8b 51 04 <89> 50 04 89 02 c7 41
>  00 02 20 00 c7 43 18 00 01 10 00 ff 8e

Looks like this code dump got garbled too.  The disassembly is nonsense right
at the line breaks.

Looking at what's there, it seems the page's lru list pointers are
junk; they contain 0x23232323.  But it's hard to tell with what you posted.

-- 
Chuck


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: IRQ sharing: BUG: spinlock lockup on CPU#0
  2006-06-02  5:33 IRQ sharing: BUG: spinlock lockup on CPU#0 Chuck Ebbert
@ 2006-06-02  6:17 ` Keith Chew
  2006-06-08 10:51   ` Keith Chew
  0 siblings, 1 reply; 9+ messages in thread
From: Keith Chew @ 2006-06-02  6:17 UTC (permalink / raw)
  To: Chuck Ebbert; +Cc: linux-kernel

Hi Chuck

>
> Your mailer is garbling long lines when it wraps them...
>

Oh dear, it looks like we forgot to set Line Wrapping in minicom, that
was real silly. Will post another stack trace as soon as another feeze
happens.

Regards
Keith

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: IRQ sharing: BUG: spinlock lockup on CPU#0
  2006-06-02  6:17 ` Keith Chew
@ 2006-06-08 10:51   ` Keith Chew
  0 siblings, 0 replies; 9+ messages in thread
From: Keith Chew @ 2006-06-08 10:51 UTC (permalink / raw)
  To: Chuck Ebbert; +Cc: linux-kernel

Hi

We have updated all the 10 PCs kernel to 2.6.16.18. We only have 1 of
them running in IOAPIC mode (ie local APIC mode support), the rest are
in XT-PIC. The APIC machine crashed after 24 hours of operation. Below
is the stack trace. Is this related to the IO APIC, or should we be
worried about the XT-PIC machines too?

[4354161.283000] Unable to handle kernel NULL pointer dereference at virtual add
ress 00000008
[4354161.283000]  printing eip:
[4354161.283000] c0115dec
[4354161.283000] *pde = 00000000
[4354161.283000] Oops: 0002 [#1]
[4354161.283000] Modules linked in: zd1211 rt2570 autofs4 video button battery a
c uhci_hcd bt878 tuner tvaudio bttv video_buf compat_ioctl32 i2c_algo_bit v4l2_c
ommon btcx_risc ir_common tveeprom videodev i2c_i801 i2c_core snd_intel8x0 snd_a
c97_codec snd_ac97_bus snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore
 snd_page_alloc e100 mii dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod
[4354161.283000] CPU:    0
[4354161.283000] EIP:    0060:[<c0115dec>]    Not tainted VLI
[4354161.283000] EFLAGS: 00010017   (2.6.16.18 #3)
[4354161.283000] EIP is at scheduler_tick+0x76/0x287
[4354161.283000] eax: 00000000   ebx: 3fd1db7d   ecx: 6c53c678   edx: 000f7816
[4354161.283000] esi: ed103560   edi: 00000000   ebp: de18ba44   esp: de18ba28
[4354161.283000] ds: 007b   es: 007b   ss: 0068
[4354161.283000] Process ?.?? (pid: -1, threadinfo=de18a000 task=ed103560)
[4354161.283000] Stack: <0>00000000 de18ba40 2c9166c0 000f7816 ed103560 00000000
 00000000 de18ba54
[4354161.283000]        c0120aba de18bac4 00000000 de18ba64 c010588f 00000000 c0
33a3a0 de18ba8c
[4354161.283000]        c01357a5 00000000 00000000 de18bac4 de18bac4 00000000 c0
394b1c c0394b00
[4354161.283000] Call Trace:
[4354161.283000]  [<c0103951>] show_stack_log_lvl+0xa5/0xad
[4354161.283000]  [<c0103a8c>] show_registers+0x106/0x16f
[4354161.283000]  [<c0103c2e>] die+0xc1/0x13c
[4354161.283000]  [<c02dfd39>] do_page_fault+0x366/0x50e
[4354161.283000]  [<c01035ef>] error_code+0x4f/0x54
[4354161.283000]  [<c0120aba>] update_process_times+0x51/0x5e
[4354161.283000]  [<c010588f>] timer_interrupt+0x59/0x94
[4354161.283000]  [<c01357a5>] handle_IRQ_event+0x26/0x56
[4354161.283000]  [<c013584e>] __do_IRQ+0x79/0xcf
[4354161.283000]  [<c0104811>] do_IRQ+0x45/0x56
[4354161.283000]  [<c0103526>] common_interrupt+0x1a/0x20
[4354161.283000]  [<c013584e>] __do_IRQ+0x79/0xcf
[4354161.283000]  [<c0104811>] do_IRQ+0x45/0x56
[4354161.283000]  [<c0103526>] common_interrupt+0x1a/0x20
[4354161.283000]  [<c02dd492>] schedule+0x4b8/0x516
[4354161.283000]  [<f895e7bf>] videobuf_waiton+0xad/0x102 [video_buf]
[4354161.283000]  [<f89870af>] bttv_do_ioctl+0xb45/0x141e [bttv]
[4354161.283000]  [<f896a2be>] video_usercopy+0xb9/0x112 [videodev]
[4354161.283000]  [<f89879c4>] bttv_ioctl+0x3c/0x41 [bttv]
[4354161.283000]  [<c015ff68>] do_ioctl+0x48/0x52
[4354161.283000]  [<c016019a>] vfs_ioctl+0x16e/0x17d
[4354161.283000]  [<c01601ef>] sys_ioctl+0x46/0x63
[4354161.283000]  [<c0102a87>] sysenter_past_esp+0x54/0x75
[4354161.283000] Code: 48 8b 45 ec 8b 55 f0 3b 35 10 98 3d c0 a3 04 98 3d c0 89
15 08 98 3d c0 0f 84 16 02 00 00 a1 18 98 3d c0 39 46 28 74 0d 8b 46 04 <0f> ba
68 08 03 e9 ff 01 00 00 b8 e0 97 3d c0 e8 54 90 1c 00 8b
[4354161.283000]  <0>Kernel panic - not syncing: Fatal exception in interrupt
[4354161.283000]

Regards
Keith

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: IRQ sharing: BUG: spinlock lockup on CPU#0
@ 2006-06-11  6:09 Chuck Ebbert
  2006-06-11 10:28 ` Keith Chew
  0 siblings, 1 reply; 9+ messages in thread
From: Chuck Ebbert @ 2006-06-11  6:09 UTC (permalink / raw)
  To: Keith Chew; +Cc: linux-kernel

In-Reply-To: <20f65d530606080351o1be35d15qc528f40c84e6279e@mail.gmail.com>

On Thu, 8 Jun 2006 22:51:17 +1200, Keith Chew wrote:

> We have updated all the 10 PCs kernel to 2.6.16.18. We only have 1 of
> them running in IOAPIC mode (ie local APIC mode support), the rest are
> in XT-PIC. The APIC machine crashed after 24 hours of operation. Below
> is the stack trace. Is this related to the IO APIC, or should we be
> worried about the XT-PIC machines too?

Something has corrupted memory.  thread_info->task points to a task_struct
at ed103560, but that task_struct->thread_info contains 00000000, its
command-line field (comm) contains junk and its pid is -1.  This is
very hard to diagnose.  Are you using 8K stacks?  Stack overflow is one
possible cause, the other likely one is random memory scribbles.

I've been playing with this patch but it's only boot tested on one config.
It should catch this kind of corruption earlier but I can't easily test
that.

Whether IO-APIC caused this bug or not, it's hard to say...


--- 2.6.17-rc6-nb-post.orig/include/asm-i386/current.h
+++ 2.6.17-rc6-nb-post/include/asm-i386/current.h
@@ -3,6 +3,19 @@
 
 #include <linux/thread_info.h>
 
+#define CONFIG_PARANOID
+#ifdef CONFIG_PARANOID
+
+/* must be a macro or things will get ugly */
+#define get_current()						\
+({								\
+	struct task_struct *task = current_thread_info()->task;	\
+	BUG_ON(task->thread_info != current_thread_info());	\
+	task;							\
+})
+
+#else
+
 struct task_struct;
 
 static __always_inline struct task_struct * get_current(void)
@@ -10,6 +23,8 @@ static __always_inline struct task_struc
 	return current_thread_info()->task;
 }
  
+#endif
+
 #define current get_current()
 
 #endif /* !(_I386_CURRENT_H) */
-- 
Chuck

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: IRQ sharing: BUG: spinlock lockup on CPU#0
  2006-06-11  6:09 Chuck Ebbert
@ 2006-06-11 10:28 ` Keith Chew
  0 siblings, 0 replies; 9+ messages in thread
From: Keith Chew @ 2006-06-11 10:28 UTC (permalink / raw)
  To: Chuck Ebbert; +Cc: linux-kernel

Hi Chuck

> Whether IO-APIC caused this bug or not, it's hard to say...
>

We tested it with pci=noacpi, and it has been stable for 36 hours now.
It looks like it has something to do with that.

Regards
Keith

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: IRQ sharing: BUG: spinlock lockup on CPU#0
@ 2006-06-12 10:42 Chuck Ebbert
  0 siblings, 0 replies; 9+ messages in thread
From: Chuck Ebbert @ 2006-06-12 10:42 UTC (permalink / raw)
  To: Keith Chew; +Cc: linux-kernel

In-Reply-To: <20f65d530606110328l5287cdf1ha4579f4120ed8ae9@mail.gmail.com>

On Sun, 11 Jun 2006 22:28:56 +1200, Keith Chew wrote:

> > Whether IO-APIC caused this bug or not, it's hard to say...
> >
> 
> We tested it with pci=noacpi, and it has been stable for 36 hours now.
> It looks like it has something to do with that.

Hmm...  could you post /proc/interrupts with and without pci=noacpi?
Also output of 'dmesg -s 999999 | grep -i irq' from both would help.

-- 
Chuck


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2006-06-12 10:45 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-02  5:33 IRQ sharing: BUG: spinlock lockup on CPU#0 Chuck Ebbert
2006-06-02  6:17 ` Keith Chew
2006-06-08 10:51   ` Keith Chew
  -- strict thread matches above, loose matches on Subject: below --
2006-06-12 10:42 Chuck Ebbert
2006-06-11  6:09 Chuck Ebbert
2006-06-11 10:28 ` Keith Chew
2006-06-02  1:29 Keith Chew
2006-06-02  2:59 ` Andrew Morton
2006-06-02  3:20   ` Keith Chew

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.