Kernel Panics in the network stack

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Kernel Panics in the network stack
@ 2009-12-11 21:09 Kevin Constantine
  2009-12-11 21:39 ` Eric Dumazet
  2009-12-12  0:44 ` Neil Horman
  0 siblings, 2 replies; 16+ messages in thread
From: Kevin Constantine @ 2009-12-11 21:09 UTC (permalink / raw)
  To: netdev

Hey Everyone-

I've been playing with an ARM based linuxstamp 
http://opencircuits.com/Linuxstamp, and I've been seeing kernel panics 
with both 2.6.28.3, and 2.6.30 within an hour or so of turning the 
linuxstamp on.  The stack traces always seem to point at functions 
related to networking.  I've pasted a couple of the crash outputs below. 
  The linuxstamp isn't typically doing anything when the crashes occur, 
in fact it'll crash even if I haven't logged in.

If I ifconfig the interface down, the linuxstamp stays up indefinitely. 
  Any pointers in one direction or another would be much appreciated.

I'm not sure if this is the right audience to help out or if the arm 
lists might be better.  But in any event, any help would be really 
appreciated.


linuxstamp login: Unable to handle kernel paging request at virtual 
address 183cb7b0
pgd = c0004000
[183cb7b0] *pgd=00000000
Internal error: Oops: 0 [#1] PREEMPT
Modules linked in:
CPU: 0    Not tainted  (2.6.30-00002-g0148992 #13)
PC is at 0x183cb7b0
LR is at __udp4_lib_rcv+0x43c/0x72c
pc : [<183cb7b0>]    lr : [<c024ff4c>]    psr: 40000013
sp : c0381e70  ip : c0381e20  fp : c0386ea0
r10: 00000008  r9 : c03bce68  r8 : 00000000
r7 : c03bd254  r6 : c03bcc0c  r5 : c1e5bd80  r4 : c03a2848
r3 : 00000000  r2 : c0380000  r1 : 00000075  r0 : 00000000
Flags: nZcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
Control: c000717f  Table: 21d5c000  DAC: 00000017
Process swapper (pid: 0, stack limit = 0xc0380268)
Stack: (0xc0381e70 to 0xc0382000)
1e60:                                     c1d21800 c1db6830 c1e5bd80 
c03a28d4
1e80: c1d21800 c022f288 c1d21800 00000001 c1db1000 c026fb08 c03bce48 
c1e5bd80
1ea0: c03a28d4 c1d21800 00000000 c0216500 0000012c c03a0688 00000001 
c03a06a4
1ec0: ffffafc1 00000040 00000000 c03a068c c03a0688 c02165e4 ffffffff 
c03a06a4
1ee0: 00000040 c0380000 0000012c c03a0688 c03bce58 ffffafc3 c03a0698 
c0214d94
1f00: c1e5bd80 00000103 0000000c c0380000 00000001 c03aa7d8 00000000 
0000000a
1f20: 00000000 c0040358 c0380000 2001cf88 00000000 00000018 00000000 
00000018
1f40: 00000002 00000001 c0380000 2001cf88 00000000 c0040428 00000018 
c0022060
1f60: 00000000 ffffffff fefff000 c0022a3c 00000000 00000001 00000080 
60000013
1f80: c00243a4 c0380000 c0383ebc c00243a4 c03a5c28 41129200 2001cf88 
00000000
1fa0: fefff800 c0381fb8 c00243e0 c00243ec 60000013 ffffffff c00243a4 
c0024368
1fc0: c03ad2d4 c03a5bf0 c001ed30 c0383d08 2001cfbc c00088d4 c0008434 
00000000
1fe0: 00000000 c001ed30 c0007175 c03a5c58 c001f134 20008034 00000000 
00000000
Code: bad PC value.
Kernel panic - not syncing: Fatal exception in interrupt
[<c002895c>] (unwind_backtrace+0x0/0xdc) from [<c02b342c>] 
(panic+0x3c/0x120)
[<c02b342c>] (panic+0x3c/0x120) from [<c0026e60>] (die+0x154/0x180)
[<c0026e60>] (die+0x154/0x180) from [<c0029848>] 
(__do_kernel_fault+0x68/0x80)
[<c0029848>] (__do_kernel_fault+0x68/0x80) from [<c0029a74>] 
(do_page_fault+0x214/0x234)
[<c0029a74>] (do_page_fault+0x214/0x234) from [<c0022b40>] 
(__pabt_svc+0x40/0x80)
[<c0022b40>] (__pabt_svc+0x40/0x80) from [<c024ff4c>] 
(__udp4_lib_rcv+0x43c/0x72c)
[<c024ff4c>] (__udp4_lib_rcv+0x43c/0x72c) from [<c03a06a4>] (0xc03a06a4)


linuxstamp:~# Unable to handle kernel paging request at virtual address 
ffffff42
pgd = c0004000
[ffffff42] *pgd=20407031, *pte=00000000, *ppte=00000000
Internal error: Oops: 17 [#1] PREEMPT
Modules linked in:
CPU: 0    Not tainted  (2.6.30-00002-g0148992 #13)
PC is at process_backlog+0x8c/0xd8
LR is at netif_receive_skb+0x2ac/0x2e8
pc : [<c02165e4>]    lr : [<c021651c>]    psr: 40000013
sp : c0381ed8  ip : c0381eb0  fp : c0386ea0
r10: c03a0688  r9 : c03a068c  r8 : 00000000
r7 : 00000040  r6 : 000cbc7e  r5 : c03a06a4  r4 : 00000001
r3 : 00000000  r2 : c0380000  r1 : 00000062  r0 : 00000000
Flags: nZcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
Control: c000717f  Table: 21d5c000  DAC: 00000017
Process swapper (pid: 0, stack limit = 0xc0380268)
Stack: (0xc0381ed8 to 0xc0382000)
1ec0:                                                       00000001 
c03a06a4
1ee0: 00000040 c0380000 0000012c c03a0688 c03bce58 000cbc80 c03a0698 
c0214d94
1f00: c1e76500 00000103 0000000c c0380000 00000001 c03aa7d8 00000000 
0000000a
1f20: 00000000 c0040358 c0380000 2001cf88 00000000 00000018 00000000 
00000018
1f40: 00000002 00000001 c0380000 2001cf88 00000000 c0040428 00000018 
c0022060
1f60: 00000000 ffffffff fefff000 c0022a3c 00000000 00000001 00000080 
60000013
1f80: c00243a4 c0380000 c0383ebc c00243a4 c03a5c28 41129200 2001cf88 
00000000
1fa0: fefff800 c0381fb8 c00243e0 c00243ec 60000013 ffffffff c00243a4 
c0024368
1fc0: c03ad2d4 c03a5bf0 c001ed30 c0383d08 2001cfbc c00088d4 c0008434 
00000000
1fe0: 00000000 c001ed30 c0007175 c03a5c58 c001f134 20008034 00000000 
00000000
[<c02165e4>] (process_backlog+0x8c/0xd8) from [<c0214d94>] 
(net_rx_action+0x68/0x170)
[<c0214d94>] (net_rx_action+0x68/0x170) from [<c0040358>] 
(__do_softirq+0x74/0x104)
[<c0040358>] (__do_softirq+0x74/0x104) from [<c0040428>] 
(irq_exit+0x40/0x58)
[<c0040428>] (irq_exit+0x40/0x58) from [<c0022060>] (_text+0x60/0x78)
[<c0022060>] (_text+0x60/0x78) from [<c0022a3c>] (__irq_svc+0x3c/0x80)
Exception stack(0xc0381f70 to 0xc0381fb8)
1f60:                                     00000000 00000001 00000080 
60000013
1f80: c00243a4 c0380000 c0383ebc c00243a4 c03a5c28 41129200 2001cf88 
00000000
1fa0: fefff800 c0381fb8 c00243e0 c00243ec 60000013 ffffffff
[<c0022a3c>] (__irq_svc+0x3c/0x80) from [<c00243e0>] 
(default_idle+0x3c/0x54)
[<c00243e0>] (default_idle+0x3c/0x54) from [<c0024368>] (cpu_idle+0x48/0x84)
[<c0024368>] (cpu_idle+0x48/0x84) from [<c00088d4>] 
(start_kernel+0x208/0x254)
[<c00088d4>] (start_kernel+0x208/0x254) from [<20008034>] (0x20008034)
Code: e3c33080 e121f003 e2844001 ebffff22 (e1540007)
Kernel panic - not syncing: Fatal exception in interrupt
[<c002895c>] (unwind_backtrace+0x0/0xdc) from [<c02b342c>] 
(panic+0x3c/0x120)
[<c02b342c>] (panic+0x3c/0x120) from [<c0026e60>] (die+0x154/0x180)
[<c0026e60>] (die+0x154/0x180) from [<c0029848>] 
(__do_kernel_fault+0x68/0x80)
[<c0029848>] (__do_kernel_fault+0x68/0x80) from [<c0029a74>] 
(do_page_fault+0x214/0x234)
[<c0029a74>] (do_page_fault+0x214/0x234) from [<c0022244>] 
(do_DataAbort+0x30/0x90)
[<c0022244>] (do_DataAbort+0x30/0x90) from [<c00229e0>] 
(__dabt_svc+0x40/0x60)
Exception stack(0xc0381e90 to 0xc0381ed8)
1e80:                                     00000000 00000062 c0380000 
00000000
1ea0: 00000001 c03a06a4 000cbc7e 00000040 00000000 c03a068c c03a0688 
c0386ea0
1ec0: c0381eb0 c0381ed8 c021651c c02165e4 40000013 ffffffff
[<c00229e0>] (__dabt_svc+0x40/0x60) from [<c021651c>] 
(netif_receive_skb+0x2ac/0x2e8)
[<c021651c>] (netif_receive_skb+0x2ac/0x2e8) from [<c0214d94>] 
(net_rx_action+0x68/0x170)
[<c0214d94>] (net_rx_action+0x68/0x170) from [<c0040358>] 
(__do_softirq+0x74/0x104)
[<c0040358>] (__do_softirq+0x74/0x104) from [<c0040428>] 
(irq_exit+0x40/0x58)
[<c0040428>] (irq_exit+0x40/0x58) from [<c0022060>] (_text+0x60/0x78)
[<c0022060>] (_text+0x60/0x78) from [<c0022a3c>] (__irq_svc+0x3c/0x80)
Exception stack(0xc0381f70 to 0xc0381fb8)
1f60:                                     00000000 00000001 00000080 
60000013
1f80: c00243a4 c0380000 c0383ebc c00243a4 c03a5c28 41129200 2001cf88 
00000000
1fa0: fefff800 c0381fb8 c00243e0 c00243ec 60000013 ffffffff
[<c0022a3c>] (__irq_svc+0x3c/0x80) from [<c00243e0>] 
(default_idle+0x3c/0x54)
[<c00243e0>] (default_idle+0x3c/0x54) from [<c0024368>] (cpu_idle+0x48/0x84)
[<c0024368>] (cpu_idle+0x48/0x84) from [<c00088d4>] 
(start_kernel+0x208/0x254)
[<c00088d4>] (start_kernel+0x208/0x254) from [<20008034>] (0x20008034)

Thanks a lot
-kevin

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Kernel Panics in the network stack
  2009-12-11 21:09 Kernel Panics in the network stack Kevin Constantine
@ 2009-12-11 21:39 ` Eric Dumazet
  2009-12-11 21:50   ` Kevin Constantine
  2009-12-12  0:44 ` Neil Horman
  1 sibling, 1 reply; 16+ messages in thread
From: Eric Dumazet @ 2009-12-11 21:39 UTC (permalink / raw)
  To: Kevin Constantine; +Cc: netdev

Le 11/12/2009 22:09, Kevin Constantine a écrit :
> Hey Everyone-
> 
> I've been playing with an ARM based linuxstamp
> http://opencircuits.com/Linuxstamp, and I've been seeing kernel panics
> with both 2.6.28.3, and 2.6.30 within an hour or so of turning the
> linuxstamp on.  The stack traces always seem to point at functions
> related to networking.  I've pasted a couple of the crash outputs below.
>  The linuxstamp isn't typically doing anything when the crashes occur,
> in fact it'll crash even if I haven't logged in.
> 
> If I ifconfig the interface down, the linuxstamp stays up indefinitely.
>  Any pointers in one direction or another would be much appreciated.
> 
> I'm not sure if this is the right audience to help out or if the arm
> lists might be better.  But in any event, any help would be really
> appreciated.
> 
> 
> linuxstamp login: Unable to handle kernel paging request at virtual
> address 183cb7b0
> pgd = c0004000
> [183cb7b0] *pgd=00000000
> Internal error: Oops: 0 [#1] PREEMPT
> Modules linked in:
> CPU: 0    Not tainted  (2.6.30-00002-g0148992 #13)
> PC is at 0x183cb7b0
> LR is at __udp4_lib_rcv+0x43c/0x72c

Could you disassemble your vmlinux file, __udp4_lib_rcv function around LR
<c024ff4c>, to see which function was called ? This function then called 
a wrong pointer (0x183cb7b0 not a kernel pointer)

Maybe a kernel stack corruption, or bad ram, ...

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Kernel Panics in the network stack
  2009-12-11 21:39 ` Eric Dumazet
@ 2009-12-11 21:50   ` Kevin Constantine
  2009-12-11 21:58     ` Eric Dumazet
  0 siblings, 1 reply; 16+ messages in thread
From: Kevin Constantine @ 2009-12-11 21:50 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

On 12/11/2009 01:39 PM, Eric Dumazet wrote:
> Le 11/12/2009 22:09, Kevin Constantine a écrit :
>> Hey Everyone-
>>
>> I've been playing with an ARM based linuxstamp
>> http://opencircuits.com/Linuxstamp, and I've been seeing kernel panics
>> with both 2.6.28.3, and 2.6.30 within an hour or so of turning the
>> linuxstamp on.  The stack traces always seem to point at functions
>> related to networking.  I've pasted a couple of the crash outputs below.
>>   The linuxstamp isn't typically doing anything when the crashes occur,
>> in fact it'll crash even if I haven't logged in.
>>
>> If I ifconfig the interface down, the linuxstamp stays up indefinitely.
>>   Any pointers in one direction or another would be much appreciated.
>>
>> I'm not sure if this is the right audience to help out or if the arm
>> lists might be better.  But in any event, any help would be really
>> appreciated.
>>
>>
>> linuxstamp login: Unable to handle kernel paging request at virtual
>> address 183cb7b0
>> pgd = c0004000
>> [183cb7b0] *pgd=00000000
>> Internal error: Oops: 0 [#1] PREEMPT
>> Modules linked in:
>> CPU: 0    Not tainted  (2.6.30-00002-g0148992 #13)
>> PC is at 0x183cb7b0
>> LR is at __udp4_lib_rcv+0x43c/0x72c
>
> Could you disassemble your vmlinux file, __udp4_lib_rcv function around LR
> <c024ff4c>, to see which function was called ? This function then called
> a wrong pointer (0x183cb7b0 not a kernel pointer)
>
> Maybe a kernel stack corruption, or bad ram, ...

The vmlinux file I'm using has probably changed a number of times since 
then.  I'll get a fresh stack trace and disassemble that one.

I've has this type of problem with several linuxstamps.  I'm only able 
to trigger this panic when the linuxstamp is plugged into a cisco 
catalyst gigabit switch.  Plugging it in at home into a consumer grade 
10/100 switch, the linuxstamp stays up indefinitely.

Also worth noting, I'm seeing the error counts in ifconfig increase 
steadily.

eth0      Link encap:Ethernet  HWaddr ac:de:48:00:00:01
           inet addr:172.30.194.255  Bcast:172.30.207.255 
Mask:255.255.240.0
           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
           RX packets:42492 errors:1442 dropped:0 overruns:6 frame:784
           TX packets:1169 errors:0 dropped:0 overruns:0 carrier:0
           collisions:0 txqueuelen:1000
           RX bytes:3804651 (3.6 MiB)  TX bytes:106728 (104.2 KiB)
           Interrupt:24 Base address:0xc000


-kevin

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Kernel Panics in the network stack
  2009-12-11 21:50   ` Kevin Constantine
@ 2009-12-11 21:58     ` Eric Dumazet
  2009-12-11 22:16       ` Kevin Constantine
  0 siblings, 1 reply; 16+ messages in thread
From: Eric Dumazet @ 2009-12-11 21:58 UTC (permalink / raw)
  To: Kevin Constantine; +Cc: netdev

Le 11/12/2009 22:50, Kevin Constantine a écrit :
> On 12/11/2009 01:39 PM, Eric Dumazet wrote:
>> Le 11/12/2009 22:09, Kevin Constantine a écrit :
>>> Hey Everyone-
>>>
>>> I've been playing with an ARM based linuxstamp
>>> http://opencircuits.com/Linuxstamp, and I've been seeing kernel panics
>>> with both 2.6.28.3, and 2.6.30 within an hour or so of turning the
>>> linuxstamp on.  The stack traces always seem to point at functions
>>> related to networking.  I've pasted a couple of the crash outputs below.
>>>   The linuxstamp isn't typically doing anything when the crashes occur,
>>> in fact it'll crash even if I haven't logged in.
>>>
>>> If I ifconfig the interface down, the linuxstamp stays up indefinitely.
>>>   Any pointers in one direction or another would be much appreciated.
>>>
>>> I'm not sure if this is the right audience to help out or if the arm
>>> lists might be better.  But in any event, any help would be really
>>> appreciated.
>>>
>>>
>>> linuxstamp login: Unable to handle kernel paging request at virtual
>>> address 183cb7b0
>>> pgd = c0004000
>>> [183cb7b0] *pgd=00000000
>>> Internal error: Oops: 0 [#1] PREEMPT
>>> Modules linked in:
>>> CPU: 0    Not tainted  (2.6.30-00002-g0148992 #13)
>>> PC is at 0x183cb7b0
>>> LR is at __udp4_lib_rcv+0x43c/0x72c
>>
>> Could you disassemble your vmlinux file, __udp4_lib_rcv function
>> around LR
>> <c024ff4c>, to see which function was called ? This function then called
>> a wrong pointer (0x183cb7b0 not a kernel pointer)
>>
>> Maybe a kernel stack corruption, or bad ram, ...
> 
> The vmlinux file I'm using has probably changed a number of times since
> then.  I'll get a fresh stack trace and disassemble that one.
> 
> I've has this type of problem with several linuxstamps.  I'm only able
> to trigger this panic when the linuxstamp is plugged into a cisco
> catalyst gigabit switch.  Plugging it in at home into a consumer grade
> 10/100 switch, the linuxstamp stays up indefinitely.
> 
> Also worth noting, I'm seeing the error counts in ifconfig increase
> steadily.

Could be an error in NIC driver error path, this is a good point.

> 
> eth0      Link encap:Ethernet  HWaddr ac:de:48:00:00:01
>           inet addr:172.30.194.255  Bcast:172.30.207.255 Mask:255.255.240.0
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:42492 errors:1442 dropped:0 overruns:6 frame:784
>           TX packets:1169 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000
>           RX bytes:3804651 (3.6 MiB)  TX bytes:106728 (104.2 KiB)
>           Interrupt:24 Base address:0xc000
> 
> 

Please give us more information, since this platform is not well known :)

lsmod
cat /proc/meminfo
cat /proc/cpuinfo
cat /proc/slabinfo  (after more than 2000 error count in ifconfig eth0)
...

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Kernel Panics in the network stack
  2009-12-11 21:58     ` Eric Dumazet
@ 2009-12-11 22:16       ` Kevin Constantine
  2009-12-11 23:55         ` Kevin Constantine
  0 siblings, 1 reply; 16+ messages in thread
From: Kevin Constantine @ 2009-12-11 22:16 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

On 12/11/2009 01:58 PM, Eric Dumazet wrote:
> Le 11/12/2009 22:50, Kevin Constantine a écrit :
>> On 12/11/2009 01:39 PM, Eric Dumazet wrote:
>>> Le 11/12/2009 22:09, Kevin Constantine a écrit :
>>>> Hey Everyone-
>>>>
>>>> I've been playing with an ARM based linuxstamp
>>>> http://opencircuits.com/Linuxstamp, and I've been seeing kernel panics
>>>> with both 2.6.28.3, and 2.6.30 within an hour or so of turning the
>>>> linuxstamp on.  The stack traces always seem to point at functions
>>>> related to networking.  I've pasted a couple of the crash outputs below.
>>>>    The linuxstamp isn't typically doing anything when the crashes occur,
>>>> in fact it'll crash even if I haven't logged in.
>>>>
>>>> If I ifconfig the interface down, the linuxstamp stays up indefinitely.
>>>>    Any pointers in one direction or another would be much appreciated.
>>>>
>>>> I'm not sure if this is the right audience to help out or if the arm
>>>> lists might be better.  But in any event, any help would be really
>>>> appreciated.
>>>>
>>>>
>>>> linuxstamp login: Unable to handle kernel paging request at virtual
>>>> address 183cb7b0
>>>> pgd = c0004000
>>>> [183cb7b0] *pgd=00000000
>>>> Internal error: Oops: 0 [#1] PREEMPT
>>>> Modules linked in:
>>>> CPU: 0    Not tainted  (2.6.30-00002-g0148992 #13)
>>>> PC is at 0x183cb7b0
>>>> LR is at __udp4_lib_rcv+0x43c/0x72c
>>>
>>> Could you disassemble your vmlinux file, __udp4_lib_rcv function
>>> around LR
>>> <c024ff4c>, to see which function was called ? This function then called
>>> a wrong pointer (0x183cb7b0 not a kernel pointer)
>>>
>>> Maybe a kernel stack corruption, or bad ram, ...
>>
>> The vmlinux file I'm using has probably changed a number of times since
>> then.  I'll get a fresh stack trace and disassemble that one.
>>
>> I've has this type of problem with several linuxstamps.  I'm only able
>> to trigger this panic when the linuxstamp is plugged into a cisco
>> catalyst gigabit switch.  Plugging it in at home into a consumer grade
>> 10/100 switch, the linuxstamp stays up indefinitely.
>>
>> Also worth noting, I'm seeing the error counts in ifconfig increase
>> steadily.
>
> Could be an error in NIC driver error path, this is a good point.
>
>>
>> eth0      Link encap:Ethernet  HWaddr ac:de:48:00:00:01
>>            inet addr:172.30.194.255  Bcast:172.30.207.255 Mask:255.255.240.0
>>            UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>>            RX packets:42492 errors:1442 dropped:0 overruns:6 frame:784
>>            TX packets:1169 errors:0 dropped:0 overruns:0 carrier:0
>>            collisions:0 txqueuelen:1000
>>            RX bytes:3804651 (3.6 MiB)  TX bytes:106728 (104.2 KiB)
>>            Interrupt:24 Base address:0xc000
>>
>>
>
> Please give us more information, since this platform is not well known :)
>
> lsmod
> cat /proc/meminfo

debian:~# cat /proc/meminfo
MemTotal:          28732 kB
MemFree:           10260 kB
Buffers:            2304 kB
Cached:             9220 kB
SwapCached:            0 kB
Active:             8964 kB
Inactive:           4796 kB
Active(anon):       2292 kB
Inactive(anon):        0 kB
Active(file):       6672 kB
Inactive(file):     4796 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:                16 kB
Writeback:             0 kB
AnonPages:          2256 kB
Mapped:             3308 kB
Slab:               3876 kB
SReclaimable:       1508 kB
SUnreclaim:         2368 kB
PageTables:          156 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:       14364 kB
Committed_AS:      32372 kB
VmallocTotal:     989184 kB
VmallocUsed:        1136 kB
VmallocChunk:     986108 kB


> cat /proc/cpuinfo

debian:~# cat /proc/cpuinfo
Processor       : ARM920T rev 0 (v4l)
BogoMIPS        : 89.53
Features        : swp half thumb
CPU implementer : 0x41
CPU architecture: 4T
CPU variant     : 0x1
CPU part        : 0x920
CPU revision    : 0

Hardware        : emQbit's ECB_AT91
Revision        : 0000
Serial          : 0000000000000000


> cat /proc/slabinfo  (after more than 2000 error count in ifconfig eth0)
debian:~# cat /proc/slabinfo 

slabinfo - version: 2.1 

# name            <active_objs> <num_objs> <objsize> <objperslab> 
<pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata 
<active_slabs> <num_slabs> <sharedavail>
rpc_buffers            8      8   2048    2    1 : tunables   24   12 
  0 : slabdata      4      4      0 

rpc_tasks              8     24    160   24    1 : tunables  120   60 
  0 : slabdata      1      1      0 

rpc_inode_cache        0      0    416    9    1 : tunables   54   27 
  0 : slabdata      0      0      0 

flow_cache             0      0     80   48    1 : tunables  120   60 
  0 : slabdata      0      0      0 

nfs_direct_cache       0      0     68   56    1 : tunables  120   60 
  0 : slabdata      0      0      0 

nfs_write_data        36     36    416    9    1 : tunables   54   27 
  0 : slabdata      4      4      0 

nfs_read_data         32     36    416    9    1 : tunables   54   27 
  0 : slabdata      4      4      0 

nfs_inode_cache        0      0    584    7    1 : tunables   54   27 
  0 : slabdata      0      0      0 

nfs_page               0      0     64   59    1 : tunables  120   60 
  0 : slabdata      0      0      0 

journal_handle         0      0     20  169    1 : tunables  120   60 
  0 : slabdata      0      0      0 

journal_head          14    126     60   63    1 : tunables  120   60 
  0 : slabdata      2      2      0 

revoke_table           2    254     12  254    1 : tunables  120   60 
  0 : slabdata      1      1      0 

revoke_record          0      0     16  203    1 : tunables  120   60 
  0 : slabdata      0      0      0 

ext2_inode_cache       0      0    416    9    1 : tunables   54   27 
  0 : slabdata      0      0      0 

ext3_inode_cache     443    450    432    9    1 : tunables   54   27 
  0 : slabdata     50     50      0 

ext3_xattr             0      0     44   84    1 : tunables  120   60 
  0 : slabdata      0      0      0 

reiser_inode_cache      0      0    368   10    1 : tunables   54   27 
   0 : slabdata      0      0      0 

configfs_dir_cache      0      0     52   72    1 : tunables  120   60 
   0 : slabdata      0      0      0 

kioctx                 0      0    160   24    1 : tunables  120   60 
  0 : slabdata      0      0      0 

kiocb                  0      0    160   24    1 : tunables  120   60 
  0 : slabdata      0      0      0 

inotify_event_cache      0      0     28  127    1 : tunables  120   60 
    0 : slabdata      0      0      0 

inotify_watch_cache      2     92     40   92    1 : tunables  120   60 
    0 : slabdata      1      1      0 

dnotify_cache          0      0     20  169    1 : tunables  120   60 
  0 : slabdata      0      0      0 

fasync_cache           0      0     16  203    1 : tunables  120   60 
  0 : slabdata      0      0      0 

shmem_inode_cache   2497   2500    392   10    1 : tunables   54   27 
  0 : slabdata    250    250      0 

nsproxy                0      0     24  145    1 : tunables  120   60 
  0 : slabdata      0      0      0 

posix_timers_cache      0      0    112   35    1 : tunables  120   60 
   0 : slabdata      0      0      0 

uid_cache              0      0     64   59    1 : tunables  120   60 
  0 : slabdata      0      0      0 

UNIX                   5     10    384   10    1 : tunables   54   27 
  0 : slabdata      1      1      0 

UDP-Lite               0      0    480    8    1 : tunables   54   27 
  0 : slabdata      0      0      0 

tcp_bind_bucket        1    113     32  113    1 : tunables  120   60 
  0 : slabdata      1      1      0 

inet_peer_cache        0      0     64   59    1 : tunables  120   60 
  0 : slabdata      0      0      0 

secpath_cache          0      0     32  113    1 : tunables  120   60 
  0 : slabdata      0      0      0 

xfrm_dst_cache         0      0    288   13    1 : tunables   54   27 
  0 : slabdata      0      0      0 

ip_fib_alias           0      0     16  203    1 : tunables  120   60 
  0 : slabdata      0      0      0 

ip_fib_hash            9    101     36  101    1 : tunables  120   60 
  0 : slabdata      1      1      0 

ip_dst_cache         388    435    256   15    1 : tunables  120   60 
  0 : slabdata     29     29      0 

arp_cache              1     30    128   30    1 : tunables  120   60 
  0 : slabdata      1      1      0 

RAW                    2      9    448    9    1 : tunables   54   27 
  0 : slabdata      1      1      0 

UDP                    1      8    480    8    1 : tunables   54   27 
  0 : slabdata      1      1      0 

tw_sock_TCP            0      0     96   40    1 : tunables  120   60 
  0 : slabdata      0      0      0 

request_sock_TCP       0      0     96   40    1 : tunables  120   60 
  0 : slabdata      0      0      0 

TCP                    2      3   1056    3    1 : tunables   24   12 
  0 : slabdata      1      1      0 

eventpoll_pwq          0      0     36  101    1 : tunables  120   60 
  0 : slabdata      0      0      0 

eventpoll_epi          0      0     96   40    1 : tunables  120   60 
  0 : slabdata      0      0      0 

sgpool-128             2      2   2048    2    1 : tunables   24   12 
  0 : slabdata      1      1      0 

sgpool-64              2      4   1024    4    1 : tunables   54   27 
  0 : slabdata      1      1      0 

sgpool-32              2      8    512    8    1 : tunables   54   27 
  0 : slabdata      1      1      0 

sgpool-16              2     15    256   15    1 : tunables  120   60 
  0 : slabdata      1      1      0 

sgpool-8               2     30    128   30    1 : tunables  120   60 
  0 : slabdata      1      1      0
scsi_data_buffer       0      0     20  169    1 : tunables  120   60 
  0 : slabdata      0      0      0
blkdev_queue          10     12   1216    3    1 : tunables   24   12 
  0 : slabdata      4      4      0
blkdev_requests        8     18    216   18    1 : tunables  120   60 
  0 : slabdata      1      1      0
blkdev_ioc            10     84     44   84    1 : tunables  120   60 
  0 : slabdata      1      1      0
bio-0                  2     30    128   30    1 : tunables  120   60 
  0 : slabdata      1      1      0
biovec-256             2      2   3072    1    1 : tunables   24   12 
  0 : slabdata      2      2      0
biovec-128             0      0   1536    2    1 : tunables   24   12 
  0 : slabdata      0      0      0
biovec-64              0      0    768    5    1 : tunables   54   27 
  0 : slabdata      0      0      0
biovec-16              0      0    192   20    1 : tunables  120   60 
  0 : slabdata      0      0      0
sock_inode_cache      18     22    352   11    1 : tunables   54   27 
  0 : slabdata      2      2      0
skbuff_fclone_cache     11     11    352   11    1 : tunables   54   27 
    0 : slabdata      1      1      0
skbuff_head_cache    100    180    192   20    1 : tunables  120   60 
  0 : slabdata      9      9      0
file_lock_cache        1     40     96   40    1 : tunables  120   60 
  0 : slabdata      1      1      0
proc_inode_cache     132    132    320   12    1 : tunables   54   27 
  0 : slabdata     11     11      0
sigqueue               1     27    144   27    1 : tunables  120   60 
  0 : slabdata      1      1      0
radix_tree_node      289    299    288   13    1 : tunables   54   27 
  0 : slabdata     23     23      0
bdev_cache             3      9    416    9    1 : tunables   54   27 
  0 : slabdata      1      1      0
sysfs_dir_cache     3902   3948     44   84    1 : tunables  120   60 
  0 : slabdata     47     47      0
mnt_cache             20     30    128   30    1 : tunables  120   60 
  0 : slabdata      1      1      0
filp                 210    210    128   30    1 : tunables  120   60 
  0 : slabdata      7      7      0
inode_cache         1560   1560    296   13    1 : tunables   54   27 
  0 : slabdata    120    120      0
dentry              4829   4830    128   30    1 : tunables  120   60 
  0 : slabdata    161    161      0
names_cache            1      1   4096    1    1 : tunables   24   12 
  0 : slabdata      1      1      0
buffer_head          614    648     52   72    1 : tunables  120   60 
  0 : slabdata      9      9      0
vm_area_struct       631    644     84   46    1 : tunables  120   60 
  0 : slabdata     14     14      0
mm_struct             20     20    384   10    1 : tunables   54   27 
  0 : slabdata      2      2      0
fs_cache              10    113     32  113    1 : tunables  120   60 
  0 : slabdata      1      1      0
files_cache           11     40    192   20    1 : tunables  120   60 
  0 : slabdata      2      2      0
signal_cache          45     45    448    9    1 : tunables   54   27 
  0 : slabdata      5      5      0
sighand_cache         36     36   1312    3    1 : tunables   24   12 
  0 : slabdata     12     12      0
task_struct           40     40    768    5    1 : tunables   54   27 
  0 : slabdata      8      8      0
cred_jar              56    120     96   40    1 : tunables  120   60 
  0 : slabdata      3      3      0
anon_vma             265    339      8  339    1 : tunables  120   60 
  0 : slabdata      1      1      0
pid                   35     59     64   59    1 : tunables  120   60 
  0 : slabdata      1      1      0
idr_layer_cache      127    130    148   26    1 : tunables  120   60 
  0 : slabdata      5      5      0
size-4194304           0      0 4194304    1 1024 : tunables    1    1 
   0 : slabdata      0      0      0
size-2097152           0      0 2097152    1  512 : tunables    1    1 
   0 : slabdata      0      0      0
size-1048576           0      0 1048576    1  256 : tunables    1    1 
   0 : slabdata      0      0      0
size-524288            0      0 524288    1  128 : tunables    1    1 
  0 : slabdata      0      0      0
size-262144            0      0 262144    1   64 : tunables    1    1 
  0 : slabdata      0      0      0
size-131072            0      0 131072    1   32 : tunables    8    4 
  0 : slabdata      0      0      0
size-65536             0      0  65536    1   16 : tunables    8    4 
  0 : slabdata      0      0      0
size-32768             0      0  32768    1    8 : tunables    8    4 
  0 : slabdata      0      0      0
size-16384             0      0  16384    1    4 : tunables    8    4 
  0 : slabdata      0      0      0
size-8192              0      0   8192    1    2 : tunables    8    4 
  0 : slabdata      0      0      0
size-4096              4      4   4096    1    1 : tunables   24   12 
  0 : slabdata      4      4      0
size-2048             14     14   2048    2    1 : tunables   24   12 
  0 : slabdata      7      7      0
size-1024             40     40   1024    4    1 : tunables   54   27 
  0 : slabdata     10     10      0
size-512             173    200    512    8    1 : tunables   54   27 
  0 : slabdata     25     25      0
size-256              75     75    256   15    1 : tunables  120   60 
  0 : slabdata      5      5      0
size-192             669    680    192   20    1 : tunables  120   60 
  0 : slabdata     34     34      0
size-128             240    240    128   30    1 : tunables  120   60 
  0 : slabdata      8      8      0
size-96              919    920     96   40    1 : tunables  120   60 
  0 : slabdata     23     23      0
size-64              590    590     64   59    1 : tunables  120   60 
  0 : slabdata     10     10      0
size-32             3374   3390     32  113    1 : tunables  120   60 
  0 : slabdata     30     30      0
kmem_cache           105    120     96   40    1 : tunables  120   60 
  0 : slabdata      3      3      0
> ...
debian:~# dmesg 
 

Linux version 2.6.30-00002-g0148992 (kconstan@debian) (gcc version 4.3.2 
(Debian 4.3.2-1.1) ) #16 PREEMPT Fri Dec 11 13:48:45 PST 2009 

CPU: ARM920T [41129200] revision 0 (ARMv4T), cr=c0007177 
 

CPU: VIVT data cache, VIVT instruction cache 
 

Machine: emQbit's ECB_AT91 
 

Memory policy: ECC disabled, Data cache writeback 
 

On node 0 totalpages: 8192 
 

free_area_init_node: node 0, pgdat c03a3340, node_mem_map c03c3000 
 

   Normal zone: 64 pages used for memmap 
 

   Normal zone: 0 pages reserved 
 

   Normal zone: 8128 pages, LIFO batch:0 
 

Clocks: CPU 179 MHz, master 59 MHz, main 18.432 MHz 
 

Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 
8128 

Kernel command line: mem=32M root=/dev/mmcblk0p1 ip=192.168.0.51 
console=ttyS0,115200n8 rootdelay=4 

NR_IRQS:192 
 

AT91: 96 gpio irqs in 3 banks
...
eth0: Link now 100-FullDuplex
eth0: AT91 ethernet at 0xfefbc000 int=24 100-FullDuplex (00:00:00:00:00:5b)
eth0: Micrel KS8721 PHY
ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver

-kevin

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Kernel Panics in the network stack
  2009-12-11 22:16       ` Kevin Constantine
@ 2009-12-11 23:55         ` Kevin Constantine
  2009-12-12  1:06           ` Kevin Constantine
  0 siblings, 1 reply; 16+ messages in thread
From: Kevin Constantine @ 2009-12-11 23:55 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

Kevin Constantine wrote:
> On 12/11/2009 01:58 PM, Eric Dumazet wrote:
>> Le 11/12/2009 22:50, Kevin Constantine a écrit :
>>> On 12/11/2009 01:39 PM, Eric Dumazet wrote:
>>>> Le 11/12/2009 22:09, Kevin Constantine a écrit :
>>>>> Hey Everyone-
>>>>>
>>>>> I've been playing with an ARM based linuxstamp
>>>>> http://opencircuits.com/Linuxstamp, and I've been seeing kernel panics
>>>>> with both 2.6.28.3, and 2.6.30 within an hour or so of turning the
>>>>> linuxstamp on.  The stack traces always seem to point at functions
>>>>> related to networking.  I've pasted a couple of the crash outputs 
>>>>> below.
>>>>>    The linuxstamp isn't typically doing anything when the crashes 
>>>>> occur,
>>>>> in fact it'll crash even if I haven't logged in.
>>>>>
>>>>> If I ifconfig the interface down, the linuxstamp stays up 
>>>>> indefinitely.
>>>>>    Any pointers in one direction or another would be much appreciated.
>>>>>
>>>>> I'm not sure if this is the right audience to help out or if the arm
>>>>> lists might be better.  But in any event, any help would be really
>>>>> appreciated.
>>>>>
>>>>>
>>>>> linuxstamp login: Unable to handle kernel paging request at virtual
>>>>> address 183cb7b0
>>>>> pgd = c0004000
>>>>> [183cb7b0] *pgd=00000000
>>>>> Internal error: Oops: 0 [#1] PREEMPT
>>>>> Modules linked in:
>>>>> CPU: 0    Not tainted  (2.6.30-00002-g0148992 #13)
>>>>> PC is at 0x183cb7b0
>>>>> LR is at __udp4_lib_rcv+0x43c/0x72c
>>>>
>>>> Could you disassemble your vmlinux file, __udp4_lib_rcv function
>>>> around LR
>>>> <c024ff4c>, to see which function was called ? This function then 
>>>> called
>>>> a wrong pointer (0x183cb7b0 not a kernel pointer)
>>>>
>>>> Maybe a kernel stack corruption, or bad ram, ...
>>>
>>> The vmlinux file I'm using has probably changed a number of times since
>>> then.  I'll get a fresh stack trace and disassemble that one.

Here's a new panic.  What would you like from the disassembled binary?

debian:~# Internal error: Oops - undefined instruction: 0 [#1] PREEMPT
Modules linked in: spidev atmel_spi
CPU: 0    Not tainted  (2.6.30-00002-g0148992 #16)
PC is at netif_receive_skb+0x284/0x2e8
LR is at kmem_cache_free+0x20/0x64
pc : [<c0214ec4>]    lr : [<c0089608>]    psr: a0000013
sp : c037feb0  ip : c037fe70  fp : c0384e60
r10: 00000008  r9 : c03bad00  r8 : 00000000
r7 : c1d2a800  r6 : c03a077c  r5 : c1e14980  r4 : c03bace0
r3 : c1d2a800  r2 : 00000062  r1 : c1d2a800  r0 : c1e14980
Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
Control: c000717f  Table: 21d58000  DAC: 00000017
Process swapper (pid: 0, stack limit = 0xc037e268)
Stack: (0xc037feb0 to 0xc0380000)
fea0:                                     00000000 c037fec0 00000001 
c039e54c
fec0: 00083674 00000040 00000000 c039e534 c039e530 c0214fb4 fefff000 
c039e54c
fee0: 00000040 c037e000 0000012c c039e530 c03bacf0 00083676 c039e540 
c0213764
ff00: 00000001 00000103 0000000c c037e000 00000001 c03a8678 00000000 
0000000a
ff20: 00000000 c0040358 c037e000 2001ccb8 00000000 00000018 00000000 
00000018
ff40: 00000002 00000001 c037e000 2001ccb8 00000000 c0040428 00000018 
c0022060
ff60: 00000000 ffffffff fefff000 c0022a3c 00000000 00000001 00000080 
60000013
ff80: c00243a4 c037e000 c0381e7c c00243a4 c03a3ac8 41129200 2001ccb8 
00000000
ffa0: fefff800 c037ffb8 c00243e0 c00243ec 60000013 ffffffff c00243a4 
c0024368
ffc0: c03ab174 c03a3a90 c001ed30 c0381cc8 2001ccec c00088d4 c0008434 
00000000
ffe0: 00000000 c001ed30 c0007175 c03a3af8 c001f134 20008034 00000000 
00000000
[<c0214ec4>] (netif_receive_skb+0x284/0x2e8) from [<c0214fb4>] 
(process_backlog+0x8c/0xd8)
[<c0214fb4>] (process_backlog+0x8c/0xd8) from [<c0213764>] 
(net_rx_action+0x68/0x170)
[<c0213764>] (net_rx_action+0x68/0x170) from [<c0040358>] 
(__do_softirq+0x74/0x104)
[<c0040358>] (__do_softirq+0x74/0x104) from [<c0040428>] 
(irq_exit+0x40/0x58)
[<c0040428>] (irq_exit+0x40/0x58) from [<c0022060>] (_text+0x60/0x78)
[<c0022060>] (_text+0x60/0x78) from [<c0022a3c>] (__irq_svc+0x3c/0x80)
Exception stack(0xc037ff70 to 0xc037ffb8)
ff60:                                     00000000 00000001 00000080 
60000013
ff80: c00243a4 c037e000 c0381e7c c00243a4 c03a3ac8 41129200 2001ccb8 
00000000
ffa0: fefff800 c037ffb8 c00243e0 c00243ec 60000013 ffffffff 

[<c0022a3c>] (__irq_svc+0x3c/0x80) from [<c00243e0>] 
(default_idle+0x3c/0x54)
[<c00243e0>] (default_idle+0x3c/0x54) from [<c0024368>] 
(cpu_idle+0x48/0x84)
[<c0024368>] (cpu_idle+0x48/0x84) from [<c00088d4>] 
(start_kernel+0x208/0x254)
[<c00088d4>] (start_kernel+0x208/0x254) from [<20008034>] (0x20008034)
Code: 0a000007 e1a00005 e1a03007 e5951018 (e1a02006)
Kernel panic - not syncing: Fatal exception in interrupt
[<c002895c>] (unwind_backtrace+0x0/0xdc) from [<c02b1dfc>] 
(panic+0x3c/0x120)
[<c02b1dfc>] (panic+0x3c/0x120) from [<c0026e60>] (die+0x154/0x180)
[<c0026e60>] (die+0x154/0x180) from [<c0026f30>] (baddataabort+0x0/0xac)
[<c0026f30>] (baddataabort+0x0/0xac) from [<c037fe9c>] (0xc037fe9c)


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Kernel Panics in the network stack
  2009-12-11 21:09 Kernel Panics in the network stack Kevin Constantine
  2009-12-11 21:39 ` Eric Dumazet
@ 2009-12-12  0:44 ` Neil Horman
  1 sibling, 0 replies; 16+ messages in thread
From: Neil Horman @ 2009-12-12  0:44 UTC (permalink / raw)
  To: Kevin Constantine; +Cc: netdev

On Fri, Dec 11, 2009 at 01:09:06PM -0800, Kevin Constantine wrote:
> Hey Everyone-
>
> I've been playing with an ARM based linuxstamp  
> http://opencircuits.com/Linuxstamp, and I've been seeing kernel panics  
> with both 2.6.28.3, and 2.6.30 within an hour or so of turning the  
> linuxstamp on.  The stack traces always seem to point at functions  
> related to networking.  I've pasted a couple of the crash outputs below.  
>  The linuxstamp isn't typically doing anything when the crashes occur,  
> in fact it'll crash even if I haven't logged in.
>
> If I ifconfig the interface down, the linuxstamp stays up indefinitely.  
> Any pointers in one direction or another would be much appreciated.
>
> I'm not sure if this is the right audience to help out or if the arm  
> lists might be better.  But in any event, any help would be really  
> appreciated.
>
>
Might be worth turning on slab debugging to see if you get any violations on
slabs prior to your oops.  I was just debugging something simmilar on e1000e,
and slab debug was an immense help

Neil

>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Kernel Panics in the network stack
  2009-12-11 23:55         ` Kevin Constantine
@ 2009-12-12  1:06           ` Kevin Constantine
  2009-12-12  1:49             ` Kevin Constantine
  2009-12-12  7:15             ` Eric Dumazet
  0 siblings, 2 replies; 16+ messages in thread
From: Kevin Constantine @ 2009-12-12  1:06 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

On 12/11/2009 03:55 PM, Kevin Constantine wrote:
> Kevin Constantine wrote:
>> On 12/11/2009 01:58 PM, Eric Dumazet wrote:
>>> Le 11/12/2009 22:50, Kevin Constantine a écrit :
>>>> On 12/11/2009 01:39 PM, Eric Dumazet wrote:
>>>>> Le 11/12/2009 22:09, Kevin Constantine a écrit :
>>>>>> Hey Everyone-
>>>>>>
>>>>>> I've been playing with an ARM based linuxstamp
>>>>>> http://opencircuits.com/Linuxstamp, and I've been seeing kernel
>>>>>> panics
>>>>>> with both 2.6.28.3, and 2.6.30 within an hour or so of turning the
>>>>>> linuxstamp on. The stack traces always seem to point at functions
>>>>>> related to networking. I've pasted a couple of the crash outputs
>>>>>> below.
>>>>>> The linuxstamp isn't typically doing anything when the crashes occur,
>>>>>> in fact it'll crash even if I haven't logged in.
>>>>>>
>>>>>> If I ifconfig the interface down, the linuxstamp stays up
>>>>>> indefinitely.
>>>>>> Any pointers in one direction or another would be much appreciated.
>>>>>>
>>>>>> I'm not sure if this is the right audience to help out or if the arm
>>>>>> lists might be better. But in any event, any help would be really
>>>>>> appreciated.
>>>>>>
>>>>>>
>>>>>> linuxstamp login: Unable to handle kernel paging request at virtual
>>>>>> address 183cb7b0
>>>>>> pgd = c0004000
>>>>>> [183cb7b0] *pgd=00000000
>>>>>> Internal error: Oops: 0 [#1] PREEMPT
>>>>>> Modules linked in:
>>>>>> CPU: 0 Not tainted (2.6.30-00002-g0148992 #13)
>>>>>> PC is at 0x183cb7b0
>>>>>> LR is at __udp4_lib_rcv+0x43c/0x72c
>>>>>
>>>>> Could you disassemble your vmlinux file, __udp4_lib_rcv function
>>>>> around LR
>>>>> <c024ff4c>, to see which function was called ? This function then
>>>>> called
>>>>> a wrong pointer (0x183cb7b0 not a kernel pointer)
>>>>>
>>>>> Maybe a kernel stack corruption, or bad ram, ...
>>>>
>>>> The vmlinux file I'm using has probably changed a number of times since
>>>> then. I'll get a fresh stack trace and disassemble that one.

Here's another crash from while the machine was sitting idly at the 
login prompt.


debian login: Unable to handle kernel paging request at virtual address 
183d84a0
pgd = c0004000
[183d84a0] *pgd=00000000
Internal error: Oops: 0 [#1] PREEMPT
Modules linked in: spidev atmel_spi
CPU: 0    Not tainted  (2.6.30-00002-g0148992 #16)
PC is at 0x183d84a0
LR is at __udp4_lib_rcv+0x43c/0x72c
pc : [<183d84a0>]    lr : [<c024e91c>]    psr: 40000013
sp : c037fe70  ip : c037fe20  fp : c0384e60
r10: 00000008  r9 : c03bad00  r8 : 00000000
r7 : c03bb0ec  r6 : c03baaa4  r5 : c1ec2500  r4 : c03a06f0
r3 : 00000000  r2 : c037e000  r1 : 00000075  r0 : 00000000
Flags: nZcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
Control: c000717f  Table: 21d58000  DAC: 00000017
Process swapper (pid: 0, stack limit = 0xc037e268)
Stack: (0xc037fe70 to 0xc0380000)
fe60:                                     c1d17800 c1da5c30 c1ec2500 
c03a077c
fe80: c1d17800 c022dc58 c1d17800 00000001 c1df6400 c026e4d8 c03bace0 
c1ec2500
fea0: c03a077c c1d17800 00000000 c0214ed0 c0037100 c0034388 00000001 
c039e54c
fec0: 0005bedc 00000040 00000000 c039e534 c039e530 c0214fb4 00000001 
c039e54c
fee0: 00000040 c037e000 0000012c c039e530 c03bacf0 0005bede c039e540 
c0213764
ff00: c1ec2500 00000103 0000000c c037e000 00000001 c03a8678 00000000 
0000000a
ff20: 00000000 c0040358 c037e000 2001ccb8 00000000 00000018 00000000 
00000018
ff40: 00000002 00000001 c037e000 2001ccb8 00000000 c0040428 00000018 
c0022060
ff60: 00000000 ffffffff fefff000 c0022a3c 00000000 00000001 00000080 
60000013
ff80: c00243a4 c037e000 c0381e7c c00243a4 c03a3ac8 41129200 2001ccb8 
00000000
ffa0: fefff800 c037ffb8 c00243e0 c00243ec 60000013 ffffffff c00243a4 
c0024368
ffc0: c03ab174 c03a3a90 c001ed30 c0381cc8 2001ccec c00088d4 c0008434 
00000000
ffe0: 00000000 c001ed30 c0007175 c03a3af8 c001f134 20008034 00000000 
00000000
Code: bad PC value.
Kernel panic - not syncing: Fatal exception in interrupt
[<c002895c>] (unwind_backtrace+0x0/0xdc) from [<c02b1dfc>] 
(panic+0x3c/0x120)
[<c02b1dfc>] (panic+0x3c/0x120) from [<c0026e60>] (die+0x154/0x180)
[<c0026e60>] (die+0x154/0x180) from [<c0029848>] 
(__do_kernel_fault+0x68/0x80)
[<c0029848>] (__do_kernel_fault+0x68/0x80) from [<c0029a74>] 
(do_page_fault+0x214/0x234)
[<c0029a74>] (do_page_fault+0x214/0x234) from [<c0022b40>] 
(__pabt_svc+0x40/0x80)
[<c0022b40>] (__pabt_svc+0x40/0x80) from [<c024e91c>] 
(__udp4_lib_rcv+0x43c/0x72c)
[<c024e91c>] (__udp4_lib_rcv+0x43c/0x72c) from [<c039e54c>] (0xc039e54c)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Kernel Panics in the network stack
  2009-12-12  1:06           ` Kevin Constantine
@ 2009-12-12  1:49             ` Kevin Constantine
  2009-12-12  7:56               ` Eric Dumazet
  2009-12-22 10:09               ` Eric Dumazet
  2009-12-12  7:15             ` Eric Dumazet
  1 sibling, 2 replies; 16+ messages in thread
From: Kevin Constantine @ 2009-12-12  1:49 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

Kevin Constantine wrote:
> On 12/11/2009 03:55 PM, Kevin Constantine wrote:
>> Kevin Constantine wrote:
>>> On 12/11/2009 01:58 PM, Eric Dumazet wrote:
>>>> Le 11/12/2009 22:50, Kevin Constantine a écrit :
>>>>> On 12/11/2009 01:39 PM, Eric Dumazet wrote:
>>>>>> Le 11/12/2009 22:09, Kevin Constantine a écrit :
>>>>>>> Hey Everyone-
>>>>>>>
>>>>>>> I've been playing with an ARM based linuxstamp
>>>>>>> http://opencircuits.com/Linuxstamp, and I've been seeing kernel
>>>>>>> panics
>>>>>>> with both 2.6.28.3, and 2.6.30 within an hour or so of turning the
>>>>>>> linuxstamp on. The stack traces always seem to point at functions
>>>>>>> related to networking. I've pasted a couple of the crash outputs
>>>>>>> below.
>>>>>>> The linuxstamp isn't typically doing anything when the crashes 
>>>>>>> occur,
>>>>>>> in fact it'll crash even if I haven't logged in.
>>>>>>>
>>>>>>> If I ifconfig the interface down, the linuxstamp stays up
>>>>>>> indefinitely.
>>>>>>> Any pointers in one direction or another would be much appreciated.
>>>>>>>
>>>>>>> I'm not sure if this is the right audience to help out or if the arm
>>>>>>> lists might be better. But in any event, any help would be really
>>>>>>> appreciated.
>>>>>>>
>>>>>>>
>>>>>>> linuxstamp login: Unable to handle kernel paging request at virtual
>>>>>>> address 183cb7b0
>>>>>>> pgd = c0004000
>>>>>>> [183cb7b0] *pgd=00000000
>>>>>>> Internal error: Oops: 0 [#1] PREEMPT
>>>>>>> Modules linked in:
>>>>>>> CPU: 0 Not tainted (2.6.30-00002-g0148992 #13)
>>>>>>> PC is at 0x183cb7b0
>>>>>>> LR is at __udp4_lib_rcv+0x43c/0x72c
>>>>>>
>>>>>> Could you disassemble your vmlinux file, __udp4_lib_rcv function
>>>>>> around LR
>>>>>> <c024ff4c>, to see which function was called ? This function then
>>>>>> called
>>>>>> a wrong pointer (0x183cb7b0 not a kernel pointer)
>>>>>>
>>>>>> Maybe a kernel stack corruption, or bad ram, ...
>>>>>
>>>>> The vmlinux file I'm using has probably changed a number of times 
>>>>> since
>>>>> then. I'll get a fresh stack trace and disassemble that one.
> 

Here's yet another crash.  I recompiled the kernel to include slab 
debug.  This crash seems to implicate the at91ether driver.



debian login: Unable to handle kernel paging request at virtual 
address 60000013
pgd = c0004000
[60000013] *pgd=00000000
Internal error: Oops: 805 [#1] PREEMPT
Modules linked in:
CPU: 0    Not tainted  (2.6.30-00002-g0148992 #17)
PC is at memset+0xb8/0xc0
LR is at __alloc_skb+0x64/0x108
pc : [<c017c118>]    lr : [<c0211a64>]    psr: 20000013
sp : c0383ee8  ip : 5a5a5a5a  fp : ffc00048
r10: 00000000  r9 : 00000002  r8 : c021268c
r7 : c1c06d20  r6 : 000000e0  r5 : c1db2000  r4 : 60000013
r3 : 00000003  r2 : 00000000  r1 : 00000088  r0 : 60000013
Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
Control: c000717f  Table: 21d78000  DAC: 00000017
Process swapper (pid: 0, stack limit = 0xc0382268)
Stack: (0xc0383ee8 to 0xc0384000)
3ee0:                   c0045164 c1c91e60 000000be c1d38800 c1d38b00 
00000006
3f00: ffc00000 c021268c 00000004 c01c90d4 00000001 c1c91e60 00000000 
00000000
3f20: 00000018 00000001 c0382000 2001cf90 00000000 c006112c 00000000 
c1c91e60
3f40: c038a37c 00000018 00000002 c0062e7c 00000018 00000000 00000018 
c0022050
3f60: 00000000 ffffffff fefff000 c0022a3c 00000000 00000001 00000080 
60000013
3f80: c00243a4 c0382000 c0385ebc c00243a4 c03a7c68 41129200 2001cf90 
00000000
3fa0: fefff800 c0383fb8 c00243e0 c00243ec 60000013 ffffffff c00243a4 
c0024368
3fc0: c03af314 c03a7c30 c001ed30 c0385d08 2001cfc4 c00088d4 c0008434 
00000000
3fe0: 00000000 c001ed30 c0007175 c03a7c98 c001f134 20008034 00000000 
00000000
[<c017c118>] (memset+0xb8/0xc0) from [<c1d38800>] (0xc1d38800)
Code: ba00001d e3530002 b4c02001 d4c02001 (e4c02001)
Kernel panic - not syncing: Fatal exception in interrupt
[<c002895c>] (unwind_backtrace+0x0/0xdc) from [<c02b4c20>] 
(panic+0x3c/0x120)
[<c02b4c20>] (panic+0x3c/0x120) from [<c0026e60>] (die+0x154/0x180)
[<c0026e60>] (die+0x154/0x180) from [<c0029848>] 
(__do_kernel_fault+0x68/0x80)
[<c0029848>] (__do_kernel_fault+0x68/0x80) from [<c0029a74>] 
(do_page_fault+0x214/0x234)
[<c0029a74>] (do_page_fault+0x214/0x234) from [<c0022244>] 
(do_DataAbort+0x30/0x90)
[<c0022244>] (do_DataAbort+0x30/0x90) from [<c00229e0>] 
(__dabt_svc+0x40/0x60)
Exception stack(0xc0383ea0 to 0xc0383ee8)
3ea0: 60000013 00000088 00000000 00000003 60000013 c1db2000 000000e0 
c1c06d20
3ec0: c021268c 00000002 00000000 ffc00048 5a5a5a5a c0383ee8 c0211a64 
c017c118
3ee0: 20000013 ffffffff 

[<c00229e0>] (__dabt_svc+0x40/0x60) from [<c0211a64>] 
(__alloc_skb+0x64/0x108)
[<c0211a64>] (__alloc_skb+0x64/0x108) from [<c021268c>] 
(dev_alloc_skb+0x1c/0x44)
[<c021268c>] (dev_alloc_skb+0x1c/0x44) from [<c01c90d4>] 
(at91ether_interrupt+0x44/0x1b8)
[<c01c90d4>] (at91ether_interrupt+0x44/0x1b8) from [<c006112c>] 
(handle_IRQ_event+0x40/0x110)
[<c006112c>] (handle_IRQ_event+0x40/0x110) from [<c0062e7c>] 
(handle_level_irq+0xbc/0x134)
[<c0062e7c>] (handle_level_irq+0xbc/0x134) from [<c0022050>] 
(_text+0x50/0x78)
[<c0022050>] (_text+0x50/0x78) from [<c0022a3c>] (__irq_svc+0x3c/0x80)
Exception stack(0xc0383f70 to 0xc0383fb8)
3f60:                                     00000000 00000001 00000080 
60000013
3f80: c00243a4 c0382000 c0385ebc c00243a4 c03a7c68 41129200 2001cf90 
00000000
3fa0: fefff800 c0383fb8 c00243e0 c00243ec 60000013 ffffffff 

[<c0022a3c>] (__irq_svc+0x3c/0x80) from [<c00243e0>] 
(default_idle+0x3c/0x54)
[<c00243e0>] (default_idle+0x3c/0x54) from [<c0024368>] 
(cpu_idle+0x48/0x84)
[<c0024368>] (cpu_idle+0x48/0x84) from [<c00088d4>] 
(start_kernel+0x208/0x254)
[<c00088d4>] (start_kernel+0x208/0x254) from [<20008034>] (0x20008034)



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Kernel Panics in the network stack
  2009-12-12  1:06           ` Kevin Constantine
  2009-12-12  1:49             ` Kevin Constantine
@ 2009-12-12  7:15             ` Eric Dumazet
  1 sibling, 0 replies; 16+ messages in thread
From: Eric Dumazet @ 2009-12-12  7:15 UTC (permalink / raw)
  To: Kevin Constantine; +Cc: netdev

Le 12/12/2009 02:06, Kevin Constantine a écrit :
> Here's another crash from while the machine was sitting idly at the
> login prompt.
> 
> 
> debian login: Unable to handle kernel paging request at virtual address
> 183d84a0
> pgd = c0004000
> [183d84a0] *pgd=00000000
> Internal error: Oops: 0 [#1] PREEMPT
> Modules linked in: spidev atmel_spi
> CPU: 0    Not tainted  (2.6.30-00002-g0148992 #16)
> PC is at 0x183d84a0
> LR is at __udp4_lib_rcv+0x43c/0x72c
> pc : [<183d84a0>]    lr : [<c024e91c>]    psr: 40000013
> sp : c037fe70  ip : c037fe20  fp : c0384e60
> r10: 00000008  r9 : c03bad00  r8 : 00000000
> r7 : c03bb0ec  r6 : c03baaa4  r5 : c1ec2500  r4 : c03a06f0
> r3 : 00000000  r2 : c037e000  r1 : 00000075  r0 : 00000000
> Flags: nZcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
> Control: c000717f  Table: 21d58000  DAC: 00000017
> Process swapper (pid: 0, stack limit = 0xc037e268)
> Stack: (0xc037fe70 to 0xc0380000)
> fe60:                                     c1d17800 c1da5c30 c1ec2500
> c03a077c
> fe80: c1d17800 c022dc58 c1d17800 00000001 c1df6400 c026e4d8 c03bace0
> c1ec2500
> fea0: c03a077c c1d17800 00000000 c0214ed0 c0037100 c0034388 00000001
> c039e54c
> fec0: 0005bedc 00000040 00000000 c039e534 c039e530 c0214fb4 00000001
> c039e54c
> fee0: 00000040 c037e000 0000012c c039e530 c03bacf0 0005bede c039e540
> c0213764
> ff00: c1ec2500 00000103 0000000c c037e000 00000001 c03a8678 00000000
> 0000000a
> ff20: 00000000 c0040358 c037e000 2001ccb8 00000000 00000018 00000000
> 00000018
> ff40: 00000002 00000001 c037e000 2001ccb8 00000000 c0040428 00000018
> c0022060
> ff60: 00000000 ffffffff fefff000 c0022a3c 00000000 00000001 00000080
> 60000013
> ff80: c00243a4 c037e000 c0381e7c c00243a4 c03a3ac8 41129200 2001ccb8
> 00000000
> ffa0: fefff800 c037ffb8 c00243e0 c00243ec 60000013 ffffffff c00243a4
> c0024368
> ffc0: c03ab174 c03a3a90 c001ed30 c0381cc8 2001ccec c00088d4 c0008434
> 00000000
> ffe0: 00000000 c001ed30 c0007175 c03a3af8 c001f134 20008034 00000000
> 00000000
> Code: bad PC value.
> Kernel panic - not syncing: Fatal exception in interrupt
> [<c002895c>] (unwind_backtrace+0x0/0xdc) from [<c02b1dfc>]
> (panic+0x3c/0x120)
> [<c02b1dfc>] (panic+0x3c/0x120) from [<c0026e60>] (die+0x154/0x180)
> [<c0026e60>] (die+0x154/0x180) from [<c0029848>]
> (__do_kernel_fault+0x68/0x80)
> [<c0029848>] (__do_kernel_fault+0x68/0x80) from [<c0029a74>]
> (do_page_fault+0x214/0x234)
> [<c0029a74>] (do_page_fault+0x214/0x234) from [<c0022b40>]
> (__pabt_svc+0x40/0x80)
> [<c0022b40>] (__pabt_svc+0x40/0x80) from [<c024e91c>]
> (__udp4_lib_rcv+0x43c/0x72c)
> [<c024e91c>] (__udp4_lib_rcv+0x43c/0x72c) from [<c039e54c>] (0xc039e54c)
> -- 

This one happens frequently.

A disassembly of whole __udp4_lib_rcv() function could help.

Could you please send _me_ the vmlinux file, or an url on it ?

Thanks

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Kernel Panics in the network stack
  2009-12-12  1:49             ` Kevin Constantine
@ 2009-12-12  7:56               ` Eric Dumazet
  2009-12-22 10:09               ` Eric Dumazet
  1 sibling, 0 replies; 16+ messages in thread
From: Eric Dumazet @ 2009-12-12  7:56 UTC (permalink / raw)
  To: Kevin Constantine; +Cc: netdev

Le 12/12/2009 02:49, Kevin Constantine a écrit :
> 
> Here's yet another crash.  I recompiled the kernel to include slab
> debug.  This crash seems to implicate the at91ether driver.
> 

Or more likely a memory problem (use after free ?) somewhere,
it might be good to switch SLUB/SLAB for example to get other stack traces.

kmem_alloc(192) return 0x60000013 , this seems not good at all :(

> 
> 
> debian login: Unable to handle kernel paging request at virtual address
> 60000013
> pgd = c0004000
> [60000013] *pgd=00000000
> Internal error: Oops: 805 [#1] PREEMPT
> Modules linked in:
> CPU: 0    Not tainted  (2.6.30-00002-g0148992 #17)
> PC is at memset+0xb8/0xc0
> LR is at __alloc_skb+0x64/0x108
> pc : [<c017c118>]    lr : [<c0211a64>]    psr: 20000013
> sp : c0383ee8  ip : 5a5a5a5a  fp : ffc00048
> r10: 00000000  r9 : 00000002  r8 : c021268c
> r7 : c1c06d20  r6 : 000000e0  r5 : c1db2000  r4 : 60000013
> r3 : 00000003  r2 : 00000000  r1 : 00000088  r0 : 60000013
> Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
> Control: c000717f  Table: 21d78000  DAC: 00000017
> Process swapper (pid: 0, stack limit = 0xc0382268)
> Stack: (0xc0383ee8 to 0xc0384000)
> 3ee0:                   c0045164 c1c91e60 000000be c1d38800 c1d38b00
> 00000006
> 3f00: ffc00000 c021268c 00000004 c01c90d4 00000001 c1c91e60 00000000
> 00000000
> 3f20: 00000018 00000001 c0382000 2001cf90 00000000 c006112c 00000000
> c1c91e60
> 3f40: c038a37c 00000018 00000002 c0062e7c 00000018 00000000 00000018
> c0022050
> 3f60: 00000000 ffffffff fefff000 c0022a3c 00000000 00000001 00000080
> 60000013
> 3f80: c00243a4 c0382000 c0385ebc c00243a4 c03a7c68 41129200 2001cf90
> 00000000
> 3fa0: fefff800 c0383fb8 c00243e0 c00243ec 60000013 ffffffff c00243a4
> c0024368
> 3fc0: c03af314 c03a7c30 c001ed30 c0385d08 2001cfc4 c00088d4 c0008434
> 00000000
> 3fe0: 00000000 c001ed30 c0007175 c03a7c98 c001f134 20008034 00000000
> 00000000
> [<c017c118>] (memset+0xb8/0xc0) from [<c1d38800>] (0xc1d38800)
> Code: ba00001d e3530002 b4c02001 d4c02001 (e4c02001)
> Kernel panic - not syncing: Fatal exception in interrupt
> [<c002895c>] (unwind_backtrace+0x0/0xdc) from [<c02b4c20>]
> (panic+0x3c/0x120)
> [<c02b4c20>] (panic+0x3c/0x120) from [<c0026e60>] (die+0x154/0x180)
> [<c0026e60>] (die+0x154/0x180) from [<c0029848>]
> (__do_kernel_fault+0x68/0x80)
> [<c0029848>] (__do_kernel_fault+0x68/0x80) from [<c0029a74>]
> (do_page_fault+0x214/0x234)
> [<c0029a74>] (do_page_fault+0x214/0x234) from [<c0022244>]
> (do_DataAbort+0x30/0x90)
> [<c0022244>] (do_DataAbort+0x30/0x90) from [<c00229e0>]
> (__dabt_svc+0x40/0x60)
> Exception stack(0xc0383ea0 to 0xc0383ee8)
> 3ea0: 60000013 00000088 00000000 00000003 60000013 c1db2000 000000e0
> c1c06d20
> 3ec0: c021268c 00000002 00000000 ffc00048 5a5a5a5a c0383ee8 c0211a64
> c017c118
> 3ee0: 20000013 ffffffff
> [<c00229e0>] (__dabt_svc+0x40/0x60) from [<c0211a64>]
> (__alloc_skb+0x64/0x108)
> [<c0211a64>] (__alloc_skb+0x64/0x108) from [<c021268c>]
> (dev_alloc_skb+0x1c/0x44)
> [<c021268c>] (dev_alloc_skb+0x1c/0x44) from [<c01c90d4>]
> (at91ether_interrupt+0x44/0x1b8)
> [<c01c90d4>] (at91ether_interrupt+0x44/0x1b8) from [<c006112c>]
> (handle_IRQ_event+0x40/0x110)
> [<c006112c>] (handle_IRQ_event+0x40/0x110) from [<c0062e7c>]
> (handle_level_irq+0xbc/0x134)
> [<c0062e7c>] (handle_level_irq+0xbc/0x134) from [<c0022050>]
> (_text+0x50/0x78)
> [<c0022050>] (_text+0x50/0x78) from [<c0022a3c>] (__irq_svc+0x3c/0x80)
> Exception stack(0xc0383f70 to 0xc0383fb8)
> 3f60:                                     00000000 00000001 00000080
> 60000013
> 3f80: c00243a4 c0382000 c0385ebc c00243a4 c03a7c68 41129200 2001cf90
> 00000000
> 3fa0: fefff800 c0383fb8 c00243e0 c00243ec 60000013 ffffffff
> [<c0022a3c>] (__irq_svc+0x3c/0x80) from [<c00243e0>]
> (default_idle+0x3c/0x54)
> [<c00243e0>] (default_idle+0x3c/0x54) from [<c0024368>]
> (cpu_idle+0x48/0x84)
> [<c0024368>] (cpu_idle+0x48/0x84) from [<c00088d4>]
> (start_kernel+0x208/0x254)
> [<c00088d4>] (start_kernel+0x208/0x254) from [<20008034>] (0x20008034)
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Kernel Panics in the network stack
  2009-12-12  1:49             ` Kevin Constantine
  2009-12-12  7:56               ` Eric Dumazet
@ 2009-12-22 10:09               ` Eric Dumazet
  2009-12-22 11:08                 ` Catalin Marinas
  1 sibling, 1 reply; 16+ messages in thread
From: Eric Dumazet @ 2009-12-22 10:09 UTC (permalink / raw)
  To: Kevin Constantine; +Cc: netdev, linux kernel, Catalin Marinas, Rusty Russell

Le 12/12/2009 02:49, Kevin Constantine a écrit :
> Kevin Constantine wrote:
>> On 12/11/2009 03:55 PM, Kevin Constantine wrote:
>>> Kevin Constantine wrote:
>>>> On 12/11/2009 01:58 PM, Eric Dumazet wrote:
>>>>> Le 11/12/2009 22:50, Kevin Constantine a écrit :
>>>>>> On 12/11/2009 01:39 PM, Eric Dumazet wrote:
>>>>>>> Le 11/12/2009 22:09, Kevin Constantine a écrit :
>>>>>>>> Hey Everyone-
>>>>>>>>
>>>>>>>> I've been playing with an ARM based linuxstamp
>>>>>>>> http://opencircuits.com/Linuxstamp, and I've been seeing kernel
>>>>>>>> panics
>>>>>>>> with both 2.6.28.3, and 2.6.30 within an hour or so of turning the
>>>>>>>> linuxstamp on. The stack traces always seem to point at functions
>>>>>>>> related to networking. I've pasted a couple of the crash outputs
>>>>>>>> below.
>>>>>>>> The linuxstamp isn't typically doing anything when the crashes
>>>>>>>> occur,
>>>>>>>> in fact it'll crash even if I haven't logged in.
>>>>>>>>
>>>>>>>> If I ifconfig the interface down, the linuxstamp stays up
>>>>>>>> indefinitely.
>>>>>>>> Any pointers in one direction or another would be much appreciated.
>>>>>>>>
>>>>>>>> I'm not sure if this is the right audience to help out or if the
>>>>>>>> arm
>>>>>>>> lists might be better. But in any event, any help would be really
>>>>>>>> appreciated.
>>>>>>>>
>>>>>>>>
>>>>>>>> linuxstamp login: Unable to handle kernel paging request at virtual
>>>>>>>> address 183cb7b0
>>>>>>>> pgd = c0004000
>>>>>>>> [183cb7b0] *pgd=00000000
>>>>>>>> Internal error: Oops: 0 [#1] PREEMPT
>>>>>>>> Modules linked in:
>>>>>>>> CPU: 0 Not tainted (2.6.30-00002-g0148992 #13)
>>>>>>>> PC is at 0x183cb7b0
>>>>>>>> LR is at __udp4_lib_rcv+0x43c/0x72c
>>>>>>>
>>>>>>> Could you disassemble your vmlinux file, __udp4_lib_rcv function
>>>>>>> around LR
>>>>>>> <c024ff4c>, to see which function was called ? This function then
>>>>>>> called
>>>>>>> a wrong pointer (0x183cb7b0 not a kernel pointer)
>>>>>>>
>>>>>>> Maybe a kernel stack corruption, or bad ram, ...
>>>>>>
>>>>>> The vmlinux file I'm using has probably changed a number of times
>>>>>> since
>>>>>> then. I'll get a fresh stack trace and disassemble that one.
>>
> 
> Here's yet another crash.  I recompiled the kernel to include slab
> debug.  This crash seems to implicate the at91ether driver.
> 
> 
> 
> debian login: Unable to handle kernel paging request at virtual address
> 60000013
> pgd = c0004000
> [60000013] *pgd=00000000
> Internal error: Oops: 805 [#1] PREEMPT
> Modules linked in:
> CPU: 0    Not tainted  (2.6.30-00002-g0148992 #17)
> PC is at memset+0xb8/0xc0
> LR is at __alloc_skb+0x64/0x108
> pc : [<c017c118>]    lr : [<c0211a64>]    psr: 20000013
> sp : c0383ee8  ip : 5a5a5a5a  fp : ffc00048
> r10: 00000000  r9 : 00000002  r8 : c021268c
> r7 : c1c06d20  r6 : 000000e0  r5 : c1db2000  r4 : 60000013
> r3 : 00000003  r2 : 00000000  r1 : 00000088  r0 : 60000013
> Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
> Control: c000717f  Table: 21d78000  DAC: 00000017
> Process swapper (pid: 0, stack limit = 0xc0382268)
> Stack: (0xc0383ee8 to 0xc0384000)
> 3ee0:                   c0045164 c1c91e60 000000be c1d38800 c1d38b00
> 00000006
> 3f00: ffc00000 c021268c 00000004 c01c90d4 00000001 c1c91e60 00000000
> 00000000
> 3f20: 00000018 00000001 c0382000 2001cf90 00000000 c006112c 00000000
> c1c91e60
> 3f40: c038a37c 00000018 00000002 c0062e7c 00000018 00000000 00000018
> c0022050
> 3f60: 00000000 ffffffff fefff000 c0022a3c 00000000 00000001 00000080
> 60000013
> 3f80: c00243a4 c0382000 c0385ebc c00243a4 c03a7c68 41129200 2001cf90
> 00000000
> 3fa0: fefff800 c0383fb8 c00243e0 c00243ec 60000013 ffffffff c00243a4
> c0024368
> 3fc0: c03af314 c03a7c30 c001ed30 c0385d08 2001cfc4 c00088d4 c0008434
> 00000000
> 3fe0: 00000000 c001ed30 c0007175 c03a7c98 c001f134 20008034 00000000
> 00000000
> [<c017c118>] (memset+0xb8/0xc0) from [<c1d38800>] (0xc1d38800)
> Code: ba00001d e3530002 b4c02001 d4c02001 (e4c02001)
> Kernel panic - not syncing: Fatal exception in interrupt
> [<c002895c>] (unwind_backtrace+0x0/0xdc) from [<c02b4c20>]
> (panic+0x3c/0x120)
> [<c02b4c20>] (panic+0x3c/0x120) from [<c0026e60>] (die+0x154/0x180)
> [<c0026e60>] (die+0x154/0x180) from [<c0029848>]
> (__do_kernel_fault+0x68/0x80)
> [<c0029848>] (__do_kernel_fault+0x68/0x80) from [<c0029a74>]
> (do_page_fault+0x214/0x234)
> [<c0029a74>] (do_page_fault+0x214/0x234) from [<c0022244>]
> (do_DataAbort+0x30/0x90)
> [<c0022244>] (do_DataAbort+0x30/0x90) from [<c00229e0>]
> (__dabt_svc+0x40/0x60)
> Exception stack(0xc0383ea0 to 0xc0383ee8)
> 3ea0: 60000013 00000088 00000000 00000003 60000013 c1db2000 000000e0
> c1c06d20
> 3ec0: c021268c 00000002 00000000 ffc00048 5a5a5a5a c0383ee8 c0211a64
> c017c118
> 3ee0: 20000013 ffffffff
> [<c00229e0>] (__dabt_svc+0x40/0x60) from [<c0211a64>]
> (__alloc_skb+0x64/0x108)
> [<c0211a64>] (__alloc_skb+0x64/0x108) from [<c021268c>]
> (dev_alloc_skb+0x1c/0x44)
> [<c021268c>] (dev_alloc_skb+0x1c/0x44) from [<c01c90d4>]
> (at91ether_interrupt+0x44/0x1b8)
> [<c01c90d4>] (at91ether_interrupt+0x44/0x1b8) from [<c006112c>]
> (handle_IRQ_event+0x40/0x110)
> [<c006112c>] (handle_IRQ_event+0x40/0x110) from [<c0062e7c>]
> (handle_level_irq+0xbc/0x134)
> [<c0062e7c>] (handle_level_irq+0xbc/0x134) from [<c0022050>]
> (_text+0x50/0x78)
> [<c0022050>] (_text+0x50/0x78) from [<c0022a3c>] (__irq_svc+0x3c/0x80)
> Exception stack(0xc0383f70 to 0xc0383fb8)
> 3f60:                                     00000000 00000001 00000080
> 60000013
> 3f80: c00243a4 c0382000 c0385ebc c00243a4 c03a7c68 41129200 2001cf90
> 00000000
> 3fa0: fefff800 c0383fb8 c00243e0 c00243ec 60000013 ffffffff
> [<c0022a3c>] (__irq_svc+0x3c/0x80) from [<c00243e0>]
> (default_idle+0x3c/0x54)
> [<c00243e0>] (default_idle+0x3c/0x54) from [<c0024368>]
> (cpu_idle+0x48/0x84)
> [<c0024368>] (cpu_idle+0x48/0x84) from [<c00088d4>]
> (start_kernel+0x208/0x254)
> [<c00088d4>] (start_kernel+0x208/0x254) from [<20008034>] (0x20008034)
> 
> 

After many private mails exchanged with Kevin, 
it seems we have many unrelated corruptions happening in ARM, possibly at IRQ
handling or whatever. Its more likely an ARM problem more than a network stack issue.

I found an old commit mentioning a problem with LDM instruction that could be
interrupted/ restarted with a base register already changed -> we load registers with garbage.

author	Catalin Marinas <catalin.marinas@arm.com>	
	Thu, 12 Jan 2006 16:53:51 +0000 (16:53 +0000)
committer	Russell King <rmk+kernel@arm.linux.org.uk>	
	Thu, 12 Jan 2006 16:53:51 +0000 (16:53 +0000)
commit	90303b102353302e84758f245906368907e6a23b


Patch from Catalin Marinas

If the low interrupt latency mode is enabled for the CPU (from ARMv6
onwards), the ldm/stm instructions are no longer atomic. An ldm instruction
restoring the sp and pc registers can be interrupted immediately after sp
was updated but before the pc. If this happens, the CPU restores the base
register to the value before the ldm instruction but if the base register
is not sp, the interrupt routine will corrupt the stack and the restarted
ldm instruction will load garbage.

Note that future ARM cores might always run in the low interrupt latency
mode.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

I found one instance of LDM instruction in 2.6.30 that could have same problem :

__switch_to:

...
	ldm r4, {r4, r5, r6, r7, r8, r9, sl, fp, sp, pc}


Kevin, any chance you can try 2.6.33 (or 2.6.32) instead of 2.6.30 ?


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Kernel Panics in the network stack
  2009-12-22 10:09               ` Eric Dumazet
@ 2009-12-22 11:08                 ` Catalin Marinas
  2009-12-22 11:25                   ` Russell King - ARM Linux
  2009-12-22 11:32                   ` Eric Dumazet
  0 siblings, 2 replies; 16+ messages in thread
From: Catalin Marinas @ 2009-12-22 11:08 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Kevin Constantine, netdev, linux kernel, Rusty Russell,
	Russell King - ARM Linux

On Tue, 2009-12-22 at 10:09 +0000, Eric Dumazet wrote:
> Le 12/12/2009 02:49, Kevin Constantine a écrit :
> > Kevin Constantine wrote:
> >> On 12/11/2009 03:55 PM, Kevin Constantine wrote:
> >>> Kevin Constantine wrote:
> >>>> On 12/11/2009 01:58 PM, Eric Dumazet wrote:
> >>>>> Le 11/12/2009 22:50, Kevin Constantine a écrit :
> >>>>>> On 12/11/2009 01:39 PM, Eric Dumazet wrote:
> >>>>>>> Le 11/12/2009 22:09, Kevin Constantine a écrit :
> >>>>>>>> Hey Everyone-
> >>>>>>>>
> >>>>>>>> I've been playing with an ARM based linuxstamp
> >>>>>>>> http://opencircuits.com/Linuxstamp, and I've been seeing kernel
> >>>>>>>> panics
> >>>>>>>> with both 2.6.28.3, and 2.6.30 within an hour or so of turning the
> >>>>>>>> linuxstamp on. The stack traces always seem to point at functions
> >>>>>>>> related to networking. I've pasted a couple of the crash outputs
> >>>>>>>> below.
> >>>>>>>> The linuxstamp isn't typically doing anything when the crashes
> >>>>>>>> occur,
> >>>>>>>> in fact it'll crash even if I haven't logged in.
> >>>>>>>>
> >>>>>>>> If I ifconfig the interface down, the linuxstamp stays up
> >>>>>>>> indefinitely.
> >>>>>>>> Any pointers in one direction or another would be much appreciated.
> >>>>>>>>
> >>>>>>>> I'm not sure if this is the right audience to help out or if the
> >>>>>>>> arm
> >>>>>>>> lists might be better. But in any event, any help would be really
> >>>>>>>> appreciated.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> linuxstamp login: Unable to handle kernel paging request at virtual
> >>>>>>>> address 183cb7b0
> >>>>>>>> pgd = c0004000
> >>>>>>>> [183cb7b0] *pgd=00000000
> >>>>>>>> Internal error: Oops: 0 [#1] PREEMPT
> >>>>>>>> Modules linked in:
> >>>>>>>> CPU: 0 Not tainted (2.6.30-00002-g0148992 #13)
> >>>>>>>> PC is at 0x183cb7b0
> >>>>>>>> LR is at __udp4_lib_rcv+0x43c/0x72c
> >>>>>>>
> >>>>>>> Could you disassemble your vmlinux file, __udp4_lib_rcv function
> >>>>>>> around LR
> >>>>>>> <c024ff4c>, to see which function was called ? This function then
> >>>>>>> called
> >>>>>>> a wrong pointer (0x183cb7b0 not a kernel pointer)
> >>>>>>>
> >>>>>>> Maybe a kernel stack corruption, or bad ram, ...
> >>>>>>
> >>>>>> The vmlinux file I'm using has probably changed a number of times
> >>>>>> since
> >>>>>> then. I'll get a fresh stack trace and disassemble that one.
> >>
> >
> > Here's yet another crash.  I recompiled the kernel to include slab
> > debug.  This crash seems to implicate the at91ether driver.
> >
> >
> >
> > debian login: Unable to handle kernel paging request at virtual address
> > 60000013
> > pgd = c0004000
> > [60000013] *pgd=00000000
> > Internal error: Oops: 805 [#1] PREEMPT
> > Modules linked in:
> > CPU: 0    Not tainted  (2.6.30-00002-g0148992 #17)
> > PC is at memset+0xb8/0xc0
> > LR is at __alloc_skb+0x64/0x108
> > pc : [<c017c118>]    lr : [<c0211a64>]    psr: 20000013
> > sp : c0383ee8  ip : 5a5a5a5a  fp : ffc00048
> > r10: 00000000  r9 : 00000002  r8 : c021268c
> > r7 : c1c06d20  r6 : 000000e0  r5 : c1db2000  r4 : 60000013
> > r3 : 00000003  r2 : 00000000  r1 : 00000088  r0 : 60000013
> > Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
> > Control: c000717f  Table: 21d78000  DAC: 00000017
> > Process swapper (pid: 0, stack limit = 0xc0382268)
> > Stack: (0xc0383ee8 to 0xc0384000)
> > 3ee0:                   c0045164 c1c91e60 000000be c1d38800 c1d38b00
> > 00000006
> > 3f00: ffc00000 c021268c 00000004 c01c90d4 00000001 c1c91e60 00000000
> > 00000000
> > 3f20: 00000018 00000001 c0382000 2001cf90 00000000 c006112c 00000000
> > c1c91e60
> > 3f40: c038a37c 00000018 00000002 c0062e7c 00000018 00000000 00000018
> > c0022050
> > 3f60: 00000000 ffffffff fefff000 c0022a3c 00000000 00000001 00000080
> > 60000013
> > 3f80: c00243a4 c0382000 c0385ebc c00243a4 c03a7c68 41129200 2001cf90
> > 00000000
> > 3fa0: fefff800 c0383fb8 c00243e0 c00243ec 60000013 ffffffff c00243a4
> > c0024368
> > 3fc0: c03af314 c03a7c30 c001ed30 c0385d08 2001cfc4 c00088d4 c0008434
> > 00000000
> > 3fe0: 00000000 c001ed30 c0007175 c03a7c98 c001f134 20008034 00000000
> > 00000000
> > [<c017c118>] (memset+0xb8/0xc0) from [<c1d38800>] (0xc1d38800)
> > Code: ba00001d e3530002 b4c02001 d4c02001 (e4c02001)
> > Kernel panic - not syncing: Fatal exception in interrupt
> > [<c002895c>] (unwind_backtrace+0x0/0xdc) from [<c02b4c20>]
> > (panic+0x3c/0x120)
> > [<c02b4c20>] (panic+0x3c/0x120) from [<c0026e60>] (die+0x154/0x180)
> > [<c0026e60>] (die+0x154/0x180) from [<c0029848>]
> > (__do_kernel_fault+0x68/0x80)
> > [<c0029848>] (__do_kernel_fault+0x68/0x80) from [<c0029a74>]
> > (do_page_fault+0x214/0x234)
> > [<c0029a74>] (do_page_fault+0x214/0x234) from [<c0022244>]
> > (do_DataAbort+0x30/0x90)
> > [<c0022244>] (do_DataAbort+0x30/0x90) from [<c00229e0>]
> > (__dabt_svc+0x40/0x60)
> > Exception stack(0xc0383ea0 to 0xc0383ee8)
> > 3ea0: 60000013 00000088 00000000 00000003 60000013 c1db2000 000000e0
> > c1c06d20
> > 3ec0: c021268c 00000002 00000000 ffc00048 5a5a5a5a c0383ee8 c0211a64
> > c017c118
> > 3ee0: 20000013 ffffffff
> > [<c00229e0>] (__dabt_svc+0x40/0x60) from [<c0211a64>]
> > (__alloc_skb+0x64/0x108)
> > [<c0211a64>] (__alloc_skb+0x64/0x108) from [<c021268c>]
> > (dev_alloc_skb+0x1c/0x44)
> > [<c021268c>] (dev_alloc_skb+0x1c/0x44) from [<c01c90d4>]
> > (at91ether_interrupt+0x44/0x1b8)
> > [<c01c90d4>] (at91ether_interrupt+0x44/0x1b8) from [<c006112c>]
> > (handle_IRQ_event+0x40/0x110)
> > [<c006112c>] (handle_IRQ_event+0x40/0x110) from [<c0062e7c>]
> > (handle_level_irq+0xbc/0x134)
> > [<c0062e7c>] (handle_level_irq+0xbc/0x134) from [<c0022050>]
> > (_text+0x50/0x78)
> > [<c0022050>] (_text+0x50/0x78) from [<c0022a3c>] (__irq_svc+0x3c/0x80)
> > Exception stack(0xc0383f70 to 0xc0383fb8)
> > 3f60:                                     00000000 00000001 00000080
> > 60000013
> > 3f80: c00243a4 c0382000 c0385ebc c00243a4 c03a7c68 41129200 2001cf90
> > 00000000
> > 3fa0: fefff800 c0383fb8 c00243e0 c00243ec 60000013 ffffffff
> > [<c0022a3c>] (__irq_svc+0x3c/0x80) from [<c00243e0>]
> > (default_idle+0x3c/0x54)
> > [<c00243e0>] (default_idle+0x3c/0x54) from [<c0024368>]
> > (cpu_idle+0x48/0x84)
> > [<c0024368>] (cpu_idle+0x48/0x84) from [<c00088d4>]
> > (start_kernel+0x208/0x254)
> > [<c00088d4>] (start_kernel+0x208/0x254) from [<20008034>] (0x20008034)
[...]
> I found an old commit mentioning a problem with LDM instruction that
> could be interrupted/ restarted with a base register already changed
> -> we load registers with garbage.
[...]
> If the low interrupt latency mode is enabled for the CPU (from ARMv6
> onwards), the ldm/stm instructions are no longer atomic. An ldm instruction
> restoring the sp and pc registers can be interrupted immediately after sp
> was updated but before the pc. If this happens, the CPU restores the base
> register to the value before the ldm instruction but if the base register
> is not sp, the interrupt routine will corrupt the stack and the restarted
> ldm instruction will load garbage.
[...]
> I found one instance of LDM instruction in 2.6.30 that could have same problem :
> 
> __switch_to:
> 
> ...
>         ldm r4, {r4, r5, r6, r7, r8, r9, sl, fp, sp, pc}

It looks to me like it is possible to get an interrupt after SP was
loaded but before PC, the stack could be corrupted and PC would be
loaded with garbage. One instance of your oops messages looks like PC
corruption but the other may be caused by something else. What ARM CPU
are you using?

I'm cc'ing Russell as well, it's strange that we haven't got any issue
with this so far.

You could try #undef'ing __ARCH_WANT_INTERRUPTS_ON_CTXSW in
arch/arm/include/asm/system.h as a sanity check for your aborts.

-- 
Catalin


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Kernel Panics in the network stack
  2009-12-22 11:08                 ` Catalin Marinas
@ 2009-12-22 11:25                   ` Russell King - ARM Linux
  2009-12-22 11:48                     ` Catalin Marinas
  2009-12-22 11:32                   ` Eric Dumazet
  1 sibling, 1 reply; 16+ messages in thread
From: Russell King - ARM Linux @ 2009-12-22 11:25 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Eric Dumazet, Kevin Constantine, netdev, linux kernel,
	Rusty Russell

On Tue, Dec 22, 2009 at 11:08:25AM +0000, Catalin Marinas wrote:
> On Tue, 2009-12-22 at 10:09 +0000, Eric Dumazet wrote:
> > I found an old commit mentioning a problem with LDM instruction that
> > could be interrupted/ restarted with a base register already changed
> > -> we load registers with garbage.
> [...]
> > If the low interrupt latency mode is enabled for the CPU (from ARMv6
> > onwards), the ldm/stm instructions are no longer atomic. An ldm instruction
> > restoring the sp and pc registers can be interrupted immediately after sp
> > was updated but before the pc. If this happens, the CPU restores the base
> > register to the value before the ldm instruction but if the base register
> > is not sp, the interrupt routine will corrupt the stack and the restarted
> > ldm instruction will load garbage.
> [...]
> > I found one instance of LDM instruction in 2.6.30 that could have same problem :
> > 
> > __switch_to:
> > 
> > ...
> >         ldm r4, {r4, r5, r6, r7, r8, r9, sl, fp, sp, pc}
> 
> It looks to me like it is possible to get an interrupt after SP was
> loaded but before PC, the stack could be corrupted and PC would be
> loaded with garbage. One instance of your oops messages looks like PC
> corruption but the other may be caused by something else. What ARM CPU
> are you using?
> 
> I'm cc'ing Russell as well, it's strange that we haven't got any issue
> with this so far.

We don't see the issue because we explicitly disable low latency
interrupt mode.

> You could try #undef'ing __ARCH_WANT_INTERRUPTS_ON_CTXSW in
> arch/arm/include/asm/system.h as a sanity check for your aborts.

Unfortunately, we can't do that for older ARM architectures without
severely impacting the interrupt latency there.  Not only that, but
the interrupt latency will be increased during any context switch.

I really question the value of this "low latency interrupt" setting.
If you're worried about interrupts being disabled for a very small
number of bus cycles for a LDM, then you're going to be screaming
merry hell about the places in the kernel where interrupts are masked.
The two just do not go together.

The only case for enabling the low latency interrupt mode would be if
you have tightly controlled software which never disables interrupts.
Linux does not fall into that category, so enabling it is pointless
and causes unnecessary problems.

Given that, the simple and obvious solution is: do not modify the kernel
to enable low interrupt latency mode.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Kernel Panics in the network stack
  2009-12-22 11:08                 ` Catalin Marinas
  2009-12-22 11:25                   ` Russell King - ARM Linux
@ 2009-12-22 11:32                   ` Eric Dumazet
  1 sibling, 0 replies; 16+ messages in thread
From: Eric Dumazet @ 2009-12-22 11:32 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Kevin Constantine, netdev, linux kernel, Rusty Russell,
	Russell King - ARM Linux

Le 22/12/2009 12:08, Catalin Marinas a écrit :
> On Tue, 2009-12-22 at 10:09 +0000, Eric Dumazet wrote:
>> __switch_to:
>>
>> ...
>>         ldm r4, {r4, r5, r6, r7, r8, r9, sl, fp, sp, pc}
> 
> It looks to me like it is possible to get an interrupt after SP was
> loaded but before PC, the stack could be corrupted and PC would be
> loaded with garbage. One instance of your oops messages looks like PC
> corruption but the other may be caused by something else. What ARM CPU
> are you using?

I saw other very strange corruptions (registers R6 & R7) as well, on Kevin supplied traces.

> 
> I'm cc'ing Russell as well, it's strange that we haven't got any issue
> with this so far.

Oh well, it seems I CC'ed Rusty Russel instead :)

> 
> You could try #undef'ing __ARCH_WANT_INTERRUPTS_ON_CTXSW in
> arch/arm/include/asm/system.h as a sanity check for your aborts.
> 

Kevin uses linuxstamp card, from open circuits :

http://www.opencircuits.com/Linuxstamp

It's a AT91RM9200 processor (Arm9 with MMU, 180MHz )

Thanks

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Kernel Panics in the network stack
  2009-12-22 11:25                   ` Russell King - ARM Linux
@ 2009-12-22 11:48                     ` Catalin Marinas
  0 siblings, 0 replies; 16+ messages in thread
From: Catalin Marinas @ 2009-12-22 11:48 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Eric Dumazet, Kevin Constantine, netdev, linux kernel,
	Rusty Russell

On Tue, 2009-12-22 at 11:25 +0000, Russell King - ARM Linux wrote:
> On Tue, Dec 22, 2009 at 11:08:25AM +0000, Catalin Marinas wrote:
> > On Tue, 2009-12-22 at 10:09 +0000, Eric Dumazet wrote:
> > > I found an old commit mentioning a problem with LDM instruction that
> > > could be interrupted/ restarted with a base register already changed
> > > -> we load registers with garbage.
> > [...]
> > > If the low interrupt latency mode is enabled for the CPU (from ARMv6
> > > onwards), the ldm/stm instructions are no longer atomic. An ldm instruction
> > > restoring the sp and pc registers can be interrupted immediately after sp
> > > was updated but before the pc. If this happens, the CPU restores the base
> > > register to the value before the ldm instruction but if the base register
> > > is not sp, the interrupt routine will corrupt the stack and the restarted
> > > ldm instruction will load garbage.
> > [...]
> > > I found one instance of LDM instruction in 2.6.30 that could have same problem :
> > >
> > > __switch_to:
> > >
> > > ...
> > >         ldm r4, {r4, r5, r6, r7, r8, r9, sl, fp, sp, pc}
> >
> > It looks to me like it is possible to get an interrupt after SP was
> > loaded but before PC, the stack could be corrupted and PC would be
> > loaded with garbage. One instance of your oops messages looks like PC
> > corruption but the other may be caused by something else. What ARM CPU
> > are you using?
> >
> > I'm cc'ing Russell as well, it's strange that we haven't got any issue
> > with this so far.
> 
> We don't see the issue because we explicitly disable low latency
> interrupt mode.

I think there are some processors where this is always on (but I think
the no-MMU ones).

But looking at this again, I don't think it actually matters since R4
doesn't point to the current stack but to the cpu_context in
thread_info. Even if interrupt occurs after SP was loaded and before PC,
it doesn't corrupt the thread_info structure and what the LDM re-reads.
> 
> > You could try #undef'ing __ARCH_WANT_INTERRUPTS_ON_CTXSW in
> > arch/arm/include/asm/system.h as a sanity check for your aborts.
> 
> Unfortunately, we can't do that for older ARM architectures without
> severely impacting the interrupt latency there.  Not only that, but
> the interrupt latency will be increased during any context switch.

I didn't say we should have this all the time, just as a check for
Eric's problem. But I don't think it's even needed.

-- 
Catalin


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2009-12-22 11:49 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-12-11 21:09 Kernel Panics in the network stack Kevin Constantine
2009-12-11 21:39 ` Eric Dumazet
2009-12-11 21:50   ` Kevin Constantine
2009-12-11 21:58     ` Eric Dumazet
2009-12-11 22:16       ` Kevin Constantine
2009-12-11 23:55         ` Kevin Constantine
2009-12-12  1:06           ` Kevin Constantine
2009-12-12  1:49             ` Kevin Constantine
2009-12-12  7:56               ` Eric Dumazet
2009-12-22 10:09               ` Eric Dumazet
2009-12-22 11:08                 ` Catalin Marinas
2009-12-22 11:25                   ` Russell King - ARM Linux
2009-12-22 11:48                     ` Catalin Marinas
2009-12-22 11:32                   ` Eric Dumazet
2009-12-12  7:15             ` Eric Dumazet
2009-12-12  0:44 ` Neil Horman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).