* 2.6.33.6-rt28 kernel oops while stressing network
@ 2010-08-09 20:10 John Culvertson
2010-08-10 12:23 ` Patrice Kadionik
[not found] ` <AANLkTi=tPSeXTZkjPPm_MGmmOx2fZhryOkajgssv0EsX@mail.gmail.com>
0 siblings, 2 replies; 15+ messages in thread
From: John Culvertson @ 2010-08-09 20:10 UTC (permalink / raw)
To: linux-rt-users
Hello,
I am trying to use the RT patches on an x86 industrial computer. I am
getting intermittent network hangs and kernel crashes when I load the
network with netperf. The unpatched kernel does not exhibit these
problems. The kernel is 2.6.33.6 patched with rt28.
The computer has an AMD LX800 processor and two Intel 82559 10/100 PCI
Ethernet controllers. I have only seen the kernel crashes when
running netperf on both ports simultaneously.
This is my first time using the RT patches, so I am not sure how to go
about resolving this. Any tips would be greatly appreciated.
[ 201.514962] BUG: unable to handle kernel paging request at a0282044
[ 201.516020] IP: [<c108d664>] free_block+0x4f/0xe5
[ 201.516020] *pde = 00000000
[ 201.516020] Oops: 0002 [#1] PREEMPT
[ 201.516020] last sysfs file: /sys/module/vt/parameters/default_utf8
[ 201.516020] Modules linked in: evdev usbhid ohci_hcd geode_rng ecb
aes_i586 ehci_hcd aes_generic usbcore geode_aes nls_base
[ 201.516020]
[ 201.516020] Pid: 6, comm: sirq-net-rx/0 Tainted: G W
2.6.33.6-rt28 #4 SL8/SL8
[ 201.516020] EIP: 0060:[<c108d664>] EFLAGS: 00010202 CPU: 0
[ 201.516020] EIP is at free_block+0x4f/0xe5
[ 201.516020] EAX: d6d75060 EBX: de682500 ECX: 00000004 EDX: a0282040
[ 201.516020] ESI: de682020 EDI: de431340 EBP: de40e5c0 ESP: de44bd74
[ 201.516020] DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068 preempt:00000000
[ 201.516020] Process sirq-net-rx/0 (pid: 6, ti=de44a000
task=de420490 task.ti=de44a000)
[ 201.516020] Stack:
[ 201.516020] 00000003 00000000 0000001b de406688 00000001 de431340
00000000 de406660
[ 201.516020] <0> 0000001b c108d835 00000000 de44bdc8 de44bdc8
ddbd2060 de40e5c0 de431364
[ 201.516020] <0> 00000000 de40e5c0 ddbd2060 ddbd2060 c108d581
00000000 00000000 d6e78620
[ 201.516020] Call Trace:
[ 201.516020] [<c108d835>] ? __cache_free+0x7a/0xae
[ 201.516020] [<c108d581>] ? kmem_cache_free+0x1c/0x58
[ 201.516020] [<c11d3493>] ? tcp_ack+0x3eb/0x12f5
[ 201.516020] [<c11d4bd8>] ? tcp_rcv_established+0xb0/0x476
[ 201.516020] [<c11da92f>] ? tcp_v4_do_rcv+0x129/0x28f
[ 201.516020] [<c11dbf43>] ? tcp_v4_rcv+0x339/0x523
[ 201.516020] [<c11c3a8a>] ? ip_local_deliver_finish+0xf9/0x160
[ 201.516020] [<c11c3925>] ? ip_rcv_finish+0x28a/0x29d
[ 201.516020] [<c11aceb4>] ? netif_receive_skb+0x1c2/0x1e9
[ 201.516020] [<c118d368>] ? e100_poll+0x172/0x37c
[ 201.516020] [<c11af94c>] ? net_rx_action+0x53/0x100
[ 201.516020] [<c1027743>] ? run_ksoftirqd+0xfb/0x1da
[ 201.516020] [<c1027648>] ? run_ksoftirqd+0x0/0x1da
[ 201.516020] [<c1036d2d>] ? kthread+0x52/0x57
[ 201.516020] [<c1036cdb>] ? kthread+0x0/0x57
[ 201.516020] [<c1002dbe>] ? kernel_thread_helper+0x6/0x10
[ 201.516020] Code: 24 0c 8b 1c 82 89 d8 e8 34 fc ff ff 89 c6 e8 18
f9 ff ff 85 c0 75 04 0f 0b eb fe 8b 76 1c 8b 44 24 28 8b 16 8b 7c 85
4c 8b 46 04 <89> 42 04 89 10 2b 5e 0c c7 06 00 01 10 00 c7 46 04 00 02
20 00
[ 201.516020] EIP: [<c108d664>] free_block+0x4f/0xe5 SS:ESP 0068:de44bd74
[ 201.516020] CR2: 00000000a0282044
[ 201.908587] ---[ end trace d28d8d35cd5a7130 ]---
[ 201.920053] ------------[ cut here ]------------
[ 201.924018] kernel BUG at kernel/rtmutex.c:831!
[ 201.924018] invalid opcode: 0000 [#2] PREEMPT
[ 201.924018] last sysfs file: /sys/module/vt/parameters/default_utf8
[ 201.924018] Modules linked in: evdev usbhid ohci_hcd geode_rng ecb
aes_i586 ehci_hcd aes_generic usbcore geode_aes nls_base
[ 201.924018]
[ 201.924018] Pid: 6, comm: sirq-net-rx/0 Tainted: G D W
2.6.33.6-rt28 #4 SL8/SL8
[ 201.924018] EIP: 0060:[<c122ca6e>] EFLAGS: 00010046 CPU: 0
[ 201.924018] EIP is at rt_spin_lock_slowlock+0x35/0x155
[ 201.924018] EAX: de420490 EBX: 00000292 ECX: 00000000 EDX: de420490
[ 201.924018] ESI: c122ca39 EDI: c1321160 EBP: 00000000 ESP: de44bba8
[ 201.924018] DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068 preempt:00000001
[ 201.924018] Process sirq-net-rx/0 (pid: 6, ti=de44a000
task=de420490 task.ti=de44a000)
[ 201.924018] Stack:
[ 201.924018] 00000030 00000046 de44bbd0 c102784a c1003c19 de120c7c
de226b3c de40a600
[ 201.924018] <0> 00000000 c1002db0 de120c7c 00000000 c1322c40
de226b3c c1321160 c122ca39
[ 201.924018] <0> de120c64 00000000 c104582b de44bc08 de40e7a0
c108d08a de120c7c c108d576
[ 201.924018] Call Trace:
[ 201.924018] [<c102784a>] ? irq_exit+0x28/0x32
[ 201.924018] [<c1003c19>] ? do_IRQ+0x61/0x71
[ 201.924018] [<c1002db0>] ? common_interrupt+0x30/0x38
[ 201.924018] [<c122ca39>] ? rt_spin_lock_slowlock+0x0/0x155
[ 201.924018] [<c104582b>] ? rt_spin_lock_fastlock+0x52/0x55
[ 201.924018] [<c108d08a>] ? _slab_irq_disable+0xd/0x15
[ 201.924018] [<c108d576>] ? kmem_cache_free+0x11/0x58
[ 201.924018] [<c109f603>] ? destroy_inode+0x1c/0x2b
[ 201.924018] [<c109eefe>] ? iput+0x47/0x49
[ 201.924018] [<c109cfd1>] ? d_kill+0x2d/0x47
[ 201.924018] [<c109d195>] ? __shrink_dcache_sb+0x1aa/0x247
[ 201.924018] [<c109d4c0>] ? shrink_dcache_parent+0x26/0xd7
[ 201.924018] [<c10c59f9>] ? proc_flush_task+0x7d/0x165
[ 201.924018] [<c1024445>] ? release_task+0x18/0x2af
[ 201.924018] [<c102570c>] ? do_exit+0x4dd/0x547
[ 201.924018] [<c1004d16>] ? oops_end+0x7f/0x83
[ 201.924018] [<c1015165>] ? no_context+0x10c/0x115
[ 201.924018] [<c10153ad>] ? do_page_fault+0x0/0x28f
[ 201.924018] [<c1015361>] ? bad_area_nosemaphore+0xa/0xc
[ 201.924018] [<c122d2fb>] ? error_code+0x6b/0x70
[ 201.924018] [<c108d664>] ? free_block+0x4f/0xe5
[ 201.924018] [<c108d835>] ? __cache_free+0x7a/0xae
[ 201.924018] [<c108d581>] ? kmem_cache_free+0x1c/0x58
[ 201.924018] [<c11d3493>] ? tcp_ack+0x3eb/0x12f5
[ 201.924018] [<c11d4bd8>] ? tcp_rcv_established+0xb0/0x476
[ 201.924018] [<c11da92f>] ? tcp_v4_do_rcv+0x129/0x28f
[ 201.924018] [<c11dbf43>] ? tcp_v4_rcv+0x339/0x523
[ 201.924018] [<c11c3a8a>] ? ip_local_deliver_finish+0xf9/0x160
[ 201.924018] [<c11c3925>] ? ip_rcv_finish+0x28a/0x29d
[ 201.924018] [<c11aceb4>] ? netif_receive_skb+0x1c2/0x1e9
[ 201.924018] [<c118d368>] ? e100_poll+0x172/0x37c
[ 201.924018] [<c11af94c>] ? net_rx_action+0x53/0x100
[ 201.924018] [<c1027743>] ? run_ksoftirqd+0xfb/0x1da
[ 201.924018] [<c1027648>] ? run_ksoftirqd+0x0/0x1da
[ 201.924018] [<c1036d2d>] ? kthread+0x52/0x57
[ 201.924018] [<c1036cdb>] ? kthread+0x0/0x57
[ 201.924018] [<c1002dbe>] ? kernel_thread_helper+0x6/0x10
[ 201.924018] Code: 44 24 2c 00 00 00 00 9c 5b fa b8 01 00 00 00 e8
8d f5 de ff 89 f8 e8 fd 83 e1 ff 8b 47 10 8b 15 d8 02 31 c1 83 e0 fc
39 d0 75 04 <0f> 0b eb fe 8b 02 e8 e0 82 e1 ff 89 c5 8b 35 d8 02 31 c1
8b 46
[ 201.924018] EIP: [<c122ca6e>] rt_spin_lock_slowlock+0x35/0x155
SS:ESP 0068:de44bba8
[ 201.924018] ---[ end trace d28d8d35cd5a7131 ]---
[ 201.924018] Fixing recursive fault but reboot is needed!
[ 202.672902] sched: RT throttling activated
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: 2.6.33.6-rt28 kernel oops while stressing network 2010-08-09 20:10 2.6.33.6-rt28 kernel oops while stressing network John Culvertson @ 2010-08-10 12:23 ` Patrice Kadionik 2010-08-10 13:00 ` Patrice Kadionik [not found] ` <AANLkTi=tPSeXTZkjPPm_MGmmOx2fZhryOkajgssv0EsX@mail.gmail.com> 1 sibling, 1 reply; 15+ messages in thread From: Patrice Kadionik @ 2010-08-10 12:23 UTC (permalink / raw) Cc: linux-rt-users Le 09/08/2010 22:10, John Culvertson a écrit : > Hello, > Hello, > I am trying to use the RT patches on an x86 industrial computer. I am > getting intermittent network hangs and kernel crashes when I load the > network with netperf. The unpatched kernel does not exhibit these > problems. The kernel is 2.6.33.6 patched with rt28. > > The computer has an AMD LX800 processor and two Intel 82559 10/100 PCI > Ethernet controllers. I have only seen the kernel crashes when > running netperf on both ports simultaneously. > I have ported PREEMPT-RT to the NIOS II architecture. NIOS II is a softcore processor from Altera. I have added to the NIOS II Linux port(http://sopc.et.ntust.edu.tw/) the hrtimer support and can now use cyclistest. I have done some measurements for having latency (my NIOS II target boards runs at 100 MHz!). I have used ping flooding from another powerful PC (CPU frequency > 2 GHz) and have noticed that after few seconds, the bounded latency I had arises up to 50 ms! My target board doesn't crash like you. I have spent time for understanding. The ping flooding is OK with a normal Linux kernel (few ms as latency in this case). I used wireshark to analyze the traffic and saw that my board with PREEMPT-RT support doesn't respond after few seconds to all ping requests. I've tried to put the IRQ thread of the Ethernet driver in a classical mode like with the standard Linux kernel through adding the IRQ_NODELAY flag with with request_irq() in the driver. My boards boots but crashs on the first ping because treatment is always done by the soft IRQ sirq-net-rx (this is this soft IRQ thread that causes your crash). The NIOS II has no ftrace support yet so no tool for studying latencies is available... I've done some researchs on the net on this problem and found the presentation "INTERRUPTS CONSIDERED HARMFUL" from Peter Chubb and Yang Song (http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.156.9914&rep=rep1&type=pdf). The paper presents the same testing environment like you and me: a target board under PREEMPT-RT and a Ethernet traffic generator that can generates a huge traffic load. They use cyclictest too.With heavy traffic, latency from cyclictest goes up to 50 ms (like me)! By analyzing traces (with ftrace), they saw that the soft IRQ sirq-net-rx takes too time for responding in case of heavy traffic load. The solution they have found was to modify the Ethernet driver (e1000) with no soft IRQ. I know now the source of my problem and can't have a realistic response time to ping flooding with a traffic generator that saturates the target board under PREEMPT-RT. In this case, the Ethernet driver must be revisited. You may have the same problem with another consequence: crash. Have you tried to ping flood just one Ethernet interface with heavy traffic? For latency measurement, I just use hackbench (http://devresources.linuxfoundation.org/craiger/hackbench/), stress (http://weather.ou.edu/~apw/projects/stress/) tools and dd commands. My latency time with cyclictest is bounded with heavy CPU load (min= 300µs max<1400 µs CPU@100 MHz) and know that I can have realistic response time in case of heavy Ethernet traffic (my NIOS II board has not enough CPU power in this case). Pat. > This is my first time using the RT patches, so I am not sure how to go > about resolving this. Any tips would be greatly appreciated. > > [ 201.514962] BUG: unable to handle kernel paging request at a0282044 > [ 201.516020] IP: [<c108d664>] free_block+0x4f/0xe5 > [ 201.516020] *pde = 00000000 > [ 201.516020] Oops: 0002 [#1] PREEMPT > [ 201.516020] last sysfs file: /sys/module/vt/parameters/default_utf8 > [ 201.516020] Modules linked in: evdev usbhid ohci_hcd geode_rng ecb > aes_i586 ehci_hcd aes_generic usbcore geode_aes nls_base > [ 201.516020] > [ 201.516020] Pid: 6, comm: sirq-net-rx/0 Tainted: G W > 2.6.33.6-rt28 #4 SL8/SL8 > [ 201.516020] EIP: 0060:[<c108d664>] EFLAGS: 00010202 CPU: 0 > [ 201.516020] EIP is at free_block+0x4f/0xe5 > [ 201.516020] EAX: d6d75060 EBX: de682500 ECX: 00000004 EDX: a0282040 > [ 201.516020] ESI: de682020 EDI: de431340 EBP: de40e5c0 ESP: de44bd74 > [ 201.516020] DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068 preempt:00000000 > [ 201.516020] Process sirq-net-rx/0 (pid: 6, ti=de44a000 > task=de420490 task.ti=de44a000) > [ 201.516020] Stack: > [ 201.516020] 00000003 00000000 0000001b de406688 00000001 de431340 > 00000000 de406660 > [ 201.516020]<0> 0000001b c108d835 00000000 de44bdc8 de44bdc8 > ddbd2060 de40e5c0 de431364 > [ 201.516020]<0> 00000000 de40e5c0 ddbd2060 ddbd2060 c108d581 > 00000000 00000000 d6e78620 > [ 201.516020] Call Trace: > [ 201.516020] [<c108d835>] ? __cache_free+0x7a/0xae > [ 201.516020] [<c108d581>] ? kmem_cache_free+0x1c/0x58 > [ 201.516020] [<c11d3493>] ? tcp_ack+0x3eb/0x12f5 > [ 201.516020] [<c11d4bd8>] ? tcp_rcv_established+0xb0/0x476 > [ 201.516020] [<c11da92f>] ? tcp_v4_do_rcv+0x129/0x28f > [ 201.516020] [<c11dbf43>] ? tcp_v4_rcv+0x339/0x523 > [ 201.516020] [<c11c3a8a>] ? ip_local_deliver_finish+0xf9/0x160 > [ 201.516020] [<c11c3925>] ? ip_rcv_finish+0x28a/0x29d > [ 201.516020] [<c11aceb4>] ? netif_receive_skb+0x1c2/0x1e9 > [ 201.516020] [<c118d368>] ? e100_poll+0x172/0x37c > [ 201.516020] [<c11af94c>] ? net_rx_action+0x53/0x100 > [ 201.516020] [<c1027743>] ? run_ksoftirqd+0xfb/0x1da > [ 201.516020] [<c1027648>] ? run_ksoftirqd+0x0/0x1da > [ 201.516020] [<c1036d2d>] ? kthread+0x52/0x57 > [ 201.516020] [<c1036cdb>] ? kthread+0x0/0x57 > [ 201.516020] [<c1002dbe>] ? kernel_thread_helper+0x6/0x10 > [ 201.516020] Code: 24 0c 8b 1c 82 89 d8 e8 34 fc ff ff 89 c6 e8 18 > f9 ff ff 85 c0 75 04 0f 0b eb fe 8b 76 1c 8b 44 24 28 8b 16 8b 7c 85 > 4c 8b 46 04<89> 42 04 89 10 2b 5e 0c c7 06 00 01 10 00 c7 46 04 00 02 > 20 00 > [ 201.516020] EIP: [<c108d664>] free_block+0x4f/0xe5 SS:ESP 0068:de44bd74 > [ 201.516020] CR2: 00000000a0282044 > [ 201.908587] ---[ end trace d28d8d35cd5a7130 ]--- > > [ 201.920053] ------------[ cut here ]------------ > [ 201.924018] kernel BUG at kernel/rtmutex.c:831! > [ 201.924018] invalid opcode: 0000 [#2] PREEMPT > [ 201.924018] last sysfs file: /sys/module/vt/parameters/default_utf8 > [ 201.924018] Modules linked in: evdev usbhid ohci_hcd geode_rng ecb > aes_i586 ehci_hcd aes_generic usbcore geode_aes nls_base > [ 201.924018] > [ 201.924018] Pid: 6, comm: sirq-net-rx/0 Tainted: G D W > 2.6.33.6-rt28 #4 SL8/SL8 > [ 201.924018] EIP: 0060:[<c122ca6e>] EFLAGS: 00010046 CPU: 0 > [ 201.924018] EIP is at rt_spin_lock_slowlock+0x35/0x155 > [ 201.924018] EAX: de420490 EBX: 00000292 ECX: 00000000 EDX: de420490 > [ 201.924018] ESI: c122ca39 EDI: c1321160 EBP: 00000000 ESP: de44bba8 > [ 201.924018] DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068 preempt:00000001 > [ 201.924018] Process sirq-net-rx/0 (pid: 6, ti=de44a000 > task=de420490 task.ti=de44a000) > [ 201.924018] Stack: > [ 201.924018] 00000030 00000046 de44bbd0 c102784a c1003c19 de120c7c > de226b3c de40a600 > [ 201.924018]<0> 00000000 c1002db0 de120c7c 00000000 c1322c40 > de226b3c c1321160 c122ca39 > [ 201.924018]<0> de120c64 00000000 c104582b de44bc08 de40e7a0 > c108d08a de120c7c c108d576 > [ 201.924018] Call Trace: > [ 201.924018] [<c102784a>] ? irq_exit+0x28/0x32 > [ 201.924018] [<c1003c19>] ? do_IRQ+0x61/0x71 > [ 201.924018] [<c1002db0>] ? common_interrupt+0x30/0x38 > [ 201.924018] [<c122ca39>] ? rt_spin_lock_slowlock+0x0/0x155 > [ 201.924018] [<c104582b>] ? rt_spin_lock_fastlock+0x52/0x55 > [ 201.924018] [<c108d08a>] ? _slab_irq_disable+0xd/0x15 > [ 201.924018] [<c108d576>] ? kmem_cache_free+0x11/0x58 > [ 201.924018] [<c109f603>] ? destroy_inode+0x1c/0x2b > [ 201.924018] [<c109eefe>] ? iput+0x47/0x49 > [ 201.924018] [<c109cfd1>] ? d_kill+0x2d/0x47 > [ 201.924018] [<c109d195>] ? __shrink_dcache_sb+0x1aa/0x247 > [ 201.924018] [<c109d4c0>] ? shrink_dcache_parent+0x26/0xd7 > [ 201.924018] [<c10c59f9>] ? proc_flush_task+0x7d/0x165 > [ 201.924018] [<c1024445>] ? release_task+0x18/0x2af > [ 201.924018] [<c102570c>] ? do_exit+0x4dd/0x547 > [ 201.924018] [<c1004d16>] ? oops_end+0x7f/0x83 > [ 201.924018] [<c1015165>] ? no_context+0x10c/0x115 > [ 201.924018] [<c10153ad>] ? do_page_fault+0x0/0x28f > [ 201.924018] [<c1015361>] ? bad_area_nosemaphore+0xa/0xc > [ 201.924018] [<c122d2fb>] ? error_code+0x6b/0x70 > [ 201.924018] [<c108d664>] ? free_block+0x4f/0xe5 > [ 201.924018] [<c108d835>] ? __cache_free+0x7a/0xae > [ 201.924018] [<c108d581>] ? kmem_cache_free+0x1c/0x58 > [ 201.924018] [<c11d3493>] ? tcp_ack+0x3eb/0x12f5 > [ 201.924018] [<c11d4bd8>] ? tcp_rcv_established+0xb0/0x476 > [ 201.924018] [<c11da92f>] ? tcp_v4_do_rcv+0x129/0x28f > [ 201.924018] [<c11dbf43>] ? tcp_v4_rcv+0x339/0x523 > [ 201.924018] [<c11c3a8a>] ? ip_local_deliver_finish+0xf9/0x160 > [ 201.924018] [<c11c3925>] ? ip_rcv_finish+0x28a/0x29d > [ 201.924018] [<c11aceb4>] ? netif_receive_skb+0x1c2/0x1e9 > [ 201.924018] [<c118d368>] ? e100_poll+0x172/0x37c > [ 201.924018] [<c11af94c>] ? net_rx_action+0x53/0x100 > [ 201.924018] [<c1027743>] ? run_ksoftirqd+0xfb/0x1da > [ 201.924018] [<c1027648>] ? run_ksoftirqd+0x0/0x1da > [ 201.924018] [<c1036d2d>] ? kthread+0x52/0x57 > [ 201.924018] [<c1036cdb>] ? kthread+0x0/0x57 > [ 201.924018] [<c1002dbe>] ? kernel_thread_helper+0x6/0x10 > [ 201.924018] Code: 44 24 2c 00 00 00 00 9c 5b fa b8 01 00 00 00 e8 > 8d f5 de ff 89 f8 e8 fd 83 e1 ff 8b 47 10 8b 15 d8 02 31 c1 83 e0 fc > 39 d0 75 04<0f> 0b eb fe 8b 02 e8 e0 82 e1 ff 89 c5 8b 35 d8 02 31 c1 > 8b 46 > [ 201.924018] EIP: [<c122ca6e>] rt_spin_lock_slowlock+0x35/0x155 > SS:ESP 0068:de44bba8 > [ 201.924018] ---[ end trace d28d8d35cd5a7131 ]--- > [ 201.924018] Fixing recursive fault but reboot is needed! > [ 202.672902] sched: RT throttling activated > -- > To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- Patrice Kadionik. F6KQH / F4CUQ ----------- +----------------------------------------------------------------------+ +"Tout doit etre aussi simple que possible, pas seulement plus simple" + +----------------------------------------------------------------------+ + Patrice Kadionik http://www.enseirb-matmeca.fr/~kadionik + + IMS Laboratory http://www.ims-bordeaux.fr/ + + ENSEIRB-MATMECA http://www.enseirb-matmeca.fr + + PO BOX 99 fax : +33 5.56.37.20.23 + + 33402 TALENCE Cedex voice : +33 5.56.84.23.47 + + FRANCE mailto:patrice.kadionik@ims-bordeaux.fr + +----------------------------------------------------------------------+ -- To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: 2.6.33.6-rt28 kernel oops while stressing network 2010-08-10 12:23 ` Patrice Kadionik @ 2010-08-10 13:00 ` Patrice Kadionik 2010-08-12 16:09 ` Patrice Kadionik 0 siblings, 1 reply; 15+ messages in thread From: Patrice Kadionik @ 2010-08-10 13:00 UTC (permalink / raw) To: linux-rt-users Le 10/08/2010 14:23, Patrice Kadionik a écrit : > Le 09/08/2010 22:10, John Culvertson a écrit : >> Hello, > Hello, > >> I am trying to use the RT patches on an x86 industrial computer. I am >> getting intermittent network hangs and kernel crashes when I load the >> network with netperf. The unpatched kernel does not exhibit these >> problems. The kernel is 2.6.33.6 patched with rt28. >> >> The computer has an AMD LX800 processor and two Intel 82559 10/100 PCI >> Ethernet controllers. I have only seen the kernel crashes when >> running netperf on both ports simultaneously. > I have ported PREEMPT-RT to the NIOS II architecture. NIOS II is a > softcore processor from Altera. > I have added to the NIOS II Linux port(http://sopc.et.ntust.edu.tw/) > the hrtimer support and can now use cyclistest. > I have done some measurements for having latency (my NIOS II target > boards runs at 100 MHz!). > I have used ping flooding from another powerful PC (CPU frequency > 2 > GHz) and have noticed that after few seconds, the bounded latency I > had arises up to 50 ms! My target board doesn't crash like you. > I have spent time for understanding. The ping flooding is OK with a > normal Linux kernel (few ms as latency in this case). I used wireshark > to analyze the traffic and saw that my board with PREEMPT-RT support > doesn't respond after few seconds to all ping requests. > > I've tried to put the IRQ thread of the Ethernet driver in a classical > mode like with the standard Linux kernel through adding the > IRQ_NODELAY flag with with request_irq() in the driver. My boards > boots but crashs on the first ping because treatment is always done by > the soft IRQ sirq-net-rx (this is this soft IRQ thread that causes > your crash). > The NIOS II has no ftrace support yet so no tool for studying > latencies is available... > > I've done some researchs on the net on this problem and found the > presentation "INTERRUPTS CONSIDERED HARMFUL" from Peter Chubb and Yang > Song > (http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.156.9914&rep=rep1&type=pdf). > > > The paper presents the same testing environment like you and me: a > target board under PREEMPT-RT and a Ethernet traffic generator that > can generates a huge traffic load. They use cyclictest too.With heavy > traffic, latency from cyclictest goes up to 50 ms (like me)! By > analyzing traces (with ftrace), they saw that the soft IRQ sirq-net-rx > takes too time for responding in case of heavy traffic load. The > solution they have found was to modify the Ethernet driver (e1000) > with no soft IRQ. > I know now the source of my problem and can't have a realistic > response time to ping flooding with a traffic generator that saturates > the target board under PREEMPT-RT. In this case, the Ethernet driver > must be revisited. > You may have the same problem with another consequence: crash. Have > you tried to ping flood just one Ethernet interface with heavy traffic? > For latency measurement, I just use hackbench > (http://devresources.linuxfoundation.org/craiger/hackbench/), stress > (http://weather.ou.edu/~apw/projects/stress/) tools and dd commands. > My latency time with cyclictest is bounded with heavy CPU load (min= > 300µs max<1400 µs CPU@100 MHz) and know that I can have realistic > response time in case of heavy Ethernet traffic (my NIOS II board has > not enough CPU power in this case). read: ...know that I CAN'T have realistic response time in case of heavy Ethernet traffic (my NIOS II board has not enough CPU power in this case). Sorry. Pat. > > Pat. > > >> This is my first time using the RT patches, so I am not sure how to go >> about resolving this. Any tips would be greatly appreciated. >> >> [ 201.514962] BUG: unable to handle kernel paging request at a0282044 >> [ 201.516020] IP: [<c108d664>] free_block+0x4f/0xe5 >> [ 201.516020] *pde = 00000000 >> [ 201.516020] Oops: 0002 [#1] PREEMPT >> [ 201.516020] last sysfs file: /sys/module/vt/parameters/default_utf8 >> [ 201.516020] Modules linked in: evdev usbhid ohci_hcd geode_rng ecb >> aes_i586 ehci_hcd aes_generic usbcore geode_aes nls_base >> [ 201.516020] >> [ 201.516020] Pid: 6, comm: sirq-net-rx/0 Tainted: G W >> 2.6.33.6-rt28 #4 SL8/SL8 >> [ 201.516020] EIP: 0060:[<c108d664>] EFLAGS: 00010202 CPU: 0 >> [ 201.516020] EIP is at free_block+0x4f/0xe5 >> [ 201.516020] EAX: d6d75060 EBX: de682500 ECX: 00000004 EDX: a0282040 >> [ 201.516020] ESI: de682020 EDI: de431340 EBP: de40e5c0 ESP: de44bd74 >> [ 201.516020] DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068 >> preempt:00000000 >> [ 201.516020] Process sirq-net-rx/0 (pid: 6, ti=de44a000 >> task=de420490 task.ti=de44a000) >> [ 201.516020] Stack: >> [ 201.516020] 00000003 00000000 0000001b de406688 00000001 de431340 >> 00000000 de406660 >> [ 201.516020]<0> 0000001b c108d835 00000000 de44bdc8 de44bdc8 >> ddbd2060 de40e5c0 de431364 >> [ 201.516020]<0> 00000000 de40e5c0 ddbd2060 ddbd2060 c108d581 >> 00000000 00000000 d6e78620 >> [ 201.516020] Call Trace: >> [ 201.516020] [<c108d835>] ? __cache_free+0x7a/0xae >> [ 201.516020] [<c108d581>] ? kmem_cache_free+0x1c/0x58 >> [ 201.516020] [<c11d3493>] ? tcp_ack+0x3eb/0x12f5 >> [ 201.516020] [<c11d4bd8>] ? tcp_rcv_established+0xb0/0x476 >> [ 201.516020] [<c11da92f>] ? tcp_v4_do_rcv+0x129/0x28f >> [ 201.516020] [<c11dbf43>] ? tcp_v4_rcv+0x339/0x523 >> [ 201.516020] [<c11c3a8a>] ? ip_local_deliver_finish+0xf9/0x160 >> [ 201.516020] [<c11c3925>] ? ip_rcv_finish+0x28a/0x29d >> [ 201.516020] [<c11aceb4>] ? netif_receive_skb+0x1c2/0x1e9 >> [ 201.516020] [<c118d368>] ? e100_poll+0x172/0x37c >> [ 201.516020] [<c11af94c>] ? net_rx_action+0x53/0x100 >> [ 201.516020] [<c1027743>] ? run_ksoftirqd+0xfb/0x1da >> [ 201.516020] [<c1027648>] ? run_ksoftirqd+0x0/0x1da >> [ 201.516020] [<c1036d2d>] ? kthread+0x52/0x57 >> [ 201.516020] [<c1036cdb>] ? kthread+0x0/0x57 >> [ 201.516020] [<c1002dbe>] ? kernel_thread_helper+0x6/0x10 >> [ 201.516020] Code: 24 0c 8b 1c 82 89 d8 e8 34 fc ff ff 89 c6 e8 18 >> f9 ff ff 85 c0 75 04 0f 0b eb fe 8b 76 1c 8b 44 24 28 8b 16 8b 7c 85 >> 4c 8b 46 04<89> 42 04 89 10 2b 5e 0c c7 06 00 01 10 00 c7 46 04 00 02 >> 20 00 >> [ 201.516020] EIP: [<c108d664>] free_block+0x4f/0xe5 SS:ESP >> 0068:de44bd74 >> [ 201.516020] CR2: 00000000a0282044 >> [ 201.908587] ---[ end trace d28d8d35cd5a7130 ]--- >> >> [ 201.920053] ------------[ cut here ]------------ >> [ 201.924018] kernel BUG at kernel/rtmutex.c:831! >> [ 201.924018] invalid opcode: 0000 [#2] PREEMPT >> [ 201.924018] last sysfs file: /sys/module/vt/parameters/default_utf8 >> [ 201.924018] Modules linked in: evdev usbhid ohci_hcd geode_rng ecb >> aes_i586 ehci_hcd aes_generic usbcore geode_aes nls_base >> [ 201.924018] >> [ 201.924018] Pid: 6, comm: sirq-net-rx/0 Tainted: G D W >> 2.6.33.6-rt28 #4 SL8/SL8 >> [ 201.924018] EIP: 0060:[<c122ca6e>] EFLAGS: 00010046 CPU: 0 >> [ 201.924018] EIP is at rt_spin_lock_slowlock+0x35/0x155 >> [ 201.924018] EAX: de420490 EBX: 00000292 ECX: 00000000 EDX: de420490 >> [ 201.924018] ESI: c122ca39 EDI: c1321160 EBP: 00000000 ESP: de44bba8 >> [ 201.924018] DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068 >> preempt:00000001 >> [ 201.924018] Process sirq-net-rx/0 (pid: 6, ti=de44a000 >> task=de420490 task.ti=de44a000) >> [ 201.924018] Stack: >> [ 201.924018] 00000030 00000046 de44bbd0 c102784a c1003c19 de120c7c >> de226b3c de40a600 >> [ 201.924018]<0> 00000000 c1002db0 de120c7c 00000000 c1322c40 >> de226b3c c1321160 c122ca39 >> [ 201.924018]<0> de120c64 00000000 c104582b de44bc08 de40e7a0 >> c108d08a de120c7c c108d576 >> [ 201.924018] Call Trace: >> [ 201.924018] [<c102784a>] ? irq_exit+0x28/0x32 >> [ 201.924018] [<c1003c19>] ? do_IRQ+0x61/0x71 >> [ 201.924018] [<c1002db0>] ? common_interrupt+0x30/0x38 >> [ 201.924018] [<c122ca39>] ? rt_spin_lock_slowlock+0x0/0x155 >> [ 201.924018] [<c104582b>] ? rt_spin_lock_fastlock+0x52/0x55 >> [ 201.924018] [<c108d08a>] ? _slab_irq_disable+0xd/0x15 >> [ 201.924018] [<c108d576>] ? kmem_cache_free+0x11/0x58 >> [ 201.924018] [<c109f603>] ? destroy_inode+0x1c/0x2b >> [ 201.924018] [<c109eefe>] ? iput+0x47/0x49 >> [ 201.924018] [<c109cfd1>] ? d_kill+0x2d/0x47 >> [ 201.924018] [<c109d195>] ? __shrink_dcache_sb+0x1aa/0x247 >> [ 201.924018] [<c109d4c0>] ? shrink_dcache_parent+0x26/0xd7 >> [ 201.924018] [<c10c59f9>] ? proc_flush_task+0x7d/0x165 >> [ 201.924018] [<c1024445>] ? release_task+0x18/0x2af >> [ 201.924018] [<c102570c>] ? do_exit+0x4dd/0x547 >> [ 201.924018] [<c1004d16>] ? oops_end+0x7f/0x83 >> [ 201.924018] [<c1015165>] ? no_context+0x10c/0x115 >> [ 201.924018] [<c10153ad>] ? do_page_fault+0x0/0x28f >> [ 201.924018] [<c1015361>] ? bad_area_nosemaphore+0xa/0xc >> [ 201.924018] [<c122d2fb>] ? error_code+0x6b/0x70 >> [ 201.924018] [<c108d664>] ? free_block+0x4f/0xe5 >> [ 201.924018] [<c108d835>] ? __cache_free+0x7a/0xae >> [ 201.924018] [<c108d581>] ? kmem_cache_free+0x1c/0x58 >> [ 201.924018] [<c11d3493>] ? tcp_ack+0x3eb/0x12f5 >> [ 201.924018] [<c11d4bd8>] ? tcp_rcv_established+0xb0/0x476 >> [ 201.924018] [<c11da92f>] ? tcp_v4_do_rcv+0x129/0x28f >> [ 201.924018] [<c11dbf43>] ? tcp_v4_rcv+0x339/0x523 >> [ 201.924018] [<c11c3a8a>] ? ip_local_deliver_finish+0xf9/0x160 >> [ 201.924018] [<c11c3925>] ? ip_rcv_finish+0x28a/0x29d >> [ 201.924018] [<c11aceb4>] ? netif_receive_skb+0x1c2/0x1e9 >> [ 201.924018] [<c118d368>] ? e100_poll+0x172/0x37c >> [ 201.924018] [<c11af94c>] ? net_rx_action+0x53/0x100 >> [ 201.924018] [<c1027743>] ? run_ksoftirqd+0xfb/0x1da >> [ 201.924018] [<c1027648>] ? run_ksoftirqd+0x0/0x1da >> [ 201.924018] [<c1036d2d>] ? kthread+0x52/0x57 >> [ 201.924018] [<c1036cdb>] ? kthread+0x0/0x57 >> [ 201.924018] [<c1002dbe>] ? kernel_thread_helper+0x6/0x10 >> [ 201.924018] Code: 44 24 2c 00 00 00 00 9c 5b fa b8 01 00 00 00 e8 >> 8d f5 de ff 89 f8 e8 fd 83 e1 ff 8b 47 10 8b 15 d8 02 31 c1 83 e0 fc >> 39 d0 75 04<0f> 0b eb fe 8b 02 e8 e0 82 e1 ff 89 c5 8b 35 d8 02 31 c1 >> 8b 46 >> [ 201.924018] EIP: [<c122ca6e>] rt_spin_lock_slowlock+0x35/0x155 >> SS:ESP 0068:de44bba8 >> [ 201.924018] ---[ end trace d28d8d35cd5a7131 ]--- >> [ 201.924018] Fixing recursive fault but reboot is needed! >> [ 202.672902] sched: RT throttling activated >> -- >> To unsubscribe from this list: send the line "unsubscribe >> linux-rt-users" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > -- Patrice Kadionik. F6KQH / F4CUQ ----------- +----------------------------------------------------------------------+ +"Tout doit etre aussi simple que possible, pas seulement plus simple" + +----------------------------------------------------------------------+ + Patrice Kadionik http://www.enseirb-matmeca.fr/~kadionik + + IMS Laboratory http://www.ims-bordeaux.fr/ + + ENSEIRB-MATMECA http://www.enseirb-matmeca.fr + + PO BOX 99 fax : +33 5.56.37.20.23 + + 33402 TALENCE Cedex voice : +33 5.56.84.23.47 + + FRANCE mailto:patrice.kadionik@ims-bordeaux.fr + +----------------------------------------------------------------------+ -- To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: 2.6.33.6-rt28 kernel oops while stressing network 2010-08-10 13:00 ` Patrice Kadionik @ 2010-08-12 16:09 ` Patrice Kadionik 0 siblings, 0 replies; 15+ messages in thread From: Patrice Kadionik @ 2010-08-12 16:09 UTC (permalink / raw) To: linux-rt-users Le 10/08/2010 15:00, Patrice Kadionik a écrit : Hello, Thanks to John (and this paper http://lwn.net/images/conf/rtlws11/papers/proc/p11.pdf), I have an explanation on the fact that my latency with cyclictest grows up to 50 ms when I perform ping flooding of my NIOS II board target under PREEMPT-RT. I cite: ”Ping flooding the target system from a different machine will cause a huge number of network interrupts. Be careful: The Floodping will cause the IRQ Handler of the network interface to run very often. It’ll also increase the runtime of the Soft IRQ Handlers for the RX and TX handling. All of them are Kernel Threads with Realtime priority. So, on PREEMPT RT please check /proc/sys/kernel/sched_rt_runtime_ns. This defines a threshold for the runtime of Realtime Tasks (to prevent a starvation of the low priority tasks). The default value is 950ms, which means if the rt runtime exceeds 950ms, the rt tasks won’t be scheduled up to the full second. You can disable this behaviour by writing -1 to sched_rt_runtime_ns." More precisely here: http://www.kernel.org/doc/Documentation/scheduler/sched-rt-group.txt By default, I have : # cat /proc/sys/kernel/sched_rt_period_us 1000000 # cat /proc/sys/kernel/sched_rt_runtime_us 950000 That means that during 1 second, 95 % of time is for RT threads and 5 % remaining time for the non-RT threads. When I ping floog my board, with wireshark, I've noticed that after few seconds, every second, I had a burst of ping responses (no explanation at this time). It was regular and every second. And my latency measured with cyclictest exploded up to 50 ms. I have put now: # cat /proc/sys/kernel/sched_rt_period_us 2000000 # cat /proc/sys/kernel/sched_rt_runtime_us 1900000 With ping flooding, the burst of ping responses (near the double) appears exactly every 2 seconds. I have now the explanation of the burst of ping responses every second with ping flooding. By default, 95 % is not enough for treating my ping flooding (the soft IRQ sirq-net-rx takes too time). Then RT treatment is stopped for the last 5% remaining time for non-RT threads. The remaining RT job is done during the next second in the RT slice (950 ms), the origin of the burst. I have done: # echo -1 > /proc/sys/kernel/sched_rt_runtime_us for disabling this time slicing between RT and non-RT threads and leaves all the time to the RT threads if necessary (that's what I want with a realtime system). When I do that, the latency is now self contained and doesn't explose. It fixes my problem. I've verified that I haven't yet no burst of ping responses every second. I will check with a scope if it's OK when I'm generating a periodic signal on a GPIO pin during ping flooding. Pat. > Le 10/08/2010 14:23, Patrice Kadionik a écrit : >> Le 09/08/2010 22:10, John Culvertson a écrit : >>> Hello, >> Hello, >> >>> I am trying to use the RT patches on an x86 industrial computer. I am >>> getting intermittent network hangs and kernel crashes when I load the >>> network with netperf. The unpatched kernel does not exhibit these >>> problems. The kernel is 2.6.33.6 patched with rt28. >>> >>> The computer has an AMD LX800 processor and two Intel 82559 10/100 PCI >>> Ethernet controllers. I have only seen the kernel crashes when >>> running netperf on both ports simultaneously. >> I have ported PREEMPT-RT to the NIOS II architecture. NIOS II is a >> softcore processor from Altera. >> I have added to the NIOS II Linux port(http://sopc.et.ntust.edu.tw/) >> the hrtimer support and can now use cyclistest. >> I have done some measurements for having latency (my NIOS II target >> boards runs at 100 MHz!). >> I have used ping flooding from another powerful PC (CPU frequency > 2 >> GHz) and have noticed that after few seconds, the bounded latency I >> had arises up to 50 ms! My target board doesn't crash like you. >> I have spent time for understanding. The ping flooding is OK with a >> normal Linux kernel (few ms as latency in this case). I used >> wireshark to analyze the traffic and saw that my board with >> PREEMPT-RT support doesn't respond after few seconds to all ping >> requests. >> >> I've tried to put the IRQ thread of the Ethernet driver in a >> classical mode like with the standard Linux kernel through adding the >> IRQ_NODELAY flag with with request_irq() in the driver. My boards >> boots but crashs on the first ping because treatment is always done >> by the soft IRQ sirq-net-rx (this is this soft IRQ thread that causes >> your crash). >> The NIOS II has no ftrace support yet so no tool for studying >> latencies is available... >> >> I've done some researchs on the net on this problem and found the >> presentation "INTERRUPTS CONSIDERED HARMFUL" from Peter Chubb and >> Yang Song >> (http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.156.9914&rep=rep1&type=pdf). >> >> >> The paper presents the same testing environment like you and me: a >> target board under PREEMPT-RT and a Ethernet traffic generator that >> can generates a huge traffic load. They use cyclictest too.With heavy >> traffic, latency from cyclictest goes up to 50 ms (like me)! By >> analyzing traces (with ftrace), they saw that the soft IRQ >> sirq-net-rx takes too time for responding in case of heavy traffic >> load. The solution they have found was to modify the Ethernet driver >> (e1000) with no soft IRQ. >> I know now the source of my problem and can't have a realistic >> response time to ping flooding with a traffic generator that >> saturates the target board under PREEMPT-RT. In this case, the >> Ethernet driver must be revisited. >> You may have the same problem with another consequence: crash. Have >> you tried to ping flood just one Ethernet interface with heavy traffic? >> For latency measurement, I just use hackbench >> (http://devresources.linuxfoundation.org/craiger/hackbench/), stress >> (http://weather.ou.edu/~apw/projects/stress/) tools and dd commands. >> My latency time with cyclictest is bounded with heavy CPU load (min= >> 300µs max<1400 µs CPU@100 MHz) and know that I can have realistic >> response time in case of heavy Ethernet traffic (my NIOS II board has >> not enough CPU power in this case). > read: > ...know that I CAN'T have realistic response time in case of heavy > Ethernet traffic (my NIOS II board has not enough CPU power in this > case). > > Sorry. > Pat. >> >> Pat. >> >> >>> This is my first time using the RT patches, so I am not sure how to go >>> about resolving this. Any tips would be greatly appreciated. >>> >>> [ 201.514962] BUG: unable to handle kernel paging request at a0282044 >>> [ 201.516020] IP: [<c108d664>] free_block+0x4f/0xe5 >>> [ 201.516020] *pde = 00000000 >>> [ 201.516020] Oops: 0002 [#1] PREEMPT >>> [ 201.516020] last sysfs file: /sys/module/vt/parameters/default_utf8 >>> [ 201.516020] Modules linked in: evdev usbhid ohci_hcd geode_rng ecb >>> aes_i586 ehci_hcd aes_generic usbcore geode_aes nls_base >>> [ 201.516020] >>> [ 201.516020] Pid: 6, comm: sirq-net-rx/0 Tainted: G W >>> 2.6.33.6-rt28 #4 SL8/SL8 >>> [ 201.516020] EIP: 0060:[<c108d664>] EFLAGS: 00010202 CPU: 0 >>> [ 201.516020] EIP is at free_block+0x4f/0xe5 >>> [ 201.516020] EAX: d6d75060 EBX: de682500 ECX: 00000004 EDX: a0282040 >>> [ 201.516020] ESI: de682020 EDI: de431340 EBP: de40e5c0 ESP: de44bd74 >>> [ 201.516020] DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068 >>> preempt:00000000 >>> [ 201.516020] Process sirq-net-rx/0 (pid: 6, ti=de44a000 >>> task=de420490 task.ti=de44a000) >>> [ 201.516020] Stack: >>> [ 201.516020] 00000003 00000000 0000001b de406688 00000001 de431340 >>> 00000000 de406660 >>> [ 201.516020]<0> 0000001b c108d835 00000000 de44bdc8 de44bdc8 >>> ddbd2060 de40e5c0 de431364 >>> [ 201.516020]<0> 00000000 de40e5c0 ddbd2060 ddbd2060 c108d581 >>> 00000000 00000000 d6e78620 >>> [ 201.516020] Call Trace: >>> [ 201.516020] [<c108d835>] ? __cache_free+0x7a/0xae >>> [ 201.516020] [<c108d581>] ? kmem_cache_free+0x1c/0x58 >>> [ 201.516020] [<c11d3493>] ? tcp_ack+0x3eb/0x12f5 >>> [ 201.516020] [<c11d4bd8>] ? tcp_rcv_established+0xb0/0x476 >>> [ 201.516020] [<c11da92f>] ? tcp_v4_do_rcv+0x129/0x28f >>> [ 201.516020] [<c11dbf43>] ? tcp_v4_rcv+0x339/0x523 >>> [ 201.516020] [<c11c3a8a>] ? ip_local_deliver_finish+0xf9/0x160 >>> [ 201.516020] [<c11c3925>] ? ip_rcv_finish+0x28a/0x29d >>> [ 201.516020] [<c11aceb4>] ? netif_receive_skb+0x1c2/0x1e9 >>> [ 201.516020] [<c118d368>] ? e100_poll+0x172/0x37c >>> [ 201.516020] [<c11af94c>] ? net_rx_action+0x53/0x100 >>> [ 201.516020] [<c1027743>] ? run_ksoftirqd+0xfb/0x1da >>> [ 201.516020] [<c1027648>] ? run_ksoftirqd+0x0/0x1da >>> [ 201.516020] [<c1036d2d>] ? kthread+0x52/0x57 >>> [ 201.516020] [<c1036cdb>] ? kthread+0x0/0x57 >>> [ 201.516020] [<c1002dbe>] ? kernel_thread_helper+0x6/0x10 >>> [ 201.516020] Code: 24 0c 8b 1c 82 89 d8 e8 34 fc ff ff 89 c6 e8 18 >>> f9 ff ff 85 c0 75 04 0f 0b eb fe 8b 76 1c 8b 44 24 28 8b 16 8b 7c 85 >>> 4c 8b 46 04<89> 42 04 89 10 2b 5e 0c c7 06 00 01 10 00 c7 46 04 00 02 >>> 20 00 >>> [ 201.516020] EIP: [<c108d664>] free_block+0x4f/0xe5 SS:ESP >>> 0068:de44bd74 >>> [ 201.516020] CR2: 00000000a0282044 >>> [ 201.908587] ---[ end trace d28d8d35cd5a7130 ]--- >>> >>> [ 201.920053] ------------[ cut here ]------------ >>> [ 201.924018] kernel BUG at kernel/rtmutex.c:831! >>> [ 201.924018] invalid opcode: 0000 [#2] PREEMPT >>> [ 201.924018] last sysfs file: /sys/module/vt/parameters/default_utf8 >>> [ 201.924018] Modules linked in: evdev usbhid ohci_hcd geode_rng ecb >>> aes_i586 ehci_hcd aes_generic usbcore geode_aes nls_base >>> [ 201.924018] >>> [ 201.924018] Pid: 6, comm: sirq-net-rx/0 Tainted: G D W >>> 2.6.33.6-rt28 #4 SL8/SL8 >>> [ 201.924018] EIP: 0060:[<c122ca6e>] EFLAGS: 00010046 CPU: 0 >>> [ 201.924018] EIP is at rt_spin_lock_slowlock+0x35/0x155 >>> [ 201.924018] EAX: de420490 EBX: 00000292 ECX: 00000000 EDX: de420490 >>> [ 201.924018] ESI: c122ca39 EDI: c1321160 EBP: 00000000 ESP: de44bba8 >>> [ 201.924018] DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068 >>> preempt:00000001 >>> [ 201.924018] Process sirq-net-rx/0 (pid: 6, ti=de44a000 >>> task=de420490 task.ti=de44a000) >>> [ 201.924018] Stack: >>> [ 201.924018] 00000030 00000046 de44bbd0 c102784a c1003c19 de120c7c >>> de226b3c de40a600 >>> [ 201.924018]<0> 00000000 c1002db0 de120c7c 00000000 c1322c40 >>> de226b3c c1321160 c122ca39 >>> [ 201.924018]<0> de120c64 00000000 c104582b de44bc08 de40e7a0 >>> c108d08a de120c7c c108d576 >>> [ 201.924018] Call Trace: >>> [ 201.924018] [<c102784a>] ? irq_exit+0x28/0x32 >>> [ 201.924018] [<c1003c19>] ? do_IRQ+0x61/0x71 >>> [ 201.924018] [<c1002db0>] ? common_interrupt+0x30/0x38 >>> [ 201.924018] [<c122ca39>] ? rt_spin_lock_slowlock+0x0/0x155 >>> [ 201.924018] [<c104582b>] ? rt_spin_lock_fastlock+0x52/0x55 >>> [ 201.924018] [<c108d08a>] ? _slab_irq_disable+0xd/0x15 >>> [ 201.924018] [<c108d576>] ? kmem_cache_free+0x11/0x58 >>> [ 201.924018] [<c109f603>] ? destroy_inode+0x1c/0x2b >>> [ 201.924018] [<c109eefe>] ? iput+0x47/0x49 >>> [ 201.924018] [<c109cfd1>] ? d_kill+0x2d/0x47 >>> [ 201.924018] [<c109d195>] ? __shrink_dcache_sb+0x1aa/0x247 >>> [ 201.924018] [<c109d4c0>] ? shrink_dcache_parent+0x26/0xd7 >>> [ 201.924018] [<c10c59f9>] ? proc_flush_task+0x7d/0x165 >>> [ 201.924018] [<c1024445>] ? release_task+0x18/0x2af >>> [ 201.924018] [<c102570c>] ? do_exit+0x4dd/0x547 >>> [ 201.924018] [<c1004d16>] ? oops_end+0x7f/0x83 >>> [ 201.924018] [<c1015165>] ? no_context+0x10c/0x115 >>> [ 201.924018] [<c10153ad>] ? do_page_fault+0x0/0x28f >>> [ 201.924018] [<c1015361>] ? bad_area_nosemaphore+0xa/0xc >>> [ 201.924018] [<c122d2fb>] ? error_code+0x6b/0x70 >>> [ 201.924018] [<c108d664>] ? free_block+0x4f/0xe5 >>> [ 201.924018] [<c108d835>] ? __cache_free+0x7a/0xae >>> [ 201.924018] [<c108d581>] ? kmem_cache_free+0x1c/0x58 >>> [ 201.924018] [<c11d3493>] ? tcp_ack+0x3eb/0x12f5 >>> [ 201.924018] [<c11d4bd8>] ? tcp_rcv_established+0xb0/0x476 >>> [ 201.924018] [<c11da92f>] ? tcp_v4_do_rcv+0x129/0x28f >>> [ 201.924018] [<c11dbf43>] ? tcp_v4_rcv+0x339/0x523 >>> [ 201.924018] [<c11c3a8a>] ? ip_local_deliver_finish+0xf9/0x160 >>> [ 201.924018] [<c11c3925>] ? ip_rcv_finish+0x28a/0x29d >>> [ 201.924018] [<c11aceb4>] ? netif_receive_skb+0x1c2/0x1e9 >>> [ 201.924018] [<c118d368>] ? e100_poll+0x172/0x37c >>> [ 201.924018] [<c11af94c>] ? net_rx_action+0x53/0x100 >>> [ 201.924018] [<c1027743>] ? run_ksoftirqd+0xfb/0x1da >>> [ 201.924018] [<c1027648>] ? run_ksoftirqd+0x0/0x1da >>> [ 201.924018] [<c1036d2d>] ? kthread+0x52/0x57 >>> [ 201.924018] [<c1036cdb>] ? kthread+0x0/0x57 >>> [ 201.924018] [<c1002dbe>] ? kernel_thread_helper+0x6/0x10 >>> [ 201.924018] Code: 44 24 2c 00 00 00 00 9c 5b fa b8 01 00 00 00 e8 >>> 8d f5 de ff 89 f8 e8 fd 83 e1 ff 8b 47 10 8b 15 d8 02 31 c1 83 e0 fc >>> 39 d0 75 04<0f> 0b eb fe 8b 02 e8 e0 82 e1 ff 89 c5 8b 35 d8 02 31 c1 >>> 8b 46 >>> [ 201.924018] EIP: [<c122ca6e>] rt_spin_lock_slowlock+0x35/0x155 >>> SS:ESP 0068:de44bba8 >>> [ 201.924018] ---[ end trace d28d8d35cd5a7131 ]--- >>> [ 201.924018] Fixing recursive fault but reboot is needed! >>> [ 202.672902] sched: RT throttling activated >>> -- >>> To unsubscribe from this list: send the line "unsubscribe >>> linux-rt-users" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >> >> > > -- Patrice Kadionik. F6KQH / F4CUQ ----------- +----------------------------------------------------------------------+ +"Tout doit etre aussi simple que possible, pas seulement plus simple" + +----------------------------------------------------------------------+ + Patrice Kadionik http://www.enseirb-matmeca.fr/~kadionik + + IMS Laboratory http://www.ims-bordeaux.fr/ + + ENSEIRB-MATMECA http://www.enseirb-matmeca.fr + + PO BOX 99 fax : +33 5.56.37.20.23 + + 33402 TALENCE Cedex voice : +33 5.56.84.23.47 + + FRANCE mailto:patrice.kadionik@ims-bordeaux.fr + +----------------------------------------------------------------------+ -- To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <AANLkTi=tPSeXTZkjPPm_MGmmOx2fZhryOkajgssv0EsX@mail.gmail.com>]
* Re: 2.6.33.6-rt28 kernel oops while stressing network [not found] ` <AANLkTi=tPSeXTZkjPPm_MGmmOx2fZhryOkajgssv0EsX@mail.gmail.com> @ 2010-08-11 16:53 ` John Culvertson 2010-08-13 17:37 ` John Culvertson 2010-08-27 10:33 ` Thomas Gleixner 0 siblings, 2 replies; 15+ messages in thread From: John Culvertson @ 2010-08-11 16:53 UTC (permalink / raw) To: linux-rt-users I updated to 2.6.33.7-rt29, and I am seeing similar symptoms. [ 2120.781166] BUG: unable to handle kernel paging request at c11cd497 [ 2120.784018] IP: [<c11d5ce2>] tcp_set_skb_tso_segs+0x33/0x85 [ 2120.784018] *pde = 1d7f6063 *pte = 011cd161 [ 2120.784018] Oops: 0003 [#1] PREEMPT [ 2120.784018] last sysfs file: /sys/devices/pci0000:00/0000:00:11.0/firmware/0000:00:11.0/loading [ 2120.784018] Modules linked in: evdev usbhid ohci_hcd geode_rng ecb ehci_hcd aes_i586 aes_generic usbcore geode_aes nls_base [ 2120.784018] [ 2120.784018] Pid: 6, comm: sirq-net-rx/0 Tainted: G W 2.6.33.7-rt29 #2 SL8/SL8 [ 2120.784018] EIP: 0060:[<c11d5ce2>] EFLAGS: 00010287 CPU: 0 [ 2120.784018] EIP is at tcp_set_skb_tso_segs+0x33/0x85 [ 2120.784018] EAX: c11cd48f EBX: de78e7e0 ECX: 000005a8 EDX: 00000000 [ 2120.784018] ESI: dd73d1a0 EDI: 0d0005a8 EBP: de78e7e0 ESP: de44bdc4 [ 2120.784018] DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068 preempt:00000000 [ 2120.784018] Process sirq-net-rx/0 (pid: 6, ti=de44a000 task=de420490 task.ti=de44a000) [ 2120.784018] Stack: [ 2120.784018] 0006f21b de78e7e0 000005a8 dd73d1a0 de78e7e0 c11d5ee3 000005a8 dd73d1a0 [ 2120.784018] <0> 00000004 c11d79cf 00000000 000005a8 00000002 00000001 00000001 000005a8 [ 2120.784018] <0> 00000000 dd73d1a0 de44be2c 00000000 c11d60ec dd73d1a0 000005a8 00000020 [ 2120.784018] Call Trace: [ 2120.784018] [<c11d5ee3>] ? tcp_init_tso_segs+0x31/0x41 [ 2120.784018] [<c11d79cf>] ? tcp_write_xmit+0x35a/0x70a [ 2120.784018] [<c11d60ec>] ? tcp_established_options+0x1c/0x8d [ 2120.784018] [<c11d6198>] ? tcp_current_mss+0x3b/0x56 [ 2120.784018] [<c11d7d9d>] ? __tcp_push_pending_frames+0x1e/0x50 [ 2120.784018] [<c11d4451>] ? tcp_data_snd_check+0x1c/0xe6 [ 2120.784018] [<c11d4c7e>] ? tcp_rcv_established+0xbe/0x476 [ 2120.784018] [<c11da9d7>] ? tcp_v4_do_rcv+0x129/0x28f [ 2120.784018] [<c11dbfeb>] ? tcp_v4_rcv+0x339/0x523 [ 2120.784018] [<c11c3b22>] ? ip_local_deliver_finish+0xf9/0x160 [ 2120.784018] [<c11c39bd>] ? ip_rcv_finish+0x28a/0x29d [ 2120.784018] [<c11acf24>] ? netif_receive_skb+0x1c2/0x1e9 [ 2120.784018] [<c118d3d0>] ? e100_poll+0x172/0x37c [ 2120.784018] [<c11af9c3>] ? net_rx_action+0x53/0x100 [ 2120.784018] [<c1027767>] ? run_ksoftirqd+0xfb/0x1da [ 2120.784018] [<c102766c>] ? run_ksoftirqd+0x0/0x1da [ 2120.784018] [<c1036d51>] ? kthread+0x52/0x57 [ 2120.784018] [<c1036cff>] ? kthread+0x0/0x57 [ 2120.784018] [<c1002dbe>] ? kernel_thread_helper+0x6/0x10 [ 2120.784018] Code: 83 ec 04 8b 7a 50 39 cf 76 1b 8b 80 38 01 00 00 c1 e0 10 89 c2 23 96 34 01 00 00 39 c2 75 06 f6 43 64 0c 75 26 8b 83 9c 00 00 00 <66> c7 40 08 01 00 8b 83 9c 00 00 00 66 c7 40 06 00 00 8b 83 9c [ 2120.784018] EIP: [<c11d5ce2>] tcp_set_skb_tso_segs+0x33/0x85 SS:ESP 0068:de44bdc4 [ 2120.784018] CR2: 00000000c11cd497 [ 2120.784018] ---[ end trace f11850323396760e ]--- [ 2121.268090] BUG: unable to handle kernel NULL pointer dereference at (null) [ 2121.268112] IP: [<c103bdc3>] exit_creds+0x9/0x51 [ 2121.268150] *pde = 00000000 [ 2121.268166] Oops: 0000 [#2] PREEMPT [ 2121.268182] last sysfs file: /sys/devices/pci0000:00/0000:00:11.0/firmware/0000:00:11.0/loading [ 2121.268200] Modules linked in: evdev usbhid ohci_hcd geode_rng ecb ehci_hcd aes_i586 aes_generic usbcore geode_aes nls_base [ 2121.268250] [ 2121.268271] Pid: 12, comm: sirq-rcu/0 Tainted: G D W 2.6.33.7-rt29 #2 SAM-L8/SAM-L8 [ 2121.268292] EIP: 0060:[<c103bdc3>] EFLAGS: 00010287 CPU: 0 [ 2121.268313] EIP is at exit_creds+0x9/0x51 [ 2121.268330] EAX: 00000000 EBX: de420490 ECX: 00000000 EDX: c122c9c2 [ 2121.268349] ESI: de4208ac EDI: c131e960 EBP: 00000002 ESP: de459f68 [ 2121.268371] DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068 preempt:00000000 [ 2121.268393] Process sirq-rcu/0 (pid: 12, ti=de458000 task=de44e900 task.ti=de458000) [ 2121.268407] Stack: [ 2121.268417] de420490 c1021df9 00000000 c10580c0 c131e760 de4208ac fffffdff c1313d50 [ 2121.268451] <0> 00000200 00000000 c10581a4 c1027767 00000031 de423f64 de459fb4 c1313d50 [ 2121.268487] <0> c102766c c1036d51 00000000 00000000 de459fb8 de459fb8 de459fc0 de459fc0 [ 2121.268526] Call Trace: [ 2121.268558] [<c1021df9>] ? __put_task_struct+0x50/0x65 [ 2121.268584] [<c10580c0>] ? __rcu_process_callbacks+0x163/0x21a [ 2121.268611] [<c10581a4>] ? rcu_process_callbacks+0x2d/0x2e [ 2121.268641] [<c1027767>] ? run_ksoftirqd+0xfb/0x1da [ 2121.268667] [<c102766c>] ? run_ksoftirqd+0x0/0x1da [ 2121.268695] [<c1036d51>] ? kthread+0x52/0x57 [ 2121.268722] [<c1036cff>] ? kthread+0x0/0x57 [ 2121.268747] [<c1002dbe>] ? kernel_thread_helper+0x6/0x10 [ 2121.268761] Code: c0 84 c0 74 08 8b 43 60 e8 aa 0b 00 00 8b 43 5c e8 85 1d ff ff a1 4c 44 3b c1 89 da 5b e9 bb 17 05 00 53 89 c3 8b 80 00 02 00 00 <8b> 00 8b 83 fc 01 00 00 c7 83 fc 01 00 00 00 00 00 00 e8 b5 fd [ 2121.268950] EIP: [<c103bdc3>] exit_creds+0x9/0x51 SS:ESP 0068:de459f68 [ 2121.268978] CR2: 0000000000000000 [ 2121.268994] ---[ end trace f11850323396760f ]--- On Tue, Aug 10, 2010 at 7:19 AM, John Kacur <jkacur@redhat.com> wrote: > On Mon, Aug 9, 2010 at 10:10 PM, John Culvertson <jculvertson@gmail.com> wrote: >> Hello, >> >> I am trying to use the RT patches on an x86 industrial computer. I am >> getting intermittent network hangs and kernel crashes when I load the >> network with netperf. The unpatched kernel does not exhibit these >> problems. The kernel is 2.6.33.6 patched with rt28. >> >> The computer has an AMD LX800 processor and two Intel 82559 10/100 PCI >> Ethernet controllers. I have only seen the kernel crashes when >> running netperf on both ports simultaneously. >> >> This is my first time using the RT patches, so I am not sure how to go >> about resolving this. Any tips would be greatly appreciated. >> >> [ 201.514962] BUG: unable to handle kernel paging request at a0282044 >> [ 201.516020] IP: [<c108d664>] free_block+0x4f/0xe5 >> [ 201.516020] *pde = 00000000 >> [ 201.516020] Oops: 0002 [#1] PREEMPT >> [ 201.516020] last sysfs file: /sys/module/vt/parameters/default_utf8 >> [ 201.516020] Modules linked in: evdev usbhid ohci_hcd geode_rng ecb >> aes_i586 ehci_hcd aes_generic usbcore geode_aes nls_base >> [ 201.516020] >> [ 201.516020] Pid: 6, comm: sirq-net-rx/0 Tainted: G W >> 2.6.33.6-rt28 #4 SL8/SL8 >> [ 201.516020] EIP: 0060:[<c108d664>] EFLAGS: 00010202 CPU: 0 >> [ 201.516020] EIP is at free_block+0x4f/0xe5 >> [ 201.516020] EAX: d6d75060 EBX: de682500 ECX: 00000004 EDX: a0282040 >> [ 201.516020] ESI: de682020 EDI: de431340 EBP: de40e5c0 ESP: de44bd74 >> [ 201.516020] DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068 preempt:00000000 >> [ 201.516020] Process sirq-net-rx/0 (pid: 6, ti=de44a000 >> task=de420490 task.ti=de44a000) >> [ 201.516020] Stack: >> [ 201.516020] 00000003 00000000 0000001b de406688 00000001 de431340 >> 00000000 de406660 >> [ 201.516020] <0> 0000001b c108d835 00000000 de44bdc8 de44bdc8 >> ddbd2060 de40e5c0 de431364 >> [ 201.516020] <0> 00000000 de40e5c0 ddbd2060 ddbd2060 c108d581 >> 00000000 00000000 d6e78620 >> [ 201.516020] Call Trace: >> [ 201.516020] [<c108d835>] ? __cache_free+0x7a/0xae >> [ 201.516020] [<c108d581>] ? kmem_cache_free+0x1c/0x58 >> [ 201.516020] [<c11d3493>] ? tcp_ack+0x3eb/0x12f5 >> [ 201.516020] [<c11d4bd8>] ? tcp_rcv_established+0xb0/0x476 >> [ 201.516020] [<c11da92f>] ? tcp_v4_do_rcv+0x129/0x28f >> [ 201.516020] [<c11dbf43>] ? tcp_v4_rcv+0x339/0x523 >> [ 201.516020] [<c11c3a8a>] ? ip_local_deliver_finish+0xf9/0x160 >> [ 201.516020] [<c11c3925>] ? ip_rcv_finish+0x28a/0x29d >> [ 201.516020] [<c11aceb4>] ? netif_receive_skb+0x1c2/0x1e9 >> [ 201.516020] [<c118d368>] ? e100_poll+0x172/0x37c >> [ 201.516020] [<c11af94c>] ? net_rx_action+0x53/0x100 >> [ 201.516020] [<c1027743>] ? run_ksoftirqd+0xfb/0x1da >> [ 201.516020] [<c1027648>] ? run_ksoftirqd+0x0/0x1da >> [ 201.516020] [<c1036d2d>] ? kthread+0x52/0x57 >> [ 201.516020] [<c1036cdb>] ? kthread+0x0/0x57 >> [ 201.516020] [<c1002dbe>] ? kernel_thread_helper+0x6/0x10 >> [ 201.516020] Code: 24 0c 8b 1c 82 89 d8 e8 34 fc ff ff 89 c6 e8 18 >> f9 ff ff 85 c0 75 04 0f 0b eb fe 8b 76 1c 8b 44 24 28 8b 16 8b 7c 85 >> 4c 8b 46 04 <89> 42 04 89 10 2b 5e 0c c7 06 00 01 10 00 c7 46 04 00 02 >> 20 00 >> [ 201.516020] EIP: [<c108d664>] free_block+0x4f/0xe5 SS:ESP 0068:de44bd74 >> [ 201.516020] CR2: 00000000a0282044 >> [ 201.908587] ---[ end trace d28d8d35cd5a7130 ]--- >> >> [ 201.920053] ------------[ cut here ]------------ >> [ 201.924018] kernel BUG at kernel/rtmutex.c:831! >> [ 201.924018] invalid opcode: 0000 [#2] PREEMPT >> [ 201.924018] last sysfs file: /sys/module/vt/parameters/default_utf8 >> [ 201.924018] Modules linked in: evdev usbhid ohci_hcd geode_rng ecb >> aes_i586 ehci_hcd aes_generic usbcore geode_aes nls_base >> [ 201.924018] >> [ 201.924018] Pid: 6, comm: sirq-net-rx/0 Tainted: G D W >> 2.6.33.6-rt28 #4 SL8/SL8 >> [ 201.924018] EIP: 0060:[<c122ca6e>] EFLAGS: 00010046 CPU: 0 >> [ 201.924018] EIP is at rt_spin_lock_slowlock+0x35/0x155 >> [ 201.924018] EAX: de420490 EBX: 00000292 ECX: 00000000 EDX: de420490 >> [ 201.924018] ESI: c122ca39 EDI: c1321160 EBP: 00000000 ESP: de44bba8 >> [ 201.924018] DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068 preempt:00000001 >> [ 201.924018] Process sirq-net-rx/0 (pid: 6, ti=de44a000 >> task=de420490 task.ti=de44a000) >> [ 201.924018] Stack: >> [ 201.924018] 00000030 00000046 de44bbd0 c102784a c1003c19 de120c7c >> de226b3c de40a600 >> [ 201.924018] <0> 00000000 c1002db0 de120c7c 00000000 c1322c40 >> de226b3c c1321160 c122ca39 >> [ 201.924018] <0> de120c64 00000000 c104582b de44bc08 de40e7a0 >> c108d08a de120c7c c108d576 >> [ 201.924018] Call Trace: >> [ 201.924018] [<c102784a>] ? irq_exit+0x28/0x32 >> [ 201.924018] [<c1003c19>] ? do_IRQ+0x61/0x71 >> [ 201.924018] [<c1002db0>] ? common_interrupt+0x30/0x38 >> [ 201.924018] [<c122ca39>] ? rt_spin_lock_slowlock+0x0/0x155 >> [ 201.924018] [<c104582b>] ? rt_spin_lock_fastlock+0x52/0x55 >> [ 201.924018] [<c108d08a>] ? _slab_irq_disable+0xd/0x15 >> [ 201.924018] [<c108d576>] ? kmem_cache_free+0x11/0x58 >> [ 201.924018] [<c109f603>] ? destroy_inode+0x1c/0x2b >> [ 201.924018] [<c109eefe>] ? iput+0x47/0x49 >> [ 201.924018] [<c109cfd1>] ? d_kill+0x2d/0x47 >> [ 201.924018] [<c109d195>] ? __shrink_dcache_sb+0x1aa/0x247 >> [ 201.924018] [<c109d4c0>] ? shrink_dcache_parent+0x26/0xd7 >> [ 201.924018] [<c10c59f9>] ? proc_flush_task+0x7d/0x165 >> [ 201.924018] [<c1024445>] ? release_task+0x18/0x2af >> [ 201.924018] [<c102570c>] ? do_exit+0x4dd/0x547 >> [ 201.924018] [<c1004d16>] ? oops_end+0x7f/0x83 >> [ 201.924018] [<c1015165>] ? no_context+0x10c/0x115 >> [ 201.924018] [<c10153ad>] ? do_page_fault+0x0/0x28f >> [ 201.924018] [<c1015361>] ? bad_area_nosemaphore+0xa/0xc >> [ 201.924018] [<c122d2fb>] ? error_code+0x6b/0x70 >> [ 201.924018] [<c108d664>] ? free_block+0x4f/0xe5 >> [ 201.924018] [<c108d835>] ? __cache_free+0x7a/0xae >> [ 201.924018] [<c108d581>] ? kmem_cache_free+0x1c/0x58 >> [ 201.924018] [<c11d3493>] ? tcp_ack+0x3eb/0x12f5 >> [ 201.924018] [<c11d4bd8>] ? tcp_rcv_established+0xb0/0x476 >> [ 201.924018] [<c11da92f>] ? tcp_v4_do_rcv+0x129/0x28f >> [ 201.924018] [<c11dbf43>] ? tcp_v4_rcv+0x339/0x523 >> [ 201.924018] [<c11c3a8a>] ? ip_local_deliver_finish+0xf9/0x160 >> [ 201.924018] [<c11c3925>] ? ip_rcv_finish+0x28a/0x29d >> [ 201.924018] [<c11aceb4>] ? netif_receive_skb+0x1c2/0x1e9 >> [ 201.924018] [<c118d368>] ? e100_poll+0x172/0x37c >> [ 201.924018] [<c11af94c>] ? net_rx_action+0x53/0x100 >> [ 201.924018] [<c1027743>] ? run_ksoftirqd+0xfb/0x1da >> [ 201.924018] [<c1027648>] ? run_ksoftirqd+0x0/0x1da >> [ 201.924018] [<c1036d2d>] ? kthread+0x52/0x57 >> [ 201.924018] [<c1036cdb>] ? kthread+0x0/0x57 >> [ 201.924018] [<c1002dbe>] ? kernel_thread_helper+0x6/0x10 >> [ 201.924018] Code: 44 24 2c 00 00 00 00 9c 5b fa b8 01 00 00 00 e8 >> 8d f5 de ff 89 f8 e8 fd 83 e1 ff 8b 47 10 8b 15 d8 02 31 c1 83 e0 fc >> 39 d0 75 04 <0f> 0b eb fe 8b 02 e8 e0 82 e1 ff 89 c5 8b 35 d8 02 31 c1 >> 8b 46 >> [ 201.924018] EIP: [<c122ca6e>] rt_spin_lock_slowlock+0x35/0x155 >> SS:ESP 0068:de44bba8 >> [ 201.924018] ---[ end trace d28d8d35cd5a7131 ]--- >> [ 201.924018] Fixing recursive fault but reboot is needed! >> [ 202.672902] sched: RT throttling activated > > > Please upgrade to 2.6.33.7-rt29 > -- To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: 2.6.33.6-rt28 kernel oops while stressing network 2010-08-11 16:53 ` John Culvertson @ 2010-08-13 17:37 ` John Culvertson 2010-08-13 17:56 ` Darcy Watkins 2010-08-27 10:33 ` Thomas Gleixner 1 sibling, 1 reply; 15+ messages in thread From: John Culvertson @ 2010-08-13 17:37 UTC (permalink / raw) To: linux-rt-users Since it was my understanding that x86 was the most mature and stable architecture for preempt-rt, I was surprised when I immediately encountered problems. Is this typical when trying the patches on a new platform? Like I mentioned before, I am a newbie with preempt-rt. On Wed, Aug 11, 2010 at 12:53 PM, John Culvertson <jculvertson@gmail.com> wrote: > I updated to 2.6.33.7-rt29, and I am seeing similar symptoms. > > [ 2120.781166] BUG: unable to handle kernel paging request at c11cd497 > [ 2120.784018] IP: [<c11d5ce2>] tcp_set_skb_tso_segs+0x33/0x85 > [ 2120.784018] *pde = 1d7f6063 *pte = 011cd161 > [ 2120.784018] Oops: 0003 [#1] PREEMPT ^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: 2.6.33.6-rt28 kernel oops while stressing network 2010-08-13 17:37 ` John Culvertson @ 2010-08-13 17:56 ` Darcy Watkins [not found] ` <AANLkTikty=V_==0udO9F2MxpVxwuLzyOQZt0ha5=VC3y@mail.gmail.com> 0 siblings, 1 reply; 15+ messages in thread From: Darcy Watkins @ 2010-08-13 17:56 UTC (permalink / raw) To: John Culvertson, linux-rt-users Hi John, I use Fedora 13 which has 2.6.33.6 as the kernel (without RT). My machine has three net i/f in it. Two PCI net cards with Realtek chipset and Intel PRO built into the mainboard's chipset. When I installed Fedora using the netboot USB flash drive, it insisted on using one of the Realtek interfaces for Internet connection so that is my eth0. All fine. More recently, I activated the other two net i/f for private LAN to target HW test network. I set one to 10.0.0.1 and the other to 192.168.101.4 and connected them to the target network. Note that they were both connected to the same network switch. Shortly after that, the system froze. After reboot it would run for a while and then freeze. I unplugged the net i/f based on the Intel PRO and all has been fine since. I mention all this because you once mentioned you were using two Intel net i/f. It may not even be RT related. I suggest you try (not in any particular order, but each on its own)... - running with only one net i/f connected - building a side-by-side vanilla kernel 2.6.33.7 (without RT patch) and running your two net i/f without RT - using different net i/f cards (say based on Realtek or something other than Intel) try running it with RT If you see your system behavior change related to any of these, it possibly may not be RT patch related (or it could be tied to a specific driver). Regards, Darcy -----Original Message----- From: linux-rt-users-owner@vger.kernel.org [mailto:linux-rt-users-owner@vger.kernel.org] On Behalf Of John Culvertson Sent: Friday, August 13, 2010 10:38 AM To: linux-rt-users@vger.kernel.org Subject: Re: 2.6.33.6-rt28 kernel oops while stressing network Since it was my understanding that x86 was the most mature and stable architecture for preempt-rt, I was surprised when I immediately encountered problems. Is this typical when trying the patches on a new platform? Like I mentioned before, I am a newbie with preempt-rt. On Wed, Aug 11, 2010 at 12:53 PM, John Culvertson <jculvertson@gmail.com> wrote: > I updated to 2.6.33.7-rt29, and I am seeing similar symptoms. > > [ 2120.781166] BUG: unable to handle kernel paging request at c11cd497 > [ 2120.784018] IP: [<c11d5ce2>] tcp_set_skb_tso_segs+0x33/0x85 > [ 2120.784018] *pde = 1d7f6063 *pte = 011cd161 > [ 2120.784018] Oops: 0003 [#1] PREEMPT -- To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <AANLkTikty=V_==0udO9F2MxpVxwuLzyOQZt0ha5=VC3y@mail.gmail.com>]
* Re: 2.6.33.6-rt28 kernel oops while stressing network [not found] ` <AANLkTikty=V_==0udO9F2MxpVxwuLzyOQZt0ha5=VC3y@mail.gmail.com> @ 2010-08-13 18:07 ` John Culvertson 2010-08-13 20:17 ` Sven-Thorsten Dietrich [not found] ` <D61182AC8012EA4EBC531B3AF23BE1099C86D6@tranzeo-mail2.12stewart.tranzeo.com> 1 sibling, 1 reply; 15+ messages in thread From: John Culvertson @ 2010-08-13 18:07 UTC (permalink / raw) To: linux-rt-users Thanks for the suggestions. I have tried the unpatched 2.6.33.7 kernel, and the problem does not occur. The hardware is a single board industrial computer with the network controllers onboard, so I cannot easily try different NICs. I have not seen the problem occur with only one port in use, but I have not tested that long enough to be positive. One thing that may be a little odd about this computer is that both Ethernet controllers (Intel 82559) share the same PCI interrupt. Interrupt sharing should be OK, but since adjacent PCI slots in normal PCs generally use different interrupts, it may not occur often in other systems. On Fri, Aug 13, 2010 at 1:56 PM, Darcy Watkins <DWatkins@tranzeo.com> wrote: > Hi John, > > I use Fedora 13 which has 2.6.33.6 as the kernel (without RT). > > My machine has three net i/f in it. Two PCI net cards with Realtek > chipset and Intel PRO built into the mainboard's chipset. > > When I installed Fedora using the netboot USB flash drive, it insisted > on using one of the Realtek interfaces for Internet connection so that > is my eth0. All fine. > > More recently, I activated the other two net i/f for private LAN to > target HW test network. I set one to 10.0.0.1 and the other to > 192.168.101.4 and connected them to the target network. Note that they > were both connected to the same network switch. > > Shortly after that, the system froze. After reboot it would run for a > while and then freeze. I unplugged the net i/f based on the Intel PRO > and all has been fine since. > > I mention all this because you once mentioned you were using two Intel > net i/f. It may not even be RT related. > > I suggest you try (not in any particular order, but each on its own)... > > - running with only one net i/f connected > - building a side-by-side vanilla kernel 2.6.33.7 (without RT patch) > and running your two net i/f without RT > - using different net i/f cards (say based on Realtek or something > other than Intel) try running it with RT > > If you see your system behavior change related to any of these, it > possibly may not be RT patch related (or it could be tied to a specific > driver). > > Regards, > > Darcy > > -----Original Message----- > From: linux-rt-users-owner@vger.kernel.org > [mailto:linux-rt-users-owner@vger.kernel.org] On Behalf Of John > Culvertson > Sent: Friday, August 13, 2010 10:38 AM > To: linux-rt-users@vger.kernel.org > Subject: Re: 2.6.33.6-rt28 kernel oops while stressing network > > Since it was my understanding that x86 was the most mature and stable > architecture for preempt-rt, I was surprised when I immediately > encountered problems. Is this typical when trying the patches on a > new platform? Like I mentioned before, I am a newbie with preempt-rt. > > On Wed, Aug 11, 2010 at 12:53 PM, John Culvertson > <jculvertson@gmail.com> wrote: >> I updated to 2.6.33.7-rt29, and I am seeing similar symptoms. >> >> [ 2120.781166] BUG: unable to handle kernel paging request at c11cd497 >> [ 2120.784018] IP: [<c11d5ce2>] tcp_set_skb_tso_segs+0x33/0x85 >> [ 2120.784018] *pde = 1d7f6063 *pte = 011cd161 >> [ 2120.784018] Oops: 0003 [#1] PREEMPT > -- > To unsubscribe from this list: send the line "unsubscribe > linux-rt-users" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: 2.6.33.6-rt28 kernel oops while stressing network 2010-08-13 18:07 ` John Culvertson @ 2010-08-13 20:17 ` Sven-Thorsten Dietrich 2010-08-13 21:20 ` John Culvertson 0 siblings, 1 reply; 15+ messages in thread From: Sven-Thorsten Dietrich @ 2010-08-13 20:17 UTC (permalink / raw) To: John Culvertson; +Cc: linux-rt-users On 08/13/2010 11:07 AM, John Culvertson wrote: > Thanks for the suggestions. I have tried the unpatched 2.6.33.7 > kernel, and the problem does not occur. The hardware is a single > board industrial computer with the network controllers onboard, so I > cannot easily try different NICs. I have not seen the problem occur > with only one port in use, but I have not tested that long enough to > be positive. > > One thing that may be a little odd about this computer is that both > Ethernet controllers (Intel 82559) share the same PCI interrupt. > Interrupt sharing should be OK, but since adjacent PCI slots in normal > PCs generally use different interrupts, it may not occur often in > other systems. > Does it reproduce when you turn off PREEMPT_RT, but leave all the IRQ threading options enabled? ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: 2.6.33.6-rt28 kernel oops while stressing network 2010-08-13 20:17 ` Sven-Thorsten Dietrich @ 2010-08-13 21:20 ` John Culvertson 2010-08-13 22:57 ` Sven-Thorsten Dietrich 0 siblings, 1 reply; 15+ messages in thread From: John Culvertson @ 2010-08-13 21:20 UTC (permalink / raw) To: Sven-Thorsten Dietrich, linux-rt-users I changed from PREEMPT_RT to PREEMPT_DESKTOP, and now it will not boot. I get the following error, then it hangs: [ 9.025745] BUG: sleeping function called from invalid context at mm/slab.c:3266 [ 9.044859] pcnt: 1 0 in_atomic(): 1, irqs_disabled(): 0, pid: 16, name: events/0 [ 9.064192] Pid: 16, comm: events/0 Not tainted 2.6.33.7-rt29 #3 [ 9.076674] Call Trace: [ 9.085224] [<c108ed63>] ? kmem_cache_alloc+0x1b/0x99 [ 9.096780] [<c11a8aef>] ? __alloc_skb+0x2e/0x10a [ 9.108079] [<c1209db2>] ? alloc_skb+0x9/0xb [ 9.118763] [<c120a5fd>] ? inet6_rt_notify+0x2f/0xb3 [ 9.130285] [<c120d630>] ? fib6_add+0x21e/0x38e [ 9.141487] [<c120ac83>] ? __ip6_ins_rt+0x23/0x35 [ 9.152743] [<c120536d>] ? addrconf_add_mroute+0x6c/0x72 [ 9.164855] [<c12061f8>] ? addrconf_add_dev+0x3d/0x49 [ 9.176495] [<c12070ed>] ? addrconf_notify+0x4f6/0x6d7 [ 9.188213] [<c114ddff>] ? extract_entropy+0x45/0xfe [ 9.199753] [<c11c18a2>] ? need_resched+0x11/0x1a [ 9.210686] [<c11c1d97>] ? rt_do_flush+0x26/0x105 [ 9.221572] [<c103b3dc>] ? notifier_call_chain+0x2a/0x52 [ 9.233426] [<c103b418>] ? raw_notifier_call_chain+0x9/0xc [ 9.244968] [<c11af6ff>] ? netdev_state_change+0x18/0x29 [ 9.256198] [<c11b88d7>] ? linkwatch_do_dev+0x9e/0xa7 [ 9.267208] [<c11b8b07>] ? __linkwatch_run_queue+0xd4/0x108 [ 9.278611] [<c11b8b58>] ? linkwatch_event+0x1d/0x22 [ 9.289437] [<c10351e2>] ? worker_thread+0xe1/0x15e [ 9.299891] [<c11b8b3b>] ? linkwatch_event+0x0/0x22 [ 9.310157] [<c1037574>] ? autoremove_wake_function+0x0/0x2d [ 9.321305] [<c1035101>] ? worker_thread+0x0/0x15e [ 9.331182] [<c103737b>] ? kthread+0x52/0x57 [ 9.340335] [<c1037329>] ? kthread+0x0/0x57 [ 9.349248] [<c1002dfe>] ? kernel_thread_helper+0x6/0x10 On Fri, Aug 13, 2010 at 4:17 PM, Sven-Thorsten Dietrich <sven@thebigcorporation.com> wrote: > On 08/13/2010 11:07 AM, John Culvertson wrote: >> >> Thanks for the suggestions. I have tried the unpatched 2.6.33.7 >> kernel, and the problem does not occur. The hardware is a single >> board industrial computer with the network controllers onboard, so I >> cannot easily try different NICs. I have not seen the problem occur >> with only one port in use, but I have not tested that long enough to >> be positive. >> >> One thing that may be a little odd about this computer is that both >> Ethernet controllers (Intel 82559) share the same PCI interrupt. >> Interrupt sharing should be OK, but since adjacent PCI slots in normal >> PCs generally use different interrupts, it may not occur often in >> other systems. >> > > Does it reproduce when you turn off PREEMPT_RT, but leave all the IRQ > threading options enabled? > > > -- To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: 2.6.33.6-rt28 kernel oops while stressing network 2010-08-13 21:20 ` John Culvertson @ 2010-08-13 22:57 ` Sven-Thorsten Dietrich 2010-08-16 17:16 ` John Culvertson 0 siblings, 1 reply; 15+ messages in thread From: Sven-Thorsten Dietrich @ 2010-08-13 22:57 UTC (permalink / raw) To: John Culvertson; +Cc: linux-rt-users On Fri, 2010-08-13 at 17:20 -0400, John Culvertson wrote: > I changed from PREEMPT_RT to PREEMPT_DESKTOP, and now it will not > boot. I get the following error, then it hangs: LOL. Sounds like there is a little more work to be done on the .33 release. I think you are onto something however - although I am not at the moment blessed with the time to dig for you - linkwatch_event - is that part of e1000? If so, that's where I'd start digging. What happens if you first disable PREEMPT_HARDIRQs, and then next also disable PREEMPT_SOFTIRQs? I would assume after the latter it would work just fine. Regards, Sven > > [ 9.025745] BUG: sleeping function called from invalid context at > mm/slab.c:3266 > [ 9.044859] pcnt: 1 0 in_atomic(): 1, irqs_disabled(): 0, pid: 16, > name: events/0 > [ 9.064192] Pid: 16, comm: events/0 Not tainted 2.6.33.7-rt29 #3 > [ 9.076674] Call Trace: > [ 9.085224] [<c108ed63>] ? kmem_cache_alloc+0x1b/0x99 > [ 9.096780] [<c11a8aef>] ? __alloc_skb+0x2e/0x10a > [ 9.108079] [<c1209db2>] ? alloc_skb+0x9/0xb > [ 9.118763] [<c120a5fd>] ? inet6_rt_notify+0x2f/0xb3 > [ 9.130285] [<c120d630>] ? fib6_add+0x21e/0x38e > [ 9.141487] [<c120ac83>] ? __ip6_ins_rt+0x23/0x35 > [ 9.152743] [<c120536d>] ? addrconf_add_mroute+0x6c/0x72 > [ 9.164855] [<c12061f8>] ? addrconf_add_dev+0x3d/0x49 > [ 9.176495] [<c12070ed>] ? addrconf_notify+0x4f6/0x6d7 > [ 9.188213] [<c114ddff>] ? extract_entropy+0x45/0xfe > [ 9.199753] [<c11c18a2>] ? need_resched+0x11/0x1a > [ 9.210686] [<c11c1d97>] ? rt_do_flush+0x26/0x105 > [ 9.221572] [<c103b3dc>] ? notifier_call_chain+0x2a/0x52 > [ 9.233426] [<c103b418>] ? raw_notifier_call_chain+0x9/0xc > [ 9.244968] [<c11af6ff>] ? netdev_state_change+0x18/0x29 > [ 9.256198] [<c11b88d7>] ? linkwatch_do_dev+0x9e/0xa7 > [ 9.267208] [<c11b8b07>] ? __linkwatch_run_queue+0xd4/0x108 > [ 9.278611] [<c11b8b58>] ? linkwatch_event+0x1d/0x22 > [ 9.289437] [<c10351e2>] ? worker_thread+0xe1/0x15e > [ 9.299891] [<c11b8b3b>] ? linkwatch_event+0x0/0x22 > [ 9.310157] [<c1037574>] ? autoremove_wake_function+0x0/0x2d > [ 9.321305] [<c1035101>] ? worker_thread+0x0/0x15e > [ 9.331182] [<c103737b>] ? kthread+0x52/0x57 > [ 9.340335] [<c1037329>] ? kthread+0x0/0x57 > [ 9.349248] [<c1002dfe>] ? kernel_thread_helper+0x6/0x10 > > > On Fri, Aug 13, 2010 at 4:17 PM, Sven-Thorsten Dietrich > <sven@thebigcorporation.com> wrote: > > On 08/13/2010 11:07 AM, John Culvertson wrote: > >> > >> Thanks for the suggestions. I have tried the unpatched 2.6.33.7 > >> kernel, and the problem does not occur. The hardware is a single > >> board industrial computer with the network controllers onboard, so I > >> cannot easily try different NICs. I have not seen the problem occur > >> with only one port in use, but I have not tested that long enough to > >> be positive. > >> > >> One thing that may be a little odd about this computer is that both > >> Ethernet controllers (Intel 82559) share the same PCI interrupt. > >> Interrupt sharing should be OK, but since adjacent PCI slots in normal > >> PCs generally use different interrupts, it may not occur often in > >> other systems. > >> > > > > Does it reproduce when you turn off PREEMPT_RT, but leave all the IRQ > > threading options enabled? > > > > > > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: 2.6.33.6-rt28 kernel oops while stressing network 2010-08-13 22:57 ` Sven-Thorsten Dietrich @ 2010-08-16 17:16 ` John Culvertson 2010-08-16 17:22 ` Sven-Thorsten Dietrich 0 siblings, 1 reply; 15+ messages in thread From: John Culvertson @ 2010-08-16 17:16 UTC (permalink / raw) To: Sven-Thorsten Dietrich; +Cc: linux-rt-users linkwatch_event is defined in net/core/link_watch.c. It appears to work fine if I disable PREEMPT_HARDIRQs. Thanks for the feedback. My main objective with this platform at the moment is to learn about preempt-rt and evaluate its stability and suitability for use. On Fri, Aug 13, 2010 at 6:57 PM, Sven-Thorsten Dietrich <thebigcorporation@gmail.com> wrote: > On Fri, 2010-08-13 at 17:20 -0400, John Culvertson wrote: >> I changed from PREEMPT_RT to PREEMPT_DESKTOP, and now it will not >> boot. I get the following error, then it hangs: > > LOL. Sounds like there is a little more work to be done on the .33 > release. > > I think you are onto something however - although I am not at the moment > blessed with the time to dig for you - > > linkwatch_event - is that part of e1000? > > If so, that's where I'd start digging. > > What happens if you first disable PREEMPT_HARDIRQs, > and then next also disable PREEMPT_SOFTIRQs? > > I would assume after the latter it would work just fine. > > Regards, > > Sven > >> >> [ 9.025745] BUG: sleeping function called from invalid context at >> mm/slab.c:3266 >> [ 9.044859] pcnt: 1 0 in_atomic(): 1, irqs_disabled(): 0, pid: 16, >> name: events/0 >> [ 9.064192] Pid: 16, comm: events/0 Not tainted 2.6.33.7-rt29 #3 >> [ 9.076674] Call Trace: >> [ 9.085224] [<c108ed63>] ? kmem_cache_alloc+0x1b/0x99 >> [ 9.096780] [<c11a8aef>] ? __alloc_skb+0x2e/0x10a >> [ 9.108079] [<c1209db2>] ? alloc_skb+0x9/0xb >> [ 9.118763] [<c120a5fd>] ? inet6_rt_notify+0x2f/0xb3 >> [ 9.130285] [<c120d630>] ? fib6_add+0x21e/0x38e >> [ 9.141487] [<c120ac83>] ? __ip6_ins_rt+0x23/0x35 >> [ 9.152743] [<c120536d>] ? addrconf_add_mroute+0x6c/0x72 >> [ 9.164855] [<c12061f8>] ? addrconf_add_dev+0x3d/0x49 >> [ 9.176495] [<c12070ed>] ? addrconf_notify+0x4f6/0x6d7 >> [ 9.188213] [<c114ddff>] ? extract_entropy+0x45/0xfe >> [ 9.199753] [<c11c18a2>] ? need_resched+0x11/0x1a >> [ 9.210686] [<c11c1d97>] ? rt_do_flush+0x26/0x105 >> [ 9.221572] [<c103b3dc>] ? notifier_call_chain+0x2a/0x52 >> [ 9.233426] [<c103b418>] ? raw_notifier_call_chain+0x9/0xc >> [ 9.244968] [<c11af6ff>] ? netdev_state_change+0x18/0x29 >> [ 9.256198] [<c11b88d7>] ? linkwatch_do_dev+0x9e/0xa7 >> [ 9.267208] [<c11b8b07>] ? __linkwatch_run_queue+0xd4/0x108 >> [ 9.278611] [<c11b8b58>] ? linkwatch_event+0x1d/0x22 >> [ 9.289437] [<c10351e2>] ? worker_thread+0xe1/0x15e >> [ 9.299891] [<c11b8b3b>] ? linkwatch_event+0x0/0x22 >> [ 9.310157] [<c1037574>] ? autoremove_wake_function+0x0/0x2d >> [ 9.321305] [<c1035101>] ? worker_thread+0x0/0x15e >> [ 9.331182] [<c103737b>] ? kthread+0x52/0x57 >> [ 9.340335] [<c1037329>] ? kthread+0x0/0x57 >> [ 9.349248] [<c1002dfe>] ? kernel_thread_helper+0x6/0x10 >> >> >> On Fri, Aug 13, 2010 at 4:17 PM, Sven-Thorsten Dietrich >> <sven@thebigcorporation.com> wrote: >> > On 08/13/2010 11:07 AM, John Culvertson wrote: >> >> >> >> Thanks for the suggestions. I have tried the unpatched 2.6.33.7 >> >> kernel, and the problem does not occur. The hardware is a single >> >> board industrial computer with the network controllers onboard, so I >> >> cannot easily try different NICs. I have not seen the problem occur >> >> with only one port in use, but I have not tested that long enough to >> >> be positive. >> >> >> >> One thing that may be a little odd about this computer is that both >> >> Ethernet controllers (Intel 82559) share the same PCI interrupt. >> >> Interrupt sharing should be OK, but since adjacent PCI slots in normal >> >> PCs generally use different interrupts, it may not occur often in >> >> other systems. >> >> >> > >> > Does it reproduce when you turn off PREEMPT_RT, but leave all the IRQ >> > threading options enabled? >> > >> > >> > > > > > -- To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: 2.6.33.6-rt28 kernel oops while stressing network 2010-08-16 17:16 ` John Culvertson @ 2010-08-16 17:22 ` Sven-Thorsten Dietrich 0 siblings, 0 replies; 15+ messages in thread From: Sven-Thorsten Dietrich @ 2010-08-16 17:22 UTC (permalink / raw) To: John Culvertson; +Cc: linux-rt-users On 08/16/2010 10:16 AM, John Culvertson wrote: > linkwatch_event is defined in net/core/link_watch.c. > > It appears to work fine if I disable PREEMPT_HARDIRQs. > > Thanks for the feedback. My main objective with this platform at the > moment is to learn about preempt-rt and evaluate its stability and > suitability for use. > Cool - that makes sense - like I said, I am slammed with other stuff, but hopefully one of the heavywheigts on this will have a patch for you at some point - its definitely a bug of some species. Cheers Sven > On Fri, Aug 13, 2010 at 6:57 PM, Sven-Thorsten Dietrich > <thebigcorporation@gmail.com> wrote: >> On Fri, 2010-08-13 at 17:20 -0400, John Culvertson wrote: >>> I changed from PREEMPT_RT to PREEMPT_DESKTOP, and now it will not >>> boot. I get the following error, then it hangs: >> LOL. Sounds like there is a little more work to be done on the .33 >> release. >> >> I think you are onto something however - although I am not at the moment >> blessed with the time to dig for you - >> >> linkwatch_event - is that part of e1000? >> >> If so, that's where I'd start digging. >> >> What happens if you first disable PREEMPT_HARDIRQs, >> and then next also disable PREEMPT_SOFTIRQs? >> >> I would assume after the latter it would work just fine. >> >> Regards, >> >> Sven >> >>> [ 9.025745] BUG: sleeping function called from invalid context at >>> mm/slab.c:3266 >>> [ 9.044859] pcnt: 1 0 in_atomic(): 1, irqs_disabled(): 0, pid: 16, >>> name: events/0 >>> [ 9.064192] Pid: 16, comm: events/0 Not tainted 2.6.33.7-rt29 #3 >>> [ 9.076674] Call Trace: >>> [ 9.085224] [<c108ed63>] ? kmem_cache_alloc+0x1b/0x99 >>> [ 9.096780] [<c11a8aef>] ? __alloc_skb+0x2e/0x10a >>> [ 9.108079] [<c1209db2>] ? alloc_skb+0x9/0xb >>> [ 9.118763] [<c120a5fd>] ? inet6_rt_notify+0x2f/0xb3 >>> [ 9.130285] [<c120d630>] ? fib6_add+0x21e/0x38e >>> [ 9.141487] [<c120ac83>] ? __ip6_ins_rt+0x23/0x35 >>> [ 9.152743] [<c120536d>] ? addrconf_add_mroute+0x6c/0x72 >>> [ 9.164855] [<c12061f8>] ? addrconf_add_dev+0x3d/0x49 >>> [ 9.176495] [<c12070ed>] ? addrconf_notify+0x4f6/0x6d7 >>> [ 9.188213] [<c114ddff>] ? extract_entropy+0x45/0xfe >>> [ 9.199753] [<c11c18a2>] ? need_resched+0x11/0x1a >>> [ 9.210686] [<c11c1d97>] ? rt_do_flush+0x26/0x105 >>> [ 9.221572] [<c103b3dc>] ? notifier_call_chain+0x2a/0x52 >>> [ 9.233426] [<c103b418>] ? raw_notifier_call_chain+0x9/0xc >>> [ 9.244968] [<c11af6ff>] ? netdev_state_change+0x18/0x29 >>> [ 9.256198] [<c11b88d7>] ? linkwatch_do_dev+0x9e/0xa7 >>> [ 9.267208] [<c11b8b07>] ? __linkwatch_run_queue+0xd4/0x108 >>> [ 9.278611] [<c11b8b58>] ? linkwatch_event+0x1d/0x22 >>> [ 9.289437] [<c10351e2>] ? worker_thread+0xe1/0x15e >>> [ 9.299891] [<c11b8b3b>] ? linkwatch_event+0x0/0x22 >>> [ 9.310157] [<c1037574>] ? autoremove_wake_function+0x0/0x2d >>> [ 9.321305] [<c1035101>] ? worker_thread+0x0/0x15e >>> [ 9.331182] [<c103737b>] ? kthread+0x52/0x57 >>> [ 9.340335] [<c1037329>] ? kthread+0x0/0x57 >>> [ 9.349248] [<c1002dfe>] ? kernel_thread_helper+0x6/0x10 >>> >>> >>> On Fri, Aug 13, 2010 at 4:17 PM, Sven-Thorsten Dietrich >>> <sven@thebigcorporation.com> wrote: >>>> On 08/13/2010 11:07 AM, John Culvertson wrote: >>>>> Thanks for the suggestions. I have tried the unpatched 2.6.33.7 >>>>> kernel, and the problem does not occur. The hardware is a single >>>>> board industrial computer with the network controllers onboard, so I >>>>> cannot easily try different NICs. I have not seen the problem occur >>>>> with only one port in use, but I have not tested that long enough to >>>>> be positive. >>>>> >>>>> One thing that may be a little odd about this computer is that both >>>>> Ethernet controllers (Intel 82559) share the same PCI interrupt. >>>>> Interrupt sharing should be OK, but since adjacent PCI slots in normal >>>>> PCs generally use different interrupts, it may not occur often in >>>>> other systems. >>>>> >>>> Does it reproduce when you turn off PREEMPT_RT, but leave all the IRQ >>>> threading options enabled? >>>> >>>> >>>> >> >> >> ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <D61182AC8012EA4EBC531B3AF23BE1099C86D6@tranzeo-mail2.12stewart.tranzeo.com>]
[parent not found: <AANLkTi=3cz2RyHPdoNRjucTozKqDmJc8sDh+hsnmhKAS@mail.gmail.com>]
* Re: 2.6.33.6-rt28 kernel oops while stressing network [not found] ` <AANLkTi=3cz2RyHPdoNRjucTozKqDmJc8sDh+hsnmhKAS@mail.gmail.com> @ 2010-08-13 19:56 ` Darcy Watkins 0 siblings, 0 replies; 15+ messages in thread From: Darcy Watkins @ 2010-08-13 19:56 UTC (permalink / raw) To: John Culvertson; +Cc: linux-rt-users [-- Attachment #1: Type: text/plain, Size: 6516 bytes --] Hi John, First of all, you will want to become familiar with the 'chrt' command (available also as an applet part of busybox). That command (plus some of the finer details in optional output from the 'ps' command) is how you will tweak and experiment on a target system. Then the attached may be of use (I attached them because they are small). rtctl is a package that originates from Red Hat. If you are using busybox shell (rather than full blown bash), then apply the attached patch to it. Extract the rpm something like... rpm2cpio rtctl-1.7-1.el5rt.src.rpm | cpio --extract --make-directories ...assuming you have rpm2cpio installed on your host. Then extract any tarballs that were bundled inside it. Then apply the attached patch to it (if not using bash on your target). Then install to your target rootfs staging area something like... install -D -m 755 rtctl $MY_STAGING_DIR_PATH/usr/sbin/rtctl install -D -m 644 rtgroups $MY_STAGING_DIR_PATH/etc/rtgroups install -D -m 755 rtctl.sysconfig $MY_STAGING_DIR_PATH/etc/sysconfig/rtctl The package includes more but that's all I've ported and tried out so far on a uclibc/buildroot style embedded Linux with RT kernel so far. Edit the rtgroups file accordingly to implement an RT policy that works for you (based on outcome of your experiments using chrt). What hasn't been implemented yet is how to hook rtctl into your system initialization scripts. The stock content from the Red Hat package is for their real time distro. You'd need to invoke it from your scripts accordingly. Regards, Darcy On Fri, 2010-08-13 at 12:16 -0700, John Culvertson wrote: > Thanks for the pointers. I have seen others mention adjusting the > soft irq thread priorities, etc. Can you shed any light on how to go > about doing that? Is that in the kernel configuration, or do you have > to modify the kernel source? > > On Fri, Aug 13, 2010 at 2:14 PM, Darcy Watkins <DWatkins@tranzeo.com> > wrote: > > Hi John, > > > > In the 'make menuconfig', look for and double check any config > settings related to IRQ sharing, PCI, etc. There are a lot of > tweaks. Probably enough to write a PhD dissertation about. > > > > The other thing, if it only happens under high network stress, you > may want to check into tweaking the real time priorities of your > kernel threads, user space program threads and even IRQ threads. RT > kernels tend to treat priorities more strictly so it is possible for a > high priority thread to hog the CPU and deplete resources before a > lower priority thread processes them and frees up the resources. > > > > Regards, > > > > Darcy > > > > > > -----Original Message----- > > From: John Culvertson [mailto:jculvertson@gmail.com] > > Sent: Friday, August 13, 2010 11:06 AM > > To: Darcy Watkins > > Subject: Re: 2.6.33.6-rt28 kernel oops while stressing network > > > > Thanks for the suggestions. I have tried the unpatched 2.6.33.7 > > kernel, and the problem does not occur. The hardware is a single > > board industrial computer with the network controllers onboard, so I > > cannot easily try different NICs. I have not seen the problem occur > > with only one port in use, but I have not tested that long enough to > > be positive. > > > > One thing that may be a little odd about this computer is that both > > Ethernet controllers (Intel 82559) share the same PCI interrupt. > > Interrupt sharing should be OK, but since adjacent PCI slots in > normal > > PCs generally use different interrupts, it may not occur often in > > other systems. > > > > On Fri, Aug 13, 2010 at 1:56 PM, Darcy Watkins > <DWatkins@tranzeo.com> wrote: > >> Hi John, > >> > >> I use Fedora 13 which has 2.6.33.6 as the kernel (without RT). > >> > >> My machine has three net i/f in it. Two PCI net cards with Realtek > >> chipset and Intel PRO built into the mainboard's chipset. > >> > >> When I installed Fedora using the netboot USB flash drive, it > insisted > >> on using one of the Realtek interfaces for Internet connection so > that > >> is my eth0. All fine. > >> > >> More recently, I activated the other two net i/f for private LAN to > >> target HW test network. I set one to 10.0.0.1 and the other to > >> 192.168.101.4 and connected them to the target network. Note that > they > >> were both connected to the same network switch. > >> > >> Shortly after that, the system froze. After reboot it would run > for a > >> while and then freeze. I unplugged the net i/f based on the Intel > PRO > >> and all has been fine since. > >> > >> I mention all this because you once mentioned you were using two > Intel > >> net i/f. It may not even be RT related. > >> > >> I suggest you try (not in any particular order, but each on its > own)... > >> > >> - running with only one net i/f connected > >> - building a side-by-side vanilla kernel 2.6.33.7 (without RT > patch) > >> and running your two net i/f without RT > >> - using different net i/f cards (say based on Realtek or > something > >> other than Intel) try running it with RT > >> > >> If you see your system behavior change related to any of these, it > >> possibly may not be RT patch related (or it could be tied to a > specific > >> driver). > >> > >> Regards, > >> > >> Darcy > >> > >> -----Original Message----- > >> From: linux-rt-users-owner@vger.kernel.org > >> [mailto:linux-rt-users-owner@vger.kernel.org] On Behalf Of John > >> Culvertson > >> Sent: Friday, August 13, 2010 10:38 AM > >> To: linux-rt-users@vger.kernel.org > >> Subject: Re: 2.6.33.6-rt28 kernel oops while stressing network > >> > >> Since it was my understanding that x86 was the most mature and > stable > >> architecture for preempt-rt, I was surprised when I immediately > >> encountered problems. Is this typical when trying the patches on a > >> new platform? Like I mentioned before, I am a newbie with > preempt-rt. > >> > >> On Wed, Aug 11, 2010 at 12:53 PM, John Culvertson > >> <jculvertson@gmail.com> wrote: > >>> I updated to 2.6.33.7-rt29, and I am seeing similar symptoms. > >>> > >>> [ 2120.781166] BUG: unable to handle kernel paging request at > c11cd497 > >>> [ 2120.784018] IP: [<c11d5ce2>] tcp_set_skb_tso_segs+0x33/0x85 > >>> [ 2120.784018] *pde = 1d7f6063 *pte = 011cd161 > >>> [ 2120.784018] Oops: 0003 [#1] PREEMPT > >> -- > >> To unsubscribe from this list: send the line "unsubscribe > >> linux-rt-users" in > >> the body of a message to majordomo@vger.kernel.org > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > >> > > > > [-- Attachment #2: 10-rtctl-crossbuild-tsf.patch --] [-- Type: text/x-patch, Size: 3743 bytes --] Index: rtctl-1.7/rtctl =================================================================== --- rtctl-1.7.orig/rtctl +++ rtctl-1.7/rtctl @@ -1,4 +1,4 @@ -#!/bin/bash +#!/bin/sh usage () { @@ -26,67 +26,62 @@ shift GROUPNAME="" +ALL_GROUPS=`awk '/^[a-zA-Z_0-9-]+:[*orbf]:[0-9]+:.+$/ { split($0, parts, ":") ; print parts[1] }' ${RTGROUPFILE}` + +group_properties_of() +{ + local grouprec=`awk '/^[a-zA-Z_0-9-]+:[*orbf]:[0-9]+:[*a-fA-F0-9]+:.+$/ { split($0, parts, ":") ; if (parts[1] == groupname) print }' groupname=$1 ${RTGROUPFILE}` + if [ -n "$grouprec" ] ; then + # 5 field record format + GROUP_AFFINITY=`echo $grouprec | cut -d ':' -f 4` + GROUP_REGEX=`echo $grouprec | cut -d ':' -f 5` + else + grouprec=`awk '/^[a-zA-Z_0-9-]+:[*orbf]:[0-9]+:.+$/ { split($0, parts, ":") ; if (parts[1] == groupname) print }' groupname=$1 ${RTGROUPFILE}` + if [ -n "$grouprec" ] ; then + # 4 field legacy record format + GROUP_AFFINITY="*" + GROUP_REGEX=`echo $grouprec | cut -d ':' -f 4` + else + return 1 + fi + fi + local gname=`echo $grouprec | cut -d ':' -f 1` + GROUP_SCHED=`echo $grouprec | cut -d ':' -f 2` + GROUP_PRIORITY=`echo $grouprec | cut -d ':' -f 3` + GROUP_PIDS=`ps -eo pid,cmd | fgrep -v $GROUP_REGEX | egrep $GROUP_REGEX | awk '{ print $1 }'` + return 0; +} + # # print the PIDs of processes belonging to ${GROUPNAME} as defined # in ${RTGROUPFILE}. # group_pids () { - ps -eo pid,cmd | awk ' - /^[a-zA-Z_0-9-]+:[*orbf]:[0-9]+:.+$/ { - split($0, parts, ":") - if (parts[1] == groupname) { - nr_rules += 1 - regexp_offset = length(parts[1]) + length(parts[2]) + length(parts[3]) + 4 - if (length(parts) > 4) { - regexp_offset += length(parts[4]) + 1 - } - group_regexps[nr_rules] = substr($0,regexp_offset) - } - } - /^ *[0-9]+ .+$/ { - for (i = 1; i <= nr_rules; ++i) { - if (match($2, group_regexps[i])) { - print $1 - break - } - } - }' groupname=${GROUPNAME} ${RTGROUPFILE} - + if group_properties_of ${GROUPNAME} ; then + echo "$GROUP_PIDS" + else + return 1 + fi + return 0 } set_group_defaults () { - ps -eo pid,cmd | awk ' - /^[a-zA-Z_0-9-]+:[*orbf]:[0-9]+:.+$/ { - split($0, conf, ":") - if (groupname == "" || conf[1] == groupname) { - nr_rules += 1 - group_sched[nr_rules] = conf[2] - group_prio[nr_rules] = conf[3] - regexp_offset = length(conf[1]) + length(conf[2]) + length(conf[3]) + 4 - if (length(conf) < 5) { - group_affinity[nr_rules] = "*" - } else { - regexp_offset += length(conf[4]) + 1 - group_affinity[nr_rules] = conf[4] - } - group_regexps[nr_rules] = substr($0,regexp_offset) - } - } - /^ *[0-9]+ .+$/ { - for (i = nr_rules; i >= 1; --i) { - if (match($2, group_regexps[i])) { - if (group_sched[i] != "*") { - print "chrt -p -" group_sched[i] " " group_prio[i] " " $1 - } - if (group_affinity[i] != "*") { - print "taskset -p " group_affinity[i] " " $1 " > /dev/null" - } - break - } - } - }' groupname=${GROUPNAME} ${RTGROUPFILE} - | sh + if group_properties_of ${GROUPNAME} ; then + for pid in $GROUP_PIDS ; do + if [ "$GROUP_SCHED" != "*" ] ; then + chrt -p -$GROUP_SCHED $GROUP_PRIORITY $pid + fi + if [ "$GROUP_AFFINITY" != "*" ] ; then + taskset -p $GROUP_AFFINITY $pid > /dev/null + fi + done + else + return 1 + fi + return 0 } @@ -149,8 +144,13 @@ case "$CMD" in [ $# -gt 1 ] && usage if [ $# -ne 0 ]; then GROUPNAME=$1 + set_group_defaults + else + for grp in $ALL_GROUPS ; do + GROUPNAME=$grp + set_group_defaults + done fi - set_group_defaults ;; "show") [-- Attachment #3: rtctl-1.7-1.el5rt.src.rpm --] [-- Type: application/x-rpm, Size: 10141 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: 2.6.33.6-rt28 kernel oops while stressing network 2010-08-11 16:53 ` John Culvertson 2010-08-13 17:37 ` John Culvertson @ 2010-08-27 10:33 ` Thomas Gleixner 1 sibling, 0 replies; 15+ messages in thread From: Thomas Gleixner @ 2010-08-27 10:33 UTC (permalink / raw) To: John Culvertson; +Cc: linux-rt-users John, On Wed, 11 Aug 2010, John Culvertson wrote: > I updated to 2.6.33.7-rt29, and I am seeing similar symptoms. > > [ 2120.781166] BUG: unable to handle kernel paging request at c11cd497 > [ 2120.784018] IP: [<c11d5ce2>] tcp_set_skb_tso_segs+0x33/0x85 can you please try to reproduce the problem with the function tracer enabled ? Make sure that /proc/sys/kernel/ftrace_dump_on_oops is set to 1. When the bug reproduces the kernel will spill out the trace over the serial console, which will take a while but should show us what's going on. Thanks, tglx ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2010-08-27 10:33 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-08-09 20:10 2.6.33.6-rt28 kernel oops while stressing network John Culvertson
2010-08-10 12:23 ` Patrice Kadionik
2010-08-10 13:00 ` Patrice Kadionik
2010-08-12 16:09 ` Patrice Kadionik
[not found] ` <AANLkTi=tPSeXTZkjPPm_MGmmOx2fZhryOkajgssv0EsX@mail.gmail.com>
2010-08-11 16:53 ` John Culvertson
2010-08-13 17:37 ` John Culvertson
2010-08-13 17:56 ` Darcy Watkins
[not found] ` <AANLkTikty=V_==0udO9F2MxpVxwuLzyOQZt0ha5=VC3y@mail.gmail.com>
2010-08-13 18:07 ` John Culvertson
2010-08-13 20:17 ` Sven-Thorsten Dietrich
2010-08-13 21:20 ` John Culvertson
2010-08-13 22:57 ` Sven-Thorsten Dietrich
2010-08-16 17:16 ` John Culvertson
2010-08-16 17:22 ` Sven-Thorsten Dietrich
[not found] ` <D61182AC8012EA4EBC531B3AF23BE1099C86D6@tranzeo-mail2.12stewart.tranzeo.com>
[not found] ` <AANLkTi=3cz2RyHPdoNRjucTozKqDmJc8sDh+hsnmhKAS@mail.gmail.com>
2010-08-13 19:56 ` Darcy Watkins
2010-08-27 10:33 ` Thomas Gleixner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox