From: Patrice Kadionik <kadionik@enseirb-matmeca.fr>
To: linux-rt-users@vger.kernel.org
Subject: Re: 2.6.33.6-rt28 kernel oops while stressing network
Date: Tue, 10 Aug 2010 15:00:37 +0200 [thread overview]
Message-ID: <4C614D75.1060100@enseirb-matmeca.fr> (raw)
In-Reply-To: <4C6144C7.4040609@enseirb-matmeca.fr>
Le 10/08/2010 14:23, Patrice Kadionik a écrit :
> Le 09/08/2010 22:10, John Culvertson a écrit :
>> Hello,
> Hello,
>
>> I am trying to use the RT patches on an x86 industrial computer. I am
>> getting intermittent network hangs and kernel crashes when I load the
>> network with netperf. The unpatched kernel does not exhibit these
>> problems. The kernel is 2.6.33.6 patched with rt28.
>>
>> The computer has an AMD LX800 processor and two Intel 82559 10/100 PCI
>> Ethernet controllers. I have only seen the kernel crashes when
>> running netperf on both ports simultaneously.
> I have ported PREEMPT-RT to the NIOS II architecture. NIOS II is a
> softcore processor from Altera.
> I have added to the NIOS II Linux port(http://sopc.et.ntust.edu.tw/)
> the hrtimer support and can now use cyclistest.
> I have done some measurements for having latency (my NIOS II target
> boards runs at 100 MHz!).
> I have used ping flooding from another powerful PC (CPU frequency > 2
> GHz) and have noticed that after few seconds, the bounded latency I
> had arises up to 50 ms! My target board doesn't crash like you.
> I have spent time for understanding. The ping flooding is OK with a
> normal Linux kernel (few ms as latency in this case). I used wireshark
> to analyze the traffic and saw that my board with PREEMPT-RT support
> doesn't respond after few seconds to all ping requests.
>
> I've tried to put the IRQ thread of the Ethernet driver in a classical
> mode like with the standard Linux kernel through adding the
> IRQ_NODELAY flag with with request_irq() in the driver. My boards
> boots but crashs on the first ping because treatment is always done by
> the soft IRQ sirq-net-rx (this is this soft IRQ thread that causes
> your crash).
> The NIOS II has no ftrace support yet so no tool for studying
> latencies is available...
>
> I've done some researchs on the net on this problem and found the
> presentation "INTERRUPTS CONSIDERED HARMFUL" from Peter Chubb and Yang
> Song
> (http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.156.9914&rep=rep1&type=pdf).
>
>
> The paper presents the same testing environment like you and me: a
> target board under PREEMPT-RT and a Ethernet traffic generator that
> can generates a huge traffic load. They use cyclictest too.With heavy
> traffic, latency from cyclictest goes up to 50 ms (like me)! By
> analyzing traces (with ftrace), they saw that the soft IRQ sirq-net-rx
> takes too time for responding in case of heavy traffic load. The
> solution they have found was to modify the Ethernet driver (e1000)
> with no soft IRQ.
> I know now the source of my problem and can't have a realistic
> response time to ping flooding with a traffic generator that saturates
> the target board under PREEMPT-RT. In this case, the Ethernet driver
> must be revisited.
> You may have the same problem with another consequence: crash. Have
> you tried to ping flood just one Ethernet interface with heavy traffic?
> For latency measurement, I just use hackbench
> (http://devresources.linuxfoundation.org/craiger/hackbench/), stress
> (http://weather.ou.edu/~apw/projects/stress/) tools and dd commands.
> My latency time with cyclictest is bounded with heavy CPU load (min=
> 300µs max<1400 µs CPU@100 MHz) and know that I can have realistic
> response time in case of heavy Ethernet traffic (my NIOS II board has
> not enough CPU power in this case).
read:
...know that I CAN'T have realistic response time in case of heavy
Ethernet traffic (my NIOS II board has not enough CPU power in this case).
Sorry.
Pat.
>
> Pat.
>
>
>> This is my first time using the RT patches, so I am not sure how to go
>> about resolving this. Any tips would be greatly appreciated.
>>
>> [ 201.514962] BUG: unable to handle kernel paging request at a0282044
>> [ 201.516020] IP: [<c108d664>] free_block+0x4f/0xe5
>> [ 201.516020] *pde = 00000000
>> [ 201.516020] Oops: 0002 [#1] PREEMPT
>> [ 201.516020] last sysfs file: /sys/module/vt/parameters/default_utf8
>> [ 201.516020] Modules linked in: evdev usbhid ohci_hcd geode_rng ecb
>> aes_i586 ehci_hcd aes_generic usbcore geode_aes nls_base
>> [ 201.516020]
>> [ 201.516020] Pid: 6, comm: sirq-net-rx/0 Tainted: G W
>> 2.6.33.6-rt28 #4 SL8/SL8
>> [ 201.516020] EIP: 0060:[<c108d664>] EFLAGS: 00010202 CPU: 0
>> [ 201.516020] EIP is at free_block+0x4f/0xe5
>> [ 201.516020] EAX: d6d75060 EBX: de682500 ECX: 00000004 EDX: a0282040
>> [ 201.516020] ESI: de682020 EDI: de431340 EBP: de40e5c0 ESP: de44bd74
>> [ 201.516020] DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068
>> preempt:00000000
>> [ 201.516020] Process sirq-net-rx/0 (pid: 6, ti=de44a000
>> task=de420490 task.ti=de44a000)
>> [ 201.516020] Stack:
>> [ 201.516020] 00000003 00000000 0000001b de406688 00000001 de431340
>> 00000000 de406660
>> [ 201.516020]<0> 0000001b c108d835 00000000 de44bdc8 de44bdc8
>> ddbd2060 de40e5c0 de431364
>> [ 201.516020]<0> 00000000 de40e5c0 ddbd2060 ddbd2060 c108d581
>> 00000000 00000000 d6e78620
>> [ 201.516020] Call Trace:
>> [ 201.516020] [<c108d835>] ? __cache_free+0x7a/0xae
>> [ 201.516020] [<c108d581>] ? kmem_cache_free+0x1c/0x58
>> [ 201.516020] [<c11d3493>] ? tcp_ack+0x3eb/0x12f5
>> [ 201.516020] [<c11d4bd8>] ? tcp_rcv_established+0xb0/0x476
>> [ 201.516020] [<c11da92f>] ? tcp_v4_do_rcv+0x129/0x28f
>> [ 201.516020] [<c11dbf43>] ? tcp_v4_rcv+0x339/0x523
>> [ 201.516020] [<c11c3a8a>] ? ip_local_deliver_finish+0xf9/0x160
>> [ 201.516020] [<c11c3925>] ? ip_rcv_finish+0x28a/0x29d
>> [ 201.516020] [<c11aceb4>] ? netif_receive_skb+0x1c2/0x1e9
>> [ 201.516020] [<c118d368>] ? e100_poll+0x172/0x37c
>> [ 201.516020] [<c11af94c>] ? net_rx_action+0x53/0x100
>> [ 201.516020] [<c1027743>] ? run_ksoftirqd+0xfb/0x1da
>> [ 201.516020] [<c1027648>] ? run_ksoftirqd+0x0/0x1da
>> [ 201.516020] [<c1036d2d>] ? kthread+0x52/0x57
>> [ 201.516020] [<c1036cdb>] ? kthread+0x0/0x57
>> [ 201.516020] [<c1002dbe>] ? kernel_thread_helper+0x6/0x10
>> [ 201.516020] Code: 24 0c 8b 1c 82 89 d8 e8 34 fc ff ff 89 c6 e8 18
>> f9 ff ff 85 c0 75 04 0f 0b eb fe 8b 76 1c 8b 44 24 28 8b 16 8b 7c 85
>> 4c 8b 46 04<89> 42 04 89 10 2b 5e 0c c7 06 00 01 10 00 c7 46 04 00 02
>> 20 00
>> [ 201.516020] EIP: [<c108d664>] free_block+0x4f/0xe5 SS:ESP
>> 0068:de44bd74
>> [ 201.516020] CR2: 00000000a0282044
>> [ 201.908587] ---[ end trace d28d8d35cd5a7130 ]---
>>
>> [ 201.920053] ------------[ cut here ]------------
>> [ 201.924018] kernel BUG at kernel/rtmutex.c:831!
>> [ 201.924018] invalid opcode: 0000 [#2] PREEMPT
>> [ 201.924018] last sysfs file: /sys/module/vt/parameters/default_utf8
>> [ 201.924018] Modules linked in: evdev usbhid ohci_hcd geode_rng ecb
>> aes_i586 ehci_hcd aes_generic usbcore geode_aes nls_base
>> [ 201.924018]
>> [ 201.924018] Pid: 6, comm: sirq-net-rx/0 Tainted: G D W
>> 2.6.33.6-rt28 #4 SL8/SL8
>> [ 201.924018] EIP: 0060:[<c122ca6e>] EFLAGS: 00010046 CPU: 0
>> [ 201.924018] EIP is at rt_spin_lock_slowlock+0x35/0x155
>> [ 201.924018] EAX: de420490 EBX: 00000292 ECX: 00000000 EDX: de420490
>> [ 201.924018] ESI: c122ca39 EDI: c1321160 EBP: 00000000 ESP: de44bba8
>> [ 201.924018] DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068
>> preempt:00000001
>> [ 201.924018] Process sirq-net-rx/0 (pid: 6, ti=de44a000
>> task=de420490 task.ti=de44a000)
>> [ 201.924018] Stack:
>> [ 201.924018] 00000030 00000046 de44bbd0 c102784a c1003c19 de120c7c
>> de226b3c de40a600
>> [ 201.924018]<0> 00000000 c1002db0 de120c7c 00000000 c1322c40
>> de226b3c c1321160 c122ca39
>> [ 201.924018]<0> de120c64 00000000 c104582b de44bc08 de40e7a0
>> c108d08a de120c7c c108d576
>> [ 201.924018] Call Trace:
>> [ 201.924018] [<c102784a>] ? irq_exit+0x28/0x32
>> [ 201.924018] [<c1003c19>] ? do_IRQ+0x61/0x71
>> [ 201.924018] [<c1002db0>] ? common_interrupt+0x30/0x38
>> [ 201.924018] [<c122ca39>] ? rt_spin_lock_slowlock+0x0/0x155
>> [ 201.924018] [<c104582b>] ? rt_spin_lock_fastlock+0x52/0x55
>> [ 201.924018] [<c108d08a>] ? _slab_irq_disable+0xd/0x15
>> [ 201.924018] [<c108d576>] ? kmem_cache_free+0x11/0x58
>> [ 201.924018] [<c109f603>] ? destroy_inode+0x1c/0x2b
>> [ 201.924018] [<c109eefe>] ? iput+0x47/0x49
>> [ 201.924018] [<c109cfd1>] ? d_kill+0x2d/0x47
>> [ 201.924018] [<c109d195>] ? __shrink_dcache_sb+0x1aa/0x247
>> [ 201.924018] [<c109d4c0>] ? shrink_dcache_parent+0x26/0xd7
>> [ 201.924018] [<c10c59f9>] ? proc_flush_task+0x7d/0x165
>> [ 201.924018] [<c1024445>] ? release_task+0x18/0x2af
>> [ 201.924018] [<c102570c>] ? do_exit+0x4dd/0x547
>> [ 201.924018] [<c1004d16>] ? oops_end+0x7f/0x83
>> [ 201.924018] [<c1015165>] ? no_context+0x10c/0x115
>> [ 201.924018] [<c10153ad>] ? do_page_fault+0x0/0x28f
>> [ 201.924018] [<c1015361>] ? bad_area_nosemaphore+0xa/0xc
>> [ 201.924018] [<c122d2fb>] ? error_code+0x6b/0x70
>> [ 201.924018] [<c108d664>] ? free_block+0x4f/0xe5
>> [ 201.924018] [<c108d835>] ? __cache_free+0x7a/0xae
>> [ 201.924018] [<c108d581>] ? kmem_cache_free+0x1c/0x58
>> [ 201.924018] [<c11d3493>] ? tcp_ack+0x3eb/0x12f5
>> [ 201.924018] [<c11d4bd8>] ? tcp_rcv_established+0xb0/0x476
>> [ 201.924018] [<c11da92f>] ? tcp_v4_do_rcv+0x129/0x28f
>> [ 201.924018] [<c11dbf43>] ? tcp_v4_rcv+0x339/0x523
>> [ 201.924018] [<c11c3a8a>] ? ip_local_deliver_finish+0xf9/0x160
>> [ 201.924018] [<c11c3925>] ? ip_rcv_finish+0x28a/0x29d
>> [ 201.924018] [<c11aceb4>] ? netif_receive_skb+0x1c2/0x1e9
>> [ 201.924018] [<c118d368>] ? e100_poll+0x172/0x37c
>> [ 201.924018] [<c11af94c>] ? net_rx_action+0x53/0x100
>> [ 201.924018] [<c1027743>] ? run_ksoftirqd+0xfb/0x1da
>> [ 201.924018] [<c1027648>] ? run_ksoftirqd+0x0/0x1da
>> [ 201.924018] [<c1036d2d>] ? kthread+0x52/0x57
>> [ 201.924018] [<c1036cdb>] ? kthread+0x0/0x57
>> [ 201.924018] [<c1002dbe>] ? kernel_thread_helper+0x6/0x10
>> [ 201.924018] Code: 44 24 2c 00 00 00 00 9c 5b fa b8 01 00 00 00 e8
>> 8d f5 de ff 89 f8 e8 fd 83 e1 ff 8b 47 10 8b 15 d8 02 31 c1 83 e0 fc
>> 39 d0 75 04<0f> 0b eb fe 8b 02 e8 e0 82 e1 ff 89 c5 8b 35 d8 02 31 c1
>> 8b 46
>> [ 201.924018] EIP: [<c122ca6e>] rt_spin_lock_slowlock+0x35/0x155
>> SS:ESP 0068:de44bba8
>> [ 201.924018] ---[ end trace d28d8d35cd5a7131 ]---
>> [ 201.924018] Fixing recursive fault but reboot is needed!
>> [ 202.672902] sched: RT throttling activated
>> --
>> To unsubscribe from this list: send the line "unsubscribe
>> linux-rt-users" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>
>
--
Patrice Kadionik. F6KQH / F4CUQ
-----------
+----------------------------------------------------------------------+
+"Tout doit etre aussi simple que possible, pas seulement plus simple" +
+----------------------------------------------------------------------+
+ Patrice Kadionik http://www.enseirb-matmeca.fr/~kadionik +
+ IMS Laboratory http://www.ims-bordeaux.fr/ +
+ ENSEIRB-MATMECA http://www.enseirb-matmeca.fr +
+ PO BOX 99 fax : +33 5.56.37.20.23 +
+ 33402 TALENCE Cedex voice : +33 5.56.84.23.47 +
+ FRANCE mailto:patrice.kadionik@ims-bordeaux.fr +
+----------------------------------------------------------------------+
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2010-08-10 13:00 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-08-09 20:10 2.6.33.6-rt28 kernel oops while stressing network John Culvertson
2010-08-10 12:23 ` Patrice Kadionik
2010-08-10 13:00 ` Patrice Kadionik [this message]
2010-08-12 16:09 ` Patrice Kadionik
[not found] ` <AANLkTi=tPSeXTZkjPPm_MGmmOx2fZhryOkajgssv0EsX@mail.gmail.com>
2010-08-11 16:53 ` John Culvertson
2010-08-13 17:37 ` John Culvertson
2010-08-13 17:56 ` Darcy Watkins
[not found] ` <AANLkTikty=V_==0udO9F2MxpVxwuLzyOQZt0ha5=VC3y@mail.gmail.com>
2010-08-13 18:07 ` John Culvertson
2010-08-13 20:17 ` Sven-Thorsten Dietrich
2010-08-13 21:20 ` John Culvertson
2010-08-13 22:57 ` Sven-Thorsten Dietrich
2010-08-16 17:16 ` John Culvertson
2010-08-16 17:22 ` Sven-Thorsten Dietrich
[not found] ` <D61182AC8012EA4EBC531B3AF23BE1099C86D6@tranzeo-mail2.12stewart.tranzeo.com>
[not found] ` <AANLkTi=3cz2RyHPdoNRjucTozKqDmJc8sDh+hsnmhKAS@mail.gmail.com>
2010-08-13 19:56 ` Darcy Watkins
2010-08-27 10:33 ` Thomas Gleixner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4C614D75.1060100@enseirb-matmeca.fr \
--to=kadionik@enseirb-matmeca.fr \
--cc=linux-rt-users@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox