public inbox for linux-rt-users@vger.kernel.org
 help / color / mirror / Atom feed
From: Patrice Kadionik <kadionik@enseirb-matmeca.fr>
To: linux-rt-users@vger.kernel.org
Subject: Re: 2.6.33.6-rt28 kernel oops while stressing network
Date: Tue, 10 Aug 2010 15:00:37 +0200	[thread overview]
Message-ID: <4C614D75.1060100@enseirb-matmeca.fr> (raw)
In-Reply-To: <4C6144C7.4040609@enseirb-matmeca.fr>

Le 10/08/2010 14:23, Patrice Kadionik a écrit :
> Le 09/08/2010 22:10, John Culvertson a écrit :
>> Hello,
> Hello,
>
>> I am trying to use the RT patches on an x86 industrial computer.  I am
>> getting intermittent network hangs and kernel crashes when I load the
>> network with netperf.  The unpatched kernel does not exhibit these
>> problems.  The kernel is 2.6.33.6 patched with rt28.
>>
>> The computer has an AMD LX800 processor and two Intel 82559 10/100 PCI
>> Ethernet controllers.  I have only seen the kernel crashes when
>> running netperf on both ports simultaneously.
> I have ported PREEMPT-RT to the NIOS II architecture. NIOS II is a 
> softcore processor from Altera.
> I have added to the NIOS II Linux port(http://sopc.et.ntust.edu.tw/) 
> the hrtimer support and can now use cyclistest.
> I have done some measurements for having latency (my NIOS II target 
> boards runs at 100 MHz!).
> I have used ping flooding from another powerful PC (CPU frequency > 2 
> GHz) and have noticed that after few seconds, the bounded latency I 
> had arises up to 50 ms! My target board doesn't crash like you.
>  I have spent time for understanding. The ping flooding is OK with a 
> normal Linux kernel (few ms as latency in this case). I used wireshark 
> to analyze the traffic and saw that my board with PREEMPT-RT support 
> doesn't respond after few seconds to all ping requests.
>
> I've tried to put the IRQ thread of the Ethernet driver in a classical 
> mode like with the standard Linux kernel through adding the 
> IRQ_NODELAY flag with with request_irq() in the driver. My boards 
> boots but crashs on the first ping because treatment is always done by 
> the soft IRQ sirq-net-rx (this is this soft IRQ thread that causes 
> your crash).
> The NIOS II has no ftrace support yet so no tool for studying 
> latencies is available...
>
> I've done some researchs on the net on this problem and found the 
> presentation "INTERRUPTS CONSIDERED HARMFUL" from Peter Chubb and Yang 
> Song 
> (http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.156.9914&rep=rep1&type=pdf). 
>
>
> The paper presents the same testing environment like you and me: a 
> target board under PREEMPT-RT and a Ethernet traffic generator that 
> can generates a huge traffic load. They use cyclictest too.With heavy 
> traffic, latency from cyclictest goes up to 50 ms (like me)! By 
> analyzing traces (with ftrace), they saw that the soft IRQ sirq-net-rx 
> takes too time for responding in case of heavy traffic load. The 
> solution they have found was to modify the Ethernet driver (e1000)  
> with no soft IRQ.
> I know now the source of my problem and can't have a realistic 
> response time to ping flooding with a traffic generator that saturates 
> the target board under PREEMPT-RT. In this case, the Ethernet driver 
> must be revisited.
> You may have the same problem with another consequence: crash. Have 
> you tried to ping flood just one Ethernet interface with heavy traffic?
> For latency measurement, I just use hackbench 
> (http://devresources.linuxfoundation.org/craiger/hackbench/), stress 
> (http://weather.ou.edu/~apw/projects/stress/) tools and dd commands. 
> My latency time with cyclictest is bounded with heavy CPU load (min= 
> 300µs  max<1400 µs CPU@100 MHz) and know that I can have realistic 
> response time in case of heavy Ethernet traffic (my NIOS II board has 
> not enough CPU power in this case).
read:
...know that I CAN'T have realistic response time in case of heavy 
Ethernet traffic (my NIOS II board has not enough CPU power in this case).

Sorry.
Pat.
>
> Pat.
>
>
>> This is my first time using the RT patches, so I am not sure how to go
>> about resolving this.  Any tips would be greatly appreciated.
>>
>> [  201.514962] BUG: unable to handle kernel paging request at a0282044
>> [  201.516020] IP: [<c108d664>] free_block+0x4f/0xe5
>> [  201.516020] *pde = 00000000
>> [  201.516020] Oops: 0002 [#1] PREEMPT
>> [  201.516020] last sysfs file: /sys/module/vt/parameters/default_utf8
>> [  201.516020] Modules linked in: evdev usbhid ohci_hcd geode_rng ecb
>> aes_i586 ehci_hcd aes_generic usbcore geode_aes nls_base
>> [  201.516020]
>> [  201.516020] Pid: 6, comm: sirq-net-rx/0 Tainted: G        W
>> 2.6.33.6-rt28 #4 SL8/SL8
>> [  201.516020] EIP: 0060:[<c108d664>] EFLAGS: 00010202 CPU: 0
>> [  201.516020] EIP is at free_block+0x4f/0xe5
>> [  201.516020] EAX: d6d75060 EBX: de682500 ECX: 00000004 EDX: a0282040
>> [  201.516020] ESI: de682020 EDI: de431340 EBP: de40e5c0 ESP: de44bd74
>> [  201.516020]  DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068 
>> preempt:00000000
>> [  201.516020] Process sirq-net-rx/0 (pid: 6, ti=de44a000
>> task=de420490 task.ti=de44a000)
>> [  201.516020] Stack:
>> [  201.516020]  00000003 00000000 0000001b de406688 00000001 de431340
>> 00000000 de406660
>> [  201.516020]<0>  0000001b c108d835 00000000 de44bdc8 de44bdc8
>> ddbd2060 de40e5c0 de431364
>> [  201.516020]<0>  00000000 de40e5c0 ddbd2060 ddbd2060 c108d581
>> 00000000 00000000 d6e78620
>> [  201.516020] Call Trace:
>> [  201.516020]  [<c108d835>] ? __cache_free+0x7a/0xae
>> [  201.516020]  [<c108d581>] ? kmem_cache_free+0x1c/0x58
>> [  201.516020]  [<c11d3493>] ? tcp_ack+0x3eb/0x12f5
>> [  201.516020]  [<c11d4bd8>] ? tcp_rcv_established+0xb0/0x476
>> [  201.516020]  [<c11da92f>] ? tcp_v4_do_rcv+0x129/0x28f
>> [  201.516020]  [<c11dbf43>] ? tcp_v4_rcv+0x339/0x523
>> [  201.516020]  [<c11c3a8a>] ? ip_local_deliver_finish+0xf9/0x160
>> [  201.516020]  [<c11c3925>] ? ip_rcv_finish+0x28a/0x29d
>> [  201.516020]  [<c11aceb4>] ? netif_receive_skb+0x1c2/0x1e9
>> [  201.516020]  [<c118d368>] ? e100_poll+0x172/0x37c
>> [  201.516020]  [<c11af94c>] ? net_rx_action+0x53/0x100
>> [  201.516020]  [<c1027743>] ? run_ksoftirqd+0xfb/0x1da
>> [  201.516020]  [<c1027648>] ? run_ksoftirqd+0x0/0x1da
>> [  201.516020]  [<c1036d2d>] ? kthread+0x52/0x57
>> [  201.516020]  [<c1036cdb>] ? kthread+0x0/0x57
>> [  201.516020]  [<c1002dbe>] ? kernel_thread_helper+0x6/0x10
>> [  201.516020] Code: 24 0c 8b 1c 82 89 d8 e8 34 fc ff ff 89 c6 e8 18
>> f9 ff ff 85 c0 75 04 0f 0b eb fe 8b 76 1c 8b 44 24 28 8b 16 8b 7c 85
>> 4c 8b 46 04<89>  42 04 89 10 2b 5e 0c c7 06 00 01 10 00 c7 46 04 00 02
>> 20 00
>> [  201.516020] EIP: [<c108d664>] free_block+0x4f/0xe5 SS:ESP 
>> 0068:de44bd74
>> [  201.516020] CR2: 00000000a0282044
>> [  201.908587] ---[ end trace d28d8d35cd5a7130 ]---
>>
>> [  201.920053] ------------[ cut here ]------------
>> [  201.924018] kernel BUG at kernel/rtmutex.c:831!
>> [  201.924018] invalid opcode: 0000 [#2] PREEMPT
>> [  201.924018] last sysfs file: /sys/module/vt/parameters/default_utf8
>> [  201.924018] Modules linked in: evdev usbhid ohci_hcd geode_rng ecb
>> aes_i586 ehci_hcd aes_generic usbcore geode_aes nls_base
>> [  201.924018]
>> [  201.924018] Pid: 6, comm: sirq-net-rx/0 Tainted: G      D W
>> 2.6.33.6-rt28 #4 SL8/SL8
>> [  201.924018] EIP: 0060:[<c122ca6e>] EFLAGS: 00010046 CPU: 0
>> [  201.924018] EIP is at rt_spin_lock_slowlock+0x35/0x155
>> [  201.924018] EAX: de420490 EBX: 00000292 ECX: 00000000 EDX: de420490
>> [  201.924018] ESI: c122ca39 EDI: c1321160 EBP: 00000000 ESP: de44bba8
>> [  201.924018]  DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068 
>> preempt:00000001
>> [  201.924018] Process sirq-net-rx/0 (pid: 6, ti=de44a000
>> task=de420490 task.ti=de44a000)
>> [  201.924018] Stack:
>> [  201.924018]  00000030 00000046 de44bbd0 c102784a c1003c19 de120c7c
>> de226b3c de40a600
>> [  201.924018]<0>  00000000 c1002db0 de120c7c 00000000 c1322c40
>> de226b3c c1321160 c122ca39
>> [  201.924018]<0>  de120c64 00000000 c104582b de44bc08 de40e7a0
>> c108d08a de120c7c c108d576
>> [  201.924018] Call Trace:
>> [  201.924018]  [<c102784a>] ? irq_exit+0x28/0x32
>> [  201.924018]  [<c1003c19>] ? do_IRQ+0x61/0x71
>> [  201.924018]  [<c1002db0>] ? common_interrupt+0x30/0x38
>> [  201.924018]  [<c122ca39>] ? rt_spin_lock_slowlock+0x0/0x155
>> [  201.924018]  [<c104582b>] ? rt_spin_lock_fastlock+0x52/0x55
>> [  201.924018]  [<c108d08a>] ? _slab_irq_disable+0xd/0x15
>> [  201.924018]  [<c108d576>] ? kmem_cache_free+0x11/0x58
>> [  201.924018]  [<c109f603>] ? destroy_inode+0x1c/0x2b
>> [  201.924018]  [<c109eefe>] ? iput+0x47/0x49
>> [  201.924018]  [<c109cfd1>] ? d_kill+0x2d/0x47
>> [  201.924018]  [<c109d195>] ? __shrink_dcache_sb+0x1aa/0x247
>> [  201.924018]  [<c109d4c0>] ? shrink_dcache_parent+0x26/0xd7
>> [  201.924018]  [<c10c59f9>] ? proc_flush_task+0x7d/0x165
>> [  201.924018]  [<c1024445>] ? release_task+0x18/0x2af
>> [  201.924018]  [<c102570c>] ? do_exit+0x4dd/0x547
>> [  201.924018]  [<c1004d16>] ? oops_end+0x7f/0x83
>> [  201.924018]  [<c1015165>] ? no_context+0x10c/0x115
>> [  201.924018]  [<c10153ad>] ? do_page_fault+0x0/0x28f
>> [  201.924018]  [<c1015361>] ? bad_area_nosemaphore+0xa/0xc
>> [  201.924018]  [<c122d2fb>] ? error_code+0x6b/0x70
>> [  201.924018]  [<c108d664>] ? free_block+0x4f/0xe5
>> [  201.924018]  [<c108d835>] ? __cache_free+0x7a/0xae
>> [  201.924018]  [<c108d581>] ? kmem_cache_free+0x1c/0x58
>> [  201.924018]  [<c11d3493>] ? tcp_ack+0x3eb/0x12f5
>> [  201.924018]  [<c11d4bd8>] ? tcp_rcv_established+0xb0/0x476
>> [  201.924018]  [<c11da92f>] ? tcp_v4_do_rcv+0x129/0x28f
>> [  201.924018]  [<c11dbf43>] ? tcp_v4_rcv+0x339/0x523
>> [  201.924018]  [<c11c3a8a>] ? ip_local_deliver_finish+0xf9/0x160
>> [  201.924018]  [<c11c3925>] ? ip_rcv_finish+0x28a/0x29d
>> [  201.924018]  [<c11aceb4>] ? netif_receive_skb+0x1c2/0x1e9
>> [  201.924018]  [<c118d368>] ? e100_poll+0x172/0x37c
>> [  201.924018]  [<c11af94c>] ? net_rx_action+0x53/0x100
>> [  201.924018]  [<c1027743>] ? run_ksoftirqd+0xfb/0x1da
>> [  201.924018]  [<c1027648>] ? run_ksoftirqd+0x0/0x1da
>> [  201.924018]  [<c1036d2d>] ? kthread+0x52/0x57
>> [  201.924018]  [<c1036cdb>] ? kthread+0x0/0x57
>> [  201.924018]  [<c1002dbe>] ? kernel_thread_helper+0x6/0x10
>> [  201.924018] Code: 44 24 2c 00 00 00 00 9c 5b fa b8 01 00 00 00 e8
>> 8d f5 de ff 89 f8 e8 fd 83 e1 ff 8b 47 10 8b 15 d8 02 31 c1 83 e0 fc
>> 39 d0 75 04<0f>  0b eb fe 8b 02 e8 e0 82 e1 ff 89 c5 8b 35 d8 02 31 c1
>> 8b 46
>> [  201.924018] EIP: [<c122ca6e>] rt_spin_lock_slowlock+0x35/0x155
>> SS:ESP 0068:de44bba8
>> [  201.924018] ---[ end trace d28d8d35cd5a7131 ]---
>> [  201.924018] Fixing recursive fault but reboot is needed!
>> [  202.672902] sched: RT throttling activated
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe 
>> linux-rt-users" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
>


-- 
Patrice Kadionik. F6KQH / F4CUQ
-----------

+----------------------------------------------------------------------+
+"Tout doit etre aussi simple que possible, pas seulement plus simple" +
+----------------------------------------------------------------------+
+ Patrice Kadionik             http://www.enseirb-matmeca.fr/~kadionik +
+ IMS Laboratory               http://www.ims-bordeaux.fr/             +
+ ENSEIRB-MATMECA              http://www.enseirb-matmeca.fr           +
+ PO BOX 99                    fax   : +33 5.56.37.20.23               +
+ 33402 TALENCE Cedex          voice : +33 5.56.84.23.47               +
+ FRANCE                       mailto:patrice.kadionik@ims-bordeaux.fr +
+----------------------------------------------------------------------+

--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2010-08-10 13:00 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-08-09 20:10 2.6.33.6-rt28 kernel oops while stressing network John Culvertson
2010-08-10 12:23 ` Patrice Kadionik
2010-08-10 13:00   ` Patrice Kadionik [this message]
2010-08-12 16:09     ` Patrice Kadionik
     [not found] ` <AANLkTi=tPSeXTZkjPPm_MGmmOx2fZhryOkajgssv0EsX@mail.gmail.com>
2010-08-11 16:53   ` John Culvertson
2010-08-13 17:37     ` John Culvertson
2010-08-13 17:56       ` Darcy Watkins
     [not found]         ` <AANLkTikty=V_==0udO9F2MxpVxwuLzyOQZt0ha5=VC3y@mail.gmail.com>
2010-08-13 18:07           ` John Culvertson
2010-08-13 20:17             ` Sven-Thorsten Dietrich
2010-08-13 21:20               ` John Culvertson
2010-08-13 22:57                 ` Sven-Thorsten Dietrich
2010-08-16 17:16                   ` John Culvertson
2010-08-16 17:22                     ` Sven-Thorsten Dietrich
     [not found]           ` <D61182AC8012EA4EBC531B3AF23BE1099C86D6@tranzeo-mail2.12stewart.tranzeo.com>
     [not found]             ` <AANLkTi=3cz2RyHPdoNRjucTozKqDmJc8sDh+hsnmhKAS@mail.gmail.com>
2010-08-13 19:56               ` Darcy Watkins
2010-08-27 10:33     ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4C614D75.1060100@enseirb-matmeca.fr \
    --to=kadionik@enseirb-matmeca.fr \
    --cc=linux-rt-users@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox