From mboxrd@z Thu Jan 1 00:00:00 1970 From: Patrice Kadionik Subject: Re: 2.6.33.6-rt28 kernel oops while stressing network Date: Tue, 10 Aug 2010 15:00:37 +0200 Message-ID: <4C614D75.1060100@enseirb-matmeca.fr> References: <4C6144C7.4040609@enseirb-matmeca.fr> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE To: linux-rt-users@vger.kernel.org Return-path: Received: from plan.enseirb.fr ([147.210.18.60]:37561 "EHLO plan.enseirb.fr" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754923Ab0HJNAs (ORCPT ); Tue, 10 Aug 2010 09:00:48 -0400 Received: from localhost (mx [147.210.18.15]) by plan.enseirb.fr (8.13.8/8.13.8) with ESMTP id o7AD0A4X023177 for ; Tue, 10 Aug 2010 15:00:10 +0200 (MEST) Received: from plan.enseirb.fr ([147.210.18.60]) by localhost (tan.enseirb.fr [147.210.18.15]) (amavisd-new, port 10041) with LMTP id tFDXHmkVtnJM for ; Tue, 10 Aug 2010 15:00:20 +0200 (MEST) Received: from [192.168.0.1] (dispo-82-65-217-243.adsl.proxad.net [82.65.217.243]) (authenticated bits=0) from identified as kadionik by plan.enseirb.fr (8.13.8/8.13.8) with ESMTP id o7AD07t6023175 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Tue, 10 Aug 2010 15:00:08 +0200 (MEST) In-Reply-To: <4C6144C7.4040609@enseirb-matmeca.fr> Sender: linux-rt-users-owner@vger.kernel.org List-ID: Le 10/08/2010 14:23, Patrice Kadionik a =E9crit : > Le 09/08/2010 22:10, John Culvertson a =E9crit : >> Hello, > Hello, > >> I am trying to use the RT patches on an x86 industrial computer. I = am >> getting intermittent network hangs and kernel crashes when I load th= e >> network with netperf. The unpatched kernel does not exhibit these >> problems. The kernel is 2.6.33.6 patched with rt28. >> >> The computer has an AMD LX800 processor and two Intel 82559 10/100 P= CI >> Ethernet controllers. I have only seen the kernel crashes when >> running netperf on both ports simultaneously. > I have ported PREEMPT-RT to the NIOS II architecture. NIOS II is a=20 > softcore processor from Altera. > I have added to the NIOS II Linux port(http://sopc.et.ntust.edu.tw/)=20 > the hrtimer support and can now use cyclistest. > I have done some measurements for having latency (my NIOS II target=20 > boards runs at 100 MHz!). > I have used ping flooding from another powerful PC (CPU frequency > 2= =20 > GHz) and have noticed that after few seconds, the bounded latency I=20 > had arises up to 50 ms! My target board doesn't crash like you. > I have spent time for understanding. The ping flooding is OK with a=20 > normal Linux kernel (few ms as latency in this case). I used wireshar= k=20 > to analyze the traffic and saw that my board with PREEMPT-RT support=20 > doesn't respond after few seconds to all ping requests. > > I've tried to put the IRQ thread of the Ethernet driver in a classica= l=20 > mode like with the standard Linux kernel through adding the=20 > IRQ_NODELAY flag with with request_irq() in the driver. My boards=20 > boots but crashs on the first ping because treatment is always done b= y=20 > the soft IRQ sirq-net-rx (this is this soft IRQ thread that causes=20 > your crash). > The NIOS II has no ftrace support yet so no tool for studying=20 > latencies is available... > > I've done some researchs on the net on this problem and found the=20 > presentation "INTERRUPTS CONSIDERED HARMFUL" from Peter Chubb and Yan= g=20 > Song=20 > (http://citeseerx.ist.psu.edu/viewdoc/download?doi=3D10.1.1.156.9914&= rep=3Drep1&type=3Dpdf).=20 > > > The paper presents the same testing environment like you and me: a=20 > target board under PREEMPT-RT and a Ethernet traffic generator that=20 > can generates a huge traffic load. They use cyclictest too.With heavy= =20 > traffic, latency from cyclictest goes up to 50 ms (like me)! By=20 > analyzing traces (with ftrace), they saw that the soft IRQ sirq-net-r= x=20 > takes too time for responding in case of heavy traffic load. The=20 > solution they have found was to modify the Ethernet driver (e1000) =20 > with no soft IRQ. > I know now the source of my problem and can't have a realistic=20 > response time to ping flooding with a traffic generator that saturate= s=20 > the target board under PREEMPT-RT. In this case, the Ethernet driver=20 > must be revisited. > You may have the same problem with another consequence: crash. Have=20 > you tried to ping flood just one Ethernet interface with heavy traffi= c? > For latency measurement, I just use hackbench=20 > (http://devresources.linuxfoundation.org/craiger/hackbench/), stress=20 > (http://weather.ou.edu/~apw/projects/stress/) tools and dd commands.=20 > My latency time with cyclictest is bounded with heavy CPU load (min=3D= =20 > 300=B5s max<1400 =B5s CPU@100 MHz) and know that I can have realisti= c=20 > response time in case of heavy Ethernet traffic (my NIOS II board has= =20 > not enough CPU power in this case). read: =2E..know that I CAN'T have realistic response time in case of heavy=20 Ethernet traffic (my NIOS II board has not enough CPU power in this cas= e). Sorry. Pat. > > Pat. > > >> This is my first time using the RT patches, so I am not sure how to = go >> about resolving this. Any tips would be greatly appreciated. >> >> [ 201.514962] BUG: unable to handle kernel paging request at a02820= 44 >> [ 201.516020] IP: [] free_block+0x4f/0xe5 >> [ 201.516020] *pde =3D 00000000 >> [ 201.516020] Oops: 0002 [#1] PREEMPT >> [ 201.516020] last sysfs file: /sys/module/vt/parameters/default_ut= f8 >> [ 201.516020] Modules linked in: evdev usbhid ohci_hcd geode_rng ec= b >> aes_i586 ehci_hcd aes_generic usbcore geode_aes nls_base >> [ 201.516020] >> [ 201.516020] Pid: 6, comm: sirq-net-rx/0 Tainted: G W >> 2.6.33.6-rt28 #4 SL8/SL8 >> [ 201.516020] EIP: 0060:[] EFLAGS: 00010202 CPU: 0 >> [ 201.516020] EIP is at free_block+0x4f/0xe5 >> [ 201.516020] EAX: d6d75060 EBX: de682500 ECX: 00000004 EDX: a02820= 40 >> [ 201.516020] ESI: de682020 EDI: de431340 EBP: de40e5c0 ESP: de44bd= 74 >> [ 201.516020] DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068=20 >> preempt:00000000 >> [ 201.516020] Process sirq-net-rx/0 (pid: 6, ti=3Dde44a000 >> task=3Dde420490 task.ti=3Dde44a000) >> [ 201.516020] Stack: >> [ 201.516020] 00000003 00000000 0000001b de406688 00000001 de43134= 0 >> 00000000 de406660 >> [ 201.516020]<0> 0000001b c108d835 00000000 de44bdc8 de44bdc8 >> ddbd2060 de40e5c0 de431364 >> [ 201.516020]<0> 00000000 de40e5c0 ddbd2060 ddbd2060 c108d581 >> 00000000 00000000 d6e78620 >> [ 201.516020] Call Trace: >> [ 201.516020] [] ? __cache_free+0x7a/0xae >> [ 201.516020] [] ? kmem_cache_free+0x1c/0x58 >> [ 201.516020] [] ? tcp_ack+0x3eb/0x12f5 >> [ 201.516020] [] ? tcp_rcv_established+0xb0/0x476 >> [ 201.516020] [] ? tcp_v4_do_rcv+0x129/0x28f >> [ 201.516020] [] ? tcp_v4_rcv+0x339/0x523 >> [ 201.516020] [] ? ip_local_deliver_finish+0xf9/0x160 >> [ 201.516020] [] ? ip_rcv_finish+0x28a/0x29d >> [ 201.516020] [] ? netif_receive_skb+0x1c2/0x1e9 >> [ 201.516020] [] ? e100_poll+0x172/0x37c >> [ 201.516020] [] ? net_rx_action+0x53/0x100 >> [ 201.516020] [] ? run_ksoftirqd+0xfb/0x1da >> [ 201.516020] [] ? run_ksoftirqd+0x0/0x1da >> [ 201.516020] [] ? kthread+0x52/0x57 >> [ 201.516020] [] ? kthread+0x0/0x57 >> [ 201.516020] [] ? kernel_thread_helper+0x6/0x10 >> [ 201.516020] Code: 24 0c 8b 1c 82 89 d8 e8 34 fc ff ff 89 c6 e8 18 >> f9 ff ff 85 c0 75 04 0f 0b eb fe 8b 76 1c 8b 44 24 28 8b 16 8b 7c 85 >> 4c 8b 46 04<89> 42 04 89 10 2b 5e 0c c7 06 00 01 10 00 c7 46 04 00 = 02 >> 20 00 >> [ 201.516020] EIP: [] free_block+0x4f/0xe5 SS:ESP=20 >> 0068:de44bd74 >> [ 201.516020] CR2: 00000000a0282044 >> [ 201.908587] ---[ end trace d28d8d35cd5a7130 ]--- >> >> [ 201.920053] ------------[ cut here ]------------ >> [ 201.924018] kernel BUG at kernel/rtmutex.c:831! >> [ 201.924018] invalid opcode: 0000 [#2] PREEMPT >> [ 201.924018] last sysfs file: /sys/module/vt/parameters/default_ut= f8 >> [ 201.924018] Modules linked in: evdev usbhid ohci_hcd geode_rng ec= b >> aes_i586 ehci_hcd aes_generic usbcore geode_aes nls_base >> [ 201.924018] >> [ 201.924018] Pid: 6, comm: sirq-net-rx/0 Tainted: G D W >> 2.6.33.6-rt28 #4 SL8/SL8 >> [ 201.924018] EIP: 0060:[] EFLAGS: 00010046 CPU: 0 >> [ 201.924018] EIP is at rt_spin_lock_slowlock+0x35/0x155 >> [ 201.924018] EAX: de420490 EBX: 00000292 ECX: 00000000 EDX: de4204= 90 >> [ 201.924018] ESI: c122ca39 EDI: c1321160 EBP: 00000000 ESP: de44bb= a8 >> [ 201.924018] DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068=20 >> preempt:00000001 >> [ 201.924018] Process sirq-net-rx/0 (pid: 6, ti=3Dde44a000 >> task=3Dde420490 task.ti=3Dde44a000) >> [ 201.924018] Stack: >> [ 201.924018] 00000030 00000046 de44bbd0 c102784a c1003c19 de120c7= c >> de226b3c de40a600 >> [ 201.924018]<0> 00000000 c1002db0 de120c7c 00000000 c1322c40 >> de226b3c c1321160 c122ca39 >> [ 201.924018]<0> de120c64 00000000 c104582b de44bc08 de40e7a0 >> c108d08a de120c7c c108d576 >> [ 201.924018] Call Trace: >> [ 201.924018] [] ? irq_exit+0x28/0x32 >> [ 201.924018] [] ? do_IRQ+0x61/0x71 >> [ 201.924018] [] ? common_interrupt+0x30/0x38 >> [ 201.924018] [] ? rt_spin_lock_slowlock+0x0/0x155 >> [ 201.924018] [] ? rt_spin_lock_fastlock+0x52/0x55 >> [ 201.924018] [] ? _slab_irq_disable+0xd/0x15 >> [ 201.924018] [] ? kmem_cache_free+0x11/0x58 >> [ 201.924018] [] ? destroy_inode+0x1c/0x2b >> [ 201.924018] [] ? iput+0x47/0x49 >> [ 201.924018] [] ? d_kill+0x2d/0x47 >> [ 201.924018] [] ? __shrink_dcache_sb+0x1aa/0x247 >> [ 201.924018] [] ? shrink_dcache_parent+0x26/0xd7 >> [ 201.924018] [] ? proc_flush_task+0x7d/0x165 >> [ 201.924018] [] ? release_task+0x18/0x2af >> [ 201.924018] [] ? do_exit+0x4dd/0x547 >> [ 201.924018] [] ? oops_end+0x7f/0x83 >> [ 201.924018] [] ? no_context+0x10c/0x115 >> [ 201.924018] [] ? do_page_fault+0x0/0x28f >> [ 201.924018] [] ? bad_area_nosemaphore+0xa/0xc >> [ 201.924018] [] ? error_code+0x6b/0x70 >> [ 201.924018] [] ? free_block+0x4f/0xe5 >> [ 201.924018] [] ? __cache_free+0x7a/0xae >> [ 201.924018] [] ? kmem_cache_free+0x1c/0x58 >> [ 201.924018] [] ? tcp_ack+0x3eb/0x12f5 >> [ 201.924018] [] ? tcp_rcv_established+0xb0/0x476 >> [ 201.924018] [] ? tcp_v4_do_rcv+0x129/0x28f >> [ 201.924018] [] ? tcp_v4_rcv+0x339/0x523 >> [ 201.924018] [] ? ip_local_deliver_finish+0xf9/0x160 >> [ 201.924018] [] ? ip_rcv_finish+0x28a/0x29d >> [ 201.924018] [] ? netif_receive_skb+0x1c2/0x1e9 >> [ 201.924018] [] ? e100_poll+0x172/0x37c >> [ 201.924018] [] ? net_rx_action+0x53/0x100 >> [ 201.924018] [] ? run_ksoftirqd+0xfb/0x1da >> [ 201.924018] [] ? run_ksoftirqd+0x0/0x1da >> [ 201.924018] [] ? kthread+0x52/0x57 >> [ 201.924018] [] ? kthread+0x0/0x57 >> [ 201.924018] [] ? kernel_thread_helper+0x6/0x10 >> [ 201.924018] Code: 44 24 2c 00 00 00 00 9c 5b fa b8 01 00 00 00 e8 >> 8d f5 de ff 89 f8 e8 fd 83 e1 ff 8b 47 10 8b 15 d8 02 31 c1 83 e0 fc >> 39 d0 75 04<0f> 0b eb fe 8b 02 e8 e0 82 e1 ff 89 c5 8b 35 d8 02 31 = c1 >> 8b 46 >> [ 201.924018] EIP: [] rt_spin_lock_slowlock+0x35/0x155 >> SS:ESP 0068:de44bba8 >> [ 201.924018] ---[ end trace d28d8d35cd5a7131 ]--- >> [ 201.924018] Fixing recursive fault but reboot is needed! >> [ 202.672902] sched: RT throttling activated >> --=20 >> To unsubscribe from this list: send the line "unsubscribe=20 >> linux-rt-users" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > --=20 Patrice Kadionik. F6KQH / F4CUQ ----------- +----------------------------------------------------------------------= + +"Tout doit etre aussi simple que possible, pas seulement plus simple" = + +----------------------------------------------------------------------= + + Patrice Kadionik http://www.enseirb-matmeca.fr/~kadionik = + + IMS Laboratory http://www.ims-bordeaux.fr/ = + + ENSEIRB-MATMECA http://www.enseirb-matmeca.fr = + + PO BOX 99 fax : +33 5.56.37.20.23 = + + 33402 TALENCE Cedex voice : +33 5.56.84.23.47 = + + FRANCE mailto:patrice.kadionik@ims-bordeaux.fr = + +----------------------------------------------------------------------= + -- To unsubscribe from this list: send the line "unsubscribe linux-rt-user= s" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html