From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Morton Subject: Re: [Bugme-new] [Bug 6682] New: BUG: soft lockup detected on CPU#0! / ksoftirqd takse 100% CPU Date: Mon, 19 Jun 2006 15:20:10 -0700 Message-ID: <20060619152010.5c648b97.akpm@osdl.org> References: <200606122311.k5CNBMEx007518@fire-2.osdl.org> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: Dipankar Sarma , "Paul E. McKenney" , "bugme-daemon@kernel-bugs.osdl.org" , alchemyx@uznam.net.pl Return-path: Received: from smtp.osdl.org ([65.172.181.4]:9410 "EHLO smtp.osdl.org") by vger.kernel.org with ESMTP id S964950AbWFSWR2 (ORCPT ); Mon, 19 Jun 2006 18:17:28 -0400 To: netdev@vger.kernel.org In-Reply-To: <200606122311.k5CNBMEx007518@fire-2.osdl.org> Sender: netdev-owner@vger.kernel.org List-Id: netdev.vger.kernel.org bugme-daemon@bugzilla.kernel.org wrote: > > http://bugzilla.kernel.org/show_bug.cgi?id=6682 > > Summary: BUG: soft lockup detected on CPU#0! / ksoftirqd takse > 100% CPU > Kernel Version: 2.6.15.6 > Status: NEW > Severity: normal > Owner: acme@conectiva.com.br > Submitter: alchemyx@uznam.net.pl > > > Most recent kernel where this bug did not occur: (unknown) > Distribution: Gentoo > Hardware Environment: 2x Xeon 2.66, 1 GB RAM, NICS: 2 x e1000, and one double > port e100. Based on Intel E7501 architecture (2U rack Intel chassis). > Software Environment: quagga 0.98.6 > Problem Description: ksoftirqd/0 takes 100% of CPU. further investigation shows > no sing of network flood or something (and also 2 of 3 NICs are e1000 with > NAPI). Ocassionaly there are "BUG: soft lockup detected on CPU#0!". > > > Steps to reproduce: > > There is no simple way to reproduce. I think that everythint started when we > attached second provider with BGP support. We are using quagga which injects > about 186 000 routes into kernel. When running for a while (at least few hours, > sometimes a day) we get 100% usage on ksoftirqd/0 and following messages in logs: > > BUG: soft lockup detected on CPU#0! > > Pid: 6506, comm: zebra > EIP: 0060:[] CPU: 0 > EIP is at _spin_lock+0x7/0xf > EFLAGS: 00000286 Not tainted (2.6.15.6) > EAX: f6203180 EBX: e6fbf000 ECX: 00000000 EDX: f6bec000 > ESI: f6203000 EDI: eddb4b80 EBP: fffffff4 DS: 007b ES: 007b > CR0: 8005003b CR2: aca6dff0 CR3: 361ad000 CR4: 000006d0 > [] dev_queue_xmit+0xe0/0x203 > [] ip_output+0x1e1/0x237 > [] ip_forward+0x181/0x1df > [] ip_rcv+0x40c/0x485 > [] netif_receive_skb+0x12f/0x165 > [] e1000_clean_rx_irq+0x389/0x410 [e1000] > [] e1000_clean+0x94/0x12f [e1000] > [] net_rx_action+0x69/0xf0 > [] __do_softirq+0x55/0xbd > [] do_softirq+0x2d/0x31 > [] local_bh_enable+0x5a/0x65 > [] rt_run_flush+0x5f/0x80 > [] fn_hash_insert+0x352/0x39f > [] inet_rtm_newroute+0x57/0x62 > [] rtnetlink_rcv_msg+0x1a8/0x1cb > [] rtnetlink_rcv_msg+0x0/0x1cb > [] netlink_rcv_skb+0x3a/0x8b > [] netlink_run_queue+0x42/0xc3 > [] rtnetlink_rcv_msg+0x0/0x1cb > [] rtnetlink_rcv_msg+0x0/0x1cb > [] rtnetlink_rcv+0x22/0x40 > [] rtnetlink_rcv_msg+0x0/0x1cb > [] netlink_data_ready+0x17/0x54 > [] netlink_sendskb+0x1f/0x39 > [] netlink_sendmsg+0x27b/0x28c > [] sock_sendmsg+0xce/0xe9 > [] __wake_up+0x27/0x3b > [] copy_to_user+0x38/0x42 > [] copy_from_user+0x3a/0x60 > [] copy_from_user+0x3a/0x60 > [] autoremove_wake_function+0x0/0x3a > [] verify_iovec+0x49/0x7f > [] sys_sendmsg+0x152/0x1a8 > [] do_sync_read+0xb8/0xeb > [] copy_to_user+0x38/0x42 > [] autoremove_wake_function+0x0/0x3a > [] getrusage+0x34/0x43 > [] inotify_dentry_parent_queue_event+0x29/0x7c > [] copy_from_user+0x3a/0x60 > [] sys_socketcall+0x167/0x180 > [] sysenter_past_esp+0x54/0x75 > > BUG: soft lockup detected on CPU#0! > > Pid: 6506, comm: zebra > EIP: 0060:[] CPU: 0 > EIP is at u32_classify+0x52/0x170 [cls_u32] > EFLAGS: 00000206 Not tainted (2.6.15.6) > EAX: e2fbd020 EBX: f48649c0 ECX: 00000010 EDX: 29b09d5a > ESI: f48649ec EDI: 00000001 EBP: e2fbd020 DS: 007b ES: 007b > CR0: 8005003b CR2: 08154004 CR3: 361ad000 CR4: 000006d0 > [] ipt_do_table+0x2de/0x2fd [ip_tables] > [] ip_nat_fn+0x177/0x185 [iptable_nat] > [] ip_refrag+0x23/0x5f [ip_conntrack] > [] tc_classify+0x2c/0x3f > [] htb_classify+0x14b/0x1dd [sch_htb] > [] htb_enqueue+0x1d/0x13a [sch_htb] > [] dev_queue_xmit+0xe4/0x203 > [] ip_output+0x1e1/0x237 > [] ip_forward+0x181/0x1df > [] ip_rcv+0x40c/0x485 > [] netif_receive_skb+0x12f/0x165 > [] e1000_clean_rx_irq+0x389/0x410 [e1000] > [] e1000_clean+0x94/0x12f [e1000] > [] net_rx_action+0x69/0xf0 > [] __do_softirq+0x55/0xbd > [] do_softirq+0x2d/0x31 > [] local_bh_enable+0x5a/0x65 > [] rt_run_flush+0x5f/0x80 > [] fn_hash_insert+0x352/0x39f > [] inet_rtm_newroute+0x57/0x62 > [] rtnetlink_rcv_msg+0x1a8/0x1cb > [] rtnetlink_rcv_msg+0x0/0x1cb > [] netlink_rcv_skb+0x3a/0x8b > [] netlink_run_queue+0x42/0xc3 > [] rtnetlink_rcv_msg+0x0/0x1cb > [] rtnetlink_rcv_msg+0x0/0x1cb > [] rtnetlink_rcv+0x22/0x40 > [] rtnetlink_rcv_msg+0x0/0x1cb > [] netlink_data_ready+0x17/0x54 > [] netlink_sendskb+0x1f/0x39 > [] netlink_sendmsg+0x27b/0x28c > [] sock_sendmsg+0xce/0xe9 > [] __wake_up+0x27/0x3b > [] copy_from_user+0x3a/0x60 > [] copy_from_user+0x3a/0x60 > [] autoremove_wake_function+0x0/0x3a > [] verify_iovec+0x49/0x7f > [] sys_sendmsg+0x152/0x1a8 > [] do_sync_read+0xb8/0xeb > [] getrusage+0x34/0x43 > [] update_wall_time+0xa/0x32 > [] do_timer+0x33/0xa9 > [] copy_from_user+0x3a/0x60 > [] sys_socketcall+0x167/0x180 > [] sysenter_past_esp+0x54/0x75 > > It happens on 2.6.15.6. Tonight I will try 2.6.16.16 with FIB_TRIE instead of > FIB_HASH. I am submitting that bug under network but I am not completly sure if > it belongs here. This also happens on 2.6.16.6.