From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: with ECARTIS (v1.0.0; list xfs); Tue, 11 Mar 2008 11:14:24 -0700 (PDT) Received: from cuda.sgi.com (cuda2.sgi.com [192.48.168.29]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with ESMTP id m2BIDd6O030370 for ; Tue, 11 Mar 2008 11:13:41 -0700 Received: from gw03.mail.saunalahti.fi (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 51805686BC3 for ; Tue, 11 Mar 2008 11:14:09 -0700 (PDT) Received: from gw03.mail.saunalahti.fi (gw03.mail.saunalahti.fi [195.197.172.111]) by cuda.sgi.com with ESMTP id 8MOq9IKVG8T4hzs5 for ; Tue, 11 Mar 2008 11:14:09 -0700 (PDT) Received: from uunet198.aac.fi (uunet198.aac.fi [193.64.61.198]) by gw03.mail.saunalahti.fi (Postfix) with ESMTP id 74B682169E9 for ; Tue, 11 Mar 2008 20:14:05 +0200 (EET) Message-ID: <47D6CBE9.8090905@iki.fi> Date: Tue, 11 Mar 2008 20:14:01 +0200 From: Erkki Lintunen MIME-Version: 1.0 Subject: Re: an occational trouble with xfs file system which xfs_repair 2.7.14 has been able to fix References: <47D52BE5.6010706@iki.fi> <47D5383E.50201@sandeen.net> In-Reply-To: <47D5383E.50201@sandeen.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com List-Id: xfs To: xfs@oss.sgi.com Hi, on 10.3.2008 15:31 Eric Sandeen wrote: > Erkki Lintunen wrote: >> the cp -al commands haven't. Most of the time the cp -al process has D >> status. > >> What else information I could provide in addition to those requested in FAQ? > > When you get a process in the D state, do echo t > /proc/sysrq-trigger > to get backtraces of all processes; or echo w to get all blocked processes. Thanks for the tip. Unfortunately I couldn't get my hands onto the system before the message below on the console and SysRq rebooting the system today. From the log the script had stopped to cp -al again and in the same tree. My wild guess is that the script shouldn't have had anything to talk to network at the time kernel soft lockup nor there isn't any other services experiencing network traffic. I upgraded kernel to 2.6.24.3, ran xfs_repair 2.9.7 on the xfs file system and rest the case for next run. Best regards, Erkki BUG: soft lockup - CPU#0 stuck for 11s! [bond0:1207] Pid: 1207, comm: bond0 Not tainted (2.6.24.2-i686-net #1) EIP: 0060:[] EFLAGS: 00000286 CPU: 0 EIP is at _spin_lock+0x5/0x10 EAX: cf925134 EBX: 00000002 ECX: 00000001 EDX: cf92505c ESI: cc023d40 EDI: cf9f1c80 EBP: cee70000 ESP: cf655d8c DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 CR0: 8005003b CR2: b4d2cffc CR3: 0f78b000 CR4: 000006d0 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 DR6: ffff0ff0 DR7: 00000400 [] ad_rx_machine+0x1c/0x3c0 [bonding] [] elv_queue_empty+0x24/0x30 [] ide_do_request+0x65/0x360 [ide_core] [] bond_3ad_lacpdu_recv+0x9f/0xb0 [bonding] [] netif_receive_skb+0x2cb/0x3c0 [] e100_rx_indicate+0x100/0x180 [e100] [] irq_exit+0x52/0x80 [] do_IRQ+0x3e/0x80 [] as_put_io_context+0x48/0x70 [] e100_rx_clean+0x105/0x140 [e100] [] e100_poll+0x22/0x80 [e100] [] net_rx_action+0x18d/0x1d0 [] e100_disable_irq+0x3d/0x60 [e100] [] e100_intr+0x8e/0xc0 [e100] [] __do_softirq+0xd4/0xf0 [] do_softirq+0x38/0x40 [] irq_exit+0x75/0x80 [] do_IRQ+0x3e/0x80 [] common_interrupt+0x23/0x28 [] ad_rx_machine+0xd6/0x3c0 [bonding] [] lock_timer_base+0x27/0x60 [] __mod_timer+0x7e/0xa0 [] bond_3ad_state_machine_handler+0xc4/0x180 [bonding] [] bond_mii_monitor+0x0/0xc0 [bonding] [] bond_3ad_state_machine_handler+0x0/0x180 [bonding] [] run_workqueue+0x5b/0x110 [] worker_thread+0xcd/0x100 [] autoremove_wake_function+0x0/0x50 [] finish_task_switch+0x2f/0x80 [] autoremove_wake_function+0x0/0x50 [] worker_thread+0x0/0x100 [] kthread+0x6b/0x70 [] kthread+0x0/0x70 [] kernel_thread_helper+0x7/0x10 =======================