* an occational trouble with xfs file system which xfs_repair 2.7.14 has been able to fix
@ 2008-03-10 12:39 Erkki Lintunen
2008-03-10 13:31 ` Eric Sandeen
0 siblings, 1 reply; 4+ messages in thread
From: Erkki Lintunen @ 2008-03-10 12:39 UTC (permalink / raw)
To: xfs
Hi,
can you help me a bit with my troublesome ~700GB xfs filesystem?
The file system has had several dir trees since it was created somewhere
2004-2005. It has been written to daily since it was created. It has
been expanded few times with xfs_growfs. It has experienced the same
symptom already 2-4 times.
The symptom is that one of the dir trees gets locked about once a year.
It is always the same tree. I can't remember when or what happened when
the symptom was first experienced. I guess the system had run on
2.6.17.x kernel once in its lifetime, but xfs_repair ought to fix the
dir lock problem, at least the latest, doesn't it.
The filesystem is used for backups with rsync, cp -al and rm -fr
commands in a script. When the trouble begins cp -al command starts to
take several hours and hundreds of megs memory. rm -fr of a subtree also
takes considerably longer than rm a subtree in another bigger tree in
the same filesystem, but the rm commands have always finnished, which
the cp -al commands haven't. Most of the time the cp -al process has D
status.
I have mananged to repair the file system with xfs_repair 2.7.14, but
not with 2.6.20, which comes in Debian Sarge. Now I tried latest
xfs_repair and it didn't fix the problem - at least on the first run
without any options.
For example latest backup had to be interrupted and time command showed
following:
real 1342m7.316s
user 1m4.152s
sys 14m5.109s
I have xfs_metadump of the filesystem right after the interrup. Its size
is 3.9G uncompressed and 1.6G compressed with bzip2 -9. Now I ran
xfs_repair 2.7.14 on the file system and wait one day before I'll see
whether it was capable to fix the problem this time as well.
What else information I could provide in addition to those requested in FAQ?
plastic:~# grep backup-volA /etc/fstab
/dev/vg00/backup-volA /site/backup-volA xfs defaults
0 1
plastic:~# df -ml /backup/volA/.
Filesystem 1M-blocks Used Available Use% Mounted on
/site/backup-volA 692688 647328 45361 94% /backup/volA
plastic:~# ./xfs_repair -V
xfs_repair version 2.9.7
plastic:~# /usr/local/sbin/xfs_repair -V
xfs_repair version 2.7.14
plastic:~# /sbin/xfs_repair -V
xfs_repair version 2.6.20
plastic:~# dmesg |tail -n 3
Filesystem "dm-0": Disabling barriers, not supported by the underlying
device
XFS mounting filesystem dm-0
Ending clean XFS mount for filesystem: dm-0
plastic:~# uname -a
Linux plastic 2.6.24.2-i686-net #1 SMP Tue Feb 12 17:42:16 EET 2008 i686
GNU/Linux
plastic:~# xfs_info /site/backup-volA
meta-data=/site/backup-volA isize=256 agcount=39, agsize=4559936
blks
= sectsz=512
data = bsize=4096 blocks=177360896, imaxpct=25
= sunit=0 swidth=0 blks, unwritten=1
naming =version 2 bsize=4096
log =internal bsize=4096 blocks=32768, version=1
= sectsz=512 sunit=0 blks
realtime =none extsz=65536 blocks=0, rtextents=0
# diff between output of xfs_repair 2.9.7 (screenlog.0) and
# xfs_repair 2.7.14 (screenlog.1)
--- screenlog.0 2008-03-10 10:32:13.000000000 +0200
+++ screenlog.1 2008-03-10 14:04:00.000000000 +0200
@@ -1,3 +1,9 @@
+ - scan filesystem freespace and inode maps...
+ - found root inode chunk
+Phase 3 - for each AG...
+ - scan and clear agi unlinked lists...
+ - process known inodes and perform inode discovery...
+ - agno = 0
- agno = 1
- agno = 2
- agno = 3
@@ -39,6 +45,9 @@
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
+ - clear lost+found (if it exists) ...
+ - clearing existing "lost+found" inode
+ - marking entry "lost+found" to be deleted
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 1
@@ -83,103 +92,13 @@
- reset superblock...
Phase 6 - check inode connectivity...
- resetting contents of realtime bitmap and summary inodes
- - traversing filesystem ...
- - traversal finished ...
- - moving disconnected inodes to lost+found ...
+ - ensuring existence of lost+found directory
+ - traversing filesystem starting at / ...
+rebuilding directory inode 128
+ - traversal finished ...
+ - traversing all unattached subtrees ...
+ - traversals finished ...
+ - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
Best regards,
Erkki
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: an occational trouble with xfs file system which xfs_repair 2.7.14 has been able to fix
2008-03-10 12:39 an occational trouble with xfs file system which xfs_repair 2.7.14 has been able to fix Erkki Lintunen
@ 2008-03-10 13:31 ` Eric Sandeen
2008-03-11 18:14 ` Erkki Lintunen
0 siblings, 1 reply; 4+ messages in thread
From: Eric Sandeen @ 2008-03-10 13:31 UTC (permalink / raw)
To: erkki.lintunen; +Cc: xfs
Erkki Lintunen wrote:
...
> commands in a script. When the trouble begins cp -al command starts to
> take several hours and hundreds of megs memory. rm -fr of a subtree also
> takes considerably longer than rm a subtree in another bigger tree in
> the same filesystem, but the rm commands have always finnished, which
> the cp -al commands haven't. Most of the time the cp -al process has D
> status.
...
> What else information I could provide in addition to those requested in FAQ?
When you get a process in the D state, do echo t > /proc/sysrq-trigger
to get backtraces of all processes; or echo w to get all blocked processes.
-Eric
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: an occational trouble with xfs file system which xfs_repair 2.7.14 has been able to fix
2008-03-10 13:31 ` Eric Sandeen
@ 2008-03-11 18:14 ` Erkki Lintunen
2008-03-13 9:12 ` Erkki Lintunen
0 siblings, 1 reply; 4+ messages in thread
From: Erkki Lintunen @ 2008-03-11 18:14 UTC (permalink / raw)
To: xfs
Hi,
on 10.3.2008 15:31 Eric Sandeen wrote:
> Erkki Lintunen wrote:
>> the cp -al commands haven't. Most of the time the cp -al process has D
>> status.
>
>> What else information I could provide in addition to those requested in FAQ?
>
> When you get a process in the D state, do echo t > /proc/sysrq-trigger
> to get backtraces of all processes; or echo w to get all blocked processes.
Thanks for the tip. Unfortunately I couldn't get my hands onto the
system before the message below on the console and SysRq rebooting the
system today.
From the log the script had stopped to cp -al again and in the same
tree. My wild guess is that the script shouldn't have had anything to
talk to network at the time kernel soft lockup nor there isn't any other
services experiencing network traffic.
I upgraded kernel to 2.6.24.3, ran xfs_repair 2.9.7 on the xfs file
system and rest the case for next run.
Best regards,
Erkki
BUG: soft lockup - CPU#0 stuck for 11s! [bond0:1207]
Pid: 1207, comm: bond0 Not tainted (2.6.24.2-i686-net #1)
EIP: 0060:[<c0376bf5>] EFLAGS: 00000286 CPU: 0
EIP is at _spin_lock+0x5/0x10
EAX: cf925134 EBX: 00000002 ECX: 00000001 EDX: cf92505c
ESI: cc023d40 EDI: cf9f1c80 EBP: cee70000 ESP: cf655d8c
DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
CR0: 8005003b CR2: b4d2cffc CR3: 0f78b000 CR4: 000006d0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
[<d0a48d5c>] ad_rx_machine+0x1c/0x3c0 [bonding]
[<c0227f04>] elv_queue_empty+0x24/0x30
[<d0925d15>] ide_do_request+0x65/0x360 [ide_core]
[<d0a4acbf>] bond_3ad_lacpdu_recv+0x9f/0xb0 [bonding]
[<c02ed7eb>] netif_receive_skb+0x2cb/0x3c0
[<d087ce80>] e100_rx_indicate+0x100/0x180 [e100]
[<c012e022>] irq_exit+0x52/0x80
[<c010679e>] do_IRQ+0x3e/0x80
[<c0230aa8>] as_put_io_context+0x48/0x70
[<d087d005>] e100_rx_clean+0x105/0x140 [e100]
[<d087d282>] e100_poll+0x22/0x80 [e100]
[<c02edb7d>] net_rx_action+0x18d/0x1d0
[<d087b09d>] e100_disable_irq+0x3d/0x60 [e100]
[<d087d22e>] e100_intr+0x8e/0xc0 [e100]
[<c012df44>] __do_softirq+0xd4/0xf0
[<c012df98>] do_softirq+0x38/0x40
[<c012e045>] irq_exit+0x75/0x80
[<c010679e>] do_IRQ+0x3e/0x80
[<c0104bd7>] common_interrupt+0x23/0x28
[<d0a48e16>] ad_rx_machine+0xd6/0x3c0 [bonding]
[<c01319e7>] lock_timer_base+0x27/0x60
[<c0131a9e>] __mod_timer+0x7e/0xa0
[<d0a4a6b4>] bond_3ad_state_machine_handler+0xc4/0x180 [bonding]
[<d0a44af0>] bond_mii_monitor+0x0/0xc0 [bonding]
[<d0a4a5f0>] bond_3ad_state_machine_handler+0x0/0x180 [bonding]
[<c013927b>] run_workqueue+0x5b/0x110
[<c01393fd>] worker_thread+0xcd/0x100
[<c013d340>] autoremove_wake_function+0x0/0x50
[<c0121a4f>] finish_task_switch+0x2f/0x80
[<c013d340>] autoremove_wake_function+0x0/0x50
[<c0139330>] worker_thread+0x0/0x100
[<c013ce1b>] kthread+0x6b/0x70
[<c013cdb0>] kthread+0x0/0x70
[<c0104e17>] kernel_thread_helper+0x7/0x10
=======================
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: an occational trouble with xfs file system which xfs_repair 2.7.14 has been able to fix
2008-03-11 18:14 ` Erkki Lintunen
@ 2008-03-13 9:12 ` Erkki Lintunen
0 siblings, 0 replies; 4+ messages in thread
From: Erkki Lintunen @ 2008-03-13 9:12 UTC (permalink / raw)
To: xfs
on 11.3.2008 20:14 Erkki Lintunen wrote:
> I upgraded kernel to 2.6.24.3, ran xfs_repair 2.9.7 on the xfs file
> system and rest the case for next run.
FWIW This time either kernel upgrade or xfs_repair 2.9.7 did provide
fix for the occasional hiccup in the xfs fs.
Erkki
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2008-03-13 9:12 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-03-10 12:39 an occational trouble with xfs file system which xfs_repair 2.7.14 has been able to fix Erkki Lintunen
2008-03-10 13:31 ` Eric Sandeen
2008-03-11 18:14 ` Erkki Lintunen
2008-03-13 9:12 ` Erkki Lintunen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox