an occational trouble with xfs file system which xfs_repair 2.7.14 has been able to fix

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

* an occational trouble with xfs file system which xfs_repair 2.7.14 has been able to fix
@ 2008-03-10 12:39 Erkki Lintunen
  2008-03-10 13:31 ` Eric Sandeen
  0 siblings, 1 reply; 4+ messages in thread
From: Erkki Lintunen @ 2008-03-10 12:39 UTC (permalink / raw)
  To: xfs

Hi,

can you help me a bit with my troublesome ~700GB xfs filesystem?

The file system has had several dir trees since it was created somewhere 
2004-2005. It has been written to daily since it was created. It has 
been expanded few times with xfs_growfs. It has experienced the same 
symptom already 2-4 times.

The symptom is that one of the dir trees gets locked about once a year. 
It is always the same tree. I can't remember when or what happened when 
the symptom was first experienced. I guess the system had run on 
2.6.17.x kernel once in its lifetime, but xfs_repair ought to fix the 
dir lock problem, at least the latest, doesn't it.

The filesystem is used for backups with rsync, cp -al and rm -fr 
commands in a script. When the trouble begins cp -al command starts to 
take several hours and hundreds of megs memory. rm -fr of a subtree also 
takes considerably longer than rm a subtree in another bigger tree in 
the same filesystem, but the rm commands have always finnished, which 
the cp -al commands haven't. Most of the time the cp -al process has D 
status.

I have mananged to repair the file system with xfs_repair 2.7.14, but 
not with 2.6.20, which comes in Debian Sarge. Now I tried latest 
xfs_repair and it didn't fix the problem - at least on the first run 
without any options.

For example latest backup had to be interrupted and time command showed 
following:

real    1342m7.316s
user    1m4.152s
sys     14m5.109s

I have xfs_metadump of the filesystem right after the interrup. Its size 
is 3.9G uncompressed and 1.6G compressed with bzip2 -9. Now I ran 
xfs_repair 2.7.14 on the file system and wait one day before I'll see 
whether it was capable to fix the problem this time as well.

What else information I could provide in addition to those requested in FAQ?

plastic:~# grep backup-volA /etc/fstab
/dev/vg00/backup-volA   /site/backup-volA       xfs     defaults 
0       1

plastic:~# df -ml /backup/volA/.
Filesystem           1M-blocks      Used Available Use% Mounted on
/site/backup-volA       692688    647328     45361  94% /backup/volA

plastic:~# ./xfs_repair -V
xfs_repair version 2.9.7
plastic:~# /usr/local/sbin/xfs_repair -V
xfs_repair version 2.7.14
plastic:~# /sbin/xfs_repair -V
xfs_repair version 2.6.20

plastic:~# dmesg |tail -n 3
Filesystem "dm-0": Disabling barriers, not supported by the underlying 
device
XFS mounting filesystem dm-0
Ending clean XFS mount for filesystem: dm-0

plastic:~# uname -a
Linux plastic 2.6.24.2-i686-net #1 SMP Tue Feb 12 17:42:16 EET 2008 i686 
GNU/Linux

plastic:~# xfs_info /site/backup-volA
meta-data=/site/backup-volA      isize=256    agcount=39, agsize=4559936 
blks
          =                       sectsz=512
data     =                       bsize=4096   blocks=177360896, imaxpct=25
          =                       sunit=0      swidth=0 blks, unwritten=1
naming   =version 2              bsize=4096
log      =internal               bsize=4096   blocks=32768, version=1
          =                       sectsz=512   sunit=0 blks
realtime =none                   extsz=65536  blocks=0, rtextents=0

# diff between output of xfs_repair 2.9.7 (screenlog.0) and
# xfs_repair 2.7.14 (screenlog.1)
--- screenlog.0	2008-03-10 10:32:13.000000000 +0200
+++ screenlog.1	2008-03-10 14:04:00.000000000 +0200
@@ -1,3 +1,9 @@
+        - scan filesystem freespace and inode maps...
+        - found root inode chunk
+Phase 3 - for each AG...
+        - scan and clear agi unlinked lists...
+        - process known inodes and perform inode discovery...
+        - agno = 0
          - agno = 1
          - agno = 2
          - agno = 3
@@ -39,6 +45,9 @@
          - process newly discovered inodes...
  Phase 4 - check for duplicate blocks...
          - setting up duplicate extent list...
+        - clear lost+found (if it exists) ...
+        - clearing existing "lost+found" inode
+        - marking entry "lost+found" to be deleted
          - check for inodes claiming duplicate blocks...
          - agno = 0
          - agno = 1
@@ -83,103 +92,13 @@
          - reset superblock...
  Phase 6 - check inode connectivity...
          - resetting contents of realtime bitmap and summary inodes
-        - traversing filesystem ...
-        - traversal finished ...
-        - moving disconnected inodes to lost+found ...
+        - ensuring existence of lost+found directory
+        - traversing filesystem starting at / ...
+rebuilding directory inode 128
+        - traversal finished ...
+        - traversing all unattached subtrees ...
+        - traversals finished ...
+        - moving disconnected inodes to lost+found ...
  Phase 7 - verify and correct link counts...

Best regards,
Erkki

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: an occational trouble with xfs file system which xfs_repair 2.7.14 has been able to fix
  2008-03-10 12:39 an occational trouble with xfs file system which xfs_repair 2.7.14 has been able to fix Erkki Lintunen
@ 2008-03-10 13:31 ` Eric Sandeen
  2008-03-11 18:14   ` Erkki Lintunen
  0 siblings, 1 reply; 4+ messages in thread
From: Eric Sandeen @ 2008-03-10 13:31 UTC (permalink / raw)
  To: erkki.lintunen; +Cc: xfs

Erkki Lintunen wrote:
...

> commands in a script. When the trouble begins cp -al command starts to 
> take several hours and hundreds of megs memory. rm -fr of a subtree also 
> takes considerably longer than rm a subtree in another bigger tree in 
> the same filesystem, but the rm commands have always finnished, which 
> the cp -al commands haven't. Most of the time the cp -al process has D 
> status.

...

> What else information I could provide in addition to those requested in FAQ?

When you get a process in the D state, do echo t > /proc/sysrq-trigger
to get backtraces of all processes; or echo w to get all blocked processes.

-Eric

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: an occational trouble with xfs file system which xfs_repair 2.7.14 has been able to fix
  2008-03-10 13:31 ` Eric Sandeen
@ 2008-03-11 18:14   ` Erkki Lintunen
  2008-03-13  9:12     ` Erkki Lintunen
  0 siblings, 1 reply; 4+ messages in thread
From: Erkki Lintunen @ 2008-03-11 18:14 UTC (permalink / raw)
  To: xfs


Hi,

on 10.3.2008 15:31 Eric Sandeen wrote:
> Erkki Lintunen wrote:
>> the cp -al commands haven't. Most of the time the cp -al process has D 
>> status.
> 
>> What else information I could provide in addition to those requested in FAQ?
> 
> When you get a process in the D state, do echo t > /proc/sysrq-trigger
> to get backtraces of all processes; or echo w to get all blocked processes.

Thanks for the tip. Unfortunately I couldn't get my hands onto the 
system before the message below on the console and SysRq rebooting the 
system today.

 From the log the script had stopped to cp -al again and in the same 
tree. My wild guess is that the script shouldn't have had anything to 
talk to network at the time kernel soft lockup nor there isn't any other 
services experiencing network traffic.

I upgraded kernel to 2.6.24.3, ran xfs_repair 2.9.7 on the xfs file 
system and rest the case for next run.

Best regards,
Erkki


BUG: soft lockup - CPU#0 stuck for 11s! [bond0:1207]

Pid: 1207, comm: bond0 Not tainted (2.6.24.2-i686-net #1)
EIP: 0060:[<c0376bf5>] EFLAGS: 00000286 CPU: 0
EIP is at _spin_lock+0x5/0x10
EAX: cf925134 EBX: 00000002 ECX: 00000001 EDX: cf92505c
ESI: cc023d40 EDI: cf9f1c80 EBP: cee70000 ESP: cf655d8c
  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
CR0: 8005003b CR2: b4d2cffc CR3: 0f78b000 CR4: 000006d0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
  [<d0a48d5c>] ad_rx_machine+0x1c/0x3c0 [bonding]
  [<c0227f04>] elv_queue_empty+0x24/0x30
  [<d0925d15>] ide_do_request+0x65/0x360 [ide_core]
  [<d0a4acbf>] bond_3ad_lacpdu_recv+0x9f/0xb0 [bonding]
  [<c02ed7eb>] netif_receive_skb+0x2cb/0x3c0
  [<d087ce80>] e100_rx_indicate+0x100/0x180 [e100]
  [<c012e022>] irq_exit+0x52/0x80
  [<c010679e>] do_IRQ+0x3e/0x80
  [<c0230aa8>] as_put_io_context+0x48/0x70
  [<d087d005>] e100_rx_clean+0x105/0x140 [e100]
  [<d087d282>] e100_poll+0x22/0x80 [e100]
  [<c02edb7d>] net_rx_action+0x18d/0x1d0
  [<d087b09d>] e100_disable_irq+0x3d/0x60 [e100]
  [<d087d22e>] e100_intr+0x8e/0xc0 [e100]
  [<c012df44>] __do_softirq+0xd4/0xf0
  [<c012df98>] do_softirq+0x38/0x40
  [<c012e045>] irq_exit+0x75/0x80
  [<c010679e>] do_IRQ+0x3e/0x80
  [<c0104bd7>] common_interrupt+0x23/0x28
  [<d0a48e16>] ad_rx_machine+0xd6/0x3c0 [bonding]
  [<c01319e7>] lock_timer_base+0x27/0x60
  [<c0131a9e>] __mod_timer+0x7e/0xa0
  [<d0a4a6b4>] bond_3ad_state_machine_handler+0xc4/0x180 [bonding]
  [<d0a44af0>] bond_mii_monitor+0x0/0xc0 [bonding]
  [<d0a4a5f0>] bond_3ad_state_machine_handler+0x0/0x180 [bonding]
  [<c013927b>] run_workqueue+0x5b/0x110
  [<c01393fd>] worker_thread+0xcd/0x100
  [<c013d340>] autoremove_wake_function+0x0/0x50
  [<c0121a4f>] finish_task_switch+0x2f/0x80
  [<c013d340>] autoremove_wake_function+0x0/0x50
  [<c0139330>] worker_thread+0x0/0x100
  [<c013ce1b>] kthread+0x6b/0x70
  [<c013cdb0>] kthread+0x0/0x70
  [<c0104e17>] kernel_thread_helper+0x7/0x10
  =======================

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: an occational trouble with xfs file system which xfs_repair 2.7.14 has been able to fix
  2008-03-11 18:14   ` Erkki Lintunen
@ 2008-03-13  9:12     ` Erkki Lintunen
  0 siblings, 0 replies; 4+ messages in thread
From: Erkki Lintunen @ 2008-03-13  9:12 UTC (permalink / raw)
  To: xfs



on 11.3.2008 20:14 Erkki Lintunen wrote:
> I upgraded kernel to 2.6.24.3, ran xfs_repair 2.9.7 on the xfs file 
> system and rest the case for next run.

FWIW This time either kernel upgrade or xfs_repair 2.9.7 did provide 
fix for the occasional hiccup in the xfs fs.

Erkki

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2008-03-13  9:12 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-03-10 12:39 an occational trouble with xfs file system which xfs_repair 2.7.14 has been able to fix Erkki Lintunen
2008-03-10 13:31 ` Eric Sandeen
2008-03-11 18:14   ` Erkki Lintunen
2008-03-13  9:12     ` Erkki Lintunen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox