From: Jay Ashworth <jra@baylink.com>
To: xfs@oss.sgi.com
Subject: XFS recovery resumes...
Date: Sun, 18 Aug 2013 17:38:56 -0400 (EDT) [thread overview]
Message-ID: <11558272.4016.1376861936621.JavaMail.root@benjamin.baylink.com> (raw)
In-Reply-To: <21672216.3390.1376260599697.JavaMail.root@benjamin.baylink.com>
I'm trying to dedupe the two large XFS filesystems on which I have DVR
recordings, so that I can walk around amongst the available HDDs and create
new filesystems under everything.
Every time I rm a file, the filesystem blows up, and the driver shuts it
down.
Some background:
At the moment, I have 2 devices, /dev/sdd1 mounted on /appl/media4, and
/dev/sda1 mounted on /appl/media5, and a large script, created by hand-
hacking the output of a perl dupe finder script.
The large script was mangled so that it would remove anything that was a
dupe from media4, unless the file was an unlabeled lost+found on media5,
and had a name on media4. In that case, I removed the file on media5, and
then moved it from media4 to media5.
After the hand-hacking on the script, I sorted it to do all the rm's first,
and then all the mv's, to make sure free space when up before it went down.
And, of course, when I ran the script, it caused the XFS driver to cough and
die, leading to error 5s and gnashing of teeth.
I unmounted media5, remounted it (which worked), and unmounted it again to
run xfs_repair -n. That found one inode that was pointing somewhere bogus
(and I apologize that I can't copy that in; I was running under screen, and
it doesn't cooperate with scrollback well). I ran an xfs_repair without -n,
and it found and fixed the one error without complaint.
I mounted and unmounted it successfully (nothing notable in dmesg), and reran
xfs_repair -n, which, this time, ran without any problems reported.
So I remounted the filesystem, and again tried to run the script.
And again, it tripped something, and the filesystem unmounted, and here's the
dmesg output from the first and second trips:
First time:
[169324.654803] XFS (sdd1): Ending clean mount
[1278872.471310] ccbc0000: 41 42 54 42 00 00 00 04 df ff ff ff ff ff ff ff ABTB............
[1278872.471324] XFS (sda1): Internal error xfs_btree_check_sblock at line 119 of file /home/abuild/rpmbuild/BUI
LD/kernel-default-3.4.47/linux-3.4/fs/xfs/xfs_btree.c. Caller 0xe3caf3a5
[1278872.471328]
[1278872.471334] Pid: 16696, comm: rm Not tainted 3.4.47-2.38-default #1
[1278872.471338] Call Trace:
[1278872.471368] [<c0205349>] try_stack_unwind+0x199/0x1b0
[1278872.471382] [<c02041c7>] dump_trace+0x47/0xf0
[1278872.471391] [<c02053ab>] show_trace_log_lvl+0x4b/0x60
[1278872.471398] [<c02053d8>] show_trace+0x18/0x20
[1278872.471409] [<c06825ba>] dump_stack+0x6d/0x72
[1278872.471534] [<e3c826ed>] xfs_corruption_error+0x5d/0x90 [xfs]
[1278872.471650] [<e3cae9f4>] xfs_btree_check_sblock+0x74/0x100 [xfs]
[1278872.471834] [<e3caf3a5>] xfs_btree_read_buf_block.constprop.24+0x95/0xb0 [xfs]
[1278872.472007] [<e3caf423>] xfs_btree_lookup_get_block+0x63/0xc0 [xfs]
[1278872.472207] [<e3cb251a>] xfs_btree_lookup+0x9a/0x460 [xfs]
[1278872.472379] [<e3c9576a>] xfs_alloc_fixup_trees+0x27a/0x370 [xfs]
[1278872.472510] [<e3c97b63>] xfs_alloc_ag_vextent_size+0x523/0x670 [xfs]
[1278872.472647] [<e3c9874f>] xfs_alloc_ag_vextent+0x9f/0x100 [xfs]
[1278872.472781] [<e3c9899a>] xfs_alloc_fix_freelist+0x1ea/0x450 [xfs]
[1278872.472915] [<e3c98cd5>] xfs_free_extent+0xd5/0x160 [xfs]
[1278872.473052] [<e3ca9f4e>] xfs_bmap_finish+0x15e/0x1b0 [xfs]
[1278872.473214] [<e3cc47e9>] xfs_itruncate_extents+0x159/0x2f0 [xfs]
[1278872.473422] [<e3c92ff5>] xfs_inactive+0x335/0x4a0 [xfs]
[1278872.473516] [<c0337e84>] evict+0x84/0x150
[1278872.473530] [<c032ea22>] do_unlinkat+0x102/0x160
[1278872.473546] [<c069331c>] sysenter_do_call+0x12/0x28
[1278872.473578] [<b779b430>] 0xb779b42f
[1278872.473583] XFS (sda1): Corruption detected. Unmount and run xfs_repair
[1278872.473599] XFS (sda1): xfs_do_force_shutdown(0x8) called from line 3732 of file /home/abuild/rpmbuild/BUIL
D/kernel-default-3.4.47/linux-3.4/fs/xfs/xfs_bmap.c. Return address = 0xe3ca9f8c
[1278872.584543] XFS (sda1): Corruption of in-memory data detected. Shutting down filesystem
[1278872.584555] XFS (sda1): Please umount the filesystem and rectify the problem(s)
[1278881.888038] XFS (sda1): xfs_log_force: error 5 returned.
[1278911.968046] XFS (sda1): xfs_log_force: error 5 returned.
[1278942.048037] XFS (sda1): xfs_log_force: error 5 returned.
[1278972.128049] XFS (sda1): xfs_log_force: error 5 returned.
[1279002.208042] XFS (sda1): xfs_log_force: error 5 returned.
[1279028.046331] XFS (sda1): xfs_log_force: error 5 returned.
[1279028.046349] XFS (sda1): xfs_do_force_shutdown(0x1) called from line 1031 of file /home/abuild/rpmbuild/BUIL
D/kernel-default-3.4.47/linux-3.4/fs/xfs/xfs_buf.c. Return address = 0xe3c813c0
[1279028.060676] XFS (sda1): xfs_log_force: error 5 returned.
[1279028.067532] XFS (sda1): xfs_log_force: error 5 returned.
Here's me mounting and umounting, with the xfs_repair runs in the middle:
[1279032.147391] XFS (sda1): Mounting Filesystem
[1279032.305924] XFS (sda1): Starting recovery (logdev: internal)
[1279035.263630] XFS (sda1): Ending recovery (logdev: internal)
[1279238.566041] XFS (sda1): Mounting Filesystem
[1279238.713051] XFS (sda1): Ending clean mount
[1279286.829764] XFS (sda1): Mounting Filesystem
[1279286.982409] XFS (sda1): Ending clean mount
[1279368.607644] XFS (sda1): Mounting Filesystem
[1279368.755048] XFS (sda1): Ending clean mount
Second time:
[1279388.664986] c1516000: 41 42 54 43 00 00 00 04 df ff ff ff ff ff ff ff ABTC............
[1279388.665000] XFS (sda1): Internal error xfs_btree_check_sblock at line 119 of file /home/abuild/rpmbuild/BUI
LD/kernel-default-3.4.47/linux-3.4/fs/xfs/xfs_btree.c. Caller 0xe3caf3a5
[1279388.665004]
[1279388.665010] Pid: 18452, comm: rm Not tainted 3.4.47-2.38-default #1
[1279388.665015] Call Trace:
[1279388.665045] [<c0205349>] try_stack_unwind+0x199/0x1b0
[1279388.665058] [<c02041c7>] dump_trace+0x47/0xf0
[1279388.665067] [<c02053ab>] show_trace_log_lvl+0x4b/0x60
[1279388.665075] [<c02053d8>] show_trace+0x18/0x20
[1279388.665086] [<c06825ba>] dump_stack+0x6d/0x72
[1279388.665211] [<e3c826ed>] xfs_corruption_error+0x5d/0x90 [xfs]
[1279388.665327] [<e3cae9f4>] xfs_btree_check_sblock+0x74/0x100 [xfs]
[1279388.665511] [<e3caf3a5>] xfs_btree_read_buf_block.constprop.24+0x95/0xb0 [xfs]
[1279388.665684] [<e3caf423>] xfs_btree_lookup_get_block+0x63/0xc0 [xfs]
[1279388.665856] [<e3cb251a>] xfs_btree_lookup+0x9a/0x460 [xfs]
[1279388.666029] [<e3c97691>] xfs_alloc_ag_vextent_size+0x51/0x670 [xfs]
[1279388.666163] [<e3c9874f>] xfs_alloc_ag_vextent+0x9f/0x100 [xfs]
[1279388.666298] [<e3c9899a>] xfs_alloc_fix_freelist+0x1ea/0x450 [xfs]
[1279388.666433] [<e3c98cd5>] xfs_free_extent+0xd5/0x160 [xfs]
[1279388.666571] [<e3ca9f4e>] xfs_bmap_finish+0x15e/0x1b0 [xfs]
[1279388.666734] [<e3cc47e9>] xfs_itruncate_extents+0x159/0x2f0 [xfs]
[1279388.666944] [<e3c92ff5>] xfs_inactive+0x335/0x4a0 [xfs]
[1279388.667039] [<c0337e84>] evict+0x84/0x150
[1279388.667053] [<c032ea22>] do_unlinkat+0x102/0x160
[1279388.667069] [<c069331c>] sysenter_do_call+0x12/0x28
[1279388.667100] [<b772f430>] 0xb772f42f
[1279388.667105] XFS (sda1): Corruption detected. Unmount and run xfs_repair
[1279388.667120] XFS (sda1): xfs_do_force_shutdown(0x8) called from line 3732 of file /home/abuild/rpmbuild/BUIL
D/kernel-default-3.4.47/linux-3.4/fs/xfs/xfs_bmap.c. Return address = 0xe3ca9f8c
[1279388.690497] XFS (sda1): Corruption of in-memory data detected. Shutting down filesystem
[1279388.690506] XFS (sda1): Please umount the filesystem and rectify the problem(s)
[1279398.816060] XFS (sda1): xfs_log_force: error 5 returned.
[1279428.832065] XFS (sda1): xfs_log_force: error 5 returned.
[ ... ]
It's not entirely clear to me whether this problem is specific inodes that
are corrupt or not, or just something in the filesystem header.
Kernel:
Linux duckling 3.4.47-2.38-default #1 SMP Fri May 31 20:17:40 UTC 2013 (3961086) i686 athlon i386 GNU/Linux
progs:
xfsprogs-3.1.6-9.1.2.i586
Worst case, if I can't get these to behave, I'll just beg, borrow or steal
a spare 3T and copy everything to it, and then redo the FSs on these 2
drives, but it would a bit easier if I could get them to settle down a
bit...
Anyone have any suggestions as to which mole I should whack next?
[ ... ]
Built xfsprogs 3.1.11 from GIT, and ran it, and on /appl/media4, /dev/sda1:
============
duckling:/appl/downloads/xfsprogs # xfs_repair /dev/sda1
Phase 1 - find and verify superblock...
Not enough RAM available for repair to enable prefetching.
This will be _slow_.
You need at least 497MB RAM to run with prefetching enabled.
Phase 2 - using internal log
- zero log...
- scan filesystem freespace and inode maps...
ir_freecount/free mismatch, inode chunk 2/128, freecount 62 nfree 61
ir_freecount/free mismatch, inode chunk 3/128, freecount 36 nfree 35
xfs_allocbt_read_verify: XFS_CORRUPTION_ERROR
xfs_allocbt_read_verify: XFS_CORRUPTION_ERROR
xfs_allocbt_read_verify: XFS_CORRUPTION_ERROR
xfs_allocbt_read_verify: XFS_CORRUPTION_ERROR
xfs_allocbt_read_verify: XFS_CORRUPTION_ERROR
- found root inode chunk
Phase 3 - for each AG...
- scan and clear agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
- agno = 2
imap claims a free inode 1073742013 is in use, correcting imap and clearing inode
cleared inode 1073742013
- agno = 3
imap claims a free inode 1610612893 is in use, correcting imap and clearing inode
cleared inode 1610612893
- agno = 4
- agno = 5
- agno = 6
- agno = 7
- agno = 8
- agno = 9
- agno = 10
- agno = 11
- agno = 12
- agno = 13
- agno = 14
- agno = 15
- agno = 16
- agno = 17
- agno = 18
- agno = 19
- agno = 20
- agno = 21
- agno = 22
- agno = 23
- agno = 24
- agno = 25
- agno = 26
- agno = 27
- agno = 28
- agno = 29
- agno = 30
- agno = 31
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- agno = 4
- agno = 5
- agno = 6
- agno = 7
- agno = 8
- agno = 9
- agno = 10
- agno = 11
- agno = 12
- agno = 13
- agno = 14
- agno = 15
- agno = 16
- agno = 17
- agno = 18
- agno = 19
- agno = 20
- agno = 21
- agno = 22
- agno = 23
- agno = 24
- agno = 25
- agno = 26
- agno = 27
- agno = 28
- agno = 29
- agno = 30
- agno = 31
Phase 5 - rebuild AG headers and trees...
- reset superblock...
Phase 6 - check inode connectivity...
- resetting contents of realtime bitmap and summary inodes
- traversing filesystem ...
__read_verify: XFS_CORRUPTION_ERROR
can't read leaf block 8388608 for directory inode 128
rebuilding directory inode 128
name create failed in ino 128 (117), filesystem may be out of space
- traversal finished ...
- moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
done
============
It's not clear to me whether that actually fixed anything or not, but
I think I'm going to put off a second run, or a run on the other FS
which threw more CORRUPTION errors in a later stage, until I have a
better idea what's going on...
Cheers,
-- jra
--
Jay R. Ashworth Baylink jra@baylink.com
Designer The Things I Think RFC 2100
Ashworth & Associates http://baylink.pitas.com 2000 Land Rover DII
St Petersburg FL USA #natog +1 727 647 1274
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2013-08-18 21:39 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <29874428.3384.1376259762936.JavaMail.root@benjamin.baylink.com>
2013-08-11 22:36 ` XFS recovery resumes Jay Ashworth
2013-08-18 21:38 ` Jay Ashworth [this message]
2013-08-18 21:51 ` Joe Landman
2013-08-18 22:11 ` Jay Ashworth
2013-08-18 22:57 ` Joe Landman
2013-08-18 23:21 ` Jay Ashworth
2013-08-18 22:06 ` Stan Hoeppner
2013-08-19 3:55 ` Jay Ashworth
2013-08-19 6:47 ` Stan Hoeppner
2013-08-24 23:43 ` Jay Ashworth
2013-08-25 3:44 ` Stan Hoeppner
2013-08-25 15:29 ` Jay Ashworth
2013-08-25 17:45 ` Stan Hoeppner
2013-08-25 20:27 ` Jay Ashworth
2013-08-26 5:45 ` Stan Hoeppner
2013-08-26 15:42 ` Jay Ashworth
2013-08-24 23:48 ` Default mkfs parms for my DVR drive Jay Ashworth
2013-08-25 0:00 ` Joe Landman
2013-08-25 0:41 ` Jay Ashworth
2013-08-25 3:41 ` Jay Ashworth
2013-08-22 9:16 ` XFS recovery resumes Stefan Ring
2013-08-27 23:59 ` Dave Chinner
2013-08-28 0:19 ` Jay Ashworth
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=11558272.4016.1376861936621.JavaMail.root@benjamin.baylink.com \
--to=jra@baylink.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.