linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* ext4 corruption
@ 2011-06-06  3:59 Micah Anderson
  2011-06-06  4:19 ` Ted Ts'o
  0 siblings, 1 reply; 13+ messages in thread
From: Micah Anderson @ 2011-06-06  3:59 UTC (permalink / raw)
  To: linux-ext4

[-- Attachment #1: Type: text/plain, Size: 4291 bytes --]


I previously wrote about a recent conversion from ext3 to ext4 (on
Debian Squeeze), which went well. However, I seem to be having problems
with the ext4 filesystem.

Yesterday, there was a file in /var/spool/postfix/defer that was giving
an i/o error:

Jun  3 15:00:14 willet postfix/qmgr[29108]: fatal: qmgr_message_alloc:
677AE298316F: remove defer 677AE298316F: Input/output error

If I tried to stat it, it would give the same error. I noticed on the
console, I was getting a lot of these:

[6060479.296658] EXT4-fs error (device dm-4): ext4_lookup: deleted inode referenced: 169640807
[6060482.776087] JBD: Spotted dirty metadata buffer (dev = dm-4, blocknr = 0). There's a risk of filesystem corruption in case of 
                  system crash.

The system was clearly acting strange, so I decided it was best to touch
/forcefsk and restart to clean up the filesystem.

I got a couple Multiply-claimed block(s), "(There are 10 inodes
containing multiply-claimed blocks.)", and then I was required to run
fsck again, which I did and it seemed to be fine after the second run
(these fscks took hours). 

After things seemed clean, I started the system back up and it began to
operate fine. I then began to see the following on the console:

[ 3201.702997] EXT4-fs error (device dm-4): mb_free_blocks: double-free of inode 0's block 56429952(bit 3456 in group 1722)
[ 3201.714348] EXT4-fs error (device dm-4): mb_free_blocks: double-free of inode 0's block 56429953(bit 3457 in group 1722)
[ 3201.725665] EXT4-fs error (device dm-4): mb_free_blocks: double-free of inode 0's block 56429954(bit 3458 in group 1722)
[ 3201.737028] EXT4-fs error (device dm-4): mb_free_blocks: double-free of inode 0's block 56429955(bit 3459 in group 1722)
[ 3201.748721] EXT4-fs error (device dm-4): mb_free_blocks: double-free of inode 0's block 56429956(bit 3460 in group 1722)
[ 3201.760021] EXT4-fs error (device dm-4): mb_free_blocks: double-free of inode 0's block 56429957(bit 3461 in group 1722)
[ 3201.771489] EXT4-fs error (device dm-4): mb_free_blocks: double-free of inode 0's block 56429958(bit 3462 in group 1722)
[ 3201.782908] EXT4-fs error (device dm-4): mb_free_blocks: double-free of inode 0's block 56429959(bit 3463 in group 1722)
[ 3201.794281] EXT4-fs error (device dm-4): mb_free_blocks: double-free of inode 0's block 56429960(bit 3464 in group 1722)
[ 3201.805664] EXT4-fs error (device dm-4): mb_free_blocks: double-free of inode 0's block 56429961(bit 3465 in group 1722)
[ 3201.818936] JBD: Spotted dirty metadata buffer (dev = dm-4, blocknr = 0). There's a risk of filesystem corruption in case of system crash.
[ 3202.289345] JBD: Spotted dirty metadata buffer (dev = dm-4, blocknr = 0). There's a risk of filesystem corruption in case of system crash.
[ 3202.328925] JBD: Spotted dirty metadata buffer (dev = dm-4, blocknr = 0). There's a risk of filesystem corruption in case of system crash.

I'm concerned that this happened so quickly after a fsck resolved
issues.

The filesystem is on top of a software raid mirror, so I failed one set
and ran S.M.A.R.T. short/long tests on the device, re-added it to the
array, waited the 8hours for the resync, and then did the same thing
with the other element of the array. All smart tests completed without
error.

I took the machine down to add another disk to the system so I could
have more flexibility to be able to run badblocks tests, and when the
system came back up a fsck of the partition was required. Its been
running for 3 hours now, and so far it has only said "Duplicate or bad
block in use!" so I presume it is scanning the entire device for
duplicate blocks. This is what it did the previous fsck. 

Last time it took 8 hours to complete the first pass, and then it had to
do another pass after a reboot, which took 1.5-4hrs (i was sleeping when
it finished). So we've out for a number of hours now, which is quite
bad. 

Its certainly possible that this is not a filesystem issue, and instead
a hardware one, the badblocks tests should give us more conclusive
information. I would love any additional suggestions for what we can do
to conclusively identify what the issue is.

thanks for reading, and any thoughts you might have!

micah

[-- Attachment #2: Type: application/pgp-signature, Size: 835 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread
* ext4 corruption
@ 2011-02-26 10:16 Bill Huey (hui)
  2011-02-26 11:10 ` Theodore Tso
  0 siblings, 1 reply; 13+ messages in thread
From: Bill Huey (hui) @ 2011-02-26 10:16 UTC (permalink / raw)
  To: linux-ext4

Maybe this is deletion related since I was creating and destroying a
bunch of file with rsync. I don't know. I'm redoing the rsync with
checksums to see if the data is still in tact. Seems like some bits of
this got corrupted, but I can't tell if it's disk or file system
related.

bill
------------------------------------

Feb 22 19:23:31 finfin kernel: [    2.819633]  sdb1
Feb 22 19:27:40 finfin kernel: [  263.857108]  sdb: sdb1
Feb 22 20:03:18 finfin kernel: [ 2402.182269] EXT4-fs (sdb1): mounted
filesystem with ordered data mode. Opts: (null)
Feb 25 03:49:40 finfin kernel: [203184.800029] EXT4-fs (sdb1):
warning: mounting fs with errors, running e2fsck is recommended
Feb 25 03:49:41 finfin kernel: [203184.980730] EXT4-fs (sdb1): mounted
filesystem with ordered data mode. Opts: (null)
Feb 25 04:41:26 finfin kernel: [206290.181230] JBD: Spotted dirty
metadata buffer (dev = sdb1, blocknr = 0). There's a risk of
filesystem corruption in case of system crash.
Feb 25 05:01:39 finfin kernel: [    2.405998]  sdb: sdb1
Feb 25 05:02:07 finfin kernel: [   82.487625] EXT4-fs (sdb1): warning:
mounting fs with errors, running e2fsck is recommended
Feb 25 05:02:07 finfin kernel: [   82.657297] EXT4-fs (sdb1): mounted
filesystem with ordered data mode. Opts: (null)
Feb 25 05:09:40 finfin kernel: [    2.408385]  sdb: sdb1
Feb 25 05:14:17 finfin kernel: [    2.438605]  sdb: sdb1
Feb 25 05:16:21 finfin kernel: [  174.548463] EXT4-fs (sdb1): warning:
mounting fs with errors, running e2fsck is recommended
Feb 25 05:16:21 finfin kernel: [  174.728174] EXT4-fs (sdb1): mounted
filesystem with ordered data mode. Opts: (null)
Feb 25 15:18:20 finfin kernel: [36293.209497] EXT4-fs warning (device
sdb1): empty_dir: bad directory (dir #40242283) - no `.' or `..'
Feb 25 15:18:20 finfin kernel: [36293.209594] EXT4-fs warning (device
sdb1): empty_dir: bad directory (dir #40242282) - no `.' or `..'
Feb 25 15:18:20 finfin kernel: [36293.209668] EXT4-fs warning (device
sdb1): empty_dir: bad directory (dir #40242281) - no `.' or `..'
Feb 25 15:18:20 finfin kernel: [36293.209739] EXT4-fs warning (device
sdb1): empty_dir: bad directory (dir #40242280) - no `.' or `..'
Feb 25 15:18:20 finfin kernel: [36293.209745] EXT4-fs warning (device
sdb1): ext4_rmdir: empty directory has too many links (8)
Feb 25 15:18:20 finfin kernel: [36293.209818] EXT4-fs warning (device
sdb1): empty_dir: bad directory (dir #40242279) - no `.' or `..'
Feb 25 15:18:20 finfin kernel: [36293.209824] EXT4-fs warning (device
sdb1): ext4_rmdir: empty directory has too many links (5)
Feb 25 15:18:20 finfin kernel: [36293.209892] EXT4-fs warning (device
sdb1): empty_dir: bad directory (dir #40242278) - no `.' or `..'
Feb 25 15:18:20 finfin kernel: [36293.209898] EXT4-fs warning (device
sdb1): ext4_rmdir: empty directory has too many links (3)
Feb 25 15:18:20 finfin kernel: [36293.226488] EXT4-fs warning (device
sdb1): ext4_rmdir: empty directory has too many links (125)
Feb 25 15:18:45 finfin kernel: [36317.996660] EXT4-fs warning (device
sdb1): empty_dir: bad directory (dir #40247537) - no data block
Feb 25 15:18:45 finfin kernel: [36317.996671] EXT4-fs warning (device
sdb1): ext4_rmdir: empty directory has too many links (24906)
Feb 25 15:19:38 finfin kernel: [36371.180386] EXT4-fs warning (device
sdb1): empty_dir: bad directory (dir #40239122) - no `.' or `..'
Feb 25 15:19:38 finfin kernel: [36371.232941] EXT4-fs warning (device
sdb1): ext4_rmdir: empty directory has too many links (3)
Feb 25 15:19:38 finfin kernel: [36371.343549] EXT4-fs warning (device
sdb1): ext4_rmdir: empty directory has too many links (5)
Feb 25 15:19:38 finfin kernel: [36371.397308] EXT4-fs warning (device
sdb1): empty_dir: bad directory (dir #40239106) - no `.' or `..'
Feb 25 18:13:27 finfin kernel: [46799.800244] EXT4-fs (sdb1): mounted
filesystem with ordered data mode. Opts: (null)
Feb 25 21:51:47 finfin kernel: [59900.021575] EXT4-fs (sdb1): mounted
filesystem with ordered data mode. Opts: (null)

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2011-06-06 17:11 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-06-06  3:59 ext4 corruption Micah Anderson
2011-06-06  4:19 ` Ted Ts'o
2011-06-06 17:11   ` micah anderson
  -- strict thread matches above, loose matches on Subject: below --
2011-02-26 10:16 Bill Huey (hui)
2011-02-26 11:10 ` Theodore Tso
2011-02-26 11:13   ` Bill Huey (hui)
2011-02-26 11:16     ` Bill Huey (hui)
2011-02-28  4:43       ` Ted Ts'o
2011-02-28 20:18         ` Bill Huey (hui)
2011-02-28 20:30           ` Bill Huey (hui)
2011-02-28 22:55           ` Ted Ts'o
2011-02-28 23:45             ` Bill Huey (hui)
2011-02-28 15:01     ` Eric Sandeen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).