ext4 corruption

linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* ext4 corruption
@ 2011-02-26 10:16 Bill Huey (hui)
  2011-02-26 11:10 ` Theodore Tso
  0 siblings, 1 reply; 13+ messages in thread
From: Bill Huey (hui) @ 2011-02-26 10:16 UTC (permalink / raw)
  To: linux-ext4

Maybe this is deletion related since I was creating and destroying a
bunch of file with rsync. I don't know. I'm redoing the rsync with
checksums to see if the data is still in tact. Seems like some bits of
this got corrupted, but I can't tell if it's disk or file system
related.

bill
------------------------------------

Feb 22 19:23:31 finfin kernel: [    2.819633]  sdb1
Feb 22 19:27:40 finfin kernel: [  263.857108]  sdb: sdb1
Feb 22 20:03:18 finfin kernel: [ 2402.182269] EXT4-fs (sdb1): mounted
filesystem with ordered data mode. Opts: (null)
Feb 25 03:49:40 finfin kernel: [203184.800029] EXT4-fs (sdb1):
warning: mounting fs with errors, running e2fsck is recommended
Feb 25 03:49:41 finfin kernel: [203184.980730] EXT4-fs (sdb1): mounted
filesystem with ordered data mode. Opts: (null)
Feb 25 04:41:26 finfin kernel: [206290.181230] JBD: Spotted dirty
metadata buffer (dev = sdb1, blocknr = 0). There's a risk of
filesystem corruption in case of system crash.
Feb 25 05:01:39 finfin kernel: [    2.405998]  sdb: sdb1
Feb 25 05:02:07 finfin kernel: [   82.487625] EXT4-fs (sdb1): warning:
mounting fs with errors, running e2fsck is recommended
Feb 25 05:02:07 finfin kernel: [   82.657297] EXT4-fs (sdb1): mounted
filesystem with ordered data mode. Opts: (null)
Feb 25 05:09:40 finfin kernel: [    2.408385]  sdb: sdb1
Feb 25 05:14:17 finfin kernel: [    2.438605]  sdb: sdb1
Feb 25 05:16:21 finfin kernel: [  174.548463] EXT4-fs (sdb1): warning:
mounting fs with errors, running e2fsck is recommended
Feb 25 05:16:21 finfin kernel: [  174.728174] EXT4-fs (sdb1): mounted
filesystem with ordered data mode. Opts: (null)
Feb 25 15:18:20 finfin kernel: [36293.209497] EXT4-fs warning (device
sdb1): empty_dir: bad directory (dir #40242283) - no `.' or `..'
Feb 25 15:18:20 finfin kernel: [36293.209594] EXT4-fs warning (device
sdb1): empty_dir: bad directory (dir #40242282) - no `.' or `..'
Feb 25 15:18:20 finfin kernel: [36293.209668] EXT4-fs warning (device
sdb1): empty_dir: bad directory (dir #40242281) - no `.' or `..'
Feb 25 15:18:20 finfin kernel: [36293.209739] EXT4-fs warning (device
sdb1): empty_dir: bad directory (dir #40242280) - no `.' or `..'
Feb 25 15:18:20 finfin kernel: [36293.209745] EXT4-fs warning (device
sdb1): ext4_rmdir: empty directory has too many links (8)
Feb 25 15:18:20 finfin kernel: [36293.209818] EXT4-fs warning (device
sdb1): empty_dir: bad directory (dir #40242279) - no `.' or `..'
Feb 25 15:18:20 finfin kernel: [36293.209824] EXT4-fs warning (device
sdb1): ext4_rmdir: empty directory has too many links (5)
Feb 25 15:18:20 finfin kernel: [36293.209892] EXT4-fs warning (device
sdb1): empty_dir: bad directory (dir #40242278) - no `.' or `..'
Feb 25 15:18:20 finfin kernel: [36293.209898] EXT4-fs warning (device
sdb1): ext4_rmdir: empty directory has too many links (3)
Feb 25 15:18:20 finfin kernel: [36293.226488] EXT4-fs warning (device
sdb1): ext4_rmdir: empty directory has too many links (125)
Feb 25 15:18:45 finfin kernel: [36317.996660] EXT4-fs warning (device
sdb1): empty_dir: bad directory (dir #40247537) - no data block
Feb 25 15:18:45 finfin kernel: [36317.996671] EXT4-fs warning (device
sdb1): ext4_rmdir: empty directory has too many links (24906)
Feb 25 15:19:38 finfin kernel: [36371.180386] EXT4-fs warning (device
sdb1): empty_dir: bad directory (dir #40239122) - no `.' or `..'
Feb 25 15:19:38 finfin kernel: [36371.232941] EXT4-fs warning (device
sdb1): ext4_rmdir: empty directory has too many links (3)
Feb 25 15:19:38 finfin kernel: [36371.343549] EXT4-fs warning (device
sdb1): ext4_rmdir: empty directory has too many links (5)
Feb 25 15:19:38 finfin kernel: [36371.397308] EXT4-fs warning (device
sdb1): empty_dir: bad directory (dir #40239106) - no `.' or `..'
Feb 25 18:13:27 finfin kernel: [46799.800244] EXT4-fs (sdb1): mounted
filesystem with ordered data mode. Opts: (null)
Feb 25 21:51:47 finfin kernel: [59900.021575] EXT4-fs (sdb1): mounted
filesystem with ordered data mode. Opts: (null)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ext4 corruption
  2011-02-26 10:16 ext4 corruption Bill Huey (hui)
@ 2011-02-26 11:10 ` Theodore Tso
  2011-02-26 11:13   ` Bill Huey (hui)
  0 siblings, 1 reply; 13+ messages in thread
From: Theodore Tso @ 2011-02-26 11:10 UTC (permalink / raw)
  To: Bill Huey; +Cc: linux-ext4


On Feb 26, 2011, at 5:16 AM, Bill Huey (hui) wrote:

> Maybe this is deletion related since I was creating and destroying a
> bunch of file with rsync. I don't know. I'm redoing the rsync with
> checksums to see if the data is still in tact. Seems like some bits of
> this got corrupted, but I can't tell if it's disk or file system
> related.

I'd run e2fsck before you do anything else.   Some of the error
messages listed below, especially the ones about bad link
counts on empty directories, smells very much like the inode
table got corrupted.

-- Ted


> 
> bill
> ------------------------------------
> 
> Feb 22 19:23:31 finfin kernel: [    2.819633]  sdb1
> Feb 22 19:27:40 finfin kernel: [  263.857108]  sdb: sdb1
> Feb 22 20:03:18 finfin kernel: [ 2402.182269] EXT4-fs (sdb1): mounted
> filesystem with ordered data mode. Opts: (null)
> Feb 25 03:49:40 finfin kernel: [203184.800029] EXT4-fs (sdb1):
> warning: mounting fs with errors, running e2fsck is recommended
> Feb 25 03:49:41 finfin kernel: [203184.980730] EXT4-fs (sdb1): mounted
> filesystem with ordered data mode. Opts: (null)
> Feb 25 04:41:26 finfin kernel: [206290.181230] JBD: Spotted dirty
> metadata buffer (dev = sdb1, blocknr = 0). There's a risk of
> filesystem corruption in case of system crash.
> Feb 25 05:01:39 finfin kernel: [    2.405998]  sdb: sdb1
> Feb 25 05:02:07 finfin kernel: [   82.487625] EXT4-fs (sdb1): warning:
> mounting fs with errors, running e2fsck is recommended
> Feb 25 05:02:07 finfin kernel: [   82.657297] EXT4-fs (sdb1): mounted
> filesystem with ordered data mode. Opts: (null)
> Feb 25 05:09:40 finfin kernel: [    2.408385]  sdb: sdb1
> Feb 25 05:14:17 finfin kernel: [    2.438605]  sdb: sdb1
> Feb 25 05:16:21 finfin kernel: [  174.548463] EXT4-fs (sdb1): warning:
> mounting fs with errors, running e2fsck is recommended
> Feb 25 05:16:21 finfin kernel: [  174.728174] EXT4-fs (sdb1): mounted
> filesystem with ordered data mode. Opts: (null)
> Feb 25 15:18:20 finfin kernel: [36293.209497] EXT4-fs warning (device
> sdb1): empty_dir: bad directory (dir #40242283) - no `.' or `..'
> Feb 25 15:18:20 finfin kernel: [36293.209594] EXT4-fs warning (device
> sdb1): empty_dir: bad directory (dir #40242282) - no `.' or `..'
> Feb 25 15:18:20 finfin kernel: [36293.209668] EXT4-fs warning (device
> sdb1): empty_dir: bad directory (dir #40242281) - no `.' or `..'
> Feb 25 15:18:20 finfin kernel: [36293.209739] EXT4-fs warning (device
> sdb1): empty_dir: bad directory (dir #40242280) - no `.' or `..'
> Feb 25 15:18:20 finfin kernel: [36293.209745] EXT4-fs warning (device
> sdb1): ext4_rmdir: empty directory has too many links (8)
> Feb 25 15:18:20 finfin kernel: [36293.209818] EXT4-fs warning (device
> sdb1): empty_dir: bad directory (dir #40242279) - no `.' or `..'
> Feb 25 15:18:20 finfin kernel: [36293.209824] EXT4-fs warning (device
> sdb1): ext4_rmdir: empty directory has too many links (5)
> Feb 25 15:18:20 finfin kernel: [36293.209892] EXT4-fs warning (device
> sdb1): empty_dir: bad directory (dir #40242278) - no `.' or `..'
> Feb 25 15:18:20 finfin kernel: [36293.209898] EXT4-fs warning (device
> sdb1): ext4_rmdir: empty directory has too many links (3)
> Feb 25 15:18:20 finfin kernel: [36293.226488] EXT4-fs warning (device
> sdb1): ext4_rmdir: empty directory has too many links (125)
> Feb 25 15:18:45 finfin kernel: [36317.996660] EXT4-fs warning (device
> sdb1): empty_dir: bad directory (dir #40247537) - no data block
> Feb 25 15:18:45 finfin kernel: [36317.996671] EXT4-fs warning (device
> sdb1): ext4_rmdir: empty directory has too many links (24906)
> Feb 25 15:19:38 finfin kernel: [36371.180386] EXT4-fs warning (device
> sdb1): empty_dir: bad directory (dir #40239122) - no `.' or `..'
> Feb 25 15:19:38 finfin kernel: [36371.232941] EXT4-fs warning (device
> sdb1): ext4_rmdir: empty directory has too many links (3)
> Feb 25 15:19:38 finfin kernel: [36371.343549] EXT4-fs warning (device
> sdb1): ext4_rmdir: empty directory has too many links (5)
> Feb 25 15:19:38 finfin kernel: [36371.397308] EXT4-fs warning (device
> sdb1): empty_dir: bad directory (dir #40239106) - no `.' or `..'
> Feb 25 18:13:27 finfin kernel: [46799.800244] EXT4-fs (sdb1): mounted
> filesystem with ordered data mode. Opts: (null)
> Feb 25 21:51:47 finfin kernel: [59900.021575] EXT4-fs (sdb1): mounted
> filesystem with ordered data mode. Opts: (null)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ext4 corruption
  2011-02-26 11:10 ` Theodore Tso
@ 2011-02-26 11:13   ` Bill Huey (hui)
  2011-02-26 11:16     ` Bill Huey (hui)
  2011-02-28 15:01     ` Eric Sandeen
  0 siblings, 2 replies; 13+ messages in thread
From: Bill Huey (hui) @ 2011-02-26 11:13 UTC (permalink / raw)
  To: Theodore Tso; +Cc: linux-ext4

Theodore,

I did run fsck.ext4 on the file system. It cleared a bunch of errors
and but it's still showing various problems.

Another log of the file system. USB->SATA problems ?
---------------------------------

Feb 22 19:23:31 finfin kernel: [    2.402463] sd 6:0:0:0: [sdb]
3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB)
Feb 22 19:23:31 finfin kernel: [    2.403585] sd 6:0:0:0: [sdb] Write
Protect is off
Feb 22 19:23:31 finfin kernel: [    2.405920]  sdb:
Feb 22 19:23:31 finfin kernel: [    2.819633]  sdb1
Feb 22 19:23:31 finfin kernel: [    2.821592] sd 6:0:0:0: [sdb]
Attached SCSI disk
Feb 22 19:27:40 finfin kernel: [  263.857108]  sdb: sdb1
Feb 22 20:03:18 finfin kernel: [ 2402.182269] EXT4-fs (sdb1): mounted
filesystem with ordered data mode. Opts: (null)
Feb 25 03:49:40 finfin kernel: [203184.800029] EXT4-fs (sdb1):
warning: mounting fs with errors, running e2fsck is recommended
Feb 25 03:49:41 finfin kernel: [203184.980730] EXT4-fs (sdb1): mounted
filesystem with ordered data mode. Opts: (null)
Feb 25 04:41:26 finfin kernel: [206290.181230] JBD: Spotted dirty
metadata buffer (dev = sdb1, blocknr = 0). There's a risk of
filesystem corruption in case of system crash.
Feb 25 05:01:39 finfin kernel: [    2.402538] sd 6:0:0:0: [sdb]
3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB)
Feb 25 05:01:39 finfin kernel: [    2.403659] sd 6:0:0:0: [sdb] Write
Protect is off
Feb 25 05:01:39 finfin kernel: [    2.405998]  sdb: sdb1
Feb 25 05:01:39 finfin kernel: [    2.424879] sd 6:0:0:0: [sdb]
Attached SCSI disk
Feb 25 05:02:07 finfin kernel: [   82.487625] EXT4-fs (sdb1): warning:
mounting fs with errors, running e2fsck is recommended
Feb 25 05:02:07 finfin kernel: [   82.657297] EXT4-fs (sdb1): mounted
filesystem with ordered data mode. Opts: (null)
Feb 25 05:09:40 finfin kernel: [    2.404933] sd 6:0:0:0: [sdb]
3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB)
Feb 25 05:09:40 finfin kernel: [    2.406043] sd 6:0:0:0: [sdb] Write
Protect is off
Feb 25 05:09:40 finfin kernel: [    2.408385]  sdb: sdb1
Feb 25 05:09:40 finfin kernel: [    2.411134] sd 6:0:0:0: [sdb]
Attached SCSI disk
Feb 25 05:14:17 finfin kernel: [    2.435019] sd 6:0:0:0: [sdb]
3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB)
Feb 25 05:14:17 finfin kernel: [    2.436139] sd 6:0:0:0: [sdb] Write
Protect is off
Feb 25 05:14:17 finfin kernel: [    2.438605]  sdb: sdb1
Feb 25 05:14:17 finfin kernel: [    2.441353] sd 6:0:0:0: [sdb]
Attached SCSI disk
Feb 25 05:16:21 finfin kernel: [  174.548463] EXT4-fs (sdb1): warning:
mounting fs with errors, running e2fsck is recommended
Feb 25 05:16:21 finfin kernel: [  174.728174] EXT4-fs (sdb1): mounted
filesystem with ordered data mode. Opts: (null)
Feb 25 15:18:20 finfin kernel: [36293.209497] EXT4-fs warning (device
sdb1): empty_dir: bad directory (dir #40242283) - no `.' or `..'
Feb 25 15:18:20 finfin kernel: [36293.209594] EXT4-fs warning (device
sdb1): empty_dir: bad directory (dir #40242282) - no `.' or `..'
Feb 25 15:18:20 finfin kernel: [36293.209668] EXT4-fs warning (device
sdb1): empty_dir: bad directory (dir #40242281) - no `.' or `..'
Feb 25 15:18:20 finfin kernel: [36293.209739] EXT4-fs warning (device
sdb1): empty_dir: bad directory (dir #40242280) - no `.' or `..'
Feb 25 15:18:20 finfin kernel: [36293.209745] EXT4-fs warning (device
sdb1): ext4_rmdir: empty directory has too many links (8)
Feb 25 15:18:20 finfin kernel: [36293.209818] EXT4-fs warning (device
sdb1): empty_dir: bad directory (dir #40242279) - no `.' or `..'
Feb 25 15:18:20 finfin kernel: [36293.209824] EXT4-fs warning (device
sdb1): ext4_rmdir: empty directory has too many links (5)
Feb 25 15:18:20 finfin kernel: [36293.209892] EXT4-fs warning (device
sdb1): empty_dir: bad directory (dir #40242278) - no `.' or `..'
Feb 25 15:18:20 finfin kernel: [36293.209898] EXT4-fs warning (device
sdb1): ext4_rmdir: empty directory has too many links (3)
Feb 25 15:18:20 finfin kernel: [36293.226488] EXT4-fs warning (device
sdb1): ext4_rmdir: empty directory has too many links (125)
Feb 25 15:18:45 finfin kernel: [36317.996660] EXT4-fs warning (device
sdb1): empty_dir: bad directory (dir #40247537) - no data block
Feb 25 15:18:45 finfin kernel: [36317.996671] EXT4-fs warning (device
sdb1): ext4_rmdir: empty directory has too many links (24906)
Feb 25 15:19:38 finfin kernel: [36371.180386] EXT4-fs warning (device
sdb1): empty_dir: bad directory (dir #40239122) - no `.' or `..'
Feb 25 15:19:38 finfin kernel: [36371.232941] EXT4-fs warning (device
sdb1): ext4_rmdir: empty directory has too many links (3)
Feb 25 15:19:38 finfin kernel: [36371.343549] EXT4-fs warning (device
sdb1): ext4_rmdir: empty directory has too many links (5)
Feb 25 15:19:38 finfin kernel: [36371.397308] EXT4-fs warning (device
sdb1): empty_dir: bad directory (dir #40239106) - no `.' or `..'
Feb 25 18:13:27 finfin kernel: [46799.800244] EXT4-fs (sdb1): mounted
filesystem with ordered data mode. Opts: (null)
Feb 25 21:51:47 finfin kernel: [59900.021575] EXT4-fs (sdb1): mounted
filesystem with ordered data mode. Opts: (null)
Feb 26 02:19:27 finfin kernel: [75959.826438] sd 6:0:0:0: [sdb]
Unhandled error code
Feb 26 02:19:27 finfin kernel: [75959.826444] sd 6:0:0:0: [sdb]
Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Feb 26 02:19:27 finfin kernel: [75959.826450] sd 6:0:0:0: [sdb] CDB:
Read(10): 28 00 ac 44 00 3f 00 00 08 00

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ext4 corruption
  2011-02-26 11:13   ` Bill Huey (hui)
@ 2011-02-26 11:16     ` Bill Huey (hui)
  2011-02-28  4:43       ` Ted Ts'o
  2011-02-28 15:01     ` Eric Sandeen
  1 sibling, 1 reply; 13+ messages in thread
From: Bill Huey (hui) @ 2011-02-26 11:16 UTC (permalink / raw)
  To: Theodore Tso; +Cc: linux-ext4

Just to clarify, it did clear the problems, but the continued use of
the file system resulted in more corruption. It's no longer usable. I
have to remake the file system and copy the data again.

bill

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ext4 corruption
  2011-02-26 11:16     ` Bill Huey (hui)
@ 2011-02-28  4:43       ` Ted Ts'o
  2011-02-28 20:18         ` Bill Huey (hui)
  0 siblings, 1 reply; 13+ messages in thread
From: Ted Ts'o @ 2011-02-28  4:43 UTC (permalink / raw)
  To: Bill Huey (hui); +Cc: linux-ext4

On Sat, Feb 26, 2011 at 03:16:38AM -0800, Bill Huey (hui) wrote:
> Just to clarify, it did clear the problems, but the continued use of
> the file system resulted in more corruption. It's no longer usable. I
> have to remake the file system and copy the data again.

You didn't say what version of the kernel you were using, but there
hasn't been any ext4 bugs that have caused this much corruption this
quickly; it sure smells like a hardware problem....

							- Ted

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ext4 corruption
  2011-02-26 11:13   ` Bill Huey (hui)
  2011-02-26 11:16     ` Bill Huey (hui)
@ 2011-02-28 15:01     ` Eric Sandeen
  1 sibling, 0 replies; 13+ messages in thread
From: Eric Sandeen @ 2011-02-28 15:01 UTC (permalink / raw)
  To: Bill Huey (hui); +Cc: Theodore Tso, linux-ext4

On 2/26/11 5:13 AM, Bill Huey (hui) wrote:
> Theodore,
> 
> I did run fsck.ext4 on the file system. It cleared a bunch of errors
> and but it's still showing various problems.
> 
> Another log of the file system. USB->SATA problems ?
> ---------------------------------
> 
> Feb 22 19:23:31 finfin kernel: [    2.402463] sd 6:0:0:0: [sdb]
> 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB)
> Feb 22 19:23:31 finfin kernel: [    2.403585] sd 6:0:0:0: [sdb] Write
> Protect is off
> Feb 22 19:23:31 finfin kernel: [    2.405920]  sdb:
> Feb 22 19:23:31 finfin kernel: [    2.819633]  sdb1
> Feb 22 19:23:31 finfin kernel: [    2.821592] sd 6:0:0:0: [sdb]
> Attached SCSI disk
> Feb 22 19:27:40 finfin kernel: [  263.857108]  sdb: sdb1
> Feb 22 20:03:18 finfin kernel: [ 2402.182269] EXT4-fs (sdb1): mounted
> filesystem with ordered data mode. Opts: (null)
> Feb 25 03:49:40 finfin kernel: [203184.800029] EXT4-fs (sdb1):
> warning: mounting fs with errors, running e2fsck is recommended

this must not have been post-fsck, if it still is mounting with errors...?

...

> Feb 25 21:51:47 finfin kernel: [59900.021575] EXT4-fs (sdb1): mounted
> filesystem with ordered data mode. Opts: (null)
> Feb 26 02:19:27 finfin kernel: [75959.826438] sd 6:0:0:0: [sdb]
> Unhandled error code
> Feb 26 02:19:27 finfin kernel: [75959.826444] sd 6:0:0:0: [sdb]
> Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
> Feb 26 02:19:27 finfin kernel: [75959.826450] sd 6:0:0:0: [sdb] CDB:
> Read(10): 28 00 ac 44 00 3f 00 00 08 00

and this does indeed look like storage problems, ext4 problems are
probably secondary.

-Eric

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ext4 corruption
  2011-02-28  4:43       ` Ted Ts'o
@ 2011-02-28 20:18         ` Bill Huey (hui)
  2011-02-28 20:30           ` Bill Huey (hui)
  2011-02-28 22:55           ` Ted Ts'o
  0 siblings, 2 replies; 13+ messages in thread
From: Bill Huey (hui) @ 2011-02-28 20:18 UTC (permalink / raw)
  To: Ted Ts'o; +Cc: linux-ext4

What ever is in the latest Ubuntu maverick, 2.6.35 for the kernel.

The only thing that makes it not looking like it was a storage problem
is this line that comes before the SATA error:

-----
Feb 25 04:41:26 finfin kernel: [206290.181230] JBD: Spotted dirty
metadata buffer (dev = sdb1, blocknr = 0). There's a risk of
filesystem corruption in case of system crash.
-----

bill

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ext4 corruption
  2011-02-28 20:18         ` Bill Huey (hui)
@ 2011-02-28 20:30           ` Bill Huey (hui)
  2011-02-28 22:55           ` Ted Ts'o
  1 sibling, 0 replies; 13+ messages in thread
From: Bill Huey (hui) @ 2011-02-28 20:30 UTC (permalink / raw)
  To: Ted Ts'o; +Cc: linux-ext4

...unless those two lines are linked somehow via a timeout or something...

On Mon, Feb 28, 2011 at 12:18 PM, Bill Huey (hui) <bill.huey@gmail.com> wrote:
> What ever is in the latest Ubuntu maverick, 2.6.35 for the kernel.
>
> The only thing that makes it not looking like it was a storage problem
> is this line that comes before the SATA error:
>
> -----
> Feb 25 04:41:26 finfin kernel: [206290.181230] JBD: Spotted dirty
> metadata buffer (dev = sdb1, blocknr = 0). There's a risk of
> filesystem corruption in case of system crash.
> -----

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ext4 corruption
  2011-02-28 20:18         ` Bill Huey (hui)
  2011-02-28 20:30           ` Bill Huey (hui)
@ 2011-02-28 22:55           ` Ted Ts'o
  2011-02-28 23:45             ` Bill Huey (hui)
  1 sibling, 1 reply; 13+ messages in thread
From: Ted Ts'o @ 2011-02-28 22:55 UTC (permalink / raw)
  To: Bill Huey (hui); +Cc: linux-ext4

On Mon, Feb 28, 2011 at 12:18:16PM -0800, Bill Huey (hui) wrote:
> What ever is in the latest Ubuntu maverick, 2.6.35 for the kernel.
> 
> The only thing that makes it not looking like it was a storage problem
> is this line that comes before the SATA error:
> 
> -----
> Feb 25 04:41:26 finfin kernel: [206290.181230] JBD: Spotted dirty
> metadata buffer (dev = sdb1, blocknr = 0). There's a risk of
> filesystem corruption in case of system crash.
> -----

There are a few places where we update the superblock bypassing the
journal layer.  (For example, when we set the RO_COMPAT_LARGE_FILE
feature flag if it wasn't previously set).  Those should be cleaned
up, but it's not related to the rest of the scary-looking corruption
which you saw.  The worst that might happen is specific superblock
update might get lost (i.e., the RO_COMPAT_LARGE_FILE feature flag) on
a crash before we commit some other superblock change to the journal.

  	       	  	      	    	       - Ted

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ext4 corruption
  2011-02-28 22:55           ` Ted Ts'o
@ 2011-02-28 23:45             ` Bill Huey (hui)
  0 siblings, 0 replies; 13+ messages in thread
From: Bill Huey (hui) @ 2011-02-28 23:45 UTC (permalink / raw)
  To: Ted Ts'o; +Cc: linux-ext4

Yeah, it was just a standard mkfs so I doubt that option was
specified. The important thing for me here was to at least let you
folks know about it so that you can determine if this is significant
or not.


Thanks

bill

On Mon, Feb 28, 2011 at 2:55 PM, Ted Ts'o <tytso@mit.edu> wrote:
> There are a few places where we update the superblock bypassing the
> journal layer.  (For example, when we set the RO_COMPAT_LARGE_FILE
> feature flag if it wasn't previously set).  Those should be cleaned
> up, but it's not related to the rest of the scary-looking corruption
> which you saw.  The worst that might happen is specific superblock
> update might get lost (i.e., the RO_COMPAT_LARGE_FILE feature flag) on
> a crash before we commit some other superblock change to the journal.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* ext4 corruption
@ 2011-06-06  3:59 Micah Anderson
  2011-06-06  4:19 ` Ted Ts'o
  0 siblings, 1 reply; 13+ messages in thread
From: Micah Anderson @ 2011-06-06  3:59 UTC (permalink / raw)
  To: linux-ext4

[-- Attachment #1: Type: text/plain, Size: 4291 bytes --]

I previously wrote about a recent conversion from ext3 to ext4 (on
Debian Squeeze), which went well. However, I seem to be having problems
with the ext4 filesystem.

Yesterday, there was a file in /var/spool/postfix/defer that was giving
an i/o error:

Jun  3 15:00:14 willet postfix/qmgr[29108]: fatal: qmgr_message_alloc:
677AE298316F: remove defer 677AE298316F: Input/output error

If I tried to stat it, it would give the same error. I noticed on the
console, I was getting a lot of these:

[6060479.296658] EXT4-fs error (device dm-4): ext4_lookup: deleted inode referenced: 169640807
[6060482.776087] JBD: Spotted dirty metadata buffer (dev = dm-4, blocknr = 0). There's a risk of filesystem corruption in case of 
                  system crash.

The system was clearly acting strange, so I decided it was best to touch
/forcefsk and restart to clean up the filesystem.

I got a couple Multiply-claimed block(s), "(There are 10 inodes
containing multiply-claimed blocks.)", and then I was required to run
fsck again, which I did and it seemed to be fine after the second run
(these fscks took hours). 

After things seemed clean, I started the system back up and it began to
operate fine. I then began to see the following on the console:

[ 3201.702997] EXT4-fs error (device dm-4): mb_free_blocks: double-free of inode 0's block 56429952(bit 3456 in group 1722)
[ 3201.714348] EXT4-fs error (device dm-4): mb_free_blocks: double-free of inode 0's block 56429953(bit 3457 in group 1722)
[ 3201.725665] EXT4-fs error (device dm-4): mb_free_blocks: double-free of inode 0's block 56429954(bit 3458 in group 1722)
[ 3201.737028] EXT4-fs error (device dm-4): mb_free_blocks: double-free of inode 0's block 56429955(bit 3459 in group 1722)
[ 3201.748721] EXT4-fs error (device dm-4): mb_free_blocks: double-free of inode 0's block 56429956(bit 3460 in group 1722)
[ 3201.760021] EXT4-fs error (device dm-4): mb_free_blocks: double-free of inode 0's block 56429957(bit 3461 in group 1722)
[ 3201.771489] EXT4-fs error (device dm-4): mb_free_blocks: double-free of inode 0's block 56429958(bit 3462 in group 1722)
[ 3201.782908] EXT4-fs error (device dm-4): mb_free_blocks: double-free of inode 0's block 56429959(bit 3463 in group 1722)
[ 3201.794281] EXT4-fs error (device dm-4): mb_free_blocks: double-free of inode 0's block 56429960(bit 3464 in group 1722)
[ 3201.805664] EXT4-fs error (device dm-4): mb_free_blocks: double-free of inode 0's block 56429961(bit 3465 in group 1722)
[ 3201.818936] JBD: Spotted dirty metadata buffer (dev = dm-4, blocknr = 0). There's a risk of filesystem corruption in case of system crash.
[ 3202.289345] JBD: Spotted dirty metadata buffer (dev = dm-4, blocknr = 0). There's a risk of filesystem corruption in case of system crash.
[ 3202.328925] JBD: Spotted dirty metadata buffer (dev = dm-4, blocknr = 0). There's a risk of filesystem corruption in case of system crash.

I'm concerned that this happened so quickly after a fsck resolved
issues.

The filesystem is on top of a software raid mirror, so I failed one set
and ran S.M.A.R.T. short/long tests on the device, re-added it to the
array, waited the 8hours for the resync, and then did the same thing
with the other element of the array. All smart tests completed without
error.

I took the machine down to add another disk to the system so I could
have more flexibility to be able to run badblocks tests, and when the
system came back up a fsck of the partition was required. Its been
running for 3 hours now, and so far it has only said "Duplicate or bad
block in use!" so I presume it is scanning the entire device for
duplicate blocks. This is what it did the previous fsck. 

Last time it took 8 hours to complete the first pass, and then it had to
do another pass after a reboot, which took 1.5-4hrs (i was sleeping when
it finished). So we've out for a number of hours now, which is quite
bad. 

Its certainly possible that this is not a filesystem issue, and instead
a hardware one, the badblocks tests should give us more conclusive
information. I would love any additional suggestions for what we can do
to conclusively identify what the issue is.

thanks for reading, and any thoughts you might have!

micah

[-- Attachment #2: Type: application/pgp-signature, Size: 835 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ext4 corruption
  2011-06-06  3:59 Micah Anderson
@ 2011-06-06  4:19 ` Ted Ts'o
  2011-06-06 17:11   ` micah anderson
  0 siblings, 1 reply; 13+ messages in thread
From: Ted Ts'o @ 2011-06-06  4:19 UTC (permalink / raw)
  To: Micah Anderson; +Cc: linux-ext4

On Sun, Jun 05, 2011 at 11:59:34PM -0400, Micah Anderson wrote:
> 
> I previously wrote about a recent conversion from ext3 to ext4 (on
> Debian Squeeze), which went well. However, I seem to be having problems
> with the ext4 filesystem.

Are you using the 2.6.32 kernel (the Debian squeeze default)?  Try
updating to 2.6.39.1, and see if that stablizes things.  There have
been a huge number of bug fixes since 2.6.32, and no one has been
really backporting patches to such an ancient kernel.  This is one of
the ways in which Debian Obsolete^H^H^H^H^H^H^H^H Stable can be
somewhat of a disadvantage.  Unlike the RHEL kernels, no one is
backporting ext4 bugfixes to older Debian stable kernels, and ext4 was
still getting a lot of bug fixes in the 2.6.32 days.

That being said, you're seeing some pretty severe inode *and* block
allocation bitmap problems, and that doesn't sound like anything I
remember even back in the 2.6.32 days.  It does make me wonder about
the stability of the hardware and of the software raid code...

   	      	     	      	     	 	       - Ted

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ext4 corruption
  2011-06-06  4:19 ` Ted Ts'o
@ 2011-06-06 17:11   ` micah anderson
  0 siblings, 0 replies; 13+ messages in thread
From: micah anderson @ 2011-06-06 17:11 UTC (permalink / raw)
  To: Ted Ts'o; +Cc: linux-ext4

[-- Attachment #1: Type: text/plain, Size: 1814 bytes --]

On Mon, 6 Jun 2011 00:19:50 -0400, Ted Ts'o <tytso@mit.edu> wrote:
> On Sun, Jun 05, 2011 at 11:59:34PM -0400, Micah Anderson wrote:
> > 
> > I previously wrote about a recent conversion from ext3 to ext4 (on
> > Debian Squeeze), which went well. However, I seem to be having problems
> > with the ext4 filesystem.
> 
> Are you using the 2.6.32 kernel (the Debian squeeze default)?  Try

Yes, we are using 2.6.32-34squeeze1.

> updating to 2.6.39.1, and see if that stablizes things.  There have
> been a huge number of bug fixes since 2.6.32, and no one has been
> really backporting patches to such an ancient kernel.  This is one of
> the ways in which Debian Obsolete^H^H^H^H^H^H^H^H Stable can be
> somewhat of a disadvantage.  Unlike the RHEL kernels, no one is
> backporting ext4 bugfixes to older Debian stable kernels, and ext4 was
> still getting a lot of bug fixes in the 2.6.32 days.

Well, it does seem like 2.6.32.y contains quite a number of ext4 fixes:

$ git rev-list v2.6.32..v2.6.32.41 fs/ext4 | wc -l
92

Not an insignificant amount, although it seems the latest was in
2.6.32.23 which was some time ago.

I'm sure that the debian-kernel team would welcome some help with this,
even if it is just some help determining the most important issues to
resolve. 

I would guess that RHEL is in a better position to integrate more
invasive fixes.

> That being said, you're seeing some pretty severe inode *and* block
> allocation bitmap problems, and that doesn't sound like anything I
> remember even back in the 2.6.32 days.  It does make me wonder about
> the stability of the hardware and of the software raid code...

Yeah, so once we've done some destructive badblocks tests on the drives,
we should be able to rule out drive issues at least.

micah

[-- Attachment #2: Type: application/pgp-signature, Size: 835 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2011-06-06 17:11 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-02-26 10:16 ext4 corruption Bill Huey (hui)
2011-02-26 11:10 ` Theodore Tso
2011-02-26 11:13   ` Bill Huey (hui)
2011-02-26 11:16     ` Bill Huey (hui)
2011-02-28  4:43       ` Ted Ts'o
2011-02-28 20:18         ` Bill Huey (hui)
2011-02-28 20:30           ` Bill Huey (hui)
2011-02-28 22:55           ` Ted Ts'o
2011-02-28 23:45             ` Bill Huey (hui)
2011-02-28 15:01     ` Eric Sandeen
  -- strict thread matches above, loose matches on Subject: below --
2011-06-06  3:59 Micah Anderson
2011-06-06  4:19 ` Ted Ts'o
2011-06-06 17:11   ` micah anderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).