linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "George Spelvin" <linux@horizon.com>
To: linux-ext4@vger.kernel.org, tytso@mit.edu
Cc: linux@horizon.com
Subject: Re: Exciting :-( adventures in metadata checksumming
Date: 8 Aug 2012 19:42:39 -0400	[thread overview]
Message-ID: <20120808234239.4443.qmail@science.horizon.com> (raw)
In-Reply-To: <20120808223427.26158.qmail@science.horizon.com>

> Can someone find a workaround QUICKLY?  I can't keep this FS read-only
> for long.

I thought I had figured out a great workaround: Use 1.42.4, which doesn't
know how to check checksums.

But then I doscovered that it aborts and delivers a zero-length file
if there are filesystem inconsistencies, too!  So I get

e2image 1.42.4 (12-Jun-2012)
Illegal block number passed to ext2fs_mark_block_bitmap #3571066296 for in-use block map
Illegal block number passed to ext2fs_mark_block_bitmap #2895243190 for in-use block map
Illegal block number passed to ext2fs_mark_block_bitmap #3276895043 for in-use block map
Illegal block number passed to ext2fs_mark_block_bitmap #2488200263 for in-use block map
Illegal block number passed to ext2fs_mark_block_bitmap #2556839855 for in-use block map
... snip... (2671 total "Illegal block number passed" messages)
Illegal block number passed to ext2fs_mark_block_bitmap #3421917394 for in-use block map
Illegal block number passed to ext2fs_mark_block_bitmap #3469830505 for in-use block map
e2image: Illegal indirect block found while iterating over inode 85800474

I'm not sure this is The Right Thing To Do for a debugging tool.


The file system is a RAID-6, and repeated verifications have failed to find
RAID mismatches.

I am starting to suspect motherboard/RAM on this machine.  Already the bad
magic number error patterns looked odd to me, and I was just reminded that
we had to swap the RAM when it was first built so memtest8 would pass.
We ran it for many hours, but it *is* a consumer Intel box with no ECC.

And 8 GiB of RAM, and acting primarily as a file server, so FS metadata can
sit and bit-rot in RAM for a very long time.

I'm going to play with "hdparm -f" and drop_caches to see if I can make
the file system problems go away with no repair other than re-reading
from disk.

If so, That would confirm it as not ext4's problem.  Although it *would* be
a very cool debugging feature to re-check the checksum whenever a metadata
page is discarded from the buffer cache.

If the checksum matched when first read in, and doesn't when a supposedly
clean page is discarded, *something* is corrupting RAM.  (If you
assume that it's a single bit flip, then you can deduce the location
from the error syndrome.)


Anyway, thanks for the help!

  reply	other threads:[~2012-08-08 23:42 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-08-03 19:55 Exciting :-( adventures in metadata checksumming George Spelvin
2012-08-03 23:49 ` Theodore Ts'o
2012-08-04  1:42   ` George Spelvin
2012-08-04 22:12     ` Theodore Ts'o
2012-08-04 22:41       ` George Spelvin
2012-08-06 16:47         ` Theodore Ts'o
2012-08-06 18:14           ` George Spelvin
2012-08-06 22:12             ` Theodore Ts'o
2012-08-06 22:59               ` George Spelvin
2012-08-06 23:25                 ` Theodore Ts'o
2012-08-08 13:39                   ` metadata_csum Oops George Spelvin
2012-08-08 22:34                   ` Exciting :-( adventures in metadata checksumming George Spelvin
2012-08-08 23:42                     ` George Spelvin [this message]
2012-08-09  5:00                       ` George Spelvin
2012-08-09 23:48                         ` Arrgh! Even more excitement with " George Spelvin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120808234239.4443.qmail@science.horizon.com \
    --to=linux@horizon.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).