Re: EXT4-fs error w/ external USB drive

linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Theodore Ts'o <tytso@mit.edu>
To: "Toralf Förster" <toralf.foerster@gmx.de>
Cc: linux-ext4@vger.kernel.org,
	Linux Kernel <linux-kernel@vger.kernel.org>,
	Eric Sandeen <sandeen@redhat.com>
Subject: Re: EXT4-fs error w/ external USB drive
Date: Thu, 25 Oct 2012 14:20:04 -0400	[thread overview]
Message-ID: <20121025182004.GA16257@thunk.org> (raw)
In-Reply-To: <50896B42.90606@gmx.de>

On Thu, Oct 25, 2012 at 06:39:30PM +0200, Toralf Förster wrote:
> After a lot of file operations (Gentoo emerging, kernel build, git
> pulls, ...) I s2disk the system (that with the external USB drive)
> yesterday, wake it up today, rebooted it -
> and had to manually repair the file system, because the automatic fsck
> gave up.

OK, I'm going to send another patch series which I'd hope you could
test to see if reduces the rate at which this happens.

> Nevertheless there's another Linux system I have (64bit RH EL,internal
> drive), where with kernel 3.5.4-1.el6.elrepo.x86_64 EXT4 errors occurred.
> I attached the whole appropriate section of /var/log/message.

I don't have easy access to the RHEL kernel sources, and so I don't
know which patches were applied.  Specifically, I'd really like to
know if the commit represented by 14b4ed22a6 is in RHEL
3.5.4-1.el6.elrepo.  Also, I'd like to know which line number was
reflected here, which was the first EXT4-fs error:

> Sep 26 09:26:54 x kernel: EXT4-fs error (device dm-1) in ext4_new_inode:938: IO failure

This was from fs/ext4/ialloc.c line 938, and there are two
ext4_std_error() that this could represent, so which is why having the
exact kernel sources from this RHEL kernel would be useful.  (I'd also
suggest opening a RHEL support ticket if you have a support contract,
since that way Red Hat can track this issue, and that way Eric can
count the work he's been doing on this fire drill as supporting a
customer.  :-)

What's a bit unfortunate is that there was no other error messages
before this line.  So we can't know for sure what caused or returned
the -EIO error code.  I *suspect* it was this, which would would be
indocate a corrupted inode bitmap:

	if (insert_inode_locked(inode) < 0) {
		/*
		 * Likely a bitmap corruption causing inode to be allocated
		 * twice.
		 */
		err = -EIO;
		goto fail;
	}

Do you know if this external disk could have suffered from a cable
pull, or a flaky cable, or some kind of unclean shutdown/power failure
before it rebooted?  That would be an interesting data point.  

For the future, we need to add some better error reporting for
failures such as this.  In addition, I have a recent change we made at
work that I should get upstream which avoids allocating from a block
group once we notice a corruption (currently just for the block
allocations, but I think we should do this for inode allocations as
well), to minimize the chances of lost data once we notice that the
block/inode allocation bitmap can't be trusted.  This avoids data loss
in the case where users are using the default errors=continue instead
of errors=panic or errors=remount-ro.

Speaking of which, for your production RHEL server, you might want to
seriously consider errors=panic for any critical file system volume.
This allows the file system to get corrected via e2fsck, and prevents
the server from stumbling along, possibly causing more data loss due
to a fs corruption.

Regards,

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

next prev parent reply	other threads:[~2012-10-25 18:20 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-15 17:46 EXT4-fs error w/ external USB drive Toralf Förster
2012-10-19 21:07 ` Theodore Ts'o
2012-10-22 18:36   ` Toralf Förster
2012-10-24  1:11     ` Theodore Ts'o
2012-10-24 17:31       ` Toralf Förster
2012-10-24 18:35         ` Theodore Ts'o
2012-10-25 16:39           ` Toralf Förster
2012-10-25 18:20             ` Theodore Ts'o [this message]
2012-10-25 18:27               ` Eric Sandeen
2012-10-26 14:30           ` Toralf Förster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121025182004.GA16257@thunk.org \
    --to=tytso@mit.edu \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sandeen@redhat.com \
    --cc=toralf.foerster@gmx.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).