linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ted Ts'o <tytso@mit.edu>
To: Wang Shaoyan <stufever@gmail.com>
Cc: Lukas Czerner <lczerner@redhat.com>,
	linux-ext4@vger.kernel.org,
	Wang Shaoyan <wangshaoyan.pt@taobao.com>, Jan Kara <jack@suse.cz>
Subject: Re: [PATCH] ext4: Set file system to read-only by I/O error threshold
Date: Tue, 21 Jun 2011 10:48:56 -0400	[thread overview]
Message-ID: <20110621144856.GH32133@thunk.org> (raw)
In-Reply-To: <BANLkTim-1FfCZAm449zp5PpApbq027HGwYhoZBF0soZpJon9tQ@mail.gmail.com>

Ugh.  This is really, really, *really* ugly.  If you really want to
have hadoop shut down when there are too many errors, it's much better
to expose the number of EIO errors via sysfs, and then have some kind
of system daemon do the right thing.

But actually converting EIO's to file system errors isn't really a
good idea.  Consider that most of the time, when you get a read error
from the disk, if you rewrite that block, all will be will.  So taking
the entire disk off-line, and setting the errors fs bit won't really
help.  (a) Until the block is rewritten, the next time you try to read
it, you'll get an error, and (b) running fsck will be a waste of time,
since it will only scan the metadata blocks, and so the data block
will still have an error.

I assume you're using hadoopfs as your cluster file system, which has
redundancy at the file system level, right?  So getting an EIO won't
be the end of the world, since you can always read data chunk from a
redundant copy, or perform a reed-solomon reconstruction.  In fact,
disabling the entire file system is the worst thing you can do, since
you lose access to the rest of the files, which increases network
track to your cluster interconnect, especially if you have to do a R-S
reconstruction.

(In fact I've recently written up a plan to turn metadata errors into
EIO's, without bringing down the entire file system as containing
errors, to make the file system more resiliant to I/O errors --- the
exact reverse of what you're trying to do.)

For data I/O errors, what you in fact what to do is to handle them in
userspace, and just have HDFS delete the local copy of the file.  The
next time you allocate the space and rewrite the block, the disk will
do a bad block remap, and you'll be OK.

Now, you may want to do different things if the disk has completely
disappeared, or has completely died, so this is a case where it would
be desirable to get finer grained error reporting from the block I/O
layer --- there's a big difference between what you do for an error
caused by an unreadable block, and one caused by disk controller
bursting into flame.  But in general, remounting the file system
read-only should be a last-resort thing, and not the first thing you
should try doing.

Regards,

						- Ted

  reply	other threads:[~2011-06-21 14:49 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-06-17 12:08 [PATCH] ext4: Set file system to read-only by I/O error threshold stufever
2011-06-18  8:38 ` Lukas Czerner
2011-06-20  1:32   ` Wang Shaoyan
2011-06-21 14:48     ` Ted Ts'o [this message]
2011-06-21 15:58       ` Andreas Dilger
2011-06-20 13:36 ` Jan Kara
2011-06-20 14:12   ` Wang Shaoyan
2011-06-20 14:41     ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110621144856.GH32133@thunk.org \
    --to=tytso@mit.edu \
    --cc=jack@suse.cz \
    --cc=lczerner@redhat.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=stufever@gmail.com \
    --cc=wangshaoyan.pt@taobao.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).