All of lore.kernel.org
 help / color / mirror / Atom feed
From: Theodore Ts'o <tytso@mit.edu>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: linux-ext4@vger.kernel.org
Subject: Re: [PATCH 3/5] ext4: Mark block group as corrupt on block bitmap error
Date: Wed, 28 Aug 2013 18:26:59 -0400	[thread overview]
Message-ID: <20130828222659.GI27079@thunk.org> (raw)
In-Reply-To: <20130723033833.GF5785@blackbox.djwong.org>

On Mon, Jul 22, 2013 at 08:38:33PM -0700, Darrick J. Wong wrote:
> On Fri, Jul 19, 2013 at 04:55:52PM -0700, Darrick J. Wong wrote:
> > When we notice a block-bitmap corruption (because of device failure or
> > something else), we should mark this group as corrupt and prevent further block
> > allocations/deallocations from it. Currently, we end up generating one error
> > message for every block in the bitmap. This potentially could make the system
> > unstable as noticed in some bugs. With this patch, the error will be printed
> > only the first time and mark the entire block group as corrupted. This prevents
> > future access allocations/deallocations from it.

Thanks, applied....

> Hmm.  I think we need to have ext4_count_free_clusters() act as though corrupt
> block groups have "zero" free blocks so that mballoc will pass the -ENOSPC
> errors back to the upper layers.  Afaict, if one doesn't do this, ext4
> encounters the situation where marking the blocks in use fails, yet the fs
> thinks there are free blocks still and ... leaves the pages dirty forever,
> instead of simply failing.

Yes, that's something we should probably add to make things to be more
robust in the case where we have huge numbers of corrupted (or
hardware failures) in the block bitmaps.   

> Just trying this really quickly, if I blast /all/ the block groups, I see
> unstoppable errors in dmesg.

What sort of errors did you end up seeing?

> The other thing I noticed is that if one turns delalloc mode on, performs a
> live corruption of the bg descriptors, and then dd's a big file to the fs,
> there's no error reported back to userspace either in write(), sync(), or even
> umount().  Meanwhile, dmesg is getting hit with tons of corrupted-bitmap
> errors.

I'm not sure there's much we can do about this.  On the other hand,
how realistic of a threat is this.  If it's happening randomly, how
likely is this to happen?  And if it's a deliberate corruption, the
attacker can probably do a lot worse.

In practice, these weren't think we were really worried about when we
primarily worried about hardware failures, since hardware failures are
random, and so if the errors are affecting a large number of block
bitmaps, the storage device is probably completely toasted and there's
nothing we can do about it anyway.

When metadata checksums are enabled, this gets trickier, since it's
possible for a large number of metadata checksums to be corrupted in
the bg descriptors, especially if the bg descriptors get written to
while the file system is mounted.  This will smash a huge number of
checksums, and then badness will happen.  But realistically, bad
things would happen if that happened while the file system is mounted
even without checksums being enabled.  It maybe that the best thing we
can do is to some kind of rate limiting with log messages, or some
kind of hueristic where if a sufficient number of different checksums
are found to be broken, we take much more drastic, such as
unconditionally shutting down the file system.

The main issue here is that errors=continue is used if we want to do
some amount of recovery after certain types of file system corruption,
but what we really need is a mode where we can continue after certain
types of fs errors (especially if userspace is doing things its own
data block checksums and has its own recovery mechanisms at the
cluster file system level).  But if things gets really, really, bad,
we shouldn't trying to bull ahead in the face of errors when it's
clear it's going to be counterproductive.

> More for me to ponder....

Indeed...

					- Ted

  reply	other threads:[~2013-08-28 22:27 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-19 23:55 [PATCH v1 0/5] ext4: Shut down block groups when damage is detected Darrick J. Wong
2013-07-19 23:55 ` [PATCH 1/5] ext4: Error out if verifying the block bitmap fails Darrick J. Wong
2013-08-28 19:36   ` Theodore Ts'o
2013-07-19 23:55 ` [PATCH 2/5] ext4: Fix type declaration of ext4_validate_block_bitmap Darrick J. Wong
2013-07-24  7:12   ` Zheng Liu
2013-07-26 16:06     ` Darrick J. Wong
2013-08-28 20:01       ` Theodore Ts'o
2013-07-19 23:55 ` [PATCH 3/5] ext4: Mark block group as corrupt on block bitmap error Darrick J. Wong
2013-07-23  3:38   ` Darrick J. Wong
2013-08-28 22:26     ` Theodore Ts'o [this message]
2013-07-19 23:55 ` [PATCH 4/5] ext4: Mark block group as corrupt on inode " Darrick J. Wong
2013-07-24  7:22   ` Zheng Liu
2013-08-28 22:45     ` Theodore Ts'o
2013-07-19 23:56 ` [PATCH 5/5] ext4: Mark group corrupt on group descriptor checksum error Darrick J. Wong
2013-08-28 22:49   ` Theodore Ts'o
2013-07-21 14:32 ` [PATCH v1 0/5] ext4: Shut down block groups when damage is detected Zheng Liu
2013-07-29 15:28   ` Jeff Moyer
2013-07-30  0:31     ` Zheng Liu
2013-07-31 18:52       ` Jan Kara
2013-07-31 21:28         ` Theodore Ts'o
2013-07-30  1:57     ` Theodore Ts'o
2013-08-10  6:02     ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130828222659.GI27079@thunk.org \
    --to=tytso@mit.edu \
    --cc=darrick.wong@oracle.com \
    --cc=linux-ext4@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.