linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Zheng Liu <gnehzuil.liu@gmail.com>
To: "Darrick J. Wong" <darrick.wong@oracle.com>,
	Theodore Ts'o <tytso@mit.edu>
Cc: linux-ext4@vger.kernel.org
Subject: Re: [PATCH v1 0/5] ext4: Shut down block groups when damage is detected
Date: Sun, 21 Jul 2013 22:32:26 +0800	[thread overview]
Message-ID: <20130721143226.GA13473@gmail.com> (raw)
In-Reply-To: <20130719235532.24017.31896.stgit@blackbox.djwong.org>

On Fri, Jul 19, 2013 at 04:55:32PM -0700, Darrick J. Wong wrote:
> Right now, ext4 doesn't do quite a good enough job shutting off allocation and
> freeing activity in block groups when damage is detected, which means that ext4
> can obliviously load a corrupt bitmap, base allocation decisions off of that,
> and trash the filesystem.  We'd like to be able to freeze the block group when
> this happens, so hopefully the next fsck can repair the damage.
> 
> The first patch fixes the behavior that a corrupt bitmap can be returned to
> mballoc as if it was accurate.  The second patch is a trivial fix, and the two
> after it provide for detecting damage in either the block bitmap or the inode
> bitmap, and disabling all allocation/deallocation activity in the block group.
> The final patch changes runtime block group descriptor validation failure
> behavior to use the corruption flag to mark off the block group.
> 
> This patchset has been tested (albeit lightly) against 3.11-rc1 on x64.  I'm
> wondering about a few things -- if we detect corrupt *inodes*, should we invoke
> this mechanism as well?  Second, as I mentioned a few days ago, maybe it's time
> for block_validity to be set always, since it seems to have a low speed impact?
> Third, the block bitmap corruption flag patch is based off of Aditya Kali's
> patch that you forwarded; can a proper Signed-off-by be attached since I mostly
> just massaged that one into 3.11?
> 
> Comments and questions are, as always, welcome.

Wow, it seems to me that I have missed a very important thread [1] after
serveral crazy busy weeks.  There is an idea that is in my mind for a
while and I still can not have a proper time to try it.

My idea is to let file system can ignore the currurted block.  Namely,
when we meet a currupted block, we will track it as bad block in bad
block inode and find another block to save data.  This currupted block
will never be used.  The first step in my mind is to detect a currpted
block and mark it as bad block.  After reading the thread and Darrick's
original patch, I think Darrick's patch is a good start.

At Taobao, we have a large CDN system.  These servers are a cache for
web site, and this system can tolerate the data loss.  So we hope when
we detect a currupted block, we can just ignore it and use another block
until the whole disk currupted or the server is dropped.

I will take a closer look at these patches later.

Thanks,
                                                - Zheng

1. http://www.spinics.net/lists/linux-ext4/msg39053.html

  parent reply	other threads:[~2013-07-21 14:32 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-19 23:55 [PATCH v1 0/5] ext4: Shut down block groups when damage is detected Darrick J. Wong
2013-07-19 23:55 ` [PATCH 1/5] ext4: Error out if verifying the block bitmap fails Darrick J. Wong
2013-08-28 19:36   ` Theodore Ts'o
2013-07-19 23:55 ` [PATCH 2/5] ext4: Fix type declaration of ext4_validate_block_bitmap Darrick J. Wong
2013-07-24  7:12   ` Zheng Liu
2013-07-26 16:06     ` Darrick J. Wong
2013-08-28 20:01       ` Theodore Ts'o
2013-07-19 23:55 ` [PATCH 3/5] ext4: Mark block group as corrupt on block bitmap error Darrick J. Wong
2013-07-23  3:38   ` Darrick J. Wong
2013-08-28 22:26     ` Theodore Ts'o
2013-07-19 23:55 ` [PATCH 4/5] ext4: Mark block group as corrupt on inode " Darrick J. Wong
2013-07-24  7:22   ` Zheng Liu
2013-08-28 22:45     ` Theodore Ts'o
2013-07-19 23:56 ` [PATCH 5/5] ext4: Mark group corrupt on group descriptor checksum error Darrick J. Wong
2013-08-28 22:49   ` Theodore Ts'o
2013-07-21 14:32 ` Zheng Liu [this message]
2013-07-29 15:28   ` [PATCH v1 0/5] ext4: Shut down block groups when damage is detected Jeff Moyer
2013-07-30  0:31     ` Zheng Liu
2013-07-31 18:52       ` Jan Kara
2013-07-31 21:28         ` Theodore Ts'o
2013-07-30  1:57     ` Theodore Ts'o
2013-08-10  6:02     ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130721143226.GA13473@gmail.com \
    --to=gnehzuil.liu@gmail.com \
    --cc=darrick.wong@oracle.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).