Re: ext4 filesystem bad extent error review

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Theodore Ts'o <tytso@mit.edu>
To: "Huang Weller (CM/ESW12-CN)" <Weller.Huang@cn.bosch.com>
Cc: "linux-ext4@vger.kernel.org" <linux-ext4@vger.kernel.org>,
	"Juergens Dirk (CM-AI/PJ-CF32)" <Dirk.Juergens@de.bosch.com>
Subject: Re: ext4 filesystem bad extent error review
Date: Thu, 2 Jan 2014 13:42:11 -0500	[thread overview]
Message-ID: <20140102184211.GC10870@thunk.org> (raw)
In-Reply-To: <AE39A478622CF340ABEC2418D74074F61FC567864C@SGPMBX05.APAC.bosch.com>

On Thu, Jan 02, 2014 at 12:59:52PM +0800, Huang Weller (CM/ESW12-CN) wrote:
> 
> We did more test which we backup the journal blocks  before we mount the test partition.
> Actually, before we mount the test partition, we use fsck.ext4 with -n option to verify whether there is any  bad extents issues available. The fsck.ext4 never found any such kind issue. And we can prove that the bad extents issue is happened after journaling replay.

Ok, so that implies that the failure is almost certainly due to
corrupted blocks in the journal.  Hence, when we replay the journal,
it causes the the file system to become corrupted, because the "newer"
(and presumably, "more correct") metadata blocks found in the blocks
recorded in the journal are in fact corrupted.

BTW, you can use the logdump command in the debugfs program to look at
the journal.  The debugfs man page documents it, but once you know the
block that was corrupted, which in your case appears to be block 525:

debugfs: logdump -b 525 -c

Or to see the contents of all of the blocks logged in the journal:

debugfs: logdump -ac

> 
> We  searched such error on internet, there are some one also has such issue. But there is no solution.
> This issue maybe not a big issue which it can be repaired by fsck.ext4 easily. But we have below questions:
> 1. whether this issue already been fixed in the latest kernel version?
> 2. based on the information I provided in this mail, can you help to solve this issue ?

Well, the question is how did the journal get corrupted?  It's
possible that it's caused by a kernel bug, although I'm not aware of
any such bugs being reported.

In my mind, the most likely cause is that the SD card is ignoring the
CACHE FLUSH command, or is not properly saving the SD card's Flash
Translation Layer (FTL) metadata on a power drop.  Here are some
examples some investigation into lousy SSD's that have this bug ---
and historically, SD cards have been **worse** than SSD's, because the
manufacturers have a much lower per-unit cost, so they tend to put in
even cheaper and crappier FTL systems on SD and eMMC flash.

http://lkcl.net/reports/ssd_analysis.html

https://www.usenix.org/conference/fast13/understanding-robustness-ssds-under-power-fault

What I tell people who are using flash devices is before they start
using any flash device, to do power drop testing on a raw device,
without any file system present.  The simplest way to do this is to
write a program that writes consecutive 4k blocks that contain a
timestamp, a sequence number, some random data, and a CRC-32 checksum
over the contents of the timestamp, sequence number, a flags word, and
random data.  As the program writes such 4k block, it rolls the dice
and once every 64 blocks or so (i.e., pick a random number, and see if
it is divisible by 64), then set a bit in the flags word indicating
that this block was forced out using a cache flush, and then when
writing this block, follow up the write with a CACHE FLUSH command.
It's also best if the test program prints the blocks which have been
written with CACHE FLUSH to the serial console, and that this is saved
by your test rig.

(This is what ext4's journal does before and after writing the commit
block in the journal, and it guarantees that (a) all of the data in
the journal written up to the commit block will be available after a
power drop, and (b) that the commit block has been written to the
storage device and again, will be available after a power drop.)

Once you've written this program, set up a test rig which boots your
test board, runs the program, and then drops power to the test board
randomly.  After the power drop, examine the flash device and make
sure that all of the blocks written up to the last "commit block" are
in fact valid.

You will find that a surprising number of SD cards will fail this
test.  In fact, the really lousy cards will become unreadable after a
power drop.  (A fact many wedding photographers discover the hard way
they drop their camera and the SD card flies out, and then they find
all of that their priceless, once-in-a-lifetime photos are lost forwever.)

I ****strongly**** recommend that if you are not testing your SD cards
in this way from your parts supplier, you do so immediately, and
reject any model that is not able to guarantee that data survives a
power drop.

Good luck, and I hope this is helpful,

					- Ted

P.S.  If you do write such a program, please consider making it
available under an open source license.  If more companies did this,
it would apply pressure to the flash manufacturers to stop making such
crappy products, and while it might raise the BOM cost of products by
a penny or two, the net result would be better for everyone in the
industry.

next prev parent reply	other threads:[~2014-01-02 18:42 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-02  4:59 ext4 filesystem bad extent error review Huang Weller (CM/ESW12-CN)
2014-01-02 18:42 ` Theodore Ts'o [this message]
2014-01-03  3:16   ` Huang Weller (CM/ESW12-CN)
2014-01-03 15:48     ` Theodore Ts'o
2014-01-03 16:40       ` AW: " Juergens Dirk (CM-AI/ECO2)
2014-01-06  2:23         ` Huang Weller (CM/ESW12-CN)
2014-01-03 17:23       ` Eric Sandeen
2014-01-03 17:51         ` Theodore Ts'o
2014-01-03 17:54           ` Eric Sandeen
2014-01-03 18:06             ` Theodore Ts'o
2014-01-03 18:21               ` AW: " Juergens Dirk (CM-AI/ECO2)
2014-01-06  3:53                 ` Huang Weller (CM/ESW12-CN)
2014-01-03 16:29   ` AW: " Juergens Dirk (CM-AI/ECO2)
2014-01-03 17:25     ` Eric Sandeen
2014-01-03 18:45       ` AW: " Juergens Dirk (CM-AI/ECO2)
2014-01-03 18:48         ` Eric Sandeen
2014-01-03 18:56           ` AW: " Juergens Dirk (CM-AI/ECO2)
2014-01-06  5:45             ` Huang Weller (CM/ESW12-CN)
2014-01-06  1:44           ` Huang Weller (CM/ESW12-CN)
2014-01-06  5:17         ` Huang Weller (CM/ESW12-CN)
2014-01-06  5:10       ` [Attachment has been removed]RE: " Huang Weller (CM/ESW12-CN)
2014-01-07  9:10       ` Huang Weller (CM/ESW12-CN)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140102184211.GC10870@thunk.org \
    --to=tytso@mit.edu \
    --cc=Dirk.Juergens@de.bosch.com \
    --cc=Weller.Huang@cn.bosch.com \
    --cc=linux-ext4@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.