Re: ext4 filesystem corruption across partitions

linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Theodore Ts'o <tytso@mit.edu>
To: Devrin Talen <dct23@cornell.edu>
Cc: linux-ext4@vger.kernel.org
Subject: Re: ext4 filesystem corruption across partitions
Date: Tue, 6 May 2014 15:40:06 -0400	[thread overview]
Message-ID: <20140506194006.GC5012@thunk.org> (raw)
In-Reply-To: <20140505220130.1a256f9d@luigi>

On Mon, May 05, 2014 at 10:01:30PM -0400, Devrin Talen wrote:
> 
> 1. Run `ls -R *` in a loop from the root directory.  The root is
> mounted from partition 11 (system) on the eMMC and the ls will read
> the /cache (partition 12) and /data (partition 13) filesystems as well.

Try mounting /data read-only.  That should pretty much guarantee that
nothing should be able to write to it.  You can also use blktrace to
capture block I/O traces to the device, and use that to make sure
nothing was actually writing to it.

> 2. Write data to partition 12 via ADB (using `adb push ... /cache/`)

Instead of using ADB, I would suggest writing a test program which
writes a series of 512 byte sectors to a single large file in /cache.
At the beginning of each 512 byte sector include a 4 byte serial
number (which is incremented by one for each sector), a 4 byte testID
which is different for each run of your test program, a time stamp, a
CRC of these fields, and then fill the rest of the sector with some
text string to make it easy to recognize this pattern.  It can be
anything from 0xDEADBEEF, to a string such as "DEBUGGING RANDOM HW
BUGS REALLY SUCKS".  :-)

Now try to reproduce the problem with this write load.  If you can
reproduce the problem, check and see if the corrupted file system
block in the shows evidence of the string that was supposed to be
written into /cache, showing up in /data.  You can also check the
large file being written in the /cache has the expended serial number
and checksum.

This will allow you to see if a the block writes are just going to the
wrong place on the SSD, or something else more strange might be going
on.  Depending on the pattern of what blocks are ending up where they
shouldn't, it might point towards different possible causes (i.e., a
flaky solder joint, a buggy flash translation layer in the eMMC chip,
etc.)

Cheers,

					- Ted

next prev parent reply	other threads:[~2014-05-06 19:40 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-17 15:05 ext4 filesystem corruption across partitions Devrin Talen
2014-04-17 16:12 ` Theodore Ts'o
2014-05-06  2:01   ` Devrin Talen
2014-05-06 19:40     ` Theodore Ts'o [this message]
2014-05-06 22:38       ` Andreas Dilger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140506194006.GC5012@thunk.org \
    --to=tytso@mit.edu \
    --cc=dct23@cornell.edu \
    --cc=linux-ext4@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).