From: Romain Izard <romain.izard.pro@gmail.com>
To: linux-mmc@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Subject: Re: Corruption on shutdown outside the current partition
Date: Wed, 1 Jun 2011 16:27:58 +0000 (UTC) [thread overview]
Message-ID: <is5pae$4ag$1@dough.gmane.org> (raw)
In-Reply-To: 1C97D3C8-E4B8-4FC5-9E02-E067DAF2EC02@dilger.ca
On 2011-05-30, Andreas Dilger <adilger@dilger.ca> wrote:
>> If I use a hardware reset method instead of the kernel syscall, by
>> triggering a watchdog with interrupts locked, or doing a power cycle
>> with a testing machine, the problem does not happen. This led me to
>> think it could be a software failure, rather than the hardware failure I
>> was expecting. After activating the traces in the mmc subsystem, I
>> finally managed to catch write commands to an area outside the partition
>> being tested, which means that the problem is really due to software.
>
> Why don't you dump a stack at that point to see what is causing the
> write? Also, blktrace might be helpful to determine what caused the block
> to be written.
I tried that, unfortunately the asynchronous I/O framework led me to
have the stack of the mmc worker thread, instead of the stack of the
request originator. But it was a good first step, since it gave me an
error marker, and made me notice that the problem is much more common
than I thought. It was only hidden due to the fact that the writes
fell in unused areas of my boot partition.
Since blktrace lives in userspace, it is liable to be destroyed during
the reboot process, and give me only partial information. But I finally
found what I wanted: by writing 1 to /proc/sys/vm/block_dump, I am able
to see the original requests that led to the commands in the system log.
>From what I see now, it seems that the problem comes from a race
condition on shutdown between pending file system operations on one
side, and partition removal on the other side. It seems that the
partition can be removed, and yet some pending requests are still valid,
and are handled with the partition offset equal to 0. This leads to
the corruptions I am observing.
I have yet to figure the events leading to this, and find a correction,
since all this is happening in a part I'm not familiar of.
> Another possibility (I'm not very familiar with MMC hardware, so could
> be bogus) is that the partitions don't align to the hardware/erase
> block size of the underlying device, and a "legitimate" write to one
> partition is causing a read-modify-write into a region of another
> partition, but this isn't being handled correctly?
>
I also had alignment problems, but it only impacted performance, not
correctness.
Thanks for your help,
--
Romain Izard
prev parent reply other threads:[~2011-06-01 16:27 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-05-30 16:04 Corruption on shutdown outside the current partition Romain Izard
2011-05-30 16:31 ` Andreas Dilger
2011-05-30 20:59 ` Philip Rakity
2011-06-01 16:27 ` Romain Izard [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='is5pae$4ag$1@dough.gmane.org' \
--to=romain.izard.pro@gmail.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mmc@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).