linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To: linux-kernel@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org,
	Saeed Bishara <saeed@marvell.com>,
	Nicolas Pitre <nico@marvell.com>,
	linux-ext4@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	"James E.J. Bottomley" <jejb@parisc-linux.org>
Subject: Re: Rampant ext3/4 corruption on 2.6.34-rc7 with VIVT ARM (Marvell 88f5182)
Date: Tue, 11 May 2010 20:57:58 +1000	[thread overview]
Message-ID: <1273575478.21352.29.camel@pasglop> (raw)
In-Reply-To: <1273569821.21352.19.camel@pasglop>

On Tue, 2010-05-11 at 19:23 +1000, Benjamin Herrenschmidt wrote:

> Since I doubt ext3 is busted so dramatically in mainline for "normal" machines,
> I tend to suspect things could be related to the infamous vivt caches. On the
> other hand, it's pretty clearly metadata or journal corruption and I'm not
> sure we ever do things that could cause aliases (such as vmap etc..) on
> these things, and they shouldn't be mapped into userspace... unless it's fsck
> itself that causes aliases to occur at the block device level ? (I do unmount
> though before I run fsck).
> 
> On the other hand, it could also be a busticated marvell SATA driver :-)
> 
> I have no problem with the vendor kernel, but it's ancient (2.6.12) and based
> on an out of tree variant of a Marvell originated BSP, so everything is
> completely different, especially in the area of drivers for the chipset.
> 
> Anyways, I'll see if I can gather more data tomorrow as time, viruses and sick
> kids permits.
> 
> In the meantime, any hint appreciated.

A quick other test which brings more infos, using a smaller (about 5GB)
partition and no md or raid involved:

 - Boot with NFS root
 - mkfs /dev/sdb2 (no md or raid involved)
 - mount /dev/sdb2 /mnt/test
 - rsync -avx /test-stuff /mnt/test
 - cd /mnt/test
 - md5sum -c ~/test-stuff-sums.txt

That gives me a whole bunch of:

md5sum: ./usr/bin/debconf-escape: No such file or directory
./usr/bin/debconf-escape: FAILED open or read
./usr/bin/stat: OK
md5sum: ./usr/bin/chrt: No such file or directory
./usr/bin/chrt: FAILED open or read

In fact, if I do ls /mnt/test/usr/bin/ I see debconf but if I do
ls /mnt/test/usr/bin/chrt then I get No such file or directory.

So something is badly wrong :-)

Now, trying without the dir_index feature (mkfs.ext3 -O ^dir_index)
and it works fine. All my md5sum's are correct and fsck passes.

So there's what looks like a problem specific to htree's. I don't think
it's a SATA driver problem (doesn't smell like it but we can't
completely dismiss the possibility yet). Could be a VIVT issue but then
why ? I don't see ext3 playing with virtual mappings and none of that
should alias with userspace...

Or is it incorrectly accessing pages while they are DMA'ed to or from ?
IE. Accessing with the CPU pages between dma_map_* and dma_unmap_* ?
That will break on a number of setups including swiotlb on x86 so I tend
to doubt it but who knows...

Anyways, enough for tonight.

Cheers,
Ben.

  parent reply	other threads:[~2010-05-11 10:57 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-05-11  9:23 Rampant ext3/4 corruption on 2.6.34-rc7 with VIVT ARM (Marvell 88f5182) Benjamin Herrenschmidt
2010-05-11 10:16 ` Jamie Lokier
2010-05-11 10:47   ` Benjamin Herrenschmidt
2010-05-11 10:57 ` Benjamin Herrenschmidt [this message]
2010-05-11 11:14   ` Shilimkar, Santosh
2010-05-12 22:21     ` Jamie Lokier
2010-05-12 22:47       ` Benjamin Herrenschmidt
2010-05-12 23:08         ` Russell King - ARM Linux
2010-05-14 17:41           ` Jamie Lokier
2010-05-14 17:59             ` Russell King - ARM Linux
2010-05-12 23:41         ` James Bottomley
2010-05-13  0:18           ` Benjamin Herrenschmidt
2010-05-13 15:39             ` James Bottomley
2010-05-13 23:53               ` Benjamin Herrenschmidt
2010-05-13  3:12         ` FUJITA Tomonori
2010-05-13  4:42           ` Benjamin Herrenschmidt
2010-05-12 15:00   ` Jan Kara
2010-05-12 22:13     ` Benjamin Herrenschmidt
2010-05-13  0:15     ` Benjamin Herrenschmidt
2010-05-13 15:12       ` Jan Kara
2010-05-13 21:33         ` Benjamin Herrenschmidt
2010-05-13 23:51         ` Benjamin Herrenschmidt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1273575478.21352.29.camel@pasglop \
    --to=benh@kernel.crashing.org \
    --cc=akpm@linux-foundation.org \
    --cc=jejb@parisc-linux.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nico@marvell.com \
    --cc=saeed@marvell.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).