All of lore.kernel.org
 help / color / mirror / Atom feed
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To: linux-kernel@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org,
	Saeed Bishara <saeed@marvell.com>,
	Nicolas Pitre <nico@marvell.com>,
	linux-ext4@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	"James E.J. Bottomley" <jejb@parisc-linux.org>
Subject: Re: Rampant ext3/4 corruption on 2.6.34-rc7 with VIVT ARM (Marvell 88f5182)
Date: Tue, 11 May 2010 20:57:58 +1000	[thread overview]
Message-ID: <1273575478.21352.29.camel@pasglop> (raw)
In-Reply-To: <1273569821.21352.19.camel@pasglop>

On Tue, 2010-05-11 at 19:23 +1000, Benjamin Herrenschmidt wrote:

> Since I doubt ext3 is busted so dramatically in mainline for "normal" machines,
> I tend to suspect things could be related to the infamous vivt caches. On the
> other hand, it's pretty clearly metadata or journal corruption and I'm not
> sure we ever do things that could cause aliases (such as vmap etc..) on
> these things, and they shouldn't be mapped into userspace... unless it's fsck
> itself that causes aliases to occur at the block device level ? (I do unmount
> though before I run fsck).
> 
> On the other hand, it could also be a busticated marvell SATA driver :-)
> 
> I have no problem with the vendor kernel, but it's ancient (2.6.12) and based
> on an out of tree variant of a Marvell originated BSP, so everything is
> completely different, especially in the area of drivers for the chipset.
> 
> Anyways, I'll see if I can gather more data tomorrow as time, viruses and sick
> kids permits.
> 
> In the meantime, any hint appreciated.

A quick other test which brings more infos, using a smaller (about 5GB)
partition and no md or raid involved:

 - Boot with NFS root
 - mkfs /dev/sdb2 (no md or raid involved)
 - mount /dev/sdb2 /mnt/test
 - rsync -avx /test-stuff /mnt/test
 - cd /mnt/test
 - md5sum -c ~/test-stuff-sums.txt

That gives me a whole bunch of:

md5sum: ./usr/bin/debconf-escape: No such file or directory
./usr/bin/debconf-escape: FAILED open or read
./usr/bin/stat: OK
md5sum: ./usr/bin/chrt: No such file or directory
./usr/bin/chrt: FAILED open or read

In fact, if I do ls /mnt/test/usr/bin/ I see debconf but if I do
ls /mnt/test/usr/bin/chrt then I get No such file or directory.

So something is badly wrong :-)

Now, trying without the dir_index feature (mkfs.ext3 -O ^dir_index)
and it works fine. All my md5sum's are correct and fsck passes.

So there's what looks like a problem specific to htree's. I don't think
it's a SATA driver problem (doesn't smell like it but we can't
completely dismiss the possibility yet). Could be a VIVT issue but then
why ? I don't see ext3 playing with virtual mappings and none of that
should alias with userspace...

Or is it incorrectly accessing pages while they are DMA'ed to or from ?
IE. Accessing with the CPU pages between dma_map_* and dma_unmap_* ?
That will break on a number of setups including swiotlb on x86 so I tend
to doubt it but who knows...

Anyways, enough for tonight.

Cheers,
Ben.

WARNING: multiple messages have this Message-ID (diff)
From: benh@kernel.crashing.org (Benjamin Herrenschmidt)
To: linux-arm-kernel@lists.infradead.org
Subject: Rampant ext3/4 corruption on 2.6.34-rc7 with VIVT ARM (Marvell 88f5182)
Date: Tue, 11 May 2010 20:57:58 +1000	[thread overview]
Message-ID: <1273575478.21352.29.camel@pasglop> (raw)
In-Reply-To: <1273569821.21352.19.camel@pasglop>

On Tue, 2010-05-11 at 19:23 +1000, Benjamin Herrenschmidt wrote:

> Since I doubt ext3 is busted so dramatically in mainline for "normal" machines,
> I tend to suspect things could be related to the infamous vivt caches. On the
> other hand, it's pretty clearly metadata or journal corruption and I'm not
> sure we ever do things that could cause aliases (such as vmap etc..) on
> these things, and they shouldn't be mapped into userspace... unless it's fsck
> itself that causes aliases to occur at the block device level ? (I do unmount
> though before I run fsck).
> 
> On the other hand, it could also be a busticated marvell SATA driver :-)
> 
> I have no problem with the vendor kernel, but it's ancient (2.6.12) and based
> on an out of tree variant of a Marvell originated BSP, so everything is
> completely different, especially in the area of drivers for the chipset.
> 
> Anyways, I'll see if I can gather more data tomorrow as time, viruses and sick
> kids permits.
> 
> In the meantime, any hint appreciated.

A quick other test which brings more infos, using a smaller (about 5GB)
partition and no md or raid involved:

 - Boot with NFS root
 - mkfs /dev/sdb2 (no md or raid involved)
 - mount /dev/sdb2 /mnt/test
 - rsync -avx /test-stuff /mnt/test
 - cd /mnt/test
 - md5sum -c ~/test-stuff-sums.txt

That gives me a whole bunch of:

md5sum: ./usr/bin/debconf-escape: No such file or directory
./usr/bin/debconf-escape: FAILED open or read
./usr/bin/stat: OK
md5sum: ./usr/bin/chrt: No such file or directory
./usr/bin/chrt: FAILED open or read

In fact, if I do ls /mnt/test/usr/bin/ I see debconf but if I do
ls /mnt/test/usr/bin/chrt then I get No such file or directory.

So something is badly wrong :-)

Now, trying without the dir_index feature (mkfs.ext3 -O ^dir_index)
and it works fine. All my md5sum's are correct and fsck passes.

So there's what looks like a problem specific to htree's. I don't think
it's a SATA driver problem (doesn't smell like it but we can't
completely dismiss the possibility yet). Could be a VIVT issue but then
why ? I don't see ext3 playing with virtual mappings and none of that
should alias with userspace...

Or is it incorrectly accessing pages while they are DMA'ed to or from ?
IE. Accessing with the CPU pages between dma_map_* and dma_unmap_* ?
That will break on a number of setups including swiotlb on x86 so I tend
to doubt it but who knows...

Anyways, enough for tonight.

Cheers,
Ben.

  parent reply	other threads:[~2010-05-11 10:57 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-05-11  9:23 Rampant ext3/4 corruption on 2.6.34-rc7 with VIVT ARM (Marvell 88f5182) Benjamin Herrenschmidt
2010-05-11  9:23 ` Benjamin Herrenschmidt
2010-05-11 10:16 ` Jamie Lokier
2010-05-11 10:16   ` Jamie Lokier
2010-05-11 10:47   ` Benjamin Herrenschmidt
2010-05-11 10:47     ` Benjamin Herrenschmidt
2010-05-11 10:47     ` Benjamin Herrenschmidt
2010-05-11 10:57 ` Benjamin Herrenschmidt [this message]
2010-05-11 10:57   ` Benjamin Herrenschmidt
2010-05-11 11:14   ` Shilimkar, Santosh
2010-05-11 11:14     ` Shilimkar, Santosh
2010-05-12 22:21     ` Jamie Lokier
2010-05-12 22:21       ` Jamie Lokier
2010-05-12 22:47       ` Benjamin Herrenschmidt
2010-05-12 22:47         ` Benjamin Herrenschmidt
2010-05-12 22:47         ` Benjamin Herrenschmidt
2010-05-12 23:08         ` Russell King - ARM Linux
2010-05-12 23:08           ` Russell King - ARM Linux
2010-05-14 17:41           ` Jamie Lokier
2010-05-14 17:41             ` Jamie Lokier
2010-05-14 17:59             ` Russell King - ARM Linux
2010-05-14 17:59               ` Russell King - ARM Linux
2010-05-12 23:41         ` James Bottomley
2010-05-12 23:41           ` James Bottomley
2010-05-13  0:18           ` Benjamin Herrenschmidt
2010-05-13  0:18             ` Benjamin Herrenschmidt
2010-05-13  0:18             ` Benjamin Herrenschmidt
2010-05-13 15:39             ` James Bottomley
2010-05-13 15:39               ` James Bottomley
2010-05-13 23:53               ` Benjamin Herrenschmidt
2010-05-13 23:53                 ` Benjamin Herrenschmidt
2010-05-13 23:53                 ` Benjamin Herrenschmidt
2010-05-13  3:12         ` FUJITA Tomonori
2010-05-13  3:12           ` FUJITA Tomonori
2010-05-13  4:42           ` Benjamin Herrenschmidt
2010-05-13  4:42             ` Benjamin Herrenschmidt
2010-05-13  4:42             ` Benjamin Herrenschmidt
2010-05-12 15:00   ` Jan Kara
2010-05-12 15:00     ` Jan Kara
2010-05-12 22:13     ` Benjamin Herrenschmidt
2010-05-12 22:13       ` Benjamin Herrenschmidt
2010-05-13  0:15     ` Benjamin Herrenschmidt
2010-05-13  0:15       ` Benjamin Herrenschmidt
2010-05-13 15:12       ` Jan Kara
2010-05-13 15:12         ` Jan Kara
2010-05-13 21:33         ` Benjamin Herrenschmidt
2010-05-13 21:33           ` Benjamin Herrenschmidt
2010-05-13 23:51         ` Benjamin Herrenschmidt
2010-05-13 23:51           ` Benjamin Herrenschmidt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1273575478.21352.29.camel@pasglop \
    --to=benh@kernel.crashing.org \
    --cc=akpm@linux-foundation.org \
    --cc=jejb@parisc-linux.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nico@marvell.com \
    --cc=saeed@marvell.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.