From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To: linux-kernel@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org,
Saeed Bishara <saeed@marvell.com>,
Nicolas Pitre <nico@marvell.com>,
linux-ext4@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>,
"James E.J. Bottomley" <jejb@parisc-linux.org>
Subject: Rampant ext3/4 corruption on 2.6.34-rc7 with VIVT ARM (Marvell 88f5182)
Date: Tue, 11 May 2010 19:23:41 +1000 [thread overview]
Message-ID: <1273569821.21352.19.camel@pasglop> (raw)
Hi folks !
So I've been adapting upstream to my little D-Link DNS323 NAS box, which
is based on a Marvell 88f5182 chipset, ie, orion5 family) and started
hitting a problem with both ext3 and ext4 (on top of md/raid1).
First, the setup: This is a VIVT cache ARM CPU (so it -could- be some
cache issues though it's a bit weird as I would expect metadata to be
normal kernel pages and thus not hit cache aliases but then I'm no
specialist of how we deal with those critters).
Linux version 2.6.34-rc7-00019-g6c4f192-dirty (benh@pasglop) (gcc version 4.4.0 (GCC) ) #6 PREEMPT Tue May 11 17:02:04 EST 2010
CPU: Feroceon [41069260] revision 0 (ARMv5TEJ), cr=a0053177
CPU: VIVT data cache, VIVT instruction cache
There's two 1T disk using the built-in SATA chipset. There's an almost-1T
partition on each (sda1 and sdb1) which are setup as a raid1 md.
Then, I create a filesystem (I started with ext4 and mkfs'ed it back to ext3
after I started having problems but things persist).
I'm booted off an nfs root and basically the problem happens when I rsync over
a pre-made root to the disks.
On a freshly mkfs'ed ext3 or 4, mounted in /mnt/raid, I rsync the thing over,
used it a bit and generally observe the first symptoms in the form of a few
EXT3-fs (md0): warning: ext3_rename: Deleting old file (34103297), 2, error=-2
EXT3-fs (md0): warning: ext3_rename: Deleting old file (34113314), 2, error=-2
EXT3-fs (md0): warning: ext3_rename: Deleting old file (34127945), 2, error=-2
in my kernel log. I had a very similar message with ext4 iirc.
I unmount the filesystem and fsck it (it takes almost 1h) and I then get
a bunch of:
Problem in HTREE directory inode 5005385: node (12) has bad max hash
Problem in HTREE directory inode 5005385: node (13) has bad max hash
Problem in HTREE directory inode 5005385: node (14) has bad max hash
Invalid HTREE directory inode 5005385 (/raid-foo/var/lib/dpkg/info). Clear HTree index<y>? yes
followed by a bunch of
Inode 4981405 ref count is 1, should be 2. Fix<y>? yes
This is reasonably reproducable, as I have re-done it twice and I get similar
errors.
Tomorrow, if time permits, I'll see if I can reproduce on a smaller partition
without md, and eventually narrow it to a smaller and more predictible set of
operations.
Since I doubt ext3 is busted so dramatically in mainline for "normal" machines,
I tend to suspect things could be related to the infamous vivt caches. On the
other hand, it's pretty clearly metadata or journal corruption and I'm not
sure we ever do things that could cause aliases (such as vmap etc..) on
these things, and they shouldn't be mapped into userspace... unless it's fsck
itself that causes aliases to occur at the block device level ? (I do unmount
though before I run fsck).
On the other hand, it could also be a busticated marvell SATA driver :-)
I have no problem with the vendor kernel, but it's ancient (2.6.12) and based
on an out of tree variant of a Marvell originated BSP, so everything is
completely different, especially in the area of drivers for the chipset.
Anyways, I'll see if I can gather more data tomorrow as time, viruses and sick
kids permits.
In the meantime, any hint appreciated.
Cheers,
Ben.
next reply other threads:[~2010-05-11 9:27 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-05-11 9:23 Benjamin Herrenschmidt [this message]
2010-05-11 10:16 ` Rampant ext3/4 corruption on 2.6.34-rc7 with VIVT ARM (Marvell 88f5182) Jamie Lokier
2010-05-11 10:47 ` Benjamin Herrenschmidt
2010-05-11 10:57 ` Benjamin Herrenschmidt
2010-05-11 11:14 ` Shilimkar, Santosh
2010-05-12 22:21 ` Jamie Lokier
2010-05-12 22:47 ` Benjamin Herrenschmidt
2010-05-12 23:08 ` Russell King - ARM Linux
2010-05-14 17:41 ` Jamie Lokier
2010-05-14 17:59 ` Russell King - ARM Linux
2010-05-12 23:41 ` James Bottomley
2010-05-13 0:18 ` Benjamin Herrenschmidt
2010-05-13 15:39 ` James Bottomley
2010-05-13 23:53 ` Benjamin Herrenschmidt
2010-05-13 3:12 ` FUJITA Tomonori
2010-05-13 4:42 ` Benjamin Herrenschmidt
2010-05-12 15:00 ` Jan Kara
2010-05-12 22:13 ` Benjamin Herrenschmidt
2010-05-13 0:15 ` Benjamin Herrenschmidt
2010-05-13 15:12 ` Jan Kara
2010-05-13 21:33 ` Benjamin Herrenschmidt
2010-05-13 23:51 ` Benjamin Herrenschmidt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1273569821.21352.19.camel@pasglop \
--to=benh@kernel.crashing.org \
--cc=akpm@linux-foundation.org \
--cc=jejb@parisc-linux.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=nico@marvell.com \
--cc=saeed@marvell.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).