From: Dave Chinner <david@fromorbit.com>
To: Janos Haar <janos.haar@netcenter.hu>
Cc: xiyou.wangcong@gmail.com, linux-kernel@vger.kernel.org,
kamezawa.hiroyu@jp.fujitsu.com, linux-mm@kvack.org,
xfs@oss.sgi.com, axboe@kernel.dk
Subject: Re: Kernel crash in xfs_iflush_cluster (was Somebody take a look please!...)
Date: Tue, 13 Apr 2010 21:34:45 +1000 [thread overview]
Message-ID: <20100413113445.GZ2493@dastard> (raw)
In-Reply-To: <190201cadaeb$02ec22c0$0400a8c0@dcccs>
On Tue, Apr 13, 2010 at 11:23:36AM +0200, Janos Haar wrote:
> >If you run:
> >
> >$ xfs_db -r -c "inode 474253940" -c p /dev/sdb2
> >
> >Then I can can confirm whether there is corruption on disk or not.
> >Probably best to sample multiple of the inode numbers from the above
> >list of bad inodes.
>
> Here is the log:
> http://download.netcenter.hu/bughunt/20100413/debug.log
There are multiple fields in the inode that are corrupted.
I am really surprised that xfs-repair - even an old version - is not
picking up the corruption....
> The xfs_db does segmentation fault. :-)
Yup, it probably ran off into la-la land chasing corrupted
extent pointers.
> Btw memory corruption:
> In the beginnig of march, one of my bets was memory problem too, but
> the server was offline for 7 days, and all the time runs the
> memtest86 on the hw, and passed all the 8GB 74 times without any bit
> error.
> I don't think it is memory problem, additionally the server can
> create big size .tar.gz files without crc problem.
Ok.
> If i force my mind to think to hw memory problem, i can think only
> for the raid card's cache memory, wich i can't test with memtest86.
> Or the cache of the HDD's pcb...
Yes, it could be something like that, too, but the only way to test
it is to swap out the card....
> In the other hand, i have seen more people reported memory
> corruption about these kernel versions, can we check this and surely
> select wich is the problem? (hw or sw)?
I haven't heard of any significant memory corruption problems in
2.6.32 or 2.6.33, but it is a possibility given the nature of the
corruption. However, I may have only happened once and be completely
unreproducable.
I'd suggest fixing the existing corruption first, and then seeing if
it re-appears. If it does reappear, then we know there's a
reproducable problem we need to dig out....
> I mean, if i am right, the hw memory problem makes only 1-2 bit
> corruption seriously, and the sw page handling problem makes bad
> memory pages, no?
RAM ECC guarantees correction of single bit errors and detection of
double bit errors (which cause the kernel to panic, IIRC). I can't
tell you what happens when larger errors occur, though...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-04-13 11:33 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <03ca01cacb92$195adf50$0400a8c0@dcccs>
2010-03-25 3:29 ` Somebody take a look please! (some kind of kernel bug?) Américo Wang
2010-03-25 6:31 ` KAMEZAWA Hiroyuki
2010-03-25 8:54 ` Janos Haar
2010-04-01 10:01 ` Janos Haar
2010-04-01 10:37 ` Américo Wang
2010-04-02 22:07 ` Janos Haar
2010-04-02 23:09 ` Kernel crash in xfs_iflush_cluster (was Somebody take a look please!...) Dave Chinner
2010-04-03 13:42 ` Janos Haar
2010-04-04 10:37 ` Dave Chinner
2010-04-05 18:17 ` Janos Haar
2010-04-05 22:45 ` Dave Chinner
2010-04-05 22:59 ` Janos Haar
2010-04-08 2:45 ` Janos Haar
2010-04-08 2:58 ` Dave Chinner
2010-04-08 11:21 ` Janos Haar
2010-04-09 21:37 ` Christian Kujau
2010-04-09 22:44 ` Janos Haar
2010-04-10 8:06 ` Américo Wang
2010-04-10 21:21 ` Kernel crash in xfs_iflush_cluster (was Somebody take a lookplease!...) Janos Haar
2010-04-10 21:15 ` Kernel crash in xfs_iflush_cluster (was Somebody take a look please!...) Janos Haar
2010-04-11 22:44 ` Janos Haar
2010-04-12 0:11 ` Dave Chinner
2010-04-13 8:00 ` Janos Haar
2010-04-13 8:39 ` Dave Chinner
2010-04-13 9:23 ` Janos Haar
2010-04-13 11:34 ` Dave Chinner [this message]
2010-04-13 23:36 ` Janos Haar
2010-04-14 0:16 ` Dave Chinner
2010-04-15 7:00 ` Janos Haar
2010-04-15 9:23 ` Dave Chinner
2010-04-15 10:23 ` Janos Haar
2010-04-16 8:01 ` Janos Haar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100413113445.GZ2493@dastard \
--to=david@fromorbit.com \
--cc=axboe@kernel.dk \
--cc=janos.haar@netcenter.hu \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=xfs@oss.sgi.com \
--cc=xiyou.wangcong@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).