From: "Janos Haar" <janos.haar@netcenter.hu>
To: Dave Chinner <david@fromorbit.com>
Cc: xiyou.wangcong@gmail.com, linux-kernel@vger.kernel.org,
kamezawa.hiroyu@jp.fujitsu.com, linux-mm@kvack.org,
xfs@oss.sgi.com, axboe@kernel.dk
Subject: Re: Kernel crash in xfs_iflush_cluster (was Somebody take a look please!...)
Date: Thu, 15 Apr 2010 12:23:26 +0200 [thread overview]
Message-ID: <24dd01cadc85$b1d9ea10$0400a8c0@dcccs> (raw)
In-Reply-To: 20100415092330.GU2493@dastard
----- Original Message -----
From: "Dave Chinner" <david@fromorbit.com>
To: "Janos Haar" <janos.haar@netcenter.hu>
Cc: <xiyou.wangcong@gmail.com>; <linux-kernel@vger.kernel.org>;
<kamezawa.hiroyu@jp.fujitsu.com>; <linux-mm@kvack.org>; <xfs@oss.sgi.com>;
<axboe@kernel.dk>
Sent: Thursday, April 15, 2010 11:23 AM
Subject: Re: Kernel crash in xfs_iflush_cluster (was Somebody take a look
please!...)
> On Thu, Apr 15, 2010 at 09:00:49AM +0200, Janos Haar wrote:
>> Dave,
>>
>> The corruption + crash reproduced. (unfortunately)
>>
>> http://download.netcenter.hu/bughunt/20100413/messages-15
>>
>> Apr 14 01:06:33 alfa kernel: XFS mounting filesystem sdb2
>>
>> This was the point of the xfs_repair more times.
>
> OK, the inodes that are corrupted are different, so there's still
> something funky going on here. I still would suggest replacing the
> RAID controller to rule that out as the cause.
This was not a cheap card and i can't replace, because have only one, and
the owner decided allready about i need to replace the entire server @
saturday.
I have only 2 day to get useful debug information when the server is online.
This is bad too for testing, becasue the workload will disappear, and we
need to figure out something to reproduce the problem offline...
>
> FWIW, do you have any other servers with similar h/w, s/w and
> workloads? If so, are they seeing problems?
This is a web based game, wich generates a loooot of small files on the
corrupted filesystem, and as far as i see, the corruption happens only @
writing, but not when reading.
Because i can copy multiple times big gz files across the partitions, and
compare, and test for crc, and there is a cron-tester wich tests 12GB gz
files hourly but can't find any problem, this shows me, the corruption only
happens when writing, and not on the content, but on the FS.
This scores the RAID card problem more lower, am i right? :-)
Additionally in the last 3 days i have tried 2 times to cp -aR the entire
partition to another, and both times the corruption appears ON THE SOURCE
and finally the kernel crashed.
step 1. repair
step 2 run the game (files generated...)
step 3 start copy partition's data in background
step 4 corruption reported by kernel
step 5 kernel crashed during write
Can this be a race between read and write?
Btw i have 2 server with this game, the difference are these:
- The game's language
- The HW's structure similar, but totally different branded all the parts,
except the Intel CPU. :-)
- The workload is lower on the stable server
- The stable server is not selected for replace. :-)
The important matches:
- The base OS is FC6 on both
- The actual kernel on the stable server is 2.6.28.10
(This kernel starts to crash @ the beginnig of Marc. month on which we are
working on.)
- The FS and the internal structure is the same
>
> Can you recompile the kernel with CONFIG_XFS_DEBUG enabled and
> reboot into it before you repair and remount the filesystem again?
Yes, of course!
I will do it now, we have 2 days left to get useful infos....
> (i.e. so that we know that we have started with a clean filesystem
> and the debug kernel) I'm hoping that this will catch the corruption
> much sooner, perhaps before it gets to disk. Note that this will
> cause the machine to panic when corruption is detected, and it is
> much,much more careful about checking in memory structures so there
> is a CPU overhead involved as well.
not a problem.
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-04-15 10:23 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <03ca01cacb92$195adf50$0400a8c0@dcccs>
2010-03-25 3:29 ` Somebody take a look please! (some kind of kernel bug?) Américo Wang
2010-03-25 6:31 ` KAMEZAWA Hiroyuki
2010-03-25 8:54 ` Janos Haar
2010-04-01 10:01 ` Janos Haar
2010-04-01 10:37 ` Américo Wang
2010-04-02 22:07 ` Janos Haar
2010-04-02 23:09 ` Kernel crash in xfs_iflush_cluster (was Somebody take a look please!...) Dave Chinner
2010-04-03 13:42 ` Janos Haar
2010-04-04 10:37 ` Dave Chinner
2010-04-05 18:17 ` Janos Haar
2010-04-05 22:45 ` Dave Chinner
2010-04-05 22:59 ` Janos Haar
2010-04-08 2:45 ` Janos Haar
2010-04-08 2:58 ` Dave Chinner
2010-04-08 11:21 ` Janos Haar
2010-04-09 21:37 ` Christian Kujau
2010-04-09 22:44 ` Janos Haar
2010-04-10 8:06 ` Américo Wang
2010-04-10 21:21 ` Kernel crash in xfs_iflush_cluster (was Somebody take a lookplease!...) Janos Haar
2010-04-10 21:15 ` Kernel crash in xfs_iflush_cluster (was Somebody take a look please!...) Janos Haar
2010-04-11 22:44 ` Janos Haar
2010-04-12 0:11 ` Dave Chinner
2010-04-13 8:00 ` Janos Haar
2010-04-13 8:39 ` Dave Chinner
2010-04-13 9:23 ` Janos Haar
2010-04-13 11:34 ` Dave Chinner
2010-04-13 23:36 ` Janos Haar
2010-04-14 0:16 ` Dave Chinner
2010-04-15 7:00 ` Janos Haar
2010-04-15 9:23 ` Dave Chinner
2010-04-15 10:23 ` Janos Haar [this message]
2010-04-16 8:01 ` Janos Haar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='24dd01cadc85$b1d9ea10$0400a8c0@dcccs' \
--to=janos.haar@netcenter.hu \
--cc=axboe@kernel.dk \
--cc=david@fromorbit.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=xfs@oss.sgi.com \
--cc=xiyou.wangcong@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).