All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Janos Haar" <janos.haar@netcenter.hu>
To: Dave Chinner <david@fromorbit.com>
Cc: axboe@kernel.dk, linux-kernel@vger.kernel.org, xfs@oss.sgi.com,
	linux-mm@kvack.org, xiyou.wangcong@gmail.com,
	kamezawa.hiroyu@jp.fujitsu.com
Subject: Re: Kernel crash in xfs_iflush_cluster (was Somebody take a look please!...)
Date: Fri, 16 Apr 2010 10:01:10 +0200	[thread overview]
Message-ID: <295901cadd3a$fbeb1650$0400a8c0@dcccs> (raw)
In-Reply-To: 20100415092330.GU2493@dastard


----- Original Message ----- 
From: "Dave Chinner" <david@fromorbit.com>
To: "Janos Haar" <janos.haar@netcenter.hu>
Cc: <xiyou.wangcong@gmail.com>; <linux-kernel@vger.kernel.org>; 
<kamezawa.hiroyu@jp.fujitsu.com>; <linux-mm@kvack.org>; <xfs@oss.sgi.com>; 
<axboe@kernel.dk>
Sent: Thursday, April 15, 2010 11:23 AM
Subject: Re: Kernel crash in xfs_iflush_cluster (was Somebody take a look 
please!...)


> On Thu, Apr 15, 2010 at 09:00:49AM +0200, Janos Haar wrote:
>> Dave,
>>
>> The corruption + crash reproduced. (unfortunately)
>>
>> http://download.netcenter.hu/bughunt/20100413/messages-15
>>
>> Apr 14 01:06:33 alfa kernel: XFS mounting filesystem sdb2
>>
>> This was the point of the xfs_repair more times.
>
> OK, the inodes that are corrupted are different, so there's still
> something funky going on here. I still would suggest replacing the
> RAID controller to rule that out as the cause.

News:

(reminder from the actual state:
xfs_repair fixed the fs, than kernel reported again the corruption and 
crashed, i wrote the provious letter to report this.)

Yesterday i have stopped the service, and run xfs_repair (new version only) 
on 2 FS, but it was clean!
(this shows me, the reported corruption was only in memory, or the kernel 
repaired it on the reboot.)
(The XFS_Debug turned on before.)
Today morning i have another messages in the syslog from the sdb2 again.
At this point, i don't know what to think.

http://download.netcenter.hu/bughunt/20100413/messages-16

Regards,
Janos


>
> FWIW, do you have any other servers with similar h/w, s/w and
> workloads? If so, are they seeing problems?
>
> Can you recompile the kernel with CONFIG_XFS_DEBUG enabled and
> reboot into it before you repair and remount the filesystem again?
> (i.e. so that we know that we have started with a clean filesystem
> and the debug kernel) I'm hoping that this will catch the corruption
> much sooner, perhaps before it gets to disk. Note that this will
> cause the machine to panic when corruption is detected, and it is
> much,much more careful about checking in memory structures so there
> is a CPU overhead involved as well.
>
> Cheers,
>
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/ 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

WARNING: multiple messages have this Message-ID (diff)
From: "Janos Haar" <janos.haar@netcenter.hu>
To: "Dave Chinner" <david@fromorbit.com>
Cc: <xiyou.wangcong@gmail.com>, <linux-kernel@vger.kernel.org>,
	<kamezawa.hiroyu@jp.fujitsu.com>, <linux-mm@kvack.org>,
	<xfs@oss.sgi.com>, <axboe@kernel.dk>
Subject: Re: Kernel crash in xfs_iflush_cluster (was Somebody take a look please!...)
Date: Fri, 16 Apr 2010 10:01:10 +0200	[thread overview]
Message-ID: <295901cadd3a$fbeb1650$0400a8c0@dcccs> (raw)
In-Reply-To: 20100415092330.GU2493@dastard


----- Original Message ----- 
From: "Dave Chinner" <david@fromorbit.com>
To: "Janos Haar" <janos.haar@netcenter.hu>
Cc: <xiyou.wangcong@gmail.com>; <linux-kernel@vger.kernel.org>; 
<kamezawa.hiroyu@jp.fujitsu.com>; <linux-mm@kvack.org>; <xfs@oss.sgi.com>; 
<axboe@kernel.dk>
Sent: Thursday, April 15, 2010 11:23 AM
Subject: Re: Kernel crash in xfs_iflush_cluster (was Somebody take a look 
please!...)


> On Thu, Apr 15, 2010 at 09:00:49AM +0200, Janos Haar wrote:
>> Dave,
>>
>> The corruption + crash reproduced. (unfortunately)
>>
>> http://download.netcenter.hu/bughunt/20100413/messages-15
>>
>> Apr 14 01:06:33 alfa kernel: XFS mounting filesystem sdb2
>>
>> This was the point of the xfs_repair more times.
>
> OK, the inodes that are corrupted are different, so there's still
> something funky going on here. I still would suggest replacing the
> RAID controller to rule that out as the cause.

News:

(reminder from the actual state:
xfs_repair fixed the fs, than kernel reported again the corruption and 
crashed, i wrote the provious letter to report this.)

Yesterday i have stopped the service, and run xfs_repair (new version only) 
on 2 FS, but it was clean!
(this shows me, the reported corruption was only in memory, or the kernel 
repaired it on the reboot.)
(The XFS_Debug turned on before.)
Today morning i have another messages in the syslog from the sdb2 again.
At this point, i don't know what to think.

http://download.netcenter.hu/bughunt/20100413/messages-16

Regards,
Janos


>
> FWIW, do you have any other servers with similar h/w, s/w and
> workloads? If so, are they seeing problems?
>
> Can you recompile the kernel with CONFIG_XFS_DEBUG enabled and
> reboot into it before you repair and remount the filesystem again?
> (i.e. so that we know that we have started with a clean filesystem
> and the debug kernel) I'm hoping that this will catch the corruption
> much sooner, perhaps before it gets to disk. Note that this will
> cause the machine to panic when corruption is detected, and it is
> much,much more careful about checking in memory structures so there
> is a CPU overhead involved as well.
>
> Cheers,
>
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/ 


WARNING: multiple messages have this Message-ID (diff)
From: "Janos Haar" <janos.haar@netcenter.hu>
To: Dave Chinner <david@fromorbit.com>
Cc: xiyou.wangcong@gmail.com, linux-kernel@vger.kernel.org,
	kamezawa.hiroyu@jp.fujitsu.com, linux-mm@kvack.org,
	xfs@oss.sgi.com, axboe@kernel.dk
Subject: Re: Kernel crash in xfs_iflush_cluster (was Somebody take a look please!...)
Date: Fri, 16 Apr 2010 10:01:10 +0200	[thread overview]
Message-ID: <295901cadd3a$fbeb1650$0400a8c0@dcccs> (raw)
In-Reply-To: 20100415092330.GU2493@dastard


----- Original Message ----- 
From: "Dave Chinner" <david@fromorbit.com>
To: "Janos Haar" <janos.haar@netcenter.hu>
Cc: <xiyou.wangcong@gmail.com>; <linux-kernel@vger.kernel.org>; 
<kamezawa.hiroyu@jp.fujitsu.com>; <linux-mm@kvack.org>; <xfs@oss.sgi.com>; 
<axboe@kernel.dk>
Sent: Thursday, April 15, 2010 11:23 AM
Subject: Re: Kernel crash in xfs_iflush_cluster (was Somebody take a look 
please!...)


> On Thu, Apr 15, 2010 at 09:00:49AM +0200, Janos Haar wrote:
>> Dave,
>>
>> The corruption + crash reproduced. (unfortunately)
>>
>> http://download.netcenter.hu/bughunt/20100413/messages-15
>>
>> Apr 14 01:06:33 alfa kernel: XFS mounting filesystem sdb2
>>
>> This was the point of the xfs_repair more times.
>
> OK, the inodes that are corrupted are different, so there's still
> something funky going on here. I still would suggest replacing the
> RAID controller to rule that out as the cause.

News:

(reminder from the actual state:
xfs_repair fixed the fs, than kernel reported again the corruption and 
crashed, i wrote the provious letter to report this.)

Yesterday i have stopped the service, and run xfs_repair (new version only) 
on 2 FS, but it was clean!
(this shows me, the reported corruption was only in memory, or the kernel 
repaired it on the reboot.)
(The XFS_Debug turned on before.)
Today morning i have another messages in the syslog from the sdb2 again.
At this point, i don't know what to think.

http://download.netcenter.hu/bughunt/20100413/messages-16

Regards,
Janos


>
> FWIW, do you have any other servers with similar h/w, s/w and
> workloads? If so, are they seeing problems?
>
> Can you recompile the kernel with CONFIG_XFS_DEBUG enabled and
> reboot into it before you repair and remount the filesystem again?
> (i.e. so that we know that we have started with a clean filesystem
> and the debug kernel) I'm hoping that this will catch the corruption
> much sooner, perhaps before it gets to disk. Note that this will
> cause the machine to panic when corruption is detected, and it is
> much,much more careful about checking in memory structures so there
> is a CPU overhead involved as well.
>
> Cheers,
>
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/ 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2010-04-16  8:05 UTC|newest]

Thread overview: 93+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-24 20:39 Somebody take a look please! (some kind of kernel bug?) Janos Haar
2010-03-25  3:29 ` Américo Wang
2010-03-25  3:29   ` Américo Wang
2010-03-25  6:31   ` KAMEZAWA Hiroyuki
2010-03-25  6:31     ` KAMEZAWA Hiroyuki
2010-03-25  8:54     ` Janos Haar
2010-03-25  8:54       ` Janos Haar
2010-04-01 10:01       ` Janos Haar
2010-04-01 10:01         ` Janos Haar
2010-04-01 10:37         ` Américo Wang
2010-04-01 10:37           ` Américo Wang
2010-04-01 10:37           ` Américo Wang
2010-04-02 22:07           ` Janos Haar
2010-04-02 22:07             ` Janos Haar
2010-04-02 22:07             ` Janos Haar
2010-04-02 23:09             ` Kernel crash in xfs_iflush_cluster (was Somebody take a look please!...) Dave Chinner
2010-04-02 23:09               ` Dave Chinner
2010-04-02 23:09               ` Dave Chinner
2010-04-03 13:42               ` Janos Haar
2010-04-03 13:42                 ` Janos Haar
2010-04-03 13:42                 ` Janos Haar
2010-04-04 10:37                 ` Dave Chinner
2010-04-04 10:37                   ` Dave Chinner
2010-04-04 10:37                   ` Dave Chinner
2010-04-05 18:17                   ` Janos Haar
2010-04-05 18:17                     ` Janos Haar
2010-04-05 18:17                     ` Janos Haar
2010-04-05 22:45                     ` Dave Chinner
2010-04-05 22:45                       ` Dave Chinner
2010-04-05 22:45                       ` Dave Chinner
2010-04-05 22:59                       ` Janos Haar
2010-04-05 22:59                         ` Janos Haar
2010-04-05 22:59                         ` Janos Haar
2010-04-08  2:45                       ` Janos Haar
2010-04-08  2:45                         ` Janos Haar
2010-04-08  2:45                         ` Janos Haar
2010-04-08  2:58                         ` Dave Chinner
2010-04-08  2:58                           ` Dave Chinner
2010-04-08  2:58                           ` Dave Chinner
2010-04-08 11:21                           ` Janos Haar
2010-04-08 11:21                             ` Janos Haar
2010-04-08 11:21                             ` Janos Haar
2010-04-09 21:37                             ` Christian Kujau
2010-04-09 21:37                               ` Christian Kujau
2010-04-09 21:37                               ` Christian Kujau
2010-04-09 22:44                               ` Janos Haar
2010-04-09 22:44                                 ` Janos Haar
2010-04-09 22:44                                 ` Janos Haar
2010-04-10  8:06                                 ` Américo Wang
2010-04-10  8:06                                   ` Américo Wang
2010-04-10  8:06                                   ` Américo Wang
2010-04-10 21:21                                   ` Kernel crash in xfs_iflush_cluster (was Somebody take a lookplease!...) Janos Haar
2010-04-10 21:21                                     ` Janos Haar
2010-04-10 21:21                                     ` Janos Haar
2010-04-10 21:15                           ` Kernel crash in xfs_iflush_cluster (was Somebody take a look please!...) Janos Haar
2010-04-10 21:15                             ` Janos Haar
2010-04-10 21:15                             ` Janos Haar
2010-04-11 22:44                           ` Janos Haar
2010-04-11 22:44                             ` Janos Haar
2010-04-11 22:44                             ` Janos Haar
2010-04-12  0:11                             ` Dave Chinner
2010-04-12  0:11                               ` Dave Chinner
2010-04-12  0:11                               ` Dave Chinner
2010-04-13  8:00                               ` Janos Haar
2010-04-13  8:00                                 ` Janos Haar
2010-04-13  8:00                                 ` Janos Haar
2010-04-13  8:39                                 ` Dave Chinner
2010-04-13  8:39                                   ` Dave Chinner
2010-04-13  8:39                                   ` Dave Chinner
2010-04-13  9:23                                   ` Janos Haar
2010-04-13  9:23                                     ` Janos Haar
2010-04-13  9:23                                     ` Janos Haar
2010-04-13 11:34                                     ` Dave Chinner
2010-04-13 11:34                                       ` Dave Chinner
2010-04-13 11:34                                       ` Dave Chinner
2010-04-13 23:36                                       ` Janos Haar
2010-04-13 23:36                                         ` Janos Haar
2010-04-13 23:36                                         ` Janos Haar
2010-04-14  0:16                                         ` Dave Chinner
2010-04-14  0:16                                           ` Dave Chinner
2010-04-14  0:16                                           ` Dave Chinner
2010-04-15  7:00                                           ` Janos Haar
2010-04-15  7:00                                             ` Janos Haar
2010-04-15  7:00                                             ` Janos Haar
2010-04-15  9:23                                             ` Dave Chinner
2010-04-15  9:23                                               ` Dave Chinner
2010-04-15  9:23                                               ` Dave Chinner
2010-04-15 10:23                                               ` Janos Haar
2010-04-15 10:23                                                 ` Janos Haar
2010-04-15 10:23                                                 ` Janos Haar
2010-04-16  8:01                                               ` Janos Haar [this message]
2010-04-16  8:01                                                 ` Janos Haar
2010-04-16  8:01                                                 ` Janos Haar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='295901cadd3a$fbeb1650$0400a8c0@dcccs' \
    --to=janos.haar@netcenter.hu \
    --cc=axboe@kernel.dk \
    --cc=david@fromorbit.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=xfs@oss.sgi.com \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.