public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Sagar Borikar <Sagar_Borikar@pmc-sierra.com>
Cc: xfs@oss.sgi.com
Subject: Re: Xfs Access to block zero  exception and system crash
Date: Thu, 26 Jun 2008 17:02:15 +1000	[thread overview]
Message-ID: <20080626070215.GI11558@disturbed> (raw)
In-Reply-To: <340C71CD25A7EB49BFA81AE8C839266701323BE8@BBY1EXM10.pmc_nt.nt.pmc-sierra.bc.ca>

[please wrap your replies at 72 columns]

On Wed, Jun 25, 2008 at 11:46:59PM -0700, Sagar Borikar wrote:
> >> with 2.6.18 kernel,128 MB of RAM, MIPS architecture and XFS version 
> >> 2.8.11.
> 
> > [...]
> 
> >> Can anyone let me know what could be the probable cause of this issue.
> 
> > they are all from  corrupted extent btrees.  There are many
> > possible causes of this that we've fixed over the past years
> > since 2.6.18 was released. Indeed, we are currently discussing
> > fixes for a bunch of problems that lead to corrupted extent
> > btrees and problems like this. I'd suggest that you should
> > probably start with a more recent kernel, make sure you have a
> > serial console and set the xfs_error_level to 11 so that it
> > gives as much information as possible on the console when the
> > error it > hit.  if that doesn't give a stack trace, then  you
> > need to set the xfs_panic_mask to crash the machine on block
> > zero accesses and report the stack straces that it outputs...
>  
> Yes, I went through the changes between 2.6.24 and 2.6.18 and they
> are quite a few. But as this is production system and on field,
> its not viable to upgrade the kernel.

Well, you're pretty much on your own then :/

> I do understand that there
> could be many places which can cause the corruption.
> Unfortunately, three different systems have given three different
> places of corruption as stated.

Yes, but all the same pattern of corruption, so it is likely
that it is one problem.

> Now I am sleeping in the access to
> block zero exception and rescheduling so that it won't stall the
> system and I can monitor the state of the filesystem. As the
> frequency of landing the error is once in 2.5 days under extreme
> stress,  if you could point me to the probable place to look at, I
> can narrow down the debugging path.

Like I said - it's a corrupt bmap btree. It could be a bug in the
bmap btree code, the alloc btree code, the inode data fork
manipulation code, it could be a block device bug returning bad data
to XFS on on a cancelled btree readahead, etc. IOWs, there are so many
possible causes of a corrupted btree that a bug report by itself is
mostly useless.

All I can suggest is working out a reproducable test case in your
development environment, attaching a debugger and start digging around
in memory when the problem is hit and try to find out exactly what
is corrupted. If you can't reproduce it or work out what is
occurring to trigger the problem, then we're not going to be able to
find the cause...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2008-06-26  7:01 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-06-24  7:03 Xfs Access to block zero exception and system crash Sagar Borikar
2008-06-25  6:48 ` Sagar Borikar
2008-06-25  8:49 ` Dave Chinner
2008-06-26  6:46   ` Sagar Borikar
2008-06-26  7:02     ` Dave Chinner [this message]
2008-06-27 10:13       ` Sagar Borikar
2008-06-27 10:25         ` Sagar Borikar
2008-06-28  0:05           ` Dave Chinner
2008-06-28 16:47             ` Sagar Borikar
2008-06-29 21:56               ` Dave Chinner
2008-06-30  3:37                 ` Sagar Borikar
     [not found]                 ` <20080630034112.055CF18904C4@bby1mta01.pmc-sierra.bc.ca>
2008-06-30  6:07                   ` Sagar Borikar
2008-06-30 10:24                   ` Sagar Borikar
2008-07-01  6:44                     ` Dave Chinner
2008-07-02  4:18                       ` Sagar Borikar
2008-07-02  5:13                         ` Dave Chinner
2008-07-02  5:35                           ` Sagar Borikar
2008-07-02  6:13                             ` Nathan Scott
2008-07-02  6:56                               ` Dave Chinner
2008-07-02 11:02                                 ` Sagar Borikar
2008-07-03  4:03                                   ` Eric Sandeen
2008-07-03  5:14                                     ` Sagar Borikar
2008-07-03 15:02                                       ` Eric Sandeen
2008-07-04 10:18                                         ` Sagar Borikar
2008-07-04 12:27                                           ` Dave Chinner
2008-07-04 17:30                                             ` Sagar Borikar
2008-07-04 17:35                                               ` Eric Sandeen
2008-07-04 17:51                                                 ` Sagar Borikar
2008-07-05 16:25                                                   ` Eric Sandeen
2008-07-06 17:24                                                     ` Sagar Borikar
2008-07-06 19:07                                                       ` Eric Sandeen
2008-07-07  3:02                                                         ` Sagar Borikar
2008-07-07  3:04                                                           ` Eric Sandeen
2008-07-07  3:07                                                             ` Sagar Borikar
2008-07-07  3:11                                                               ` Eric Sandeen
2008-07-07  3:17                                                                 ` Sagar Borikar
2008-07-07  3:22                                                                   ` Eric Sandeen
2008-07-07  3:42                                                                     ` Sagar Borikar
     [not found]                                                                       ` <487191C2.6090803@sandeen  .net>
     [not found]                                                                         ` <4871947D.2090701@pmc-sierr a.com>
2008-07-07  3:47                                                                       ` Eric Sandeen
2008-07-07  3:58                                                                         ` Sagar Borikar
2008-07-07  5:19                                                                           ` Eric Sandeen
2008-07-07  5:58                                                                             ` Sagar Borikar
2008-07-06  4:19                                                   ` Dave Chinner
2008-07-04 15:33                                           ` Eric Sandeen
2008-06-28  0:02         ` Dave Chinner
     [not found] <4872E0BC.6070400@pmc-sierra.com>
     [not found] ` <4872E33E.3090107@sandeen.net>
2008-07-08  5:03   ` Sagar Borikar
2008-07-09 16:57   ` Sagar Borikar
2008-07-10  5:12     ` Sagar Borikar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080626070215.GI11558@disturbed \
    --to=david@fromorbit.com \
    --cc=Sagar_Borikar@pmc-sierra.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox