public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Paul Anderson <pha@umich.edu>
Cc: xfs-oss <xfs@oss.sgi.com>
Subject: Re: XFS file loss - 2.6.38.5, FC RAID
Date: Wed, 29 Jun 2011 11:24:16 +1000	[thread overview]
Message-ID: <20110629012416.GS32466@dastard> (raw)
In-Reply-To: <BANLkTimsEAyCF52hzx0ay4S+6TkscpKXoQ@mail.gmail.com>

On Tue, Jun 28, 2011 at 11:03:02AM -0400, Paul Anderson wrote:
> I'm sending this error report as an informational point - I'm not sure
> much can be done about it at the present time.
> 
> We had a machine crash Sunday night (June 26) around 8PM - the
> hardware failed due to a Sun J4400 chassis fault.  The XFS file loss
> noted in this report was not on this chassis.
> 
> On power cycle and subsequent reboot, one of our home directory
> volumes, a pair of 40TiByte Promise RAID6 fiber channel SAN array
> together in a single LVM, lost many files.
> 
> File loss is characterized by numerous files now with length of zero.
> I lost files that I know were last changed on Friday (June 24), more
> than 2 days before the crash.

Which means either they weren't written back, or the inode was never
written to disk or logged after the data was written. Can you tell
us about the files were that were zero length? e.g. their lifecycle,
how they are modified, expected size, etc?

> Kernel is 2.6.38.5, userland is Ubuntu 10.04, server hardware is a 24
> core Dell R900 w/128GiBytes RAM, an LSIFC949E fiber channel card, a
> bunch of Dell PERC 6 RAID cards, and a lot of direct attach SAS JBOD
> cabinets (mostly J4400, but a few Dell MD1000's).  The boot drive is a
> pair of matched 1TiByte drives in a HW RAID-1 config.

hmmmm - lots of RAM. I wonder if something in writeback land got
stuck and the system never hit dirty memory thresholds or some other
writeback trigger. Anything in the log about hung tasks?

> The Promise RAID6 SAN unit where the files were lost is battery
> backed, and reports no errors.  The filesystem showed no signs of
> distress prior to this.  The filesystem was less than 4 weeks old.
> 
> Here's the fstab mount options:
> 
> /dev/wonderlandhomet/homet    /homet    xfs
> inode64,logbufs=8,noatime        0       0

Ok, so not using delayed logging.

> xfs_info shows:
> 
> root@wonderland:~# xfs_info /homet
> meta-data=/dev/mapper/wonderlandhomet-homet isize=256    agcount=81,
> agsize=268435328 blks
>          =                       sectsz=512   attr=2
> data     =                       bsize=4096   blocks=21484355584, imaxpct=1
>          =                       sunit=128    swidth=2816 blks
> naming   =version 2              bsize=4096   ascii-ci=0
> log      =internal               bsize=4096   blocks=521728, version=2
>          =                       sectsz=512   sunit=8 blks, lazy-count=1
> realtime =none                   extsz=4096   blocks=0, rtextents=0
> 
> The dmesg log shows no signs of hardware or kernel software problems
> up to the point where the directly attached SAS card reported faults
> for the cabinet.
> 
> The vm tuning parameters are defaults (yes, I know this is bad):
> 
> root@louie:/proc/sys/vm# cat dirty_background_bytes
> 0
> root@louie:/proc/sys/vm# cat dirty_background_ratio
> 10
> root@louie:/proc/sys/vm# cat dirty_bytes
> 0
> root@louie:/proc/sys/vm# cat dirty_expire_centisecs
> 3000
> root@louie:/proc/sys/vm# cat dirty_ratio
> 40
> root@louie:/proc/sys/vm# cat dirty_writeback_centisecs
> 500

Those look like the defaults to me, so background writeback
won't start until ~13GB of memory is dirty. Hence it should only be
doing kupdate writeback at that point based on inodes aging more than
30s...

> My main question is: what specific action can I take to minimize the
> likelihood of this happening again?  As far as I know, the dirty pages
> should expire and be flushed to the FC array (2 days?  should be
> enough), and the FC array itself is stable.

No idea at this point.

There is the possibility that the AIL did not get pushed because
there wasn't sufficient transaction activity so the inode never got
written back (you've got a ~2GB log, so entirely possible), but I
would have expect log replay to handle that case just fine. 2.6.39
pushes the AIL every 30s so avoids this problem, but the fix is not
a simple backport to 2.6.38 because it is part of a conversion of
the xfssyncd/xfsaild to use workqueues.

Other than that, I can't really say. A reproducable test case is
usually needed to find such problems, and I don't think you have one
of those... :/

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

      reply	other threads:[~2011-06-29  1:24 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-06-28 15:03 XFS file loss - 2.6.38.5, FC RAID Paul Anderson
2011-06-29  1:24 ` Dave Chinner [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110629012416.GS32466@dastard \
    --to=david@fromorbit.com \
    --cc=pha@umich.edu \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox