public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Stan Hoeppner <stan@hardwarefreak.com>
To: Mike Dacre <mike.dacre@gmail.com>, xfs@oss.sgi.com
Subject: Re: Sudden File System Corruption
Date: Thu, 05 Dec 2013 02:10:59 -0600	[thread overview]
Message-ID: <52A03513.6030408@hardwarefreak.com> (raw)
In-Reply-To: <CAPd9ww_qT9J_Rt04g7+OApoBeggNOyWNwD+57DiDTuUvz-O-0g@mail.gmail.com>

On 12/4/2013 8:55 PM, Mike Dacre wrote:
...
> I have a 16 2TB drive RAID6 array powered by an LSI 9240-4i.  It has an XFS.

It's a 9260-4i, not a 9240, a huge difference.  I went digging through
your dmesg output because I knew the 9240 doesn't support RAID6.  A few
questions.  What is the LSI RAID configuration?

1.  Level -- confirm RAID6
2.  Strip size?  (eg 512KB)
3.  Stripe size? (eg 7168KB, 14*256)
4.  BBU module?
5.  Is write cache enabled?

What is the XFS geometry?

5.  xfs_info /dev/sda

A combination of these these being wrong could very well be part of your
problems.

...
> IO errors when any requests were made.  This happened while it was being

I didn't see any IO errors in your dmesg output.  None.

> accessed by  5 different users, one was doing a very large rm operation (rm
> *sh on thousands on files in a directory).  Also, about 30 minutes before
> we had connected the globus connect endpoint to allow easy file transfers
> to SDSC.

With delaylog enabled, which I believe it is in RHEL/CentOS 6, a single
big rm shouldn't kill the disks.  But with the combination of other
workloads it seems you may have been seeking the disks to death.

...
> In the end, I successfully repaired the filesystem with `xfs_repair -L
> /dev/sda1`.  However, I am nervous that some files may have been corrupted.

I'm sure your users will let you know.  I'd definitely have a look in
the directory that was targeted by the big rm operation which apparently
didn't finish when XFS shutdown.

> Do any of you have any idea what could have caused this problem?

Yes.  A few things.  The first is this, and it's a big one:

Dec  4 18:15:28 fruster kernel: io scheduler noop registered
Dec  4 18:15:28 fruster kernel: io scheduler anticipatory registered
Dec  4 18:15:28 fruster kernel: io scheduler deadline registered
Dec  4 18:15:28 fruster kernel: io scheduler cfq registered (default)

http://xfs.org/index.php/XFS_FAQ#Q:_I_want_to_tune_my_XFS_filesystems_for_.3Csomething.3E

"As of kernel 3.2.12, the default i/o scheduler, CFQ, will defeat much
of the parallelization in XFS."

*Never* use the CFQ elevator with XFS, and never with a high performance
storage system.  In fact, IMHO, never use CFQ period.  It was horrible
even before 3.2.12.  It is certain that CFQ is playing a big part in
your 120s timeouts, though it may not be solely responsible for your IO
bottleneck.  Switch to deadline or noop immediately, deadline if LSI
write cache is disabled, noop if it is enabled.  Execute this manually
now, and add it to a startup script and verify it is being set at
startup, as it's not permanent:

echo deadline > /sys/block/sda/queue/scheduler

This one simple command line may help pretty dramatically, immediately,
assuming your hardware array parameters aren't horribly wrong for your
workloads, and your XFS alignment correctly matches the hardware geometry.

-- 
Stan





_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  parent reply	other threads:[~2013-12-05  8:11 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-05  2:55 Sudden File System Corruption Mike Dacre
2013-12-05  3:40 ` Dave Chinner
2013-12-05  3:46   ` Mike Dacre
2013-12-05  3:59     ` Dave Chinner
2013-12-05  8:10 ` Stan Hoeppner [this message]
     [not found]   ` <CAPd9ww9hsOFK6pxqRY-YtLLAkkJHCuSi1BaM4n9=2XTjNVAn2Q@mail.gmail.com>
2013-12-05 15:58     ` Fwd: " Mike Dacre
2013-12-06  8:58       ` Stan Hoeppner
     [not found]         ` <CAPd9ww8+W2VX2HAfxEkVN5mL1a_+=HDAStf1126WSE33Vb=VsQ@mail.gmail.com>
2013-12-06 23:15           ` Fwd: " Mike Dacre
2013-12-07 11:12           ` Stan Hoeppner
2013-12-07 18:36             ` Mike Dacre
2013-12-08  5:22               ` Stan Hoeppner
2013-12-08 15:03                 ` Emmanuel Florac
2013-12-09  0:58                   ` Stan Hoeppner
2013-12-09  1:40                     ` Dave Chinner
2013-12-09 19:51                       ` Stan Hoeppner
2013-12-09 22:21                         ` Dave Chinner
2013-12-09 22:30                           ` Emmanuel Florac
2013-12-10  3:39                             ` Stan Hoeppner
2013-12-10  8:45                               ` Emmanuel Florac
2013-12-09 22:24                         ` Emmanuel Florac
2013-12-09  9:49                     ` Emmanuel Florac
2013-12-05 17:40 ` Ben Myers
     [not found]   ` <20131205175053.GG1935@sgi.com>
     [not found]     ` <CAPd9ww9YFbMEe-dM96zHsbRJgQuBHfF=ipromch1Yw6SzPUftg@mail.gmail.com>
     [not found]       ` <20131206002308.GS10553@sgi.com>
     [not found]         ` <CAPd9ww8XDzGbSZsEEoCmSuJ+KBYUWqHeRON1sFr6bG1fZ6af7w@mail.gmail.com>
     [not found]           ` <20131206225612.GU10553@sgi.com>
2013-12-06 23:15             ` Mike Dacre
2013-12-08 22:20               ` Dave Chinner
2013-12-09 19:04 ` Eric Sandeen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52A03513.6030408@hardwarefreak.com \
    --to=stan@hardwarefreak.com \
    --cc=mike.dacre@gmail.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox