public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Steve Costaras <stevecs@chaven.com>
To: Dave Chinner <david@fromorbit.com>
Cc: xfs@oss.sgi.com
Subject: Re: XFS data corruption with high I/O even on hardware raid
Date: Wed, 13 Jan 2010 20:33:39 -0600	[thread overview]
Message-ID: <4B4E8283.90001@chaven.com> (raw)
In-Reply-To: <20100114022409.GW17483@discord.disaster>



On 01/13/2010 20:24, Dave Chinner wrote:
> On Wed, Jan 13, 2010 at 07:11:27PM -0600, Steve Costaras wrote:
>    
>> Ok, I've been seeing a problem here since had to move over to XFS from
>> JFS due to file system size issues.   I am seeing XFS Data corruption
>> under ?heavy io?   Basically, what happens is that under heavy load
>> (i.e. if I'm doing say a xfs_fsr (which nearly always triggers the
>> freeze issue) on a volume the system hovers around 90% utilization for
>> the dm device for a while (sometimes an hour+, sometimes minutes) the
>> subsystem goes into 100% utilization and then freezes solid forcing me
>> to do a hard reboot of the box.
>>      
> xfs_fsr can cause a *large* amount of IO to be done, so it is no
> surprise that it can trigger high load bugs in hardware and
> software. XFS can trigger high load problems on hardware more
> readily than other filesystems because using direct IO (like xfs_fsr
> does) it can push far, far higher throughput to the starge subsystem
> than any other linux filesystem can.
>
> The fact that the IO subsystem is freezing at 100% elevator queue
> utilisation points to an IO never completing. This immediately makes
> me point a finger at either the RAID hardware or the driver - a bug
> in XFS is highly unlikely to cause this symptom as those stats are
> generated at layers lower than XFS.
>
> Next time you get a freeze, the output of:
>
> # echo w>  /proc/sysrq-trigger
>
> will tell use what the system is waiting on (i.e. why it is stuck)
>
> ...
>    

Thanks will try that, some times I do have enough time to issue a couple 
commands before the kernel hard locks and no user input is accepted.


>> Since I'm using hardware raid w/ BBU when I reboot and it comes back up
>> the raid controller writes out to the drives any outstanding data in
>> it's cache and from the hardware point of view (as well as lvm's point
>> of view) the array is ok.    The file system however generally can't be
>> mounted (about 4 out of 5 times, some times it does get auto-mounted but
>> when I then run an xfs_repair -n -v in those cases there are pages of
>> errors (badly aligned inode rec, bad starting inode #'s, dubious inode
>> btree block headers among others).    When I let a repair actually run
>> in one case out of 4,500,000 files it linked about 2,000,000 or so but
>> there was no way to identify and verify file integrity.  The others were
>> just lost.
>>
>> This is not limited to large volume sizes I have seen similar on small
>> ~2TiB file systems as well.  Also when it happened in a couple cases the
>> file system that was taking the I/O (say xfs_fsr -v /home ) another XFS
>> filesystem on the same system which was NOT taking much if any I/O gets
>> badly corrupted (say /var/test ).   Both would be using the same areca
>> controllers and same physical discs (same PV's and same VG's but
>> different LV's).
>>      
> These symptoms really point to a problem outside XFS - the only time
> I've seen this sort of behaviour is on buggy hardware. The
> cross-volume corruption is the smoking gun, but proving it is damn
> near impossible without expensive lab equipment and a lot of time.
>    

That's what I figured both the high I/O (as JFS did not produce as much 
I/O as I see under XFS) as well as the utilization reaching 100% on a 
particular card.

Would enabling write buffers have any positive effect here to at least 
minimize data loss issues?



>> Any suggestions on how to isolate or eliminate this would be greatly
>> appreciated.
>>      
> I'd start by not running xfs_fsr as a short term workaround to keep
> the load below the problem threshold.
>
> Looking at the iostat output - the volumes sd[f-i] all lock up at
> 100% utilisation at the same time. Then looking at this:
>    

Already planning on it, the ?sole? benefit of this corruption is that at 
least the full volume restore has much less fragmentation.   (kind of a 
killer way to defragment but it does work).


Steve

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2010-01-14  2:32 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-01-14  1:11 XFS data corruption with high I/O even on hardware raid Steve Costaras
2010-01-14  2:24 ` Dave Chinner
2010-01-14  2:33   ` Steve Costaras [this message]
2010-01-15  0:52   ` XFS data corruption with high I/O even on Areca " Steve Costaras
2010-01-15  1:35     ` Dave Chinner
2010-01-15  2:15       ` Steve Costaras
2010-01-14  9:08 ` XFS data corruption with high I/O even on " Andi Kleen
2010-01-14 11:19   ` Steve Costaras
2010-01-14 11:36     ` Andi Kleen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4B4E8283.90001@chaven.com \
    --to=stevecs@chaven.com \
    --cc=david@fromorbit.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox