All of lore.kernel.org
 help / color / mirror / Atom feed
From: Steve Costaras <stevecs@chaven.com>
To: Dave Chinner <david@fromorbit.com>
Cc: xfs@oss.sgi.com
Subject: Re: XFS data corruption with high I/O even on hardware raid
Date: Wed, 13 Jan 2010 20:33:39 -0600	[thread overview]
Message-ID: <4B4E8283.90001@chaven.com> (raw)
In-Reply-To: <20100114022409.GW17483@discord.disaster>



On 01/13/2010 20:24, Dave Chinner wrote:
> On Wed, Jan 13, 2010 at 07:11:27PM -0600, Steve Costaras wrote:
>    
>> Ok, I've been seeing a problem here since had to move over to XFS from
>> JFS due to file system size issues.   I am seeing XFS Data corruption
>> under ?heavy io?   Basically, what happens is that under heavy load
>> (i.e. if I'm doing say a xfs_fsr (which nearly always triggers the
>> freeze issue) on a volume the system hovers around 90% utilization for
>> the dm device for a while (sometimes an hour+, sometimes minutes) the
>> subsystem goes into 100% utilization and then freezes solid forcing me
>> to do a hard reboot of the box.
>>      
> xfs_fsr can cause a *large* amount of IO to be done, so it is no
> surprise that it can trigger high load bugs in hardware and
> software. XFS can trigger high load problems on hardware more
> readily than other filesystems because using direct IO (like xfs_fsr
> does) it can push far, far higher throughput to the starge subsystem
> than any other linux filesystem can.
>
> The fact that the IO subsystem is freezing at 100% elevator queue
> utilisation points to an IO never completing. This immediately makes
> me point a finger at either the RAID hardware or the driver - a bug
> in XFS is highly unlikely to cause this symptom as those stats are
> generated at layers lower than XFS.
>
> Next time you get a freeze, the output of:
>
> # echo w>  /proc/sysrq-trigger
>
> will tell use what the system is waiting on (i.e. why it is stuck)
>
> ...
>    

Thanks will try that, some times I do have enough time to issue a couple 
commands before the kernel hard locks and no user input is accepted.


>> Since I'm using hardware raid w/ BBU when I reboot and it comes back up
>> the raid controller writes out to the drives any outstanding data in
>> it's cache and from the hardware point of view (as well as lvm's point
>> of view) the array is ok.    The file system however generally can't be
>> mounted (about 4 out of 5 times, some times it does get auto-mounted but
>> when I then run an xfs_repair -n -v in those cases there are pages of
>> errors (badly aligned inode rec, bad starting inode #'s, dubious inode
>> btree block headers among others).    When I let a repair actually run
>> in one case out of 4,500,000 files it linked about 2,000,000 or so but
>> there was no way to identify and verify file integrity.  The others were
>> just lost.
>>
>> This is not limited to large volume sizes I have seen similar on small
>> ~2TiB file systems as well.  Also when it happened in a couple cases the
>> file system that was taking the I/O (say xfs_fsr -v /home ) another XFS
>> filesystem on the same system which was NOT taking much if any I/O gets
>> badly corrupted (say /var/test ).   Both would be using the same areca
>> controllers and same physical discs (same PV's and same VG's but
>> different LV's).
>>      
> These symptoms really point to a problem outside XFS - the only time
> I've seen this sort of behaviour is on buggy hardware. The
> cross-volume corruption is the smoking gun, but proving it is damn
> near impossible without expensive lab equipment and a lot of time.
>    

That's what I figured both the high I/O (as JFS did not produce as much 
I/O as I see under XFS) as well as the utilization reaching 100% on a 
particular card.

Would enabling write buffers have any positive effect here to at least 
minimize data loss issues?



>> Any suggestions on how to isolate or eliminate this would be greatly
>> appreciated.
>>      
> I'd start by not running xfs_fsr as a short term workaround to keep
> the load below the problem threshold.
>
> Looking at the iostat output - the volumes sd[f-i] all lock up at
> 100% utilisation at the same time. Then looking at this:
>    

Already planning on it, the ?sole? benefit of this corruption is that at 
least the full volume restore has much less fragmentation.   (kind of a 
killer way to defragment but it does work).


Steve

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2010-01-14  2:32 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-01-14  1:11 XFS data corruption with high I/O even on hardware raid Steve Costaras
2010-01-14  2:24 ` Dave Chinner
2010-01-14  2:33   ` Steve Costaras [this message]
2010-01-15  0:52   ` XFS data corruption with high I/O even on Areca " Steve Costaras
2010-01-15  1:35     ` Dave Chinner
2010-01-15  2:15       ` Steve Costaras
2010-01-14  9:08 ` XFS data corruption with high I/O even on " Andi Kleen
2010-01-14 11:19   ` Steve Costaras
2010-01-14 11:36     ` Andi Kleen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4B4E8283.90001@chaven.com \
    --to=stevecs@chaven.com \
    --cc=david@fromorbit.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.