From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id o0E2WesW213485 for ; Wed, 13 Jan 2010 20:32:40 -0600 Received: from omr7.networksolutionsemail.com (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id E36F116474E for ; Wed, 13 Jan 2010 18:33:36 -0800 (PST) Received: from omr7.networksolutionsemail.com (omr7.networksolutionsemail.com [205.178.146.57]) by cuda.sgi.com with ESMTP id 65f9F1Fg0YLyUgkN for ; Wed, 13 Jan 2010 18:33:36 -0800 (PST) Received: from mail.networksolutionsemail.com (mail.networksolutionsemail.com [205.178.146.50]) by omr7.networksolutionsemail.com (8.13.6/8.13.6) with SMTP id o0E2Xapd027757 for ; Wed, 13 Jan 2010 21:33:36 -0500 Message-ID: <4B4E8283.90001@chaven.com> Date: Wed, 13 Jan 2010 20:33:39 -0600 From: Steve Costaras MIME-Version: 1.0 Subject: Re: XFS data corruption with high I/O even on hardware raid References: <4B4E6F3F.1090901@chaven.com> <20100114022409.GW17483@discord.disaster> In-Reply-To: <20100114022409.GW17483@discord.disaster> List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Sender: xfs-bounces@oss.sgi.com Errors-To: xfs-bounces@oss.sgi.com To: Dave Chinner Cc: xfs@oss.sgi.com On 01/13/2010 20:24, Dave Chinner wrote: > On Wed, Jan 13, 2010 at 07:11:27PM -0600, Steve Costaras wrote: > >> Ok, I've been seeing a problem here since had to move over to XFS from >> JFS due to file system size issues. I am seeing XFS Data corruption >> under ?heavy io? Basically, what happens is that under heavy load >> (i.e. if I'm doing say a xfs_fsr (which nearly always triggers the >> freeze issue) on a volume the system hovers around 90% utilization for >> the dm device for a while (sometimes an hour+, sometimes minutes) the >> subsystem goes into 100% utilization and then freezes solid forcing me >> to do a hard reboot of the box. >> > xfs_fsr can cause a *large* amount of IO to be done, so it is no > surprise that it can trigger high load bugs in hardware and > software. XFS can trigger high load problems on hardware more > readily than other filesystems because using direct IO (like xfs_fsr > does) it can push far, far higher throughput to the starge subsystem > than any other linux filesystem can. > > The fact that the IO subsystem is freezing at 100% elevator queue > utilisation points to an IO never completing. This immediately makes > me point a finger at either the RAID hardware or the driver - a bug > in XFS is highly unlikely to cause this symptom as those stats are > generated at layers lower than XFS. > > Next time you get a freeze, the output of: > > # echo w> /proc/sysrq-trigger > > will tell use what the system is waiting on (i.e. why it is stuck) > > ... > Thanks will try that, some times I do have enough time to issue a couple commands before the kernel hard locks and no user input is accepted. >> Since I'm using hardware raid w/ BBU when I reboot and it comes back up >> the raid controller writes out to the drives any outstanding data in >> it's cache and from the hardware point of view (as well as lvm's point >> of view) the array is ok. The file system however generally can't be >> mounted (about 4 out of 5 times, some times it does get auto-mounted but >> when I then run an xfs_repair -n -v in those cases there are pages of >> errors (badly aligned inode rec, bad starting inode #'s, dubious inode >> btree block headers among others). When I let a repair actually run >> in one case out of 4,500,000 files it linked about 2,000,000 or so but >> there was no way to identify and verify file integrity. The others were >> just lost. >> >> This is not limited to large volume sizes I have seen similar on small >> ~2TiB file systems as well. Also when it happened in a couple cases the >> file system that was taking the I/O (say xfs_fsr -v /home ) another XFS >> filesystem on the same system which was NOT taking much if any I/O gets >> badly corrupted (say /var/test ). Both would be using the same areca >> controllers and same physical discs (same PV's and same VG's but >> different LV's). >> > These symptoms really point to a problem outside XFS - the only time > I've seen this sort of behaviour is on buggy hardware. The > cross-volume corruption is the smoking gun, but proving it is damn > near impossible without expensive lab equipment and a lot of time. > That's what I figured both the high I/O (as JFS did not produce as much I/O as I see under XFS) as well as the utilization reaching 100% on a particular card. Would enabling write buffers have any positive effect here to at least minimize data loss issues? >> Any suggestions on how to isolate or eliminate this would be greatly >> appreciated. >> > I'd start by not running xfs_fsr as a short term workaround to keep > the load below the problem threshold. > > Looking at the iostat output - the volumes sd[f-i] all lock up at > 100% utilisation at the same time. Then looking at this: > Already planning on it, the ?sole? benefit of this corruption is that at least the full volume restore has much less fragmentation. (kind of a killer way to defragment but it does work). Steve _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs