From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id o0CKPkIp067262 for ; Tue, 12 Jan 2010 14:25:46 -0600 Received: from mail.sandeen.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 7C6AF15F726 for ; Tue, 12 Jan 2010 12:26:40 -0800 (PST) Received: from mail.sandeen.net (64-131-60-146.usfamily.net [64.131.60.146]) by cuda.sgi.com with ESMTP id H8hTnHm9hxu839es for ; Tue, 12 Jan 2010 12:26:40 -0800 (PST) Message-ID: <4B4CDB00.1080103@sandeen.net> Date: Tue, 12 Jan 2010 14:26:40 -0600 From: Eric Sandeen MIME-Version: 1.0 Subject: Re: help investigating some xfs errors References: <4B4C95F1.20106@gmx.net> In-Reply-To: <4B4C95F1.20106@gmx.net> List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: xfs-bounces@oss.sgi.com Errors-To: xfs-bounces@oss.sgi.com To: Alexandru Coman Cc: xfs@oss.sgi.com Alexandru Coman wrote: > Hello, > > I'm having some problems with an XFS filesystem, and I'm wondering if > anyone can point me in the right direction, it would be greatly appreciated. > > I have several XFS filesystems on top of LVM in a RAID-1 (mdadm) created > on a pair of 1TB SATA drives. Running on Linux (Debian, amd64). One of > the XFS filesystems is 600GB in size (65% used), storing ~19 mil files > under 100KB (jpeg), usually under high load (read+write). There are also > a few other smaller XFS partitions on the same drives. It has been > running like this for 11 months, until a few days ago when I started to > get a lot of errors. > > On Jan 10, I got a few lines with "ata3: hard resetting link", after hardware problem... > which the partition could not be accessed, I couldn't umount/mount it. > All other partitions were fine. I rebooted the server, but that > filesystem still wouldn't mount (it said "Structure needs cleaning"), I > then ran xfs_repair on it, which reported that I needed to use the "-L" > option to destroy the log. I then ran "xfs_repair -L" which appeared to > fix a lot of errors, and then I was able to mount the filesystem again. > Everything appeared to be ok at that point. > > Jan 10 night: a lot of xfs call traces start to appear in the log > > Jan 11: xfs call traces along with > - xfs_force_shutdown(dm-4,0x8) called from line 1164 of file > fs/xfs/xfs_trans.c. Return address = 0xffffffffa01999ff > - xfs_imap_to_bp: xfs_trans_read_buf()returned an error 5 on dm-4. > Returning error. 5 is EIO - your storage had an IO error, xfs reacted. > - lots of "Filesystem "dm-4": xfs_log_force: error 5 returned." > The filesystem disappeared, but I could unmount and mount it again with > no errors. At this point I've also decided to update the kernel, and > switched from 2.6.26 to 2.6.30 Then ran xfs_repair which again found a > few errors. after those IO errors, the fs may well be in bad shape, which xfs_repair will do its best to fix. You'll need to get your hardware problems sorted out, it seems. -Eric > Jan 12: xfs call traces along with: > - Filesystem "dm-4": corrupt dinode 1293803384, extent total = 1, > nblocks = 0. Unmount and run xfs_repair. > - Filesystem "dm-4": corrupt dinode 665458404, extent total = 1, nblocks > = 0. Unmount and run xfs_repair. > - Filesystem "dm-4": corrupt dinode 225720890, extent total = 1, nblocks > = 0. Unmount and run xfs_repair. > I then unmounted the fs and ran xfs_repair again. This time the output > was massive compared to the previous runs, and it put around ~ 100.000 > files in lost+found. > > Beside 3 lines on Jan 10 with "ata3: hard resetting link", there have > been no sign of possible hardware problems. The raid and the hdd's > appear to be fine, no errors. What's curious is that I'm experiencing > problems only with the large XFS filesystem, and there hasn't been not > even a single error in the logs about the other xfs partitions. > > So, if anyone has any ideea what I can research next, to help me find > out more information about what's happening here... > > I've uploaded some detailed logs at http://ghost3k.net/xfs1/ > > > Thanks, > Alexandru Coman > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs > _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs