* help investigating some xfs errors
@ 2010-01-12 15:32 Alexandru Coman
2010-01-12 20:26 ` Eric Sandeen
0 siblings, 1 reply; 2+ messages in thread
From: Alexandru Coman @ 2010-01-12 15:32 UTC (permalink / raw)
To: xfs
Hello,
I'm having some problems with an XFS filesystem, and I'm wondering if
anyone can point me in the right direction, it would be greatly appreciated.
I have several XFS filesystems on top of LVM in a RAID-1 (mdadm) created
on a pair of 1TB SATA drives. Running on Linux (Debian, amd64). One of
the XFS filesystems is 600GB in size (65% used), storing ~19 mil files
under 100KB (jpeg), usually under high load (read+write). There are also
a few other smaller XFS partitions on the same drives. It has been
running like this for 11 months, until a few days ago when I started to
get a lot of errors.
On Jan 10, I got a few lines with "ata3: hard resetting link", after
which the partition could not be accessed, I couldn't umount/mount it.
All other partitions were fine. I rebooted the server, but that
filesystem still wouldn't mount (it said "Structure needs cleaning"), I
then ran xfs_repair on it, which reported that I needed to use the "-L"
option to destroy the log. I then ran "xfs_repair -L" which appeared to
fix a lot of errors, and then I was able to mount the filesystem again.
Everything appeared to be ok at that point.
Jan 10 night: a lot of xfs call traces start to appear in the log
Jan 11: xfs call traces along with
- xfs_force_shutdown(dm-4,0x8) called from line 1164 of file
fs/xfs/xfs_trans.c. Return address = 0xffffffffa01999ff
- xfs_imap_to_bp: xfs_trans_read_buf()returned an error 5 on dm-4.
Returning error.
- lots of "Filesystem "dm-4": xfs_log_force: error 5 returned."
The filesystem disappeared, but I could unmount and mount it again with
no errors. At this point I've also decided to update the kernel, and
switched from 2.6.26 to 2.6.30 Then ran xfs_repair which again found a
few errors.
Jan 12: xfs call traces along with:
- Filesystem "dm-4": corrupt dinode 1293803384, extent total = 1,
nblocks = 0. Unmount and run xfs_repair.
- Filesystem "dm-4": corrupt dinode 665458404, extent total = 1, nblocks
= 0. Unmount and run xfs_repair.
- Filesystem "dm-4": corrupt dinode 225720890, extent total = 1, nblocks
= 0. Unmount and run xfs_repair.
I then unmounted the fs and ran xfs_repair again. This time the output
was massive compared to the previous runs, and it put around ~ 100.000
files in lost+found.
Beside 3 lines on Jan 10 with "ata3: hard resetting link", there have
been no sign of possible hardware problems. The raid and the hdd's
appear to be fine, no errors. What's curious is that I'm experiencing
problems only with the large XFS filesystem, and there hasn't been not
even a single error in the logs about the other xfs partitions.
So, if anyone has any ideea what I can research next, to help me find
out more information about what's happening here...
I've uploaded some detailed logs at http://ghost3k.net/xfs1/
Thanks,
Alexandru Coman
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: help investigating some xfs errors
2010-01-12 15:32 help investigating some xfs errors Alexandru Coman
@ 2010-01-12 20:26 ` Eric Sandeen
0 siblings, 0 replies; 2+ messages in thread
From: Eric Sandeen @ 2010-01-12 20:26 UTC (permalink / raw)
To: Alexandru Coman; +Cc: xfs
Alexandru Coman wrote:
> Hello,
>
> I'm having some problems with an XFS filesystem, and I'm wondering if
> anyone can point me in the right direction, it would be greatly appreciated.
>
> I have several XFS filesystems on top of LVM in a RAID-1 (mdadm) created
> on a pair of 1TB SATA drives. Running on Linux (Debian, amd64). One of
> the XFS filesystems is 600GB in size (65% used), storing ~19 mil files
> under 100KB (jpeg), usually under high load (read+write). There are also
> a few other smaller XFS partitions on the same drives. It has been
> running like this for 11 months, until a few days ago when I started to
> get a lot of errors.
>
> On Jan 10, I got a few lines with "ata3: hard resetting link", after
hardware problem...
> which the partition could not be accessed, I couldn't umount/mount it.
> All other partitions were fine. I rebooted the server, but that
> filesystem still wouldn't mount (it said "Structure needs cleaning"), I
> then ran xfs_repair on it, which reported that I needed to use the "-L"
> option to destroy the log. I then ran "xfs_repair -L" which appeared to
> fix a lot of errors, and then I was able to mount the filesystem again.
> Everything appeared to be ok at that point.
>
> Jan 10 night: a lot of xfs call traces start to appear in the log
>
> Jan 11: xfs call traces along with
> - xfs_force_shutdown(dm-4,0x8) called from line 1164 of file
> fs/xfs/xfs_trans.c. Return address = 0xffffffffa01999ff
> - xfs_imap_to_bp: xfs_trans_read_buf()returned an error 5 on dm-4.
> Returning error.
5 is EIO - your storage had an IO error, xfs reacted.
> - lots of "Filesystem "dm-4": xfs_log_force: error 5 returned."
> The filesystem disappeared, but I could unmount and mount it again with
> no errors. At this point I've also decided to update the kernel, and
> switched from 2.6.26 to 2.6.30 Then ran xfs_repair which again found a
> few errors.
after those IO errors, the fs may well be in bad shape, which
xfs_repair will do its best to fix. You'll need to get your
hardware problems sorted out, it seems.
-Eric
> Jan 12: xfs call traces along with:
> - Filesystem "dm-4": corrupt dinode 1293803384, extent total = 1,
> nblocks = 0. Unmount and run xfs_repair.
> - Filesystem "dm-4": corrupt dinode 665458404, extent total = 1, nblocks
> = 0. Unmount and run xfs_repair.
> - Filesystem "dm-4": corrupt dinode 225720890, extent total = 1, nblocks
> = 0. Unmount and run xfs_repair.
> I then unmounted the fs and ran xfs_repair again. This time the output
> was massive compared to the previous runs, and it put around ~ 100.000
> files in lost+found.
>
> Beside 3 lines on Jan 10 with "ata3: hard resetting link", there have
> been no sign of possible hardware problems. The raid and the hdd's
> appear to be fine, no errors. What's curious is that I'm experiencing
> problems only with the large XFS filesystem, and there hasn't been not
> even a single error in the logs about the other xfs partitions.
>
> So, if anyone has any ideea what I can research next, to help me find
> out more information about what's happening here...
>
> I've uploaded some detailed logs at http://ghost3k.net/xfs1/
>
>
> Thanks,
> Alexandru Coman
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2010-01-12 20:25 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-01-12 15:32 help investigating some xfs errors Alexandru Coman
2010-01-12 20:26 ` Eric Sandeen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox