From: Eric Sandeen <sandeen@sandeen.net>
To: David Engel <david@istwok.net>
Cc: xfs@oss.sgi.com
Subject: Re: XFS/driver bug or bad drive?
Date: Thu, 01 Oct 2009 19:39:54 -0500 [thread overview]
Message-ID: <4AC54BDA.20806@sandeen.net> (raw)
In-Reply-To: <20091001232759.GA12832@opus.istwok.net>
David Engel wrote:
> Hi,
>
> I've been trying to diagnose a suspected disk drive problem for about
> a week. I now think the problem might be a known (and fixed) xfs or
> driver bug, but I'm not 100% sure. I'm hoping someone here can
> confirm the problem is or isn't an xfs bug.
>
> The drive in question is a Samsung HD753LJ. I have two of these
> drives and have had to do three replacements for various reasons in
> <10 months of use. In short, I don't have a lot of confidence in the
> drive, even though recent evidence seems to point elsewhere.
>
> The problem occurs when I copy several hundred gigabytes of large
> files (MythTV recordings, to be specific) to the troublesome drive
> from another drive. When using a stock 2.6.30.8 kernel and xfs, the
> copy eventually fails because the drive quits responding (and won't
> respond again until it is power cycled). The failure doesn't always
> occur at the same point in the copy, but it does always occur. Here
> is a log sample of one of the failures.
>
> Sep 29 17:59:34 tux kernel: XFS mounting filesystem sdb1
> Sep 29 17:59:34 tux kernel: Ending clean XFS mount for filesystem: sdb1
> Sep 29 18:32:07 tux kernel: ata2.00: exception Emask 0x0 SAct 0xffff SErr 0x0 action 0x6 frozen
> Sep 29 18:32:07 tux kernel: ata2.00: cmd 61/00:00:af:02:eb/04:00:17:00:00/40 tag 0 ncq 524288 out
> Sep 29 18:32:07 tux kernel: res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
> Sep 29 18:32:07 tux kernel: ata2.00: status: { DRDY }
...
> Sep 29 18:32:07 tux kernel: ata2: hard resetting link
> Sep 29 18:32:17 tux kernel: ata2: softreset failed (device not ready)
...
> Sep 29 18:33:07 tux kernel: ata2.00: disabled
> Sep 29 18:33:07 tux kernel: ata2.00: device reported invalid CHS sector 0
> Sep 29 18:33:07 tux last message repeated 15 times
> Sep 29 18:33:07 tux kernel: ata2: EH complete
> Sep 29 18:33:07 tux kernel: sd 1:0:0:0: [sdb] Unhandled error code
> Sep 29 18:33:07 tux kernel: sd 1:0:0:0: [sdb] Result: hostbyte=0x04 driverbyte=0x00
> Sep 29 18:33:07 tux kernel: end_request: I/O error, dev sdb, sector 401276591
> Sep 29 18:33:07 tux kernel: sd 1:0:0:0: [sdb] Unhandled error code
> Sep 29 18:33:07 tux kernel: sd 1:0:0:0: [sdb] Result: hostbyte=0x04 driverbyte=0x00
> Sep 29 18:33:07 tux kernel: end_request: I/O error, dev sdb, sector 401275567
These are all storage errors, not xfs. I suppose it could be differing
IO patterns from one fs or the other that trips it up, but nothing above
is related to an xfs bug; any xfs problems are in response to the above
IO errors, maybe a hardware problem or a driver problem, not sure - but
most likely a hardware issue I think. You might point smartctl at the
drive and see what it says.
-Eric
> I finally decided to give some other filesystems a try to see if
> anything changed. Low and behold it did. Still using a stock
> 2.6.30.8 kernel, but with ext3, ext4 and jfs filesystems, the large
> copy succeeded everytime! I then decided to try a stock 2.6.31.1
> kernel with xfs. It worked fine, too!
>
> My question, now, is -- is this problem a known xfs bug that was fixed
> in 2.6.31.x? I glanced through the code changes and git log and
> didn't see any smoking gun. If it's not an xfs bug, does anyone know
> if it might be a block driver bug (ata/ahci, in this case) that was
> only tickled by xfs?
>
> David
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2009-10-02 0:38 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-10-01 23:27 XFS/driver bug or bad drive? David Engel
2009-10-02 0:39 ` Eric Sandeen [this message]
2009-10-02 16:57 ` David Engel
2009-10-07 11:29 ` Michael-John Turner
2009-10-07 13:24 ` Eric Sandeen
2009-10-07 14:04 ` Michael-John Turner
2009-10-07 15:20 ` David Engel
2009-10-02 8:05 ` Michael Monnerie
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4AC54BDA.20806@sandeen.net \
--to=sandeen@sandeen.net \
--cc=david@istwok.net \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.