From: Eric Sandeen <sandeen@sandeen.net>
To: David Engel <david@istwok.net>
Cc: xfs@oss.sgi.com
Subject: Re: XFS/driver bug or bad drive?
Date: Thu, 01 Oct 2009 19:39:54 -0500 [thread overview]
Message-ID: <4AC54BDA.20806@sandeen.net> (raw)
In-Reply-To: <20091001232759.GA12832@opus.istwok.net>
David Engel wrote:
> Hi,
>
> I've been trying to diagnose a suspected disk drive problem for about
> a week. I now think the problem might be a known (and fixed) xfs or
> driver bug, but I'm not 100% sure. I'm hoping someone here can
> confirm the problem is or isn't an xfs bug.
>
> The drive in question is a Samsung HD753LJ. I have two of these
> drives and have had to do three replacements for various reasons in
> <10 months of use. In short, I don't have a lot of confidence in the
> drive, even though recent evidence seems to point elsewhere.
>
> The problem occurs when I copy several hundred gigabytes of large
> files (MythTV recordings, to be specific) to the troublesome drive
> from another drive. When using a stock 2.6.30.8 kernel and xfs, the
> copy eventually fails because the drive quits responding (and won't
> respond again until it is power cycled). The failure doesn't always
> occur at the same point in the copy, but it does always occur. Here
> is a log sample of one of the failures.
>
> Sep 29 17:59:34 tux kernel: XFS mounting filesystem sdb1
> Sep 29 17:59:34 tux kernel: Ending clean XFS mount for filesystem: sdb1
> Sep 29 18:32:07 tux kernel: ata2.00: exception Emask 0x0 SAct 0xffff SErr 0x0 action 0x6 frozen
> Sep 29 18:32:07 tux kernel: ata2.00: cmd 61/00:00:af:02:eb/04:00:17:00:00/40 tag 0 ncq 524288 out
> Sep 29 18:32:07 tux kernel: res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
> Sep 29 18:32:07 tux kernel: ata2.00: status: { DRDY }
...
> Sep 29 18:32:07 tux kernel: ata2: hard resetting link
> Sep 29 18:32:17 tux kernel: ata2: softreset failed (device not ready)
...
> Sep 29 18:33:07 tux kernel: ata2.00: disabled
> Sep 29 18:33:07 tux kernel: ata2.00: device reported invalid CHS sector 0
> Sep 29 18:33:07 tux last message repeated 15 times
> Sep 29 18:33:07 tux kernel: ata2: EH complete
> Sep 29 18:33:07 tux kernel: sd 1:0:0:0: [sdb] Unhandled error code
> Sep 29 18:33:07 tux kernel: sd 1:0:0:0: [sdb] Result: hostbyte=0x04 driverbyte=0x00
> Sep 29 18:33:07 tux kernel: end_request: I/O error, dev sdb, sector 401276591
> Sep 29 18:33:07 tux kernel: sd 1:0:0:0: [sdb] Unhandled error code
> Sep 29 18:33:07 tux kernel: sd 1:0:0:0: [sdb] Result: hostbyte=0x04 driverbyte=0x00
> Sep 29 18:33:07 tux kernel: end_request: I/O error, dev sdb, sector 401275567
These are all storage errors, not xfs. I suppose it could be differing
IO patterns from one fs or the other that trips it up, but nothing above
is related to an xfs bug; any xfs problems are in response to the above
IO errors, maybe a hardware problem or a driver problem, not sure - but
most likely a hardware issue I think. You might point smartctl at the
drive and see what it says.
-Eric
> I finally decided to give some other filesystems a try to see if
> anything changed. Low and behold it did. Still using a stock
> 2.6.30.8 kernel, but with ext3, ext4 and jfs filesystems, the large
> copy succeeded everytime! I then decided to try a stock 2.6.31.1
> kernel with xfs. It worked fine, too!
>
> My question, now, is -- is this problem a known xfs bug that was fixed
> in 2.6.31.x? I glanced through the code changes and git log and
> didn't see any smoking gun. If it's not an xfs bug, does anyone know
> if it might be a block driver bug (ata/ahci, in this case) that was
> only tickled by xfs?
>
> David
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2009-10-02 0:38 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-10-01 23:27 XFS/driver bug or bad drive? David Engel
2009-10-02 0:39 ` Eric Sandeen [this message]
2009-10-02 16:57 ` David Engel
2009-10-07 11:29 ` Michael-John Turner
2009-10-07 13:24 ` Eric Sandeen
2009-10-07 14:04 ` Michael-John Turner
2009-10-07 15:20 ` David Engel
2009-10-02 8:05 ` Michael Monnerie
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4AC54BDA.20806@sandeen.net \
--to=sandeen@sandeen.net \
--cc=david@istwok.net \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox