public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Eric Sandeen <sandeen@sandeen.net>
To: David Engel <david@istwok.net>
Cc: xfs@oss.sgi.com
Subject: Re: XFS/driver bug or bad drive?
Date: Thu, 01 Oct 2009 19:39:54 -0500	[thread overview]
Message-ID: <4AC54BDA.20806@sandeen.net> (raw)
In-Reply-To: <20091001232759.GA12832@opus.istwok.net>

David Engel wrote:
> Hi,
> 
> I've been trying to diagnose a suspected disk drive problem for about
> a week.  I now think the problem might be a known (and fixed) xfs or
> driver bug, but I'm not 100% sure.  I'm hoping someone here can
> confirm the problem is or isn't an xfs bug.
> 
> The drive in question is a Samsung HD753LJ.  I have two of these
> drives and have had to do three replacements for various reasons in
> <10 months of use.  In short, I don't have a lot of confidence in the
> drive, even though recent evidence seems to point elsewhere.
> 
> The problem occurs when I copy several hundred gigabytes of large
> files (MythTV recordings, to be specific) to the troublesome drive
> from another drive.  When using a stock 2.6.30.8 kernel and xfs, the
> copy eventually fails because the drive quits responding (and won't
> respond again until it is power cycled).  The failure doesn't always
> occur at the same point in the copy, but it does always occur.  Here
> is a log sample of one of the failures.
> 
> Sep 29 17:59:34 tux kernel: XFS mounting filesystem sdb1
> Sep 29 17:59:34 tux kernel: Ending clean XFS mount for filesystem: sdb1
> Sep 29 18:32:07 tux kernel: ata2.00: exception Emask 0x0 SAct 0xffff SErr 0x0 action 0x6 frozen
> Sep 29 18:32:07 tux kernel: ata2.00: cmd 61/00:00:af:02:eb/04:00:17:00:00/40 tag 0 ncq 524288 out
> Sep 29 18:32:07 tux kernel:          res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
> Sep 29 18:32:07 tux kernel: ata2.00: status: { DRDY }
...
> Sep 29 18:32:07 tux kernel: ata2: hard resetting link
> Sep 29 18:32:17 tux kernel: ata2: softreset failed (device not ready)
...

> Sep 29 18:33:07 tux kernel: ata2.00: disabled
> Sep 29 18:33:07 tux kernel: ata2.00: device reported invalid CHS sector 0
> Sep 29 18:33:07 tux last message repeated 15 times
> Sep 29 18:33:07 tux kernel: ata2: EH complete
> Sep 29 18:33:07 tux kernel: sd 1:0:0:0: [sdb] Unhandled error code
> Sep 29 18:33:07 tux kernel: sd 1:0:0:0: [sdb] Result: hostbyte=0x04 driverbyte=0x00
> Sep 29 18:33:07 tux kernel: end_request: I/O error, dev sdb, sector 401276591
> Sep 29 18:33:07 tux kernel: sd 1:0:0:0: [sdb] Unhandled error code
> Sep 29 18:33:07 tux kernel: sd 1:0:0:0: [sdb] Result: hostbyte=0x04 driverbyte=0x00
> Sep 29 18:33:07 tux kernel: end_request: I/O error, dev sdb, sector 401275567

These are all storage errors, not xfs.  I suppose it could be differing 
IO patterns from one fs or the other that trips it up, but nothing above 
is related to an xfs bug; any xfs problems are in response to the above 
IO errors, maybe a hardware problem or a driver problem, not sure - but 
most likely a hardware issue I think.  You might point smartctl at the 
drive and see what it says.

-Eric

> I finally decided to give some other filesystems a try to see if
> anything changed.  Low and behold it did.  Still using a stock
> 2.6.30.8 kernel, but with ext3, ext4 and jfs filesystems, the large
> copy succeeded everytime!  I then decided to try a stock 2.6.31.1
> kernel with xfs.  It worked fine, too!
> 
> My question, now, is -- is this problem a known xfs bug that was fixed
> in 2.6.31.x?  I glanced through the code changes and git log and
> didn't see any smoking gun.  If it's not an xfs bug, does anyone know
> if it might be a block driver bug (ata/ahci, in this case) that was
> only tickled by xfs?
> 
> David

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2009-10-02  0:38 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-10-01 23:27 XFS/driver bug or bad drive? David Engel
2009-10-02  0:39 ` Eric Sandeen [this message]
2009-10-02 16:57   ` David Engel
2009-10-07 11:29     ` Michael-John Turner
2009-10-07 13:24       ` Eric Sandeen
2009-10-07 14:04         ` Michael-John Turner
2009-10-07 15:20           ` David Engel
2009-10-02  8:05 ` Michael Monnerie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4AC54BDA.20806@sandeen.net \
    --to=sandeen@sandeen.net \
    --cc=david@istwok.net \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox