From: Robert Hancock <hancockrwd@gmail.com>
To: Marcin Niskiewicz <mniskiewicz@gmail.com>
Cc: linux-kernel@vger.kernel.org
Subject: Re: hdd errors with libata drivers
Date: Mon, 29 Jun 2009 18:38:10 -0600 [thread overview]
Message-ID: <4A495E72.8020800@gmail.com> (raw)
In-Reply-To: <b6373cd60906290545w2f9a660ci7c7c51794ecf5f56@mail.gmail.com>
On 06/29/2009 06:45 AM, Marcin Niskiewicz wrote:
> Hello!
> I have 2 identical machines - both with 3 disks (WDC WD3000HLFS) -
> root filesystem is under raid1, data partitions are in raid5 (using
> mdadm)
> gentoo, kernel version - 2.6.25-hardened-r8, ahci driver for disks...
> reiserfs as filesystem...
> 00:1f.2 SATA controller: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH)
> 6 port SATA AHCI Controller (rev 02)
> Intel(R) Xeon(R) CPU X3360
>
> About 4 months ago both machines died in the same way - due to problem
> with disks - both raid5-s were down, data filesystem was
> unreachable... (the root filesystem survived)
>
> I thought that it was sth linked with power supply or sth similar - so
> I made some changes to avoid the problem ...
>
> But few days ago it happened again - at the SAME time - BOTH machines
> had problems with disks! (again root filesystem survived, data
> partition was corrupted and raid5 was unreachable)
>
> In dmesg I noticed something like this:
>
> ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> ata1.00: irq_stat 0x40000001
> ata1.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
> res 51/04:00:34:cf:f3/00:00:00:f3:40/a3 Emask 0x1 (device error)
Here the drive is returning command aborted to a cache flush request,
suggesting it's having problems writing to the media.
> ata1.00: exception Emask 0x0 SAct 0x7 SErr 0x0 action 0x0
> ata1.00: irq_stat 0x40000008
> ata1.00: cmd 60/08:08:f7:23:8a/00:00:0b:00:00/40 tag 1 ncq 4096 in
> res 41/40:00:f7:23:8a/21:00:0b:00:00/4b Emask 0x409 (media error)<F>
> ata1.00: status: { DRDY ERR }
> ata1.00: error: { UNC }
> ata1.00: configured for UDMA/133
> ata1: EH complete
And here it's returning an uncorrectable media error to an NCQ read.
>
> On both machines dmesg errors were about ata1.00 ...
>
> Due to http://ata.wiki.kernel.org/index.php/Libata_error_messages it
> looks like hardware problem - but 6 disks in two machines - at the
> same time again?
> I checked all of disks with WD tools before going to production and
> everything was OK... It's really strange ....
>
> I found opinions that it could be kernel bug on ata acpi - and that I
> should add noacpi or noapic option - is it true? wouldn't it have any
> affects (performance etc.) to Intel CPU?
It seems highly unlikely that this is a kernel bug. My guess would be
something common to both machines, maybe a power problem, etc.
>
> I'm thinking about changing kernel version - maybe not hardened ...
>
> Any ideas?
>
> Thanks for any help!
>
> regards
> nichu
prev parent reply other threads:[~2009-06-30 0:37 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-06-29 12:45 hdd errors with libata drivers Marcin Niskiewicz
2009-06-30 0:38 ` Robert Hancock [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4A495E72.8020800@gmail.com \
--to=hancockrwd@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mniskiewicz@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.