public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Robert Hancock <hancockrwd@gmail.com>
To: Marcin Niskiewicz <mniskiewicz@gmail.com>
Cc: linux-kernel@vger.kernel.org
Subject: Re: hdd errors with libata drivers
Date: Mon, 29 Jun 2009 18:38:10 -0600	[thread overview]
Message-ID: <4A495E72.8020800@gmail.com> (raw)
In-Reply-To: <b6373cd60906290545w2f9a660ci7c7c51794ecf5f56@mail.gmail.com>

On 06/29/2009 06:45 AM, Marcin Niskiewicz wrote:
> Hello!
> I have 2 identical machines - both with 3 disks (WDC WD3000HLFS) -
> root filesystem is under raid1, data partitions are in raid5 (using
> mdadm)
> gentoo, kernel version - 2.6.25-hardened-r8, ahci driver for disks...
> reiserfs as filesystem...
> 00:1f.2 SATA controller: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH)
> 6 port SATA AHCI Controller (rev 02)
> Intel(R) Xeon(R) CPU X3360
>
> About 4 months ago both machines died in the same way - due to problem
> with disks - both raid5-s were down, data filesystem was
> unreachable... (the root filesystem survived)
>
> I thought that it was sth linked with power supply or sth similar - so
> I made some changes to avoid the problem ...
>
> But few days ago it happened again - at the SAME time - BOTH machines
> had problems with disks! (again root filesystem survived, data
> partition was corrupted and raid5 was unreachable)
>
> In dmesg I noticed something like this:
>
> ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> ata1.00: irq_stat 0x40000001
> ata1.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
>           res 51/04:00:34:cf:f3/00:00:00:f3:40/a3 Emask 0x1 (device error)

Here the drive is returning command aborted to a cache flush request, 
suggesting it's having problems writing to the media.

> ata1.00: exception Emask 0x0 SAct 0x7 SErr 0x0 action 0x0
> ata1.00: irq_stat 0x40000008
> ata1.00: cmd 60/08:08:f7:23:8a/00:00:0b:00:00/40 tag 1 ncq 4096 in
>           res 41/40:00:f7:23:8a/21:00:0b:00:00/4b Emask 0x409 (media error)<F>
> ata1.00: status: { DRDY ERR }
> ata1.00: error: { UNC }
> ata1.00: configured for UDMA/133
> ata1: EH complete

And here it's returning an uncorrectable media error to an NCQ read.

>
> On both machines dmesg errors were about ata1.00 ...
>
> Due to http://ata.wiki.kernel.org/index.php/Libata_error_messages it
> looks like hardware problem - but 6 disks in two machines - at the
> same time again?
> I checked all of disks with WD tools before going to production and
> everything was OK... It's really strange ....
>
> I found opinions that it could be kernel bug on ata acpi - and that I
> should add noacpi or noapic option - is it true? wouldn't it have any
> affects (performance etc.) to Intel CPU?

It seems highly unlikely that this is a kernel bug. My guess would be 
something common to both machines, maybe a power problem, etc.

>
> I'm thinking about changing kernel version - maybe not hardened ...
>
> Any ideas?
>
> Thanks for any help!
>
> regards
> nichu


      reply	other threads:[~2009-06-30  0:37 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-06-29 12:45 hdd errors with libata drivers Marcin Niskiewicz
2009-06-30  0:38 ` Robert Hancock [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4A495E72.8020800@gmail.com \
    --to=hancockrwd@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mniskiewicz@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox