Re: Scary Intel SATA problem: "frozen"

linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Andrew Lyon" <andrew.lyon@gmail.com>
To: jonas@local.se
Cc: Tejun Heo <htejun@gmail.com>, linux-ide@vger.kernel.org
Subject: Re: Scary Intel SATA problem: "frozen"
Date: Wed, 6 Dec 2006 18:45:10 +0000	[thread overview]
Message-ID: <f4527be0612061045sb90c25m1e39ad3e8f15c099@mail.gmail.com> (raw)
In-Reply-To: <457704BA.7090001@local.se>

On 12/6/06, Jonas Lundgren <jonas@local.se> wrote:
> Tejun Heo wrote:
> [--snip--]
>
> >> IF the system does recover, I start getting
> >> the extremly low disk write speeds that I reported above, and only a
> >> reboot will get the performance back to regular.
> >
> > Please full dmesg after your computer got really slow.  I suspect libata
> > decided to switch to PIO mode.
> Here's the relevant part, if you want the whole dmesg look at:
> http://pastebin.ca/269581
>
> [--snip--]
> [82048.255126] can't create port
> [85055.578172] reiser4[unrar(30787)]: disable_write_barrier
> (fs/reiser4/wander.c:234)[zam-1055]:
> [85055.578174] NOTICE: md5 does not support write barriers, using
> synchronous write instead.
> [87825.501998] can't create port
> [89520.019538] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2
> frozen
> [89520.019545] ata2.00: cmd c8/00:08:fe:68:df/00:00:00:00:00/e1 tag 0
> data 4096 in
> [89520.019547]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask
> 0x4 (timeout)
> [89520.322292] ata2: soft resetting port
> [89527.515891] ata2: port is slow to respond, please be patient (Status
> 0xd0)
> [89550.457913] ata2: port failed to respond (30 secs, Status 0xd0)
> [89550.457917] ata2: softreset failed (device not ready)
> [89550.457921] ata2: softreset failed, retrying in 5 secs
> [89555.454103] ata2: hard resetting port
> [89562.799693] ata2: port is slow to respond, please be patient (Status
> 0x80)
> [89585.740239] ata2: port failed to respond (30 secs, Status 0x80)
> [89585.740242] ata2: COMRESET failed (device not ready)
> [89585.740245] ata2: hardreset failed, retrying in 5 secs
> [89590.736978] ata2: hard resetting port
> [89598.081854] ata2: port is slow to respond, please be patient (Status
> 0x80)
> [89617.604742] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> [89617.611034] ata2.00: configured for UDMA/100
> [89617.611042] ata2: EH complete
> [89617.623426] SCSI device sdb: 145226112 512-byte hdwr sectors (74356 MB)
> [89617.633551] sdb: Write Protect is off
> [89617.633553] sdb: Mode Sense: 00 3a 00 00
> [89617.637765] SCSI device sdb: write cache: enabled, read cache:
> enabled, doesn't support DPO or FUA
>
> >
> >> I don't know what causes it, but most of the times when I've gotten it
> >> my system has been under heavy load (compiling, downloading torrents in
> >> 11mb/sec etc). Please let me know if you want any additional info, want
> >> me to try something out, or whatever. My recent hardware upgrade for
> >> around $1200 (to a core2duo system, i965 mobo) is just going to waste
> >> because of this problem. :/
> >
> > Heh, nice machine you got there.  When you look at the dmesg, do the
> > error messages occur only on one of the two drives?  Or are both
> > affected?  If only one is affected,
> >
> > 1. swap the two.  you'll probably have to dance a little bit with boot
> > loader but md should handle that fine once the kernel is loaded.  does
> > the errors persist?  on which device do they occur?  do they follow the
> > drive or stay on the mobo port?
> It follows the drive. (Hardware problem?)
>
> >
> > 2. try different cable / port.  if you change port, again, you need to
> > dance w/ boot loader.  who's carrying the error messages with it?
> Read above.
>
> >
> > 3. try different power plug from different power lane.
> I've got a really good power supply, wich can handle max 560W on the +12
> / -12 V rail alone.
>
> >
> >> I just got so glad when I saw the post of this on linux-ide, I've been
> >> searching like crazy to find another person having the same problem (and
> >> possibly a solution) for the past 2-3 weeks or so.
> >
> > My first guess is frequent transmission errors.  Please report the test
> > results.  Thanks.
> >
>
> I guess it could only be a hardware problem since the error follows the
> drive, and both the drives are identical, so it can't be a firmware
> problem. Correct me if I'm wrong.
>
> I just checked the smart status, and the drive passes, but it seems like
> it's going down though, on the other hand I might misread the results.
>
> smartctl -d ata -A /dev/sdb
> smartctl version 5.36 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen
> Home page is http://smartmontools.sourceforge.net/
>
> === START OF READ SMART DATA SECTION ===
> SMART Attributes Data Structure revision number: 16
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
> UPDATED  WHEN_FAILED RAW_VALUE
>   1 Raw_Read_Error_Rate     0x000b   200   200   051    Pre-fail  Always
>       -       0
>   3 Spin_Up_Time            0x0007   113   111   021    Pre-fail  Always
>     -       4875
>   4 Start_Stop_Count        0x0032   100   100   040    Old_age   Always
>       -       237
>   5 Reallocated_Sector_Ct   0x0033   153   153   140    Pre-fail  Always
>       -       747
>   7 Seek_Error_Rate         0x000b   100   253   051    Pre-fail  Always
>       -       0
>   9 Power_On_Hours          0x0032   076   076   000    Old_age   Always
>       -       18117
>  10 Spin_Retry_Count        0x0013   100   100   051    Pre-fail  Always
>       -       0
>  11 Calibration_Retry_Count 0x0013   100   100   051    Pre-fail  Always
>       -       0
>  12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always
>       -       228
> 194 Temperature_Celsius     0x0022   117   108   000    Old_age   Always
>       -       33
> 196 Reallocated_Event_Count 0x0032   001   001   000    Old_age   Always
>       -       639
> 197 Current_Pending_Sector  0x0012   200   200   000    Old_age   Always
>       -       0
> 198 Offline_Uncorrectable   0x0012   200   200   000    Old_age   Always
>       -       0
> 199 UDMA_CRC_Error_Count    0x000a   200   253   000    Old_age   Always
>       -       0
> 200 Multi_Zone_Error_Rate   0x0009   200   179   051    Pre-fail
> Offline      -       0
>
>
> The "Reallocated_Sector_Ct" and "Reallocated_Event_Count" worries me..
> Should I be worried?

Yes, they are a sign that the drive is wearing out!

Andy

> --
> -Jonas
>
> Name:   Jonas Lundgren
> ICQ#:   52064961
> Mail:   jonas@local.se
> IRC:    neon / neonman @ EFnet, Undernet, Quakenet, freenode
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ide" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

next prev parent reply	other threads:[~2006-12-06 18:45 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-11-28 22:24 Scary Intel SATA problem: "frozen" Jonas Lundgren
2006-11-28 22:59 ` Linus Torvalds
2006-11-28 23:22   ` Jeff Garzik
2006-11-28 23:43     ` Linus Torvalds
2006-11-29  0:38       ` Jeff Garzik
2006-11-29  0:51         ` Linus Torvalds
2006-11-29  2:51       ` Mark Lord
2006-11-29  0:57 ` Tejun Heo
2006-11-29  7:14   ` Jonas Lundgren
2006-11-29  7:29     ` Tejun Heo
2006-11-29 14:11       ` Mark Lord
2006-11-29 16:19       ` Linus Torvalds
2006-12-06 17:58   ` Jonas Lundgren
2006-12-06 18:45     ` Andrew Lyon [this message]
2006-12-07  1:25     ` Tejun Heo
  -- strict thread matches above, loose matches on Subject: below --
2006-11-14 15:04 [git patches] libata fixes Jeff Garzik
2006-11-28 17:31 ` Scary Intel SATA problem: "frozen" Linus Torvalds
2006-11-28 17:37   ` Mark Lord
2006-11-28 17:55     ` Sergei Shtylyov
2006-11-28 20:12       ` Eric D. Mudama
2006-11-28 20:36         ` Sergei Shtylyov
2006-11-29  1:12     ` Tejun Heo
2006-11-28 18:05   ` Alan
2006-11-28 18:33     ` Linus Torvalds
2006-11-28 21:03   ` Jeff Garzik
2006-11-28 21:45     ` Linus Torvalds
2006-11-28 22:18   ` Jeff Garzik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f4527be0612061045sb90c25m1e39ad3e8f15c099@mail.gmail.com \
    --to=andrew.lyon@gmail.com \
    --cc=htejun@gmail.com \
    --cc=jonas@local.se \
    --cc=linux-ide@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).