Re: Scary Intel SATA problem: "frozen"

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jonas Lundgren <jonas@local.se>
To: Tejun Heo <htejun@gmail.com>, linux-ide@vger.kernel.org
Subject: Re: Scary Intel SATA problem: "frozen"
Date: Wed, 06 Dec 2006 18:58:18 +0100	[thread overview]
Message-ID: <457704BA.7090001@local.se> (raw)
In-Reply-To: <456CDB06.40806@gmail.com>

Tejun Heo wrote:
[--snip--]

>> IF the system does recover, I start getting
>> the extremly low disk write speeds that I reported above, and only a
>> reboot will get the performance back to regular.
> 
> Please full dmesg after your computer got really slow.  I suspect libata
> decided to switch to PIO mode.
Here's the relevant part, if you want the whole dmesg look at:
http://pastebin.ca/269581

[--snip--]
[82048.255126] can't create port
[85055.578172] reiser4[unrar(30787)]: disable_write_barrier
(fs/reiser4/wander.c:234)[zam-1055]:
[85055.578174] NOTICE: md5 does not support write barriers, using
synchronous write instead.
[87825.501998] can't create port
[89520.019538] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2
frozen
[89520.019545] ata2.00: cmd c8/00:08:fe:68:df/00:00:00:00:00/e1 tag 0
data 4096 in
[89520.019547]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask
0x4 (timeout)
[89520.322292] ata2: soft resetting port
[89527.515891] ata2: port is slow to respond, please be patient (Status
0xd0)
[89550.457913] ata2: port failed to respond (30 secs, Status 0xd0)
[89550.457917] ata2: softreset failed (device not ready)
[89550.457921] ata2: softreset failed, retrying in 5 secs
[89555.454103] ata2: hard resetting port
[89562.799693] ata2: port is slow to respond, please be patient (Status
0x80)
[89585.740239] ata2: port failed to respond (30 secs, Status 0x80)
[89585.740242] ata2: COMRESET failed (device not ready)
[89585.740245] ata2: hardreset failed, retrying in 5 secs
[89590.736978] ata2: hard resetting port
[89598.081854] ata2: port is slow to respond, please be patient (Status
0x80)
[89617.604742] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[89617.611034] ata2.00: configured for UDMA/100
[89617.611042] ata2: EH complete
[89617.623426] SCSI device sdb: 145226112 512-byte hdwr sectors (74356 MB)
[89617.633551] sdb: Write Protect is off
[89617.633553] sdb: Mode Sense: 00 3a 00 00
[89617.637765] SCSI device sdb: write cache: enabled, read cache:
enabled, doesn't support DPO or FUA

> 
>> I don't know what causes it, but most of the times when I've gotten it
>> my system has been under heavy load (compiling, downloading torrents in
>> 11mb/sec etc). Please let me know if you want any additional info, want
>> me to try something out, or whatever. My recent hardware upgrade for
>> around $1200 (to a core2duo system, i965 mobo) is just going to waste
>> because of this problem. :/
> 
> Heh, nice machine you got there.  When you look at the dmesg, do the
> error messages occur only on one of the two drives?  Or are both
> affected?  If only one is affected,
> 
> 1. swap the two.  you'll probably have to dance a little bit with boot
> loader but md should handle that fine once the kernel is loaded.  does
> the errors persist?  on which device do they occur?  do they follow the
> drive or stay on the mobo port?
It follows the drive. (Hardware problem?)

> 
> 2. try different cable / port.  if you change port, again, you need to
> dance w/ boot loader.  who's carrying the error messages with it?
Read above.

> 
> 3. try different power plug from different power lane.
I've got a really good power supply, wich can handle max 560W on the +12
/ -12 V rail alone.

> 
>> I just got so glad when I saw the post of this on linux-ide, I've been
>> searching like crazy to find another person having the same problem (and
>> possibly a solution) for the past 2-3 weeks or so.
> 
> My first guess is frequent transmission errors.  Please report the test
> results.  Thanks.
>

I guess it could only be a hardware problem since the error follows the
drive, and both the drives are identical, so it can't be a firmware
problem. Correct me if I'm wrong.

I just checked the smart status, and the drive passes, but it seems like
it's going down though, on the other hand I might misread the results.

smartctl -d ata -A /dev/sdb
smartctl version 5.36 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   200   200   051    Pre-fail  Always
      -       0
  3 Spin_Up_Time            0x0007   113   111   021    Pre-fail  Always
    -       4875
  4 Start_Stop_Count        0x0032   100   100   040    Old_age   Always
      -       237
  5 Reallocated_Sector_Ct   0x0033   153   153   140    Pre-fail  Always
      -       747
  7 Seek_Error_Rate         0x000b   100   253   051    Pre-fail  Always
      -       0
  9 Power_On_Hours          0x0032   076   076   000    Old_age   Always
      -       18117
 10 Spin_Retry_Count        0x0013   100   100   051    Pre-fail  Always
      -       0
 11 Calibration_Retry_Count 0x0013   100   100   051    Pre-fail  Always
      -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always
      -       228
194 Temperature_Celsius     0x0022   117   108   000    Old_age   Always
      -       33
196 Reallocated_Event_Count 0x0032   001   001   000    Old_age   Always
      -       639
197 Current_Pending_Sector  0x0012   200   200   000    Old_age   Always
      -       0
198 Offline_Uncorrectable   0x0012   200   200   000    Old_age   Always
      -       0
199 UDMA_CRC_Error_Count    0x000a   200   253   000    Old_age   Always
      -       0
200 Multi_Zone_Error_Rate   0x0009   200   179   051    Pre-fail
Offline      -       0

The "Reallocated_Sector_Ct" and "Reallocated_Event_Count" worries me..
Should I be worried?

-- 
-Jonas

Name:   Jonas Lundgren
ICQ#:   52064961
Mail:   jonas@local.se
IRC:    neon / neonman @ EFnet, Undernet, Quakenet, freenode

next prev parent reply	other threads:[~2006-12-06 17:58 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-11-28 22:24 Scary Intel SATA problem: "frozen" Jonas Lundgren
2006-11-28 22:59 ` Linus Torvalds
2006-11-28 23:22   ` Jeff Garzik
2006-11-28 23:43     ` Linus Torvalds
2006-11-29  0:38       ` Jeff Garzik
2006-11-29  0:51         ` Linus Torvalds
2006-11-29  2:51       ` Mark Lord
2006-11-29  0:57 ` Tejun Heo
2006-11-29  7:14   ` Jonas Lundgren
2006-11-29  7:29     ` Tejun Heo
2006-11-29 14:11       ` Mark Lord
2006-11-29 16:19       ` Linus Torvalds
2006-12-06 17:58   ` Jonas Lundgren [this message]
2006-12-06 18:45     ` Andrew Lyon
2006-12-07  1:25     ` Tejun Heo
  -- strict thread matches above, loose matches on Subject: below --
2006-11-14 15:04 [git patches] libata fixes Jeff Garzik
2006-11-28 17:31 ` Scary Intel SATA problem: "frozen" Linus Torvalds
2006-11-28 17:37   ` Mark Lord
2006-11-28 17:55     ` Sergei Shtylyov
2006-11-28 20:12       ` Eric D. Mudama
2006-11-28 20:36         ` Sergei Shtylyov
2006-11-29  1:12     ` Tejun Heo
2006-11-28 18:05   ` Alan
2006-11-28 18:33     ` Linus Torvalds
2006-11-28 21:03   ` Jeff Garzik
2006-11-28 21:45     ` Linus Torvalds
2006-11-28 22:18   ` Jeff Garzik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=457704BA.7090001@local.se \
    --to=jonas@local.se \
    --cc=htejun@gmail.com \
    --cc=linux-ide@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.