linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Martin Ammermüller" <tenco@gmx.de>
To: Tejun Heo <htejun@gmail.com>
Cc: linux-ide@vger.kernel.org, jgarzik@pobox.com
Subject: Re: [sata_sil] kernel 2.6.17(-mm2) test - timeout issue
Date: Sat, 05 Aug 2006 15:36:45 +0200	[thread overview]
Message-ID: <1154785005.9220.1.camel@localhost> (raw)
In-Reply-To: <44CD1512.1060802@gmail.com>


[-- Attachment #1.1: Type: text/plain, Size: 4338 bytes --]

Am Montag, den 31.07.2006, 05:22 +0900 schrieb Tejun Heo: 
> Martin Ammermüller wrote:
> > With high disk I/O and a 2.6.18-rc1 kernel i get these errors (depending
> > upon the work i do, up to several times a day):
> > 
> > ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x400000 action 0x2 frozen
> > ata1.00: (BMDMA stat 0x20)
> > ata1.00: tag 0 cmd 0xc8 Emask 0x2 stat 0x58 err 0x0 (HSM violation)
> 
> Hmm... Interesting.  It gets HSM violation first.
> 
> > ata1: soft resetting port
> > ata1: port is slow to respond, please be patient
> > ata1: port failed to respond (30 secs)
> > ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> > ATA: abnormal status 0xD8 on port 0xDCA18087
> > ATA: abnormal status 0xD8 on port 0xDCA18087
> > ATA: abnormal status 0xD8 on port 0xDCA18087
> > ATA: abnormal status 0xD8 on port 0xDCA18087
> > ATA: abnormal status 0xD8 on port 0xDCA18087
> > ata1.00: qc timeout (cmd 0xec)
> > ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> > ata1.00: revalidation failed (errno=-5)
> > ata1: failed to recover some devices, retrying in 5 secs
> > ata1: hard resetting port
> > ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> > ata1.00: configured for UDMA/100
> > ata1: EH complete
> 
> Then two timeouts while recovering.
> 
> > SCSI device sda: 156301488 512-byte hdwr sectors (80026 MB)
> > sda: Write Protect is off
> > sda: Mode Sense: 00 3a 00 00
> > SCSI device sda: drive cache: write back
> > 
> >> Anyways, if your harddisk is doing this regularly, 
> >> your hardware is faulty.  Maybe the connection between the controller 
> >> and the disk is the problem or the disk itself.
> > 
> > I did not get those errors with Windows XP and i am not the only one who
> > has problems running this particular laptop model with a linux kernel.
> > Ok, to be honest, there's actually only one person i know of which
> > bothered enough about exactly the same errors to send me an e-mail (he
> > discovered at least one of my messages to this list). But in my
> > experience there are almost always others getting the same error, but
> > which remain silent.
> 
> It might be that the drive is quirky and raises interrupts prematurely 
> sometimes.  Depending on how the driver performs recovery, the effect 
> can be hidden from user.  Can you try the attached patch and see how the 
> kernel acts?

I tried the patch, but i couldn't see any changes in kerneloutput. I
also noticed, that there are actually two slightly different
error-messages.

#1 (shorter one, without HSM violation):
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata1.00: (BMDMA stat 0x21)
ata1.00: tag 0 cmd 0xc8 Emask 0x4 stat 0x40 err 0x0 (timeout)
ata1: port is slow to respond, please be patient
ata1: port failed to respond (30 secs)
ata1: soft resetting port
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata1.00: configured for UDMA/100
ata1: EH complete
SCSI device sda: 156301488 512-byte hdwr sectors (80026 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: drive cache: write back

#2 (longer, with HSM violation):
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x400000 action 0x2 frozen
ata1.00: (BMDMA stat 0x20)
ata1.00: tag 0 cmd 0xc8 Emask 0x2 stat 0x58 err 0x0 (HSM violation)
ata1: soft resetting port
ata1: port is slow to respond, please be patient
ata1: port failed to respond (30 secs)
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ATA: abnormal status 0xD8 on port 0xDCA44087
ATA: abnormal status 0xD8 on port 0xDCA44087
ATA: abnormal status 0xD8 on port 0xDCA44087
ATA: abnormal status 0xD8 on port 0xDCA44087
ATA: abnormal status 0xD8 on port 0xDCA44087
ata1.00: qc timeout (cmd 0xec)
ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata1.00: revalidation failed (errno=-5)
ata1: failed to recover some devices, retrying in 5 secs
ata1: hard resetting port
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata1.00: configured for UDMA/100
ata1: EH complete
SCSI device sda: 156301488 512-byte hdwr sectors (80026 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: drive cache: write back

Additionally, i attached the output of "smartctl --all -d ata /dev/sda"

Regards,
Martin Ammermüller

[-- Attachment #1.2: smart_all --]
[-- Type: text/plain, Size: 5055 bytes --]

smartctl version 5.34 [i686-pc-linux-gnu] Copyright (C) 2002-5 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model:     TOSHIBA MK8032GSX
Serial Number:    26GI5560S
Firmware Version: AS111G
User Capacity:    80.026.361.856 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   6
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Sat Aug  5 15:33:19 2006 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		 ( 331) seconds.
Offline data collection
capabilities: 			 (0x5b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 (  65) minutes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   050    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   100   100   050    Pre-fail  Offline      -       0
  3 Spin_Up_Time            0x0027   100   100   001    Pre-fail  Always       -       2104
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       482
  5 Reallocated_Sector_Ct   0x0033   100   100   050    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000b   100   100   050    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   100   100   050    Pre-fail  Offline      -       0
  9 Power_On_Hours          0x0032   096   096   000    Old_age   Always       -       1729
 10 Spin_Retry_Count        0x0033   109   100   030    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       394
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       97
193 Load_Cycle_Count        0x0032   089   089   000    Old_age   Always       -       111507
194 Temperature_Celsius     0x0022   100   100   000    Old_age   Always       -       37 (Lifetime Min/Max 15/50)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
220 Disk_Shift              0x0002   100   100   000    Old_age   Always       -       122
222 Loaded_Hours            0x0032   098   098   000    Old_age   Always       -       1168
223 Load_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
224 Load_Friction           0x0022   100   100   000    Old_age   Always       -       0
226 Load-in_Time            0x0026   100   100   000    Old_age   Always       -       324
240 Head_Flying_Hours       0x0001   100   100   001    Pre-fail  Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      1729         -
# 2  Short offline       Completed without error       00%      1728         -
# 3  Short offline       Completed without error       00%      1128         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


[-- Attachment #2: Dies ist ein digital signierter Nachrichtenteil --]
[-- Type: application/pgp-signature, Size: 191 bytes --]

  reply	other threads:[~2006-08-05 14:25 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-06-24 20:50 [sata_sil] kernel 2.6.17(-mm2) test - timeout issue Martin Ammermüller
2006-06-25  3:06 ` Tejun Heo
2006-07-24  8:21   ` Martin Ammermüller
2006-07-30 20:22     ` Tejun Heo
2006-08-05 13:36       ` Martin Ammermüller [this message]
2006-08-06 15:51         ` Tejun Heo
2006-08-14 10:12           ` Martin Ammermüller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1154785005.9220.1.camel@localhost \
    --to=tenco@gmx.de \
    --cc=htejun@gmail.com \
    --cc=jgarzik@pobox.com \
    --cc=linux-ide@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).