* SATA exceptions
@ 2007-07-05 21:46 S.Çağlar Onur
2007-07-06 1:52 ` Tejun Heo
0 siblings, 1 reply; 13+ messages in thread
From: S.Çağlar Onur @ 2007-07-05 21:46 UTC (permalink / raw)
To: LKML; +Cc: Tejun Heo, Jeff Garzik
[-- Attachment #1: Type: text/plain, Size: 3107 bytes --]
Hi;
I'm starting to see following logs in dmesg for a while (according to kern.log
these starts with 2.6.22-rc4) on HP Pavilion dv2385ea
...
[ 4260.278408] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[ 4260.278417] ata1.00: (irq_stat 0x40000001)
[ 4260.278427] ata1.00: cmd ca/00:08:d0:88:bc/00:00:00:00:00/ee tag 0 cdb 0x0
data 4096 out
[ 4260.278430] res 51/40:01:d7:88:bc/00:00:0e:00:00/ee Emask 0x9
(media error)
[ 4260.911247] ata1.00: configured for UDMA/100
[ 4260.911263] ata1: EH complete
[ 4260.911809] sd 0:0:0:0: [sda] 312581808 512-byte hardware sectors (160042
MB)
[ 4260.912127] sd 0:0:0:0: [sda] Write Protect is off
[ 4260.912135] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[ 4260.912672] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled,
doesn't support DPO or FUA
...
zangetsu log # grep "ata1.00: exception" kern.log
Jun 10 23:23:33 localhost kernel: [ 3472.867317] ata1.00: exception Emask 0x0
SAct 0x0 SErr 0x0 action 0x0
Jun 12 17:09:56 localhost kernel: [ 2470.530793] ata1.00: exception Emask 0x0
SAct 0x0 SErr 0x0 action 0x0
Jun 12 17:11:19 localhost kernel: [ 2553.874662] ata1.00: exception Emask 0x0
SAct 0x0 SErr 0x0 action 0x0
Jun 13 12:08:46 localhost kernel: [ 2235.664683] ata1.00: exception Emask 0x0
SAct 0x0 SErr 0x0 action 0x0
Jun 17 16:59:23 localhost kernel: [ 9208.673909] ata1.00: exception Emask 0x0
SAct 0x0 SErr 0x0 action 0x0
Jun 22 13:35:56 localhost kernel: [ 1719.191725] ata1.00: exception Emask 0x0
SAct 0x0 SErr 0x0 action 0x0
Jun 24 14:13:46 localhost kernel: [ 5822.239007] ata1.00: exception Emask 0x0
SAct 0x0 SErr 0x0 action 0x0
Jun 26 15:24:11 localhost kernel: [ 1315.455726] ata1.00: exception Emask 0x0
SAct 0x0 SErr 0x0 action 0x0
Jun 26 15:36:46 localhost kernel: [ 2069.003291] ata1.00: exception Emask 0x0
SAct 0x0 SErr 0x0 action 0x0
Jun 26 15:37:01 localhost kernel: [ 2082.955499] ata1.00: exception Emask 0x0
SAct 0x0 SErr 0x0 action 0x0
Jun 26 15:37:21 localhost kernel: [ 2103.400411] ata1.00: exception Emask 0x0
SAct 0x0 SErr 0x0 action 0x0
Jun 26 15:37:38 localhost kernel: [ 2120.251088] ata1.00: exception Emask 0x0
SAct 0x0 SErr 0x0 action 0x0
Jun 28 10:23:55 localhost kernel: [ 383.355017] ata1.00: exception Emask 0x0
SAct 0x0 SErr 0x0 action 0x0
Jul 5 22:05:14 localhost kernel: [ 4260.278408] ata1.00: exception Emask 0x0
SAct 0x0 SErr 0x0 action 0x0
Jul 5 22:05:52 localhost kernel: [ 4297.784773] ata1.00: exception Emask 0x0
SAct 0x0 SErr 0x0 action 0x0
Neither fsck nor badblocks didn't report anything wrong and i far as i know
these didn't make any real problem until now but i'm not sure they are
harmness or not (or indicates a hw error), so
dmesg, smartctl -a, /proc/interrupts and lspci -vv outputs can be found @[1]
if anything else needed please just tell...
[1] http://cekirdek.pardus.org.tr/~caglar/SATA/
Cheers
--
S.Çağlar Onur <caglar@pardus.org.tr>
http://cekirdek.pardus.org.tr/~caglar/
Linux is like living in a teepee. No Windows, no Gates and an Apache in house!
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: SATA exceptions
2007-07-05 21:46 SATA exceptions S.Çağlar Onur
@ 2007-07-06 1:52 ` Tejun Heo
2007-07-06 11:43 ` S.Çağlar Onur
0 siblings, 1 reply; 13+ messages in thread
From: Tejun Heo @ 2007-07-06 1:52 UTC (permalink / raw)
To: caglar; +Cc: LKML, Jeff Garzik
Hello,
S.Çağlar Onur wrote:
> [ 4260.278427] ata1.00: cmd ca/00:08:d0:88:bc/00:00:00:00:00/ee tag 0 cdb 0x0
> data 4096 out
> [ 4260.278430] res 51/40:01:d7:88:bc/00:00:0e:00:00/ee Emask 0x9
> (media error)
That's media error on sector 247236823 on WRITE. Media errors on write
are bad signs - it usually means the drive even failed to remap the
sector because extra space ran out. I'm not sure this is the case here
tho - the smart log is clear. Please run smart short/long tests and see
what they say.
--
tejun
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: SATA exceptions
2007-07-06 1:52 ` Tejun Heo
@ 2007-07-06 11:43 ` S.Çağlar Onur
0 siblings, 0 replies; 13+ messages in thread
From: S.Çağlar Onur @ 2007-07-06 11:43 UTC (permalink / raw)
To: Tejun Heo; +Cc: LKML, Jeff Garzik
[-- Attachment #1: Type: text/plain, Size: 1461 bytes --]
Hi;
06 Tem 2007 Cum tarihinde, Tejun Heo şunları yazmıştı:
> S.Çağlar Onur wrote:
> > [ 4260.278427] ata1.00: cmd ca/00:08:d0:88:bc/00:00:00:00:00/ee tag 0 cdb
> > 0x0 data 4096 out
> > [ 4260.278430] res 51/40:01:d7:88:bc/00:00:0e:00:00/ee Emask 0x9
> > (media error)
>
> That's media error on sector 247236823 on WRITE. Media errors on write
> are bad signs - it usually means the drive even failed to remap the
> sector because extra space ran out.
Hmm, more than 50GB is empty on disk :)
> I'm not sure this is the case here
> tho - the smart log is clear. Please run smart short/long tests and see
> what they say.
Both completed without a problem;
zangetsu ~ # smartctl -l selftest /dev/sda
smartctl version 5.37 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours)
LBA_of_first_error
# 1 Extended offline Completed without error 00% 357 -
# 2 Short offline Completed without error 00% 355 -
If you want me to try something else please just say :)
Cheers
--
S.Çağlar Onur <caglar@pardus.org.tr>
http://cekirdek.pardus.org.tr/~caglar/
Linux is like living in a teepee. No Windows, no Gates and an Apache in house!
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: SATA exceptions
[not found] ` <fa.NhBdOmH9+RkW+QO72+TjPRwEKj0@ifi.uio.no>
@ 2007-07-07 18:05 ` Robert Hancock
2007-07-07 21:35 ` S.Çağlar Onur
0 siblings, 1 reply; 13+ messages in thread
From: Robert Hancock @ 2007-07-07 18:05 UTC (permalink / raw)
To: caglar; +Cc: Tejun Heo, LKML, Jeff Garzik
S.Çağlar Onur wrote:
> 06 Tem 2007 Cum tarihinde, Tejun Heo şunları yazmıştı:
>> S.Çağlar Onur wrote:
>>> [ 4260.278427] ata1.00: cmd ca/00:08:d0:88:bc/00:00:00:00:00/ee tag 0 cdb
>>> 0x0 data 4096 out
>>> [ 4260.278430] res 51/40:01:d7:88:bc/00:00:0e:00:00/ee Emask 0x9
>>> (media error)
>> That's media error on sector 247236823 on WRITE. Media errors on write
>> are bad signs - it usually means the drive even failed to remap the
>> sector because extra space ran out.
>
> Hmm, more than 50GB is empty on disk :)
It's not the free space on the drive that matters, it's the number of
free sectors in the spare sector pool on the drive, which is invisible
to software.
Your SMART log shows 309 reallocated sectors. That seems somewhat high..
--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: SATA exceptions
2007-07-07 18:05 ` Robert Hancock
@ 2007-07-07 21:35 ` S.Çağlar Onur
2007-07-09 18:37 ` Tejun Heo
2007-07-11 20:31 ` Mark Lord
0 siblings, 2 replies; 13+ messages in thread
From: S.Çağlar Onur @ 2007-07-07 21:35 UTC (permalink / raw)
To: Robert Hancock; +Cc: Tejun Heo, LKML, Jeff Garzik
[-- Attachment #1: Type: text/plain, Size: 6690 bytes --]
Hi;
07 Tem 2007 Cts tarihinde, Robert Hancock şunları yazmıştı:
> It's not the free space on the drive that matters, it's the number of
> free sectors in the spare sector pool on the drive, which is invisible
> to software.
>
> Your SMART log shows 309 reallocated sectors. That seems somewhat high..
Ah sorry to misinterpret the content:), its a quiet new piece of hardware (at
most ~1.5 month old) and "Reallocated_Event_Count" constantly increases
(currently its increased to 313) and although i'm not 100 percent sure these
errors only occured with kernels > 2.6.18 (or 2.6.18 didn't report these
cause according to kern.log these only visible with 2.6.22+)
We bought 3 HP Pavillon dv2385ea and one of them only runs with 2.6.18 and its
smartctl output follows as a reference;
smartctl version 5.37 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF INFORMATION SECTION ===
Device Model: SAMSUNG HM160JI
Serial Number: S0W6J10P331479
Firmware Version: AD100-16
User Capacity: 160.041.885.696 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 7
ATA Standard is: ATA/ATAPI-7 T13 1532D revision 0
Local Time is: Sun Jul 8 00:22:21 2007 EEST
==> WARNING: May need -F samsung or -F samsung2 enabled; see manual for
details.
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine
completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (5391) seconds.
Offline data collection
capabilities: (0x51) SMART execute Offline immediate.
No Auto Offline data collection support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 89) minutes.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED
WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 100 100 051 Pre-fail
Always - 0
3 Spin_Up_Time 0x0007 253 253 025 Pre-fail
Always - 2880
4 Start_Stop_Count 0x0032 098 098 000 Old_age
Always - 2648
5 Reallocated_Sector_Ct 0x0033 253 253 010 Pre-fail
Always - 0
7 Seek_Error_Rate 0x000f 253 253 051 Pre-fail
Always - 0
8 Seek_Time_Performance 0x0025 253 253 015 Pre-fail
Offline - 0
9 Power_On_Hours 0x0032 253 253 000 Old_age
Always - 236
10 Spin_Retry_Count 0x0033 100 100 051 Pre-fail
Always - 1
11 Calibration_Retry_Count 0x0012 100 100 000 Old_age
Always - 2
12 Power_Cycle_Count 0x0032 100 100 000 Old_age
Always - 57
187 Unknown_Attribute 0x0032 253 253 000 Old_age
Always - 0
188 Unknown_Attribute 0x0032 253 253 000 Old_age
Always - 0
190 Temperature_Celsius 0x0022 047 040 040 Old_age Always
In_the_past 1008009269
191 G-Sense_Error_Rate 0x0012 100 100 000 Old_age
Always - 5396
192 Power-Off_Retract_Count 0x0012 100 100 000 Old_age
Always - 40
193 Load_Cycle_Count 0x0012 100 100 000 Old_age
Always - 2575
194 Temperature_Celsius 0x0022 047 040 000 Old_age
Always - 53 (Lifetime Min/Max 0/15381)
195 Hardware_ECC_Recovered 0x001a 100 100 000 Old_age
Always - 98037
196 Reallocated_Event_Count 0x0032 253 253 000 Old_age
Always - 0
197 Current_Pending_Sector 0x0012 253 253 000 Old_age
Always - 0
198 Offline_Uncorrectable 0x0030 253 253 000 Old_age
Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age
Always - 0
200 Multi_Zone_Error_Rate 0x000a 100 100 000 Old_age
Always - 0
201 Soft_Read_Error_Rate 0x0012 253 253 000 Old_age
Always - 0
223 Load_Retry_Count 0x0012 100 100 000 Old_age
Always - 2
225 Load_Cycle_Count 0x0012 100 100 000 Old_age
Always - 2575
255 Unknown_Attribute 0x000a 253 100 000 Old_age
Always - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Selective Self-Test Log Data Structure Revision Number (0) should be 1
SMART Selective self-test log data structure revision number 0
Warning: ATA Specification requires selective self-test log data structure
revision number = 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
What is your suggestion in that case will i try to change the hardware
assuming its harware's fault or that may be regression introduced by kernels
newer than 2.6.18?
Cheers
--
S.Çağlar Onur <caglar@pardus.org.tr>
http://cekirdek.pardus.org.tr/~caglar/
Linux is like living in a teepee. No Windows, no Gates and an Apache in house!
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: SATA exceptions
2007-07-07 21:35 ` S.Çağlar Onur
@ 2007-07-09 18:37 ` Tejun Heo
2007-07-09 19:06 ` S.Çağlar Onur
` (2 more replies)
2007-07-11 20:31 ` Mark Lord
1 sibling, 3 replies; 13+ messages in thread
From: Tejun Heo @ 2007-07-09 18:37 UTC (permalink / raw)
To: caglar; +Cc: Robert Hancock, LKML, Jeff Garzik
Hello,
S.Çağlar Onur wrote:
> 07 Tem 2007 Cts tarihinde, Robert Hancock şunları yazmıştı:
>> It's not the free space on the drive that matters, it's the number of
>> free sectors in the spare sector pool on the drive, which is invisible
>> to software.
>>
>> Your SMART log shows 309 reallocated sectors. That seems somewhat high..
>
> Ah sorry to misinterpret the content:), its a quiet new piece of hardware (at
> most ~1.5 month old) and "Reallocated_Event_Count" constantly increases
> (currently its increased to 313) and although i'm not 100 percent sure these
> errors only occured with kernels > 2.6.18 (or 2.6.18 didn't report these
> cause according to kern.log these only visible with 2.6.22+)
OS and driver can't really do much about the reallocation event. Some
number of reallocations is okay but if you it going up constantly, you
probably have a dying disk.
> We bought 3 HP Pavillon dv2385ea and one of them only runs with 2.6.18 and its
> smartctl output follows as a reference;
>
> 5 Reallocated_Sector_Ct 0x0033 253 253 010 Pre-fail
> 196 Reallocated_Event_Count 0x0032 253 253 000 Old_age
Hmm... This is pretty high too. Do the counts increase on this machine too?
--
tejun
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: SATA exceptions
2007-07-09 18:37 ` Tejun Heo
@ 2007-07-09 19:06 ` S.Çağlar Onur
2007-07-11 17:20 ` Bill Davidsen
2007-07-12 19:52 ` Pavel Machek
2 siblings, 0 replies; 13+ messages in thread
From: S.Çağlar Onur @ 2007-07-09 19:06 UTC (permalink / raw)
To: Tejun Heo
Cc: Robert Hancock, LKML, Jeff Garzik, Onur Küçük,
İsmail Dönmez
[-- Attachment #1: Type: text/plain, Size: 2198 bytes --]
Hi;
09 Tem 2007 Pts tarihinde, Tejun Heo şunları yazmıştı:
> > 07 Tem 2007 Cts tarihinde, Robert Hancock şunları yazmıştı:
> >> It's not the free space on the drive that matters, it's the number of
> >> free sectors in the spare sector pool on the drive, which is invisible
> >> to software.
> >>
> >> Your SMART log shows 309 reallocated sectors. That seems somewhat high..
> >
> > Ah sorry to misinterpret the content:), its a quiet new piece of hardware
> > (at most ~1.5 month old) and "Reallocated_Event_Count" constantly
> > increases (currently its increased to 313) and although i'm not 100
> > percent sure these errors only occured with kernels > 2.6.18 (or 2.6.18
> > didn't report these cause according to kern.log these only visible with
> > 2.6.22+)
>
> OS and driver can't really do much about the reallocation event. Some
> number of reallocations is okay but if you it going up constantly, you
> probably have a dying disk.
Hmm its really interesting, then it means 3 piece of ~1.5 month old laptops
dieing for same decease :) or they already somehow defectived (or we are
damaging them but it sits on my table happily all that time :P)
> > We bought 3 HP Pavillon dv2385ea and one of them only runs with 2.6.18
> > and its smartctl output follows as a reference;
> >
> > 5 Reallocated_Sector_Ct 0x0033 253 253 010 Pre-fail
> > 196 Reallocated_Event_Count 0x0032 253 253 000 Old_age
>
> Hmm... This is pretty high too. Do the counts increase on this machine
> too?
Yes, seems so (i'm adding Onur and İsmail to CC as other machines owner) and
here is the smart logs for this 3 seperate machine, its interesting me and
İsmail runs 2.6.22 (over 300 reloacations occured for both of us) and Onur
uses 2.6.18 (0 relocation occured for him)
[1] http://cekirdek.pardus.org.tr/~caglar/SATA/smart.caglar
[2] http://cekirdek.pardus.org.tr/~caglar/SATA/smart.ismail
[3] http://cekirdek.pardus.org.tr/~caglar/SATA/smart.onur
Cheers
--
S.Çağlar Onur <caglar@pardus.org.tr>
http://cekirdek.pardus.org.tr/~caglar/
Linux is like living in a teepee. No Windows, no Gates and an Apache in house!
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: SATA exceptions
2007-07-09 18:37 ` Tejun Heo
2007-07-09 19:06 ` S.Çağlar Onur
@ 2007-07-11 17:20 ` Bill Davidsen
2007-07-12 19:52 ` Pavel Machek
2 siblings, 0 replies; 13+ messages in thread
From: Bill Davidsen @ 2007-07-11 17:20 UTC (permalink / raw)
To: linux-kernel; +Cc: caglar, Robert Hancock, LKML, Jeff Garzik
Tejun Heo wrote:
> Hello,
>
> S.Çağlar Onur wrote:
>> 07 Tem 2007 Cts tarihinde, Robert Hancock şunları yazmıştı:
>>> It's not the free space on the drive that matters, it's the number of
>>> free sectors in the spare sector pool on the drive, which is invisible
>>> to software.
>>>
>>> Your SMART log shows 309 reallocated sectors. That seems somewhat high..
>> Ah sorry to misinterpret the content:), its a quiet new piece of hardware (at
>> most ~1.5 month old) and "Reallocated_Event_Count" constantly increases
>> (currently its increased to 313) and although i'm not 100 percent sure these
>> errors only occured with kernels > 2.6.18 (or 2.6.18 didn't report these
>> cause according to kern.log these only visible with 2.6.22+)
>
> OS and driver can't really do much about the reallocation event. Some
> number of reallocations is okay but if you it going up constantly, you
> probably have a dying disk.
>
Or, as I learned the hard way, if you have the problem on all drives
sharing a power supply, a power issue.
>> We bought 3 HP Pavillon dv2385ea and one of them only runs with 2.6.18 and its
>> smartctl output follows as a reference;
>>
>> 5 Reallocated_Sector_Ct 0x0033 253 253 010 Pre-fail
>> 196 Reallocated_Event_Count 0x0032 253 253 000 Old_age
>
> Hmm... This is pretty high too. Do the counts increase on this machine too?
>
--
Bill Davidsen <davidsen@tmr.com>
"We have more to fear from the bungling of the incompetent than from
the machinations of the wicked." - from Slashdot
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: SATA exceptions
2007-07-07 21:35 ` S.Çağlar Onur
2007-07-09 18:37 ` Tejun Heo
@ 2007-07-11 20:31 ` Mark Lord
2007-07-12 3:13 ` Tejun Heo
1 sibling, 1 reply; 13+ messages in thread
From: Mark Lord @ 2007-07-11 20:31 UTC (permalink / raw)
To: caglar; +Cc: Robert Hancock, Tejun Heo, LKML, Jeff Garzik
S.Çag(lar Onur wrote:
> Hi;
>
> 07 Tem 2007 Cts tarihinde, Robert Hancock Åunları yazmıÅtı:
>> It's not the free space on the drive that matters, it's the number of
>> free sectors in the spare sector pool on the drive, which is invisible
>> to software.
>>
>> Your SMART log shows 309 reallocated sectors. That seems somewhat high..
>
> Ah sorry to misinterpret the content:), its a quiet new piece of hardware (at
> most ~1.5 month old) and "Reallocated_Event_Count" constantly increases
> (currently its increased to 313) and although i'm not 100 percent sure these
> errors only occured with kernels > 2.6.18 (or 2.6.18 didn't report these
> cause according to kern.log these only visible with 2.6.22+)
>
> We bought 3 HP Pavillon dv2385ea and one of them only runs with 2.6.18 and its
> smartctl output follows as a reference;
>
> smartctl version 5.37 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen
> Home page is http://smartmontools.sourceforge.net/
>
> === START OF INFORMATION SECTION ===
> Device Model: SAMSUNG HM160JI
> Serial Number: S0W6J10P331479
> Firmware Version: AD100-16
> User Capacity: 160.041.885.696 bytes
> Device is: In smartctl database [for details use: -P show]
> ATA Version is: 7
> ATA Standard is: ATA/ATAPI-7 T13 1532D revision 0
> Local Time is: Sun Jul 8 00:22:21 2007 EEST
>
> ==> WARNING: May need -F samsung or -F samsung2 enabled; see manual for
> details.
>
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
>
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
> See vendor-specific Attribute list for marginal Attributes.
>
> General SMART Values:
> Offline data collection status: (0x00) Offline data collection activity
> was never started.
> Auto Offline Data Collection: Disabled.
> Self-test execution status: ( 0) The previous self-test routine
> completed
> without error or no self-test has ever
> been run.
> Total time to complete Offline
> data collection: (5391) seconds.
> Offline data collection
> capabilities: (0x51) SMART execute Offline immediate.
> No Auto Offline data collection support.
> Suspend Offline collection upon new
> command.
> No Offline surface scan supported.
> Self-test supported.
> No Conveyance Self-test supported.
> Selective Self-test supported.
> SMART capabilities: (0x0003) Saves SMART data before entering
> power-saving mode.
> Supports SMART auto save timer.
> Error logging capability: (0x01) Error logging supported.
> General Purpose Logging supported.
> Short self-test routine
> recommended polling time: ( 2) minutes.
> Extended self-test routine
> recommended polling time: ( 89) minutes.
>
> SMART Attributes Data Structure revision number: 16
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED
> WHEN_FAILED RAW_VALUE
> 1 Raw_Read_Error_Rate 0x000f 100 100 051 Pre-fail
> Always - 0
> 3 Spin_Up_Time 0x0007 253 253 025 Pre-fail
> Always - 2880
> 4 Start_Stop_Count 0x0032 098 098 000 Old_age
> Always - 2648
> 5 Reallocated_Sector_Ct 0x0033 253 253 010 Pre-fail
> Always - 0
> 7 Seek_Error_Rate 0x000f 253 253 051 Pre-fail
> Always - 0
> 8 Seek_Time_Performance 0x0025 253 253 015 Pre-fail
> Offline - 0
> 9 Power_On_Hours 0x0032 253 253 000 Old_age
> Always - 236
> 10 Spin_Retry_Count 0x0033 100 100 051 Pre-fail
> Always - 1
> 11 Calibration_Retry_Count 0x0012 100 100 000 Old_age
> Always - 2
> 12 Power_Cycle_Count 0x0032 100 100 000 Old_age
> Always - 57
> 187 Unknown_Attribute 0x0032 253 253 000 Old_age
> Always - 0
> 188 Unknown_Attribute 0x0032 253 253 000 Old_age
> Always - 0
> 190 Temperature_Celsius 0x0022 047 040 040 Old_age Always
> In_the_past 1008009269
> 191 G-Sense_Error_Rate 0x0012 100 100 000 Old_age
> Always - 5396
> 192 Power-Off_Retract_Count 0x0012 100 100 000 Old_age
> Always - 40
> 193 Load_Cycle_Count 0x0012 100 100 000 Old_age
> Always - 2575
> 194 Temperature_Celsius 0x0022 047 040 000 Old_age
> Always - 53 (Lifetime Min/Max 0/15381)
> 195 Hardware_ECC_Recovered 0x001a 100 100 000 Old_age
> Always - 98037
> 196 Reallocated_Event_Count 0x0032 253 253 000 Old_age
> Always - 0
> 197 Current_Pending_Sector 0x0012 253 253 000 Old_age
> Always - 0
> 198 Offline_Uncorrectable 0x0030 253 253 000 Old_age
> Offline - 0
> 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age
> Always - 0
> 200 Multi_Zone_Error_Rate 0x000a 100 100 000 Old_age
> Always - 0
> 201 Soft_Read_Error_Rate 0x0012 253 253 000 Old_age
> Always - 0
> 223 Load_Retry_Count 0x0012 100 100 000 Old_age
> Always - 2
> 225 Load_Cycle_Count 0x0012 100 100 000 Old_age
> Always - 2575
> 255 Unknown_Attribute 0x000a 253 100 000 Old_age
> Always - 0
..
I'm not even sure how to interpret those numbers.
It seems rather odd that nearly all fields are either "100" or "253",
so those are probably pre-programmed numbers rather than actual counts.
The raw value at the end of the line (for the various "Reallocated*" fields)
is probably the real value here.
Tejun ??
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: SATA exceptions
2007-07-11 20:31 ` Mark Lord
@ 2007-07-12 3:13 ` Tejun Heo
0 siblings, 0 replies; 13+ messages in thread
From: Tejun Heo @ 2007-07-12 3:13 UTC (permalink / raw)
To: Mark Lord; +Cc: caglar, Robert Hancock, LKML, Jeff Garzik
Mark Lord wrote:
> I'm not even sure how to interpret those numbers.
> It seems rather odd that nearly all fields are either "100" or "253",
> so those are probably pre-programmed numbers rather than actual counts.
> The raw value at the end of the line (for the various "Reallocated*"
> fields)
> is probably the real value here.
I dunno exactly either. Different vendors seem to use different metrics
anyway but increasing raw number on reallocate counter is pretty easy to
interpret.
--
tejun
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: SATA exceptions
2007-07-09 18:37 ` Tejun Heo
2007-07-09 19:06 ` S.Çağlar Onur
2007-07-11 17:20 ` Bill Davidsen
@ 2007-07-12 19:52 ` Pavel Machek
2007-07-13 3:12 ` Tejun Heo
2 siblings, 1 reply; 13+ messages in thread
From: Pavel Machek @ 2007-07-12 19:52 UTC (permalink / raw)
To: Tejun Heo; +Cc: caglar, Robert Hancock, LKML, Jeff Garzik
Hi!
> >> Your SMART log shows 309 reallocated sectors. That seems somewhat high..
> >
> > Ah sorry to misinterpret the content:), its a quiet new piece of hardware (at
> > most ~1.5 month old) and "Reallocated_Event_Count" constantly increases
> > (currently its increased to 313) and although i'm not 100 percent sure these
> > errors only occured with kernels > 2.6.18 (or 2.6.18 didn't report these
> > cause according to kern.log these only visible with 2.6.22+)
>
> OS and driver can't really do much about the reallocation event. Some
> number of reallocations is okay but if you it going up constantly, you
> probably have a dying disk.
Hmm... cut the power while writing is doable from OS and might force
reallocations?
You might want to check if number of reallocated sectors increases
with shutdowns/reboots.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: SATA exceptions
2007-07-12 19:52 ` Pavel Machek
@ 2007-07-13 3:12 ` Tejun Heo
2007-07-13 7:44 ` S.Çağlar Onur
0 siblings, 1 reply; 13+ messages in thread
From: Tejun Heo @ 2007-07-13 3:12 UTC (permalink / raw)
To: Pavel Machek; +Cc: caglar, Robert Hancock, LKML, Jeff Garzik
Pavel Machek wrote:
>>>> Your SMART log shows 309 reallocated sectors. That seems somewhat high..
>>> Ah sorry to misinterpret the content:), its a quiet new piece of hardware (at
>>> most ~1.5 month old) and "Reallocated_Event_Count" constantly increases
>>> (currently its increased to 313) and although i'm not 100 percent sure these
>>> errors only occured with kernels > 2.6.18 (or 2.6.18 didn't report these
>>> cause according to kern.log these only visible with 2.6.22+)
>> OS and driver can't really do much about the reallocation event. Some
>> number of reallocations is okay but if you it going up constantly, you
>> probably have a dying disk.
>
> Hmm... cut the power while writing is doable from OS and might force
> reallocations?
Hmmm... We don't have any pending write when power goes out and I don't
emergency unload can directly increase reallocation count. It can
shorten lifespan of the head tho.
> You might want to check if number of reallocated sectors increases
> with shutdowns/reboots.
I'm curious too.
--
tejun
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: SATA exceptions
2007-07-13 3:12 ` Tejun Heo
@ 2007-07-13 7:44 ` S.Çağlar Onur
0 siblings, 0 replies; 13+ messages in thread
From: S.Çağlar Onur @ 2007-07-13 7:44 UTC (permalink / raw)
To: Tejun Heo; +Cc: Pavel Machek, Robert Hancock, LKML, Jeff Garzik
[-- Attachment #1: Type: text/plain, Size: 1194 bytes --]
13 Tem 2007 Cum tarihinde, Tejun Heo şunları yazmıştı:
> >> OS and driver can't really do much about the reallocation event. Some
> >> number of reallocations is okay but if you it going up constantly, you
> >> probably have a dying disk.
> >
> > Hmm... cut the power while writing is doable from OS and might force
> > reallocations?
>
> Hmmm... We don't have any pending write when power goes out and I don't
> emergency unload can directly increase reallocation count. It can
> shorten lifespan of the head tho.
>
> > You might want to check if number of reallocated sectors increases
> > with shutdowns/reboots.
>
> I'm curious too.
It seems reboot/shutdown has no effect on reallocated sectors. After 5 rebot/5
shutdown it didn't change at all.
zangetsu ~ # smartctl -a /dev/sda | grep Reall
5 Reallocated_Sector_Ct 0x0033 067 067 010 Pre-fail
Always - 314
196 Reallocated_Event_Count 0x0032 067 067 000 Old_age
Always - 314
Cheers
--
S.Çağlar Onur <caglar@pardus.org.tr>
http://cekirdek.pardus.org.tr/~caglar/
Linux is like living in a teepee. No Windows, no Gates and an Apache in house!
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2007-07-13 7:44 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-07-05 21:46 SATA exceptions S.Çağlar Onur
2007-07-06 1:52 ` Tejun Heo
2007-07-06 11:43 ` S.Çağlar Onur
[not found] <fa.hBFih6KVHDFsBf6Qfg6XispiIuY@ifi.uio.no>
[not found] ` <fa.50UJZSgW/ChUl7O1Zks6ydq/1js@ifi.uio.no>
[not found] ` <fa.NhBdOmH9+RkW+QO72+TjPRwEKj0@ifi.uio.no>
2007-07-07 18:05 ` Robert Hancock
2007-07-07 21:35 ` S.Çağlar Onur
2007-07-09 18:37 ` Tejun Heo
2007-07-09 19:06 ` S.Çağlar Onur
2007-07-11 17:20 ` Bill Davidsen
2007-07-12 19:52 ` Pavel Machek
2007-07-13 3:12 ` Tejun Heo
2007-07-13 7:44 ` S.Çağlar Onur
2007-07-11 20:31 ` Mark Lord
2007-07-12 3:13 ` Tejun Heo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox