* SATA exceptions @ 2007-07-05 21:46 S.Çağlar Onur 2007-07-06 1:52 ` Tejun Heo 0 siblings, 1 reply; 13+ messages in thread From: S.Çağlar Onur @ 2007-07-05 21:46 UTC (permalink / raw) To: LKML; +Cc: Tejun Heo, Jeff Garzik [-- Attachment #1: Type: text/plain, Size: 3107 bytes --] Hi; I'm starting to see following logs in dmesg for a while (according to kern.log these starts with 2.6.22-rc4) on HP Pavilion dv2385ea ... [ 4260.278408] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 [ 4260.278417] ata1.00: (irq_stat 0x40000001) [ 4260.278427] ata1.00: cmd ca/00:08:d0:88:bc/00:00:00:00:00/ee tag 0 cdb 0x0 data 4096 out [ 4260.278430] res 51/40:01:d7:88:bc/00:00:0e:00:00/ee Emask 0x9 (media error) [ 4260.911247] ata1.00: configured for UDMA/100 [ 4260.911263] ata1: EH complete [ 4260.911809] sd 0:0:0:0: [sda] 312581808 512-byte hardware sectors (160042 MB) [ 4260.912127] sd 0:0:0:0: [sda] Write Protect is off [ 4260.912135] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 [ 4260.912672] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA ... zangetsu log # grep "ata1.00: exception" kern.log Jun 10 23:23:33 localhost kernel: [ 3472.867317] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Jun 12 17:09:56 localhost kernel: [ 2470.530793] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Jun 12 17:11:19 localhost kernel: [ 2553.874662] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Jun 13 12:08:46 localhost kernel: [ 2235.664683] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Jun 17 16:59:23 localhost kernel: [ 9208.673909] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Jun 22 13:35:56 localhost kernel: [ 1719.191725] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Jun 24 14:13:46 localhost kernel: [ 5822.239007] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Jun 26 15:24:11 localhost kernel: [ 1315.455726] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Jun 26 15:36:46 localhost kernel: [ 2069.003291] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Jun 26 15:37:01 localhost kernel: [ 2082.955499] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Jun 26 15:37:21 localhost kernel: [ 2103.400411] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Jun 26 15:37:38 localhost kernel: [ 2120.251088] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Jun 28 10:23:55 localhost kernel: [ 383.355017] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Jul 5 22:05:14 localhost kernel: [ 4260.278408] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Jul 5 22:05:52 localhost kernel: [ 4297.784773] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Neither fsck nor badblocks didn't report anything wrong and i far as i know these didn't make any real problem until now but i'm not sure they are harmness or not (or indicates a hw error), so dmesg, smartctl -a, /proc/interrupts and lspci -vv outputs can be found @[1] if anything else needed please just tell... [1] http://cekirdek.pardus.org.tr/~caglar/SATA/ Cheers -- S.Çağlar Onur <caglar@pardus.org.tr> http://cekirdek.pardus.org.tr/~caglar/ Linux is like living in a teepee. No Windows, no Gates and an Apache in house! [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: SATA exceptions 2007-07-05 21:46 SATA exceptions S.Çağlar Onur @ 2007-07-06 1:52 ` Tejun Heo 2007-07-06 11:43 ` S.Çağlar Onur 0 siblings, 1 reply; 13+ messages in thread From: Tejun Heo @ 2007-07-06 1:52 UTC (permalink / raw) To: caglar; +Cc: LKML, Jeff Garzik Hello, S.Çağlar Onur wrote: > [ 4260.278427] ata1.00: cmd ca/00:08:d0:88:bc/00:00:00:00:00/ee tag 0 cdb 0x0 > data 4096 out > [ 4260.278430] res 51/40:01:d7:88:bc/00:00:0e:00:00/ee Emask 0x9 > (media error) That's media error on sector 247236823 on WRITE. Media errors on write are bad signs - it usually means the drive even failed to remap the sector because extra space ran out. I'm not sure this is the case here tho - the smart log is clear. Please run smart short/long tests and see what they say. -- tejun ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: SATA exceptions 2007-07-06 1:52 ` Tejun Heo @ 2007-07-06 11:43 ` S.Çağlar Onur 0 siblings, 0 replies; 13+ messages in thread From: S.Çağlar Onur @ 2007-07-06 11:43 UTC (permalink / raw) To: Tejun Heo; +Cc: LKML, Jeff Garzik [-- Attachment #1: Type: text/plain, Size: 1461 bytes --] Hi; 06 Tem 2007 Cum tarihinde, Tejun Heo şunları yazmıştı: > S.Çağlar Onur wrote: > > [ 4260.278427] ata1.00: cmd ca/00:08:d0:88:bc/00:00:00:00:00/ee tag 0 cdb > > 0x0 data 4096 out > > [ 4260.278430] res 51/40:01:d7:88:bc/00:00:0e:00:00/ee Emask 0x9 > > (media error) > > That's media error on sector 247236823 on WRITE. Media errors on write > are bad signs - it usually means the drive even failed to remap the > sector because extra space ran out. Hmm, more than 50GB is empty on disk :) > I'm not sure this is the case here > tho - the smart log is clear. Please run smart short/long tests and see > what they say. Both completed without a problem; zangetsu ~ # smartctl -l selftest /dev/sda smartctl version 5.37 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF READ SMART DATA SECTION === SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 357 - # 2 Short offline Completed without error 00% 355 - If you want me to try something else please just say :) Cheers -- S.Çağlar Onur <caglar@pardus.org.tr> http://cekirdek.pardus.org.tr/~caglar/ Linux is like living in a teepee. No Windows, no Gates and an Apache in house! [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
[parent not found: <fa.hBFih6KVHDFsBf6Qfg6XispiIuY@ifi.uio.no>]
[parent not found: <fa.50UJZSgW/ChUl7O1Zks6ydq/1js@ifi.uio.no>]
[parent not found: <fa.NhBdOmH9+RkW+QO72+TjPRwEKj0@ifi.uio.no>]
* Re: SATA exceptions [not found] ` <fa.NhBdOmH9+RkW+QO72+TjPRwEKj0@ifi.uio.no> @ 2007-07-07 18:05 ` Robert Hancock 2007-07-07 21:35 ` S.Çağlar Onur 0 siblings, 1 reply; 13+ messages in thread From: Robert Hancock @ 2007-07-07 18:05 UTC (permalink / raw) To: caglar; +Cc: Tejun Heo, LKML, Jeff Garzik S.Çağlar Onur wrote: > 06 Tem 2007 Cum tarihinde, Tejun Heo şunları yazmıştı: >> S.Çağlar Onur wrote: >>> [ 4260.278427] ata1.00: cmd ca/00:08:d0:88:bc/00:00:00:00:00/ee tag 0 cdb >>> 0x0 data 4096 out >>> [ 4260.278430] res 51/40:01:d7:88:bc/00:00:0e:00:00/ee Emask 0x9 >>> (media error) >> That's media error on sector 247236823 on WRITE. Media errors on write >> are bad signs - it usually means the drive even failed to remap the >> sector because extra space ran out. > > Hmm, more than 50GB is empty on disk :) It's not the free space on the drive that matters, it's the number of free sectors in the spare sector pool on the drive, which is invisible to software. Your SMART log shows 309 reallocated sectors. That seems somewhat high.. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from hancockr@nospamshaw.ca Home Page: http://www.roberthancock.com/ ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: SATA exceptions 2007-07-07 18:05 ` Robert Hancock @ 2007-07-07 21:35 ` S.Çağlar Onur 2007-07-09 18:37 ` Tejun Heo 2007-07-11 20:31 ` Mark Lord 0 siblings, 2 replies; 13+ messages in thread From: S.Çağlar Onur @ 2007-07-07 21:35 UTC (permalink / raw) To: Robert Hancock; +Cc: Tejun Heo, LKML, Jeff Garzik [-- Attachment #1: Type: text/plain, Size: 6690 bytes --] Hi; 07 Tem 2007 Cts tarihinde, Robert Hancock şunları yazmıştı: > It's not the free space on the drive that matters, it's the number of > free sectors in the spare sector pool on the drive, which is invisible > to software. > > Your SMART log shows 309 reallocated sectors. That seems somewhat high.. Ah sorry to misinterpret the content:), its a quiet new piece of hardware (at most ~1.5 month old) and "Reallocated_Event_Count" constantly increases (currently its increased to 313) and although i'm not 100 percent sure these errors only occured with kernels > 2.6.18 (or 2.6.18 didn't report these cause according to kern.log these only visible with 2.6.22+) We bought 3 HP Pavillon dv2385ea and one of them only runs with 2.6.18 and its smartctl output follows as a reference; smartctl version 5.37 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Device Model: SAMSUNG HM160JI Serial Number: S0W6J10P331479 Firmware Version: AD100-16 User Capacity: 160.041.885.696 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 7 ATA Standard is: ATA/ATAPI-7 T13 1532D revision 0 Local Time is: Sun Jul 8 00:22:21 2007 EEST ==> WARNING: May need -F samsung or -F samsung2 enabled; see manual for details. SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED See vendor-specific Attribute list for marginal Attributes. General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (5391) seconds. Offline data collection capabilities: (0x51) SMART execute Offline immediate. No Auto Offline data collection support. Suspend Offline collection upon new command. No Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 89) minutes. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 100 100 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0007 253 253 025 Pre-fail Always - 2880 4 Start_Stop_Count 0x0032 098 098 000 Old_age Always - 2648 5 Reallocated_Sector_Ct 0x0033 253 253 010 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 253 253 051 Pre-fail Always - 0 8 Seek_Time_Performance 0x0025 253 253 015 Pre-fail Offline - 0 9 Power_On_Hours 0x0032 253 253 000 Old_age Always - 236 10 Spin_Retry_Count 0x0033 100 100 051 Pre-fail Always - 1 11 Calibration_Retry_Count 0x0012 100 100 000 Old_age Always - 2 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 57 187 Unknown_Attribute 0x0032 253 253 000 Old_age Always - 0 188 Unknown_Attribute 0x0032 253 253 000 Old_age Always - 0 190 Temperature_Celsius 0x0022 047 040 040 Old_age Always In_the_past 1008009269 191 G-Sense_Error_Rate 0x0012 100 100 000 Old_age Always - 5396 192 Power-Off_Retract_Count 0x0012 100 100 000 Old_age Always - 40 193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always - 2575 194 Temperature_Celsius 0x0022 047 040 000 Old_age Always - 53 (Lifetime Min/Max 0/15381) 195 Hardware_ECC_Recovered 0x001a 100 100 000 Old_age Always - 98037 196 Reallocated_Event_Count 0x0032 253 253 000 Old_age Always - 0 197 Current_Pending_Sector 0x0012 253 253 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 253 253 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x000a 100 100 000 Old_age Always - 0 201 Soft_Read_Error_Rate 0x0012 253 253 000 Old_age Always - 0 223 Load_Retry_Count 0x0012 100 100 000 Old_age Always - 2 225 Load_Cycle_Count 0x0012 100 100 000 Old_age Always - 2575 255 Unknown_Attribute 0x000a 253 100 000 Old_age Always - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective Self-Test Log Data Structure Revision Number (0) should be 1 SMART Selective self-test log data structure revision number 0 Warning: ATA Specification requires selective self-test log data structure revision number = 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. What is your suggestion in that case will i try to change the hardware assuming its harware's fault or that may be regression introduced by kernels newer than 2.6.18? Cheers -- S.Çağlar Onur <caglar@pardus.org.tr> http://cekirdek.pardus.org.tr/~caglar/ Linux is like living in a teepee. No Windows, no Gates and an Apache in house! [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: SATA exceptions 2007-07-07 21:35 ` S.Çağlar Onur @ 2007-07-09 18:37 ` Tejun Heo 2007-07-09 19:06 ` S.Çağlar Onur ` (2 more replies) 2007-07-11 20:31 ` Mark Lord 1 sibling, 3 replies; 13+ messages in thread From: Tejun Heo @ 2007-07-09 18:37 UTC (permalink / raw) To: caglar; +Cc: Robert Hancock, LKML, Jeff Garzik Hello, S.Çağlar Onur wrote: > 07 Tem 2007 Cts tarihinde, Robert Hancock şunları yazmıştı: >> It's not the free space on the drive that matters, it's the number of >> free sectors in the spare sector pool on the drive, which is invisible >> to software. >> >> Your SMART log shows 309 reallocated sectors. That seems somewhat high.. > > Ah sorry to misinterpret the content:), its a quiet new piece of hardware (at > most ~1.5 month old) and "Reallocated_Event_Count" constantly increases > (currently its increased to 313) and although i'm not 100 percent sure these > errors only occured with kernels > 2.6.18 (or 2.6.18 didn't report these > cause according to kern.log these only visible with 2.6.22+) OS and driver can't really do much about the reallocation event. Some number of reallocations is okay but if you it going up constantly, you probably have a dying disk. > We bought 3 HP Pavillon dv2385ea and one of them only runs with 2.6.18 and its > smartctl output follows as a reference; > > 5 Reallocated_Sector_Ct 0x0033 253 253 010 Pre-fail > 196 Reallocated_Event_Count 0x0032 253 253 000 Old_age Hmm... This is pretty high too. Do the counts increase on this machine too? -- tejun ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: SATA exceptions 2007-07-09 18:37 ` Tejun Heo @ 2007-07-09 19:06 ` S.Çağlar Onur 2007-07-11 17:20 ` Bill Davidsen 2007-07-12 19:52 ` Pavel Machek 2 siblings, 0 replies; 13+ messages in thread From: S.Çağlar Onur @ 2007-07-09 19:06 UTC (permalink / raw) To: Tejun Heo Cc: Robert Hancock, LKML, Jeff Garzik, Onur Küçük, İsmail Dönmez [-- Attachment #1: Type: text/plain, Size: 2198 bytes --] Hi; 09 Tem 2007 Pts tarihinde, Tejun Heo şunları yazmıştı: > > 07 Tem 2007 Cts tarihinde, Robert Hancock şunları yazmıştı: > >> It's not the free space on the drive that matters, it's the number of > >> free sectors in the spare sector pool on the drive, which is invisible > >> to software. > >> > >> Your SMART log shows 309 reallocated sectors. That seems somewhat high.. > > > > Ah sorry to misinterpret the content:), its a quiet new piece of hardware > > (at most ~1.5 month old) and "Reallocated_Event_Count" constantly > > increases (currently its increased to 313) and although i'm not 100 > > percent sure these errors only occured with kernels > 2.6.18 (or 2.6.18 > > didn't report these cause according to kern.log these only visible with > > 2.6.22+) > > OS and driver can't really do much about the reallocation event. Some > number of reallocations is okay but if you it going up constantly, you > probably have a dying disk. Hmm its really interesting, then it means 3 piece of ~1.5 month old laptops dieing for same decease :) or they already somehow defectived (or we are damaging them but it sits on my table happily all that time :P) > > We bought 3 HP Pavillon dv2385ea and one of them only runs with 2.6.18 > > and its smartctl output follows as a reference; > > > > 5 Reallocated_Sector_Ct 0x0033 253 253 010 Pre-fail > > 196 Reallocated_Event_Count 0x0032 253 253 000 Old_age > > Hmm... This is pretty high too. Do the counts increase on this machine > too? Yes, seems so (i'm adding Onur and İsmail to CC as other machines owner) and here is the smart logs for this 3 seperate machine, its interesting me and İsmail runs 2.6.22 (over 300 reloacations occured for both of us) and Onur uses 2.6.18 (0 relocation occured for him) [1] http://cekirdek.pardus.org.tr/~caglar/SATA/smart.caglar [2] http://cekirdek.pardus.org.tr/~caglar/SATA/smart.ismail [3] http://cekirdek.pardus.org.tr/~caglar/SATA/smart.onur Cheers -- S.Çağlar Onur <caglar@pardus.org.tr> http://cekirdek.pardus.org.tr/~caglar/ Linux is like living in a teepee. No Windows, no Gates and an Apache in house! [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: SATA exceptions 2007-07-09 18:37 ` Tejun Heo 2007-07-09 19:06 ` S.Çağlar Onur @ 2007-07-11 17:20 ` Bill Davidsen 2007-07-12 19:52 ` Pavel Machek 2 siblings, 0 replies; 13+ messages in thread From: Bill Davidsen @ 2007-07-11 17:20 UTC (permalink / raw) To: linux-kernel; +Cc: caglar, Robert Hancock, LKML, Jeff Garzik Tejun Heo wrote: > Hello, > > S.Çağlar Onur wrote: >> 07 Tem 2007 Cts tarihinde, Robert Hancock şunları yazmıştı: >>> It's not the free space on the drive that matters, it's the number of >>> free sectors in the spare sector pool on the drive, which is invisible >>> to software. >>> >>> Your SMART log shows 309 reallocated sectors. That seems somewhat high.. >> Ah sorry to misinterpret the content:), its a quiet new piece of hardware (at >> most ~1.5 month old) and "Reallocated_Event_Count" constantly increases >> (currently its increased to 313) and although i'm not 100 percent sure these >> errors only occured with kernels > 2.6.18 (or 2.6.18 didn't report these >> cause according to kern.log these only visible with 2.6.22+) > > OS and driver can't really do much about the reallocation event. Some > number of reallocations is okay but if you it going up constantly, you > probably have a dying disk. > Or, as I learned the hard way, if you have the problem on all drives sharing a power supply, a power issue. >> We bought 3 HP Pavillon dv2385ea and one of them only runs with 2.6.18 and its >> smartctl output follows as a reference; >> >> 5 Reallocated_Sector_Ct 0x0033 253 253 010 Pre-fail >> 196 Reallocated_Event_Count 0x0032 253 253 000 Old_age > > Hmm... This is pretty high too. Do the counts increase on this machine too? > -- Bill Davidsen <davidsen@tmr.com> "We have more to fear from the bungling of the incompetent than from the machinations of the wicked." - from Slashdot ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: SATA exceptions 2007-07-09 18:37 ` Tejun Heo 2007-07-09 19:06 ` S.Çağlar Onur 2007-07-11 17:20 ` Bill Davidsen @ 2007-07-12 19:52 ` Pavel Machek 2007-07-13 3:12 ` Tejun Heo 2 siblings, 1 reply; 13+ messages in thread From: Pavel Machek @ 2007-07-12 19:52 UTC (permalink / raw) To: Tejun Heo; +Cc: caglar, Robert Hancock, LKML, Jeff Garzik Hi! > >> Your SMART log shows 309 reallocated sectors. That seems somewhat high.. > > > > Ah sorry to misinterpret the content:), its a quiet new piece of hardware (at > > most ~1.5 month old) and "Reallocated_Event_Count" constantly increases > > (currently its increased to 313) and although i'm not 100 percent sure these > > errors only occured with kernels > 2.6.18 (or 2.6.18 didn't report these > > cause according to kern.log these only visible with 2.6.22+) > > OS and driver can't really do much about the reallocation event. Some > number of reallocations is okay but if you it going up constantly, you > probably have a dying disk. Hmm... cut the power while writing is doable from OS and might force reallocations? You might want to check if number of reallocated sectors increases with shutdowns/reboots. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: SATA exceptions 2007-07-12 19:52 ` Pavel Machek @ 2007-07-13 3:12 ` Tejun Heo 2007-07-13 7:44 ` S.Çağlar Onur 0 siblings, 1 reply; 13+ messages in thread From: Tejun Heo @ 2007-07-13 3:12 UTC (permalink / raw) To: Pavel Machek; +Cc: caglar, Robert Hancock, LKML, Jeff Garzik Pavel Machek wrote: >>>> Your SMART log shows 309 reallocated sectors. That seems somewhat high.. >>> Ah sorry to misinterpret the content:), its a quiet new piece of hardware (at >>> most ~1.5 month old) and "Reallocated_Event_Count" constantly increases >>> (currently its increased to 313) and although i'm not 100 percent sure these >>> errors only occured with kernels > 2.6.18 (or 2.6.18 didn't report these >>> cause according to kern.log these only visible with 2.6.22+) >> OS and driver can't really do much about the reallocation event. Some >> number of reallocations is okay but if you it going up constantly, you >> probably have a dying disk. > > Hmm... cut the power while writing is doable from OS and might force > reallocations? Hmmm... We don't have any pending write when power goes out and I don't emergency unload can directly increase reallocation count. It can shorten lifespan of the head tho. > You might want to check if number of reallocated sectors increases > with shutdowns/reboots. I'm curious too. -- tejun ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: SATA exceptions 2007-07-13 3:12 ` Tejun Heo @ 2007-07-13 7:44 ` S.Çağlar Onur 0 siblings, 0 replies; 13+ messages in thread From: S.Çağlar Onur @ 2007-07-13 7:44 UTC (permalink / raw) To: Tejun Heo; +Cc: Pavel Machek, Robert Hancock, LKML, Jeff Garzik [-- Attachment #1: Type: text/plain, Size: 1194 bytes --] 13 Tem 2007 Cum tarihinde, Tejun Heo şunları yazmıştı: > >> OS and driver can't really do much about the reallocation event. Some > >> number of reallocations is okay but if you it going up constantly, you > >> probably have a dying disk. > > > > Hmm... cut the power while writing is doable from OS and might force > > reallocations? > > Hmmm... We don't have any pending write when power goes out and I don't > emergency unload can directly increase reallocation count. It can > shorten lifespan of the head tho. > > > You might want to check if number of reallocated sectors increases > > with shutdowns/reboots. > > I'm curious too. It seems reboot/shutdown has no effect on reallocated sectors. After 5 rebot/5 shutdown it didn't change at all. zangetsu ~ # smartctl -a /dev/sda | grep Reall 5 Reallocated_Sector_Ct 0x0033 067 067 010 Pre-fail Always - 314 196 Reallocated_Event_Count 0x0032 067 067 000 Old_age Always - 314 Cheers -- S.Çağlar Onur <caglar@pardus.org.tr> http://cekirdek.pardus.org.tr/~caglar/ Linux is like living in a teepee. No Windows, no Gates and an Apache in house! [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: SATA exceptions 2007-07-07 21:35 ` S.Çağlar Onur 2007-07-09 18:37 ` Tejun Heo @ 2007-07-11 20:31 ` Mark Lord 2007-07-12 3:13 ` Tejun Heo 1 sibling, 1 reply; 13+ messages in thread From: Mark Lord @ 2007-07-11 20:31 UTC (permalink / raw) To: caglar; +Cc: Robert Hancock, Tejun Heo, LKML, Jeff Garzik S.Çag(lar Onur wrote: > Hi; > > 07 Tem 2007 Cts tarihinde, Robert Hancock Åunları yazmıÅtı: >> It's not the free space on the drive that matters, it's the number of >> free sectors in the spare sector pool on the drive, which is invisible >> to software. >> >> Your SMART log shows 309 reallocated sectors. That seems somewhat high.. > > Ah sorry to misinterpret the content:), its a quiet new piece of hardware (at > most ~1.5 month old) and "Reallocated_Event_Count" constantly increases > (currently its increased to 313) and although i'm not 100 percent sure these > errors only occured with kernels > 2.6.18 (or 2.6.18 didn't report these > cause according to kern.log these only visible with 2.6.22+) > > We bought 3 HP Pavillon dv2385ea and one of them only runs with 2.6.18 and its > smartctl output follows as a reference; > > smartctl version 5.37 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen > Home page is http://smartmontools.sourceforge.net/ > > === START OF INFORMATION SECTION === > Device Model: SAMSUNG HM160JI > Serial Number: S0W6J10P331479 > Firmware Version: AD100-16 > User Capacity: 160.041.885.696 bytes > Device is: In smartctl database [for details use: -P show] > ATA Version is: 7 > ATA Standard is: ATA/ATAPI-7 T13 1532D revision 0 > Local Time is: Sun Jul 8 00:22:21 2007 EEST > > ==> WARNING: May need -F samsung or -F samsung2 enabled; see manual for > details. > > SMART support is: Available - device has SMART capability. > SMART support is: Enabled > > === START OF READ SMART DATA SECTION === > SMART overall-health self-assessment test result: PASSED > See vendor-specific Attribute list for marginal Attributes. > > General SMART Values: > Offline data collection status: (0x00) Offline data collection activity > was never started. > Auto Offline Data Collection: Disabled. > Self-test execution status: ( 0) The previous self-test routine > completed > without error or no self-test has ever > been run. > Total time to complete Offline > data collection: (5391) seconds. > Offline data collection > capabilities: (0x51) SMART execute Offline immediate. > No Auto Offline data collection support. > Suspend Offline collection upon new > command. > No Offline surface scan supported. > Self-test supported. > No Conveyance Self-test supported. > Selective Self-test supported. > SMART capabilities: (0x0003) Saves SMART data before entering > power-saving mode. > Supports SMART auto save timer. > Error logging capability: (0x01) Error logging supported. > General Purpose Logging supported. > Short self-test routine > recommended polling time: ( 2) minutes. > Extended self-test routine > recommended polling time: ( 89) minutes. > > SMART Attributes Data Structure revision number: 16 > Vendor Specific SMART Attributes with Thresholds: > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED > WHEN_FAILED RAW_VALUE > 1 Raw_Read_Error_Rate 0x000f 100 100 051 Pre-fail > Always - 0 > 3 Spin_Up_Time 0x0007 253 253 025 Pre-fail > Always - 2880 > 4 Start_Stop_Count 0x0032 098 098 000 Old_age > Always - 2648 > 5 Reallocated_Sector_Ct 0x0033 253 253 010 Pre-fail > Always - 0 > 7 Seek_Error_Rate 0x000f 253 253 051 Pre-fail > Always - 0 > 8 Seek_Time_Performance 0x0025 253 253 015 Pre-fail > Offline - 0 > 9 Power_On_Hours 0x0032 253 253 000 Old_age > Always - 236 > 10 Spin_Retry_Count 0x0033 100 100 051 Pre-fail > Always - 1 > 11 Calibration_Retry_Count 0x0012 100 100 000 Old_age > Always - 2 > 12 Power_Cycle_Count 0x0032 100 100 000 Old_age > Always - 57 > 187 Unknown_Attribute 0x0032 253 253 000 Old_age > Always - 0 > 188 Unknown_Attribute 0x0032 253 253 000 Old_age > Always - 0 > 190 Temperature_Celsius 0x0022 047 040 040 Old_age Always > In_the_past 1008009269 > 191 G-Sense_Error_Rate 0x0012 100 100 000 Old_age > Always - 5396 > 192 Power-Off_Retract_Count 0x0012 100 100 000 Old_age > Always - 40 > 193 Load_Cycle_Count 0x0012 100 100 000 Old_age > Always - 2575 > 194 Temperature_Celsius 0x0022 047 040 000 Old_age > Always - 53 (Lifetime Min/Max 0/15381) > 195 Hardware_ECC_Recovered 0x001a 100 100 000 Old_age > Always - 98037 > 196 Reallocated_Event_Count 0x0032 253 253 000 Old_age > Always - 0 > 197 Current_Pending_Sector 0x0012 253 253 000 Old_age > Always - 0 > 198 Offline_Uncorrectable 0x0030 253 253 000 Old_age > Offline - 0 > 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age > Always - 0 > 200 Multi_Zone_Error_Rate 0x000a 100 100 000 Old_age > Always - 0 > 201 Soft_Read_Error_Rate 0x0012 253 253 000 Old_age > Always - 0 > 223 Load_Retry_Count 0x0012 100 100 000 Old_age > Always - 2 > 225 Load_Cycle_Count 0x0012 100 100 000 Old_age > Always - 2575 > 255 Unknown_Attribute 0x000a 253 100 000 Old_age > Always - 0 .. I'm not even sure how to interpret those numbers. It seems rather odd that nearly all fields are either "100" or "253", so those are probably pre-programmed numbers rather than actual counts. The raw value at the end of the line (for the various "Reallocated*" fields) is probably the real value here. Tejun ?? ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: SATA exceptions 2007-07-11 20:31 ` Mark Lord @ 2007-07-12 3:13 ` Tejun Heo 0 siblings, 0 replies; 13+ messages in thread From: Tejun Heo @ 2007-07-12 3:13 UTC (permalink / raw) To: Mark Lord; +Cc: caglar, Robert Hancock, LKML, Jeff Garzik Mark Lord wrote: > I'm not even sure how to interpret those numbers. > It seems rather odd that nearly all fields are either "100" or "253", > so those are probably pre-programmed numbers rather than actual counts. > The raw value at the end of the line (for the various "Reallocated*" > fields) > is probably the real value here. I dunno exactly either. Different vendors seem to use different metrics anyway but increasing raw number on reallocate counter is pretty easy to interpret. -- tejun ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2007-07-13 7:44 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-07-05 21:46 SATA exceptions S.Çağlar Onur
2007-07-06 1:52 ` Tejun Heo
2007-07-06 11:43 ` S.Çağlar Onur
[not found] <fa.hBFih6KVHDFsBf6Qfg6XispiIuY@ifi.uio.no>
[not found] ` <fa.50UJZSgW/ChUl7O1Zks6ydq/1js@ifi.uio.no>
[not found] ` <fa.NhBdOmH9+RkW+QO72+TjPRwEKj0@ifi.uio.no>
2007-07-07 18:05 ` Robert Hancock
2007-07-07 21:35 ` S.Çağlar Onur
2007-07-09 18:37 ` Tejun Heo
2007-07-09 19:06 ` S.Çağlar Onur
2007-07-11 17:20 ` Bill Davidsen
2007-07-12 19:52 ` Pavel Machek
2007-07-13 3:12 ` Tejun Heo
2007-07-13 7:44 ` S.Çağlar Onur
2007-07-11 20:31 ` Mark Lord
2007-07-12 3:13 ` Tejun Heo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox