From: jerome lacoste <jerome.lacoste@gmail.com>
To: lkml <linux-kernel@vger.kernel.org>
Subject: Huge unreliability - does Linux have something to do with it?
Date: Fri, 4 Feb 2005 10:03:47 +0100 [thread overview]
Message-ID: <5a2cf1f605020401037aa610b9@mail.gmail.com> (raw)
[-- Attachment #1: Type: text/plain, Size: 1311 bytes --]
[Sorry for the sensational title]
I have had this laptop for three years. It ran Linux (Debian unstable)
from the start and its hardware has been very unreliable: I changed
hard disks twice and the motherboard thrice. My DVD drive started
failing some days ago (this one is 'original', 3 years old). But I
don't mind as I am not under warranty anymore... This morning the
machine booted with fsck errors on my hard disk. I am not sure if I
did the right thing, but I said clear the inodes, and I ended up
loosing some programs(*) (du, dircolors, etc..). The day starts well
isn't it? Sounds like I will have to switch disks again...
I halted the machine correctly yesterday night. I never dropped the
box in 3 years. Am I just being unlucky? Or could the fact that I am
using Linux on the box affect the reliability in some ways on that
particular hardware (Dell Inspiron 8100)? I run Linux on 3 other
computers and never had single problems with them.
How can the file system (ext3) be messed up the way it was this
morning after I stopped the machine correctly yesterday?
Could a hardware failure look like bad sectors to fsck?
Attached the output of smartctl -a /dev/hda, whatever that helps.
Jerome
(*) I accept tips on discovering and maybe recovering which files have
been taken out of my system...
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: smartctl.log --]
[-- Type: text/x-log; name="smartctl.log", Size: 11219 bytes --]
smartctl version 5.32 Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF INFORMATION SECTION ===
Device Model: HITACHI_DK23FB-60
Serial Number: 1ZX822
Firmware Version: 00M0A0C1
Device is: In smartctl database [for details use: -P show]
ATA Version is: 5
ATA Standard is: ATA/ATAPI-5 T13 1321D revision 3
Local Time is: Fri Feb 4 09:53:50 2005 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x80) Offline data collection activity
was never started.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (2150) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
No General Purpose Logging support.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 37) minutes.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000d 091 090 050 Pre-fail Offline - 412316862542
2 Throughput_Performance 0x0005 100 092 050 Pre-fail Offline - 3140
3 Spin_Up_Time 0x0007 100 100 050 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 388
5 Reallocated_Sector_Ct 0x0033 095 095 010 Pre-fail Always - 142
7 Seek_Error_Rate 0x000f 100 100 050 Pre-fail Always - 651
8 Seek_Time_Performance 0x0005 100 100 050 Pre-fail Offline - 1125
9 Power_On_Minutes 0x0032 095 095 000 Old_age Always - 2512h+02m
10 Spin_Retry_Count 0x0013 100 100 050 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 361
191 G-Sense_Error_Rate 0x000a 100 068 000 Old_age Always - 17536255
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 25
193 Load_Cycle_Count 0x0032 092 092 000 Old_age Always - 53398/53372
194 Temperature_Celsius 0x0022 072 036 000 Old_age Always - 54 (Lifetime Min/Max 72/13)
195 Hardware_ECC_Recovered 0x001a 100 001 000 Old_age Always - 4189
196 Reallocated_Event_Count 0x0032 086 086 000 Old_age Always - 142
197 Current_Pending_Sector 0x0032 097 094 000 Old_age Always - 3
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0013 100 100 050 Pre-fail Always - 0
201 Soft_Read_Error_Rate 0x0012 100 100 000 Old_age Always - 1
223 Load_Retry_Count 0x0012 100 100 000 Old_age Always - 0
230 Head_Amplitude 0x0032 096 096 000 Old_age Always - 141669
250 Read_Error_Retry_Rate 0x000a 100 001 000 Old_age Always - 837
SMART Error Log Version: 1
ATA Error Count: 108 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 108 occurred at disk power-on lifetime: 2511 hours (104 days + 15 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 07 60 08 fb e1 Error: UNC 7 sectors at LBA = 0x01fb0860 = 33228896
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 5f 08 fb e1 00 00:02:32.020 READ DMA
c8 00 08 57 08 fb e1 00 00:02:32.010 READ DMA
c8 00 08 4f 08 fb e1 00 00:02:32.010 READ DMA
c8 00 08 47 08 fb e1 00 00:02:31.980 READ DMA
c8 00 08 3f 08 fb e1 00 00:02:31.880 READ DMA
Error 107 occurred at disk power-on lifetime: 2511 hours (104 days + 15 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 01 3d 08 fb e1 Error: UNC 1 sectors at LBA = 0x01fb083d = 33228861
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 06 38 08 fb e1 00 00:01:33.860 READ DMA
c8 00 07 37 08 fb e1 00 00:01:31.960 READ DMA
c8 00 01 3e 08 fb e1 00 00:01:31.830 READ DMA
c8 00 02 3d 08 fb e1 00 00:01:30.120 READ DMA
c8 00 03 3c 08 fb e1 00 00:01:28.350 READ DMA
Error 106 occurred at disk power-on lifetime: 2511 hours (104 days + 15 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 07 37 08 fb e1 Error: UNC 7 sectors at LBA = 0x01fb0837 = 33228855
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 07 37 08 fb e1 00 00:01:31.960 READ DMA
c8 00 01 3e 08 fb e1 00 00:01:31.830 READ DMA
c8 00 02 3d 08 fb e1 00 00:01:30.120 READ DMA
c8 00 03 3c 08 fb e1 00 00:01:28.350 READ DMA
c8 00 04 3b 08 fb e1 00 00:01:26.090 READ DMA
Error 105 occurred at disk power-on lifetime: 2511 hours (104 days + 15 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 02 3d 08 fb e1 Error: UNC 2 sectors at LBA = 0x01fb083d = 33228861
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 02 3d 08 fb e1 00 00:01:30.120 READ DMA
c8 00 03 3c 08 fb e1 00 00:01:28.350 READ DMA
c8 00 04 3b 08 fb e1 00 00:01:26.090 READ DMA
c8 00 05 3a 08 fb e1 00 00:01:24.160 READ DMA
c8 00 06 39 08 fb e1 00 00:01:21.870 READ DMA
Error 104 occurred at disk power-on lifetime: 2511 hours (104 days + 15 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 02 3d 08 fb e1 Error: UNC 2 sectors at LBA = 0x01fb083d = 33228861
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 03 3c 08 fb e1 00 00:01:28.350 READ DMA
c8 00 04 3b 08 fb e1 00 00:01:26.090 READ DMA
c8 00 05 3a 08 fb e1 00 00:01:24.160 READ DMA
c8 00 06 39 08 fb e1 00 00:01:21.870 READ DMA
c8 00 07 38 08 fb e1 00 00:01:19.790 READ DMA
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 2488 -
# 2 Short offline Completed without error 00% 2464 -
# 3 Short offline Completed without error 00% 2441 -
# 4 Extended offline Completed: read failure 20% 2408 92491576
# 5 Short offline Completed without error 00% 2407 -
# 6 Short offline Completed without error 00% 2383 -
# 7 Short offline Completed without error 00% 2296 -
# 8 Extended offline Completed: read failure 20% 2273 92491576
# 9 Short offline Completed without error 00% 2272 -
#10 Short offline Completed without error 00% 2237 -
#11 Short offline Completed without error 00% 2221 -
#12 Short offline Completed without error 00% 2190 -
#13 Short offline Completed without error 00% 2171 -
#14 Short offline Completed without error 00% 2137 -
#15 Short offline Completed without error 00% 2113 -
#16 Short offline Completed without error 00% 2090 -
#17 Extended offline Completed: read failure 20% 2067 92491576
#18 Short offline Completed without error 00% 2066 -
#19 Short offline Completed without error 00% 2044 -
#20 Short offline Completed without error 00% 2020 -
#21 Short offline Completed without error 00% 1996 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
next reply other threads:[~2005-02-04 9:05 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-02-04 9:03 jerome lacoste [this message]
2005-02-04 9:20 ` Huge unreliability - does Linux have something to do with it? Julien Banchet
2005-02-04 10:31 ` Jim Nelson
2005-02-04 10:45 ` Bernd Eckenfels
2005-02-04 11:28 ` jerome lacoste
2005-02-04 17:13 ` Horst von Brand
2005-02-04 11:27 ` Andre Tomt
2005-02-04 11:51 ` DervishD
2005-02-04 12:18 ` Wakko Warner
2005-02-04 14:44 ` Dmitry Torokhov
2005-02-06 15:58 ` Dell Inspiron sensors (was: Re: Huge unreliability - does Linux have something to do with it?) Giuseppe Bilotta
2005-02-06 16:58 ` kernel
2005-02-06 21:58 ` Giuseppe Bilotta
2005-02-05 10:38 ` Huge unreliability - does Linux have something to do with it? jerome lacoste
2005-02-05 12:27 ` Willy Tarreau
2005-02-05 15:11 ` Wakko Warner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5a2cf1f605020401037aa610b9@mail.gmail.com \
--to=jerome.lacoste@gmail.com \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).