linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Huge unreliability - does Linux have something to do with it?
@ 2005-02-04  9:03 jerome lacoste
  2005-02-04  9:20 ` Julien Banchet
                   ` (5 more replies)
  0 siblings, 6 replies; 16+ messages in thread
From: jerome lacoste @ 2005-02-04  9:03 UTC (permalink / raw)
  To: lkml

[-- Attachment #1: Type: text/plain, Size: 1311 bytes --]

[Sorry for the sensational title]

I have had this laptop for three years. It ran Linux (Debian unstable)
from the start and its hardware has been very unreliable: I changed
hard disks twice and the motherboard thrice. My DVD drive started
failing some days ago (this one is 'original', 3 years old). But I
don't mind as I am not under warranty anymore... This morning the
machine booted with fsck errors on my hard disk. I am not sure if I
did the right thing, but I said clear the inodes, and I ended up
loosing some programs(*) (du, dircolors, etc..). The day starts well
isn't it? Sounds like I will have to switch disks again...

I halted the machine correctly yesterday night. I never dropped the
box in 3 years. Am I just being unlucky? Or could the fact that I am
using Linux on the box affect the reliability in some ways on that
particular hardware (Dell Inspiron 8100)? I run Linux on 3 other
computers and never had single problems with them.

How can the file system (ext3) be messed up the way it was this
morning after I stopped the machine correctly yesterday?
Could a hardware failure look like bad sectors to fsck?

Attached the output of smartctl -a /dev/hda, whatever that helps.

Jerome

(*) I accept tips on discovering and maybe recovering which files have
been taken out of my system...

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: smartctl.log --]
[-- Type: text/x-log; name="smartctl.log", Size: 11219 bytes --]

smartctl version 5.32 Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model:     HITACHI_DK23FB-60
Serial Number:    1ZX822
Firmware Version: 00M0A0C1
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   5
ATA Standard is:  ATA/ATAPI-5 T13 1321D revision 3
Local Time is:    Fri Feb  4 09:53:50 2005 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x80)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		 (2150) seconds.
Offline data collection
capabilities: 			 (0x5b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					No General Purpose Logging support.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 (  37) minutes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000d   091   090   050    Pre-fail  Offline      -       412316862542
  2 Throughput_Performance  0x0005   100   092   050    Pre-fail  Offline      -       3140
  3 Spin_Up_Time            0x0007   100   100   050    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       388
  5 Reallocated_Sector_Ct   0x0033   095   095   010    Pre-fail  Always       -       142
  7 Seek_Error_Rate         0x000f   100   100   050    Pre-fail  Always       -       651
  8 Seek_Time_Performance   0x0005   100   100   050    Pre-fail  Offline      -       1125
  9 Power_On_Minutes        0x0032   095   095   000    Old_age   Always       -       2512h+02m
 10 Spin_Retry_Count        0x0013   100   100   050    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       361
191 G-Sense_Error_Rate      0x000a   100   068   000    Old_age   Always       -       17536255
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       25
193 Load_Cycle_Count        0x0032   092   092   000    Old_age   Always       -       53398/53372
194 Temperature_Celsius     0x0022   072   036   000    Old_age   Always       -       54 (Lifetime Min/Max 72/13)
195 Hardware_ECC_Recovered  0x001a   100   001   000    Old_age   Always       -       4189
196 Reallocated_Event_Count 0x0032   086   086   000    Old_age   Always       -       142
197 Current_Pending_Sector  0x0032   097   094   000    Old_age   Always       -       3
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0013   100   100   050    Pre-fail  Always       -       0
201 Soft_Read_Error_Rate    0x0012   100   100   000    Old_age   Always       -       1
223 Load_Retry_Count        0x0012   100   100   000    Old_age   Always       -       0
230 Head_Amplitude          0x0032   096   096   000    Old_age   Always       -       141669
250 Read_Error_Retry_Rate   0x000a   100   001   000    Old_age   Always       -       837

SMART Error Log Version: 1
ATA Error Count: 108 (device log contains only the most recent five errors)
	CR = Command Register [HEX]
	FR = Features Register [HEX]
	SC = Sector Count Register [HEX]
	SN = Sector Number Register [HEX]
	CL = Cylinder Low Register [HEX]
	CH = Cylinder High Register [HEX]
	DH = Device/Head Register [HEX]
	DC = Device Command Register [HEX]
	ER = Error register [HEX]
	ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 108 occurred at disk power-on lifetime: 2511 hours (104 days + 15 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 07 60 08 fb e1  Error: UNC 7 sectors at LBA = 0x01fb0860 = 33228896

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 08 5f 08 fb e1 00      00:02:32.020  READ DMA
  c8 00 08 57 08 fb e1 00      00:02:32.010  READ DMA
  c8 00 08 4f 08 fb e1 00      00:02:32.010  READ DMA
  c8 00 08 47 08 fb e1 00      00:02:31.980  READ DMA
  c8 00 08 3f 08 fb e1 00      00:02:31.880  READ DMA

Error 107 occurred at disk power-on lifetime: 2511 hours (104 days + 15 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 01 3d 08 fb e1  Error: UNC 1 sectors at LBA = 0x01fb083d = 33228861

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 06 38 08 fb e1 00      00:01:33.860  READ DMA
  c8 00 07 37 08 fb e1 00      00:01:31.960  READ DMA
  c8 00 01 3e 08 fb e1 00      00:01:31.830  READ DMA
  c8 00 02 3d 08 fb e1 00      00:01:30.120  READ DMA
  c8 00 03 3c 08 fb e1 00      00:01:28.350  READ DMA

Error 106 occurred at disk power-on lifetime: 2511 hours (104 days + 15 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 07 37 08 fb e1  Error: UNC 7 sectors at LBA = 0x01fb0837 = 33228855

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 07 37 08 fb e1 00      00:01:31.960  READ DMA
  c8 00 01 3e 08 fb e1 00      00:01:31.830  READ DMA
  c8 00 02 3d 08 fb e1 00      00:01:30.120  READ DMA
  c8 00 03 3c 08 fb e1 00      00:01:28.350  READ DMA
  c8 00 04 3b 08 fb e1 00      00:01:26.090  READ DMA

Error 105 occurred at disk power-on lifetime: 2511 hours (104 days + 15 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 02 3d 08 fb e1  Error: UNC 2 sectors at LBA = 0x01fb083d = 33228861

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 02 3d 08 fb e1 00      00:01:30.120  READ DMA
  c8 00 03 3c 08 fb e1 00      00:01:28.350  READ DMA
  c8 00 04 3b 08 fb e1 00      00:01:26.090  READ DMA
  c8 00 05 3a 08 fb e1 00      00:01:24.160  READ DMA
  c8 00 06 39 08 fb e1 00      00:01:21.870  READ DMA

Error 104 occurred at disk power-on lifetime: 2511 hours (104 days + 15 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 02 3d 08 fb e1  Error: UNC 2 sectors at LBA = 0x01fb083d = 33228861

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 03 3c 08 fb e1 00      00:01:28.350  READ DMA
  c8 00 04 3b 08 fb e1 00      00:01:26.090  READ DMA
  c8 00 05 3a 08 fb e1 00      00:01:24.160  READ DMA
  c8 00 06 39 08 fb e1 00      00:01:21.870  READ DMA
  c8 00 07 38 08 fb e1 00      00:01:19.790  READ DMA

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      2488         -
# 2  Short offline       Completed without error       00%      2464         -
# 3  Short offline       Completed without error       00%      2441         -
# 4  Extended offline    Completed: read failure       20%      2408         92491576
# 5  Short offline       Completed without error       00%      2407         -
# 6  Short offline       Completed without error       00%      2383         -
# 7  Short offline       Completed without error       00%      2296         -
# 8  Extended offline    Completed: read failure       20%      2273         92491576
# 9  Short offline       Completed without error       00%      2272         -
#10  Short offline       Completed without error       00%      2237         -
#11  Short offline       Completed without error       00%      2221         -
#12  Short offline       Completed without error       00%      2190         -
#13  Short offline       Completed without error       00%      2171         -
#14  Short offline       Completed without error       00%      2137         -
#15  Short offline       Completed without error       00%      2113         -
#16  Short offline       Completed without error       00%      2090         -
#17  Extended offline    Completed: read failure       20%      2067         92491576
#18  Short offline       Completed without error       00%      2066         -
#19  Short offline       Completed without error       00%      2044         -
#20  Short offline       Completed without error       00%      2020         -
#21  Short offline       Completed without error       00%      1996         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Huge unreliability - does Linux have something to do with it?
  2005-02-04  9:03 Huge unreliability - does Linux have something to do with it? jerome lacoste
@ 2005-02-04  9:20 ` Julien Banchet
  2005-02-04 10:31 ` Jim Nelson
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Julien Banchet @ 2005-02-04  9:20 UTC (permalink / raw)
  To: jerome lacoste; +Cc: lkml

Le vendredi 04 f?rier 2005 à 10:03 +0100, jerome lacoste a écrit :
> [Sorry for the sensational title]
> 
> I have had this laptop for three years. It ran Linux (Debian unstable)
> from the start and its hardware has been very unreliable: I changed
> hard disks twice and the motherboard thrice. My DVD drive started
> failing some days ago (this one is 'original', 3 years old). But I
> don't mind as I am not under warranty anymore... This morning the
> machine booted with fsck errors on my hard disk. I am not sure if I
> did the right thing, but I said clear the inodes, and I ended up
> loosing some programs(*) (du, dircolors, etc..). The day starts well
> isn't it? Sounds like I will have to switch disks again...
> 
> I halted the machine correctly yesterday night. I never dropped the
> box in 3 years. Am I just being unlucky? Or could the fact that I am
> using Linux on the box affect the reliability in some ways on that
> particular hardware (Dell Inspiron 8100)? I run Linux on 3 other
> computers and never had single problems with them.
> 
> How can the file system (ext3) be messed up the way it was this
> morning after I stopped the machine correctly yesterday?
> Could a hardware failure look like bad sectors to fsck?
> 
> Attached the output of smartctl -a /dev/hda, whatever that helps.
> 
> Jerome
> 
> (*) I accept tips on discovering and maybe recovering which files have
> been taken out of my system...

I honestly beleive that your simply out of luck, not that 3 years is
alot for a laptop, but simply the "shit happens" thing.

Even though the Distro you run is tagged "Unstable" I'd rather run a
battery of stress tools on your computer it before posting here, 'cus
it's maybe a bit beyond the scope of lkml (I bet you tried more than one
versions of the kernel in 3 years, problems never remain too long, I
also hope you tried fresh installs too).

I don't think that en Inspiron 8100 carries anything exotic, so ...
well... Go for a memtest86 then try disk stress tools (my memory wen't
blank right now, ask google ;-) )


JB,
PS: Heu Jérome.... BTS Info Indus à l'Isle sur la Sorgue ?

-- 
Julien Banchet <julien@banchet.net>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Huge unreliability - does Linux have something to do with it?
  2005-02-04  9:03 Huge unreliability - does Linux have something to do with it? jerome lacoste
  2005-02-04  9:20 ` Julien Banchet
@ 2005-02-04 10:31 ` Jim Nelson
  2005-02-04 10:45 ` Bernd Eckenfels
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Jim Nelson @ 2005-02-04 10:31 UTC (permalink / raw)
  To: jerome lacoste; +Cc: lkml

jerome lacoste wrote:
> [Sorry for the sensational title]
> 
> I have had this laptop for three years. It ran Linux (Debian unstable)
> from the start and its hardware has been very unreliable: I changed
> hard disks twice and the motherboard thrice. My DVD drive started
> failing some days ago (this one is 'original', 3 years old). But I
> don't mind as I am not under warranty anymore... This morning the
> machine booted with fsck errors on my hard disk. I am not sure if I
> did the right thing, but I said clear the inodes, and I ended up
> loosing some programs(*) (du, dircolors, etc..). The day starts well
> isn't it? Sounds like I will have to switch disks again...
> 
> I halted the machine correctly yesterday night. I never dropped the
> box in 3 years. Am I just being unlucky? Or could the fact that I am
> using Linux on the box affect the reliability in some ways on that
> particular hardware (Dell Inspiron 8100)? I run Linux on 3 other
> computers and never had single problems with them.
> 
> How can the file system (ext3) be messed up the way it was this
> morning after I stopped the machine correctly yesterday?
> Could a hardware failure look like bad sectors to fsck?
> 

It can.  I had a drive crash on my server a couple of months ago, and I had ext3 
errors show up before the syslog filled up with the ide errors.  The hard disk was 
only 1 1/2 years old.

If the bad sectors happen where directory inodes are written, your directory 
structure will be turned into swiss cheese.  That will *definitely* cause ext3 
errors, and dump you (in Red Hat systems, at least) to a shell on reboot.

> Attached the output of smartctl -a /dev/hda, whatever that helps.
> 
> Jerome
> 
> (*) I accept tips on discovering and maybe recovering which files have
> been taken out of my system...
> 

You might not have any luck.  After fsck -f, I thought I had saved the drive, 
copied everything that was left onto another machine, and found that most of the 
larger files had holes in them - mp3's had skips, jpegs were completely corrupted, 
etc.

That's what made me get a backup FireWire drive... :)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Huge unreliability - does Linux have something to do with it?
  2005-02-04  9:03 Huge unreliability - does Linux have something to do with it? jerome lacoste
  2005-02-04  9:20 ` Julien Banchet
  2005-02-04 10:31 ` Jim Nelson
@ 2005-02-04 10:45 ` Bernd Eckenfels
  2005-02-04 11:28   ` jerome lacoste
  2005-02-04 11:27 ` Andre Tomt
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: Bernd Eckenfels @ 2005-02-04 10:45 UTC (permalink / raw)
  To: linux-kernel

In article <5a2cf1f605020401037aa610b9@mail.gmail.com> you wrote:
> I halted the machine correctly yesterday night. I never dropped the
> box in 3 years. Am I just being unlucky? Or could the fact that I am
> using Linux on the box affect the reliability in some ways on that
> particular hardware (Dell Inspiron 8100)? I run Linux on 3 other
> computers and never had single problems with them.

There are a lot of possible problems with your actual hardware. Like
Interrupt handling, power control, dma, ... Those are seldom but possible.
Notebooks tend to require some special handling.

> Could a hardware failure look like bad sectors to fsck?

A failure of the bus or a former sporadic error can cause defective fs, but
normally you have a read error in fsck no structure error.

Are you using hdparm? is the system perhaps overheating or overclocked?

Greetings
Bernd

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Huge unreliability - does Linux have something to do with it?
  2005-02-04  9:03 Huge unreliability - does Linux have something to do with it? jerome lacoste
                   ` (2 preceding siblings ...)
  2005-02-04 10:45 ` Bernd Eckenfels
@ 2005-02-04 11:27 ` Andre Tomt
  2005-02-04 11:51 ` DervishD
  2005-02-04 12:18 ` Wakko Warner
  5 siblings, 0 replies; 16+ messages in thread
From: Andre Tomt @ 2005-02-04 11:27 UTC (permalink / raw)
  To: jerome lacoste; +Cc: lkml

jerome lacoste wrote:
> Attached the output of smartctl -a /dev/hda, whatever that helps.

Judging from the SMART output, this drive seems hosed. All firmware 
controlled extended off-line self-tests have failed on LBA 92491576, and 
it has a worrying amount of re-allocated sectors.

New laptop harddrives shouldn't be too hard to get hold of.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Huge unreliability - does Linux have something to do with it?
  2005-02-04 10:45 ` Bernd Eckenfels
@ 2005-02-04 11:28   ` jerome lacoste
  2005-02-04 17:13     ` Horst von Brand
  0 siblings, 1 reply; 16+ messages in thread
From: jerome lacoste @ 2005-02-04 11:28 UTC (permalink / raw)
  To: Bernd Eckenfels; +Cc: linux-kernel

>> Could a hardware failure look like bad sectors to fsck?
> 
> A failure of the bus or a former sporadic error can cause defective fs, but
> normally you have a read error in fsck no structure error.
> 
> Are you using hdparm? is the system perhaps overheating or overclocked?

no overclock
hdparm is used but I cannot tell you exactly what the config is (now
machine has been running memtest for 1.5 hour). I don't think I use
special option: probably the defaults in my config file (mult_sect 16,
dma on, write_cache off).

overheating: perhaps. The machine is hot and running many hours per
day (usually 12-16). It s running the fans very often, but it's always
been like that. I've tried to control the fan, but then the
temperature goes high very quickly. So I let the fans run.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Huge unreliability - does Linux have something to do with it?
  2005-02-04  9:03 Huge unreliability - does Linux have something to do with it? jerome lacoste
                   ` (3 preceding siblings ...)
  2005-02-04 11:27 ` Andre Tomt
@ 2005-02-04 11:51 ` DervishD
  2005-02-04 12:18 ` Wakko Warner
  5 siblings, 0 replies; 16+ messages in thread
From: DervishD @ 2005-02-04 11:51 UTC (permalink / raw)
  To: jerome lacoste; +Cc: lkml

    Hi Jerome :)

 * jerome lacoste <jerome.lacoste@gmail.com> dixit:
> [Sorry for the sensational title]

    It catched my attention ;)))
 
> I halted the machine correctly yesterday night. I never dropped the
> box in 3 years. Am I just being unlucky? Or could the fact that I am
> using Linux on the box affect the reliability in some ways on that
> particular hardware (Dell Inspiron 8100)? I run Linux on 3 other
> computers and never had single problems with them.

    Well, Linux may stress the hardware more than other operating
systems because it tries to optimize usage and performance. But in
this particular case I will think you are very unlucky O:) I've seen
that before, unfortunately.
 
> Could a hardware failure look like bad sectors to fsck?

    Yes, depending on the hardware failure.

> (*) I accept tips on discovering and maybe recovering which files have
> been taken out of my system...

    You should use 'integrit' (http://integrit.sourceforge.net). I
use it to know whether a file whose contents shouldn't change has
changed, but it has more usages. And use memtest86 (there are two
versions out there) to check your RAM, just in case. Bad RAM can
cause 'apparent' hardware failures. A bad RAM chip can cause disk
errors (if you write to disk from *bad* RAM, you'll write *bad* data)
and other failures. Use 'integrit', read the documentation for
details.

    Good luck, you'll need it with that laptop :(

    Raúl Núñez de Arenas Coronado

-- 
Linux Registered User 88736
http://www.dervishd.net & http://www.pleyades.net/
It's my PC and I'll cry if I want to...

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Huge unreliability - does Linux have something to do with it?
  2005-02-04  9:03 Huge unreliability - does Linux have something to do with it? jerome lacoste
                   ` (4 preceding siblings ...)
  2005-02-04 11:51 ` DervishD
@ 2005-02-04 12:18 ` Wakko Warner
  2005-02-04 14:44   ` Dmitry Torokhov
                     ` (2 more replies)
  5 siblings, 3 replies; 16+ messages in thread
From: Wakko Warner @ 2005-02-04 12:18 UTC (permalink / raw)
  To: jerome lacoste; +Cc: lkml

Please keep me CCd

jerome lacoste wrote:
> particular hardware (Dell Inspiron 8100)? I run Linux on 3 other

I have this exact same laptop.  It works perfectly for me with linux. 
Originally started with a 2.4 kernel and recently went to 2.6.10.  The modem
works well, the video card works well even with 3D accel.  I replaced the
original 30gb hdd with a 40gb (for space reasons).  The only complaint about
this thing I have is the fact they used an nvidia video chip.  I have seen
more than 4 months uptime on it (I used to use it as a desktop)

I did have a hardware mouse problem, but replacing the touchpad/palm rest
fixed that.  I'd give it a 4 star (out of five, mainly because of the video
chipset and the keyboard layout)

-- 
 Lab tests show that use of micro$oft causes cancer in lab animals

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Huge unreliability - does Linux have something to do with it?
  2005-02-04 12:18 ` Wakko Warner
@ 2005-02-04 14:44   ` Dmitry Torokhov
  2005-02-06 15:58     ` Dell Inspiron sensors (was: Re: Huge unreliability - does Linux have something to do with it?) Giuseppe Bilotta
  2005-02-05 10:38   ` Huge unreliability - does Linux have something to do with it? jerome lacoste
  2005-02-05 12:27   ` Willy Tarreau
  2 siblings, 1 reply; 16+ messages in thread
From: Dmitry Torokhov @ 2005-02-04 14:44 UTC (permalink / raw)
  To: jerome lacoste, lkml

On Fri, 4 Feb 2005 07:18:17 -0500, Wakko Warner <wakko@animx.eu.org> wrote:
> Please keep me CCd
> 
> jerome lacoste wrote:
> > particular hardware (Dell Inspiron 8100)? I run Linux on 3 other
> 
> I have this exact same laptop.  It works perfectly for me with linux.
> Originally started with a 2.4 kernel and recently went to 2.6.10.  The modem
> works well, the video card works well even with 3D accel.  I replaced the
> original 30gb hdd with a 40gb (for space reasons).  The only complaint about
> this thing I have is the fact they used an nvidia video chip.  I have seen
> more than 4 months uptime on it (I used to use it as a desktop)
> 
> I did have a hardware mouse problem, but replacing the touchpad/palm rest
> fixed that.  I'd give it a 4 star (out of five, mainly because of the video
> chipset and the keyboard layout)
> 

Hmm, I guess it's a hit and run. I had replaced:

1. Fan assembly (was making grinding sounds after 1.5 years)
2. DVD-CDRW combo (Samsung SN-308B could not really read pretty much
anything but burns pretty well)
3. LCD (backlight burned out and I managed to tear connectors on the
panel trying to see if I can replace the light and what part shoudl I
order). Well, that helped to persuade my better half that I really
need 1600x1200 ;)
4. Original Hitachi hard driver died horrible death - I returned home
and heard it making grinding sounds and hitting heads against
something.
5. Replacement IBM drive has developed a few bad sectors, need to
arrange replacement.

But I guess all of it has something to do with being on 24/7. I am not
complaining, I like the box, especially the touchpad ;)

-- 
Dmitry

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Huge unreliability - does Linux have something to do with it?
  2005-02-04 11:28   ` jerome lacoste
@ 2005-02-04 17:13     ` Horst von Brand
  0 siblings, 0 replies; 16+ messages in thread
From: Horst von Brand @ 2005-02-04 17:13 UTC (permalink / raw)
  To: jerome lacoste; +Cc: Bernd Eckenfels, linux-kernel

jerome lacoste <jerome.lacoste@gmail.com> said:
> Bernd Eckenfels <ecki-news2005-01@lina.inka.de> said:
> >> Could a hardware failure look like bad sectors to fsck?

> > A failure of the bus or a former sporadic error can cause defective fs, but
> > normally you have a read error in fsck no structure error.
> > 
> > Are you using hdparm? is the system perhaps overheating or overclocked?

> no overclock
> hdparm is used but I cannot tell you exactly what the config is (now
> machine has been running memtest for 1.5 hour). I don't think I use
> special option: probably the defaults in my config file (mult_sect 16,
> dma on, write_cache off).

There are combinations of IDE + disk that slowly corrupt filesystems with
DMA on, if the default setting is DMA off _don't touch it_. Not all bad
combinations are catched by the code in the kernel (intel + some Western
Digital disk is what drove me up the wall until I disabled DMA).

What machine is this, what disk?

> overheating: perhaps. The machine is hot and running many hours per
> day (usually 12-16). It s running the fans very often, but it's always
> been like that. I've tried to control the fan, but then the
> temperature goes high very quickly. So I let the fans run.

Wise decision.
-- 
Dr. Horst H. von Brand                   User #22616 counter.li.org
Departamento de Informatica                     Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria              +56 32 654239
Casilla 110-V, Valparaiso, Chile                Fax:  +56 32 797513

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Huge unreliability - does Linux have something to do with it?
  2005-02-04 12:18 ` Wakko Warner
  2005-02-04 14:44   ` Dmitry Torokhov
@ 2005-02-05 10:38   ` jerome lacoste
  2005-02-05 12:27   ` Willy Tarreau
  2 siblings, 0 replies; 16+ messages in thread
From: jerome lacoste @ 2005-02-05 10:38 UTC (permalink / raw)
  To: wakko, dmitry.torokhov, linux-kernel

Took

On Fri, 4 Feb 2005 07:18:17 -0500, Wakko Warner <wakko@animx.eu.org> wrote:
> Please keep me CCd
> 
> jerome lacoste wrote:
> > particular hardware (Dell Inspiron 8100)? I run Linux on 3 other
> 
> I have this exact same laptop.  It works perfectly for me with linux.
> Originally started with a 2.4 kernel and recently went to 2.6.10.  The modem
> works well, the video card works well even with 3D accel.  I replaced the
> original 30gb hdd with a 40gb (for space reasons).  The only complaint about
> this thing I have is the fact they used an nvidia video chip.  I have seen
> more than 4 months uptime on it (I used to use it as a desktop)

I sometimes use it as a desktop. Thing is as I never took the time to
try to make work sw suspend, I'd rather have it running all the time
than to restart it every now and then.

While looking for a replacement disk, I've seen that some new disks
were "designed for continuous, 24/7 operation".

E.g. http://www6.tomshardware.com/storage/20030813/mini-harddisks-01.html

Not sure how good that is, but I will sure look into it...

Thanks for all who answered. If you want to further the talk, it maybe
better to take this off lkml now.

J

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Huge unreliability - does Linux have something to do with it?
  2005-02-04 12:18 ` Wakko Warner
  2005-02-04 14:44   ` Dmitry Torokhov
  2005-02-05 10:38   ` Huge unreliability - does Linux have something to do with it? jerome lacoste
@ 2005-02-05 12:27   ` Willy Tarreau
  2005-02-05 15:11     ` Wakko Warner
  2 siblings, 1 reply; 16+ messages in thread
From: Willy Tarreau @ 2005-02-05 12:27 UTC (permalink / raw)
  To: jerome lacoste, lkml

On Fri, Feb 04, 2005 at 07:18:17AM -0500, Wakko Warner wrote:
> Please keep me CCd
> 
> jerome lacoste wrote:
> > particular hardware (Dell Inspiron 8100)? I run Linux on 3 other
> 
> I have this exact same laptop.  It works perfectly for me with linux. 
> Originally started with a 2.4 kernel and recently went to 2.6.10.  The modem
> works well, the video card works well even with 3D accel.  I replaced the
> original 30gb hdd with a 40gb (for space reasons).  The only complaint about
> this thing I have is the fact they used an nvidia video chip.  I have seen
> more than 4 months uptime on it (I used to use it as a desktop)

I think it does not like being moved. A friend of mine had his one repaired
several times because of either hard disk failures, backlight failure and
the machine refusing to boot at all. I've never seen such unreliable hardware!

Willy


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Huge unreliability - does Linux have something to do with it?
  2005-02-05 12:27   ` Willy Tarreau
@ 2005-02-05 15:11     ` Wakko Warner
  0 siblings, 0 replies; 16+ messages in thread
From: Wakko Warner @ 2005-02-05 15:11 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: jerome lacoste, lkml

Please keep me CCd

Willy Tarreau wrote:
> On Fri, Feb 04, 2005 at 07:18:17AM -0500, Wakko Warner wrote:
> > I have this exact same laptop.  It works perfectly for me with linux. 
> > Originally started with a 2.4 kernel and recently went to 2.6.10.  The modem
> > works well, the video card works well even with 3D accel.  I replaced the
> > original 30gb hdd with a 40gb (for space reasons).  The only complaint about
> > this thing I have is the fact they used an nvidia video chip.  I have seen
> > more than 4 months uptime on it (I used to use it as a desktop)
> 
> I think it does not like being moved. A friend of mine had his one repaired
> several times because of either hard disk failures, backlight failure and
> the machine refusing to boot at all. I've never seen such unreliable hardware!

Mine didn't have that problem.  At the time it was the fastest machine I
had.  I got away from it though with my nice xeon box =)

I have never heard of a machine that if you move it it'd quit working. 
That's bad.  I have heard of a machine quit working because someone looked
at it the wrong way.

-- 
 Lab tests show that use of micro$oft causes cancer in lab animals

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Dell Inspiron sensors (was: Re: Huge unreliability - does Linux have something to do with it?)
  2005-02-04 14:44   ` Dmitry Torokhov
@ 2005-02-06 15:58     ` Giuseppe Bilotta
  2005-02-06 16:58       ` kernel
  0 siblings, 1 reply; 16+ messages in thread
From: Giuseppe Bilotta @ 2005-02-06 15:58 UTC (permalink / raw)
  To: linux-kernel

Dmitry Torokhov wrote:
> On Fri, 4 Feb 2005 07:18:17 -0500, Wakko Warner <wakko@animx.eu.org> wrote:
> > > particular hardware (Dell Inspiron 8100)? I run Linux on 3 other
> 
> Hmm, I guess it's a hit and run. I had replaced:
> 
> 4. Original Hitachi hard driver died horrible death - I returned home
> and heard it making grinding sounds and hitting heads against
> something.

I have a Dell Inspiron 8200, from March 2002. Since end of 
December 2004 I've started having system lockups which at first 
I couldn't identify, although they seemed to be overheating 
related. So I started monitoring the temperatures on all the 
components in my system (I can monitor CPU, GPU and HD temp; 
more on this later), and noticed that the lockups happen when 
the HD temp gets around 40 C. Indeed, they are 99% of the time 
preceded by a loud "click" coming from the HD wereabouts ... 
haven't lost any data yet but I've started backing up 
everything and getting ready to get a replacement HD.

Concerning sensors, though: under Windows I can use the 
i8kfangui applet to monitor all the sensors provided in the 
computer, but under Linux I only seem able to get the CPU 
temperature, using the i8k module, and no other sensor module 
seems to be loadable. Does anybody know how to access the other 
sensors on the Dell Inspiron? Or should I suggest Massimo to 
upgrade the i8k module to add the new sensors (i8kfangui has a 
GPL source code so it shouldn't be a problem) and possibly 
interface it all with the Linux sensors framework?

-- 
Giuseppe "Oblomov" Bilotta

Can't you see
It all makes perfect sense
Expressed in dollar and cents
Pounds shillings and pence
                  (Roger Waters)


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Dell Inspiron sensors (was: Re: Huge unreliability - does Linux have something to do with it?)
  2005-02-06 15:58     ` Dell Inspiron sensors (was: Re: Huge unreliability - does Linux have something to do with it?) Giuseppe Bilotta
@ 2005-02-06 16:58       ` kernel
  2005-02-06 21:58         ` Giuseppe Bilotta
  0 siblings, 1 reply; 16+ messages in thread
From: kernel @ 2005-02-06 16:58 UTC (permalink / raw)
  To: Giuseppe Bilotta; +Cc: linux-kernel

On Sun, 2005-02-06 at 10:58, Giuseppe Bilotta wrote:
> I have a Dell Inspiron 8200, from March 2002. Since end of 
> December 2004 I've started having system lockups which at first 
> I couldn't identify, although they seemed to be overheating 
> related. So I started monitoring the temperatures on all the 
> components in my system (I can monitor CPU, GPU and HD temp; 
> more on this later), and noticed that the lockups happen when 
> the HD temp gets around 40 C. Indeed, they are 99% of the time 
> preceded by a loud "click" coming from the HD wereabouts ... 
> haven't lost any data yet but I've started backing up 
> everything and getting ready to get a replacement HD.


You might want to try this;

Remove the keyboard, remove the cover beneath.  Take a can of air dust
(or equivalent) and *carefully* blow out the inside of the laptop.  

-then-

Look at the back side and the right side of the laptop.  You'll see the
intake for air and the A/C unit.  Take that air dust and blow in such
that the plastic fan whirls away.   Take a snapshot of the dust bunnies
and send them to Dell.


I have a 5150 Inspiron.  In less than 1 year this thing started powering
off (hard) on its own, no matter the OS installed (multi-boot).  I dug
around the 'net and found similar issues, all relating to OVERHEATING. 
Poorly designed was the culprit, but Dell has not yet admitted to this
(but look at the Dell Linux forum or just Dell laptop forum and see
Dell's techs replies) - think of the numbers sold and it makes sense.

Anyways, aside from that, I had a dust bunny the size of a U.S. quarter
fly out.  Since then I make certain to rest it on a flat surface (I cut
some steel and carry it with me) and every couple of months take it
apart as described and blow out the dust.  

-fd



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Dell Inspiron sensors (was: Re: Huge unreliability - does   Linux have something to do with it?)
  2005-02-06 16:58       ` kernel
@ 2005-02-06 21:58         ` Giuseppe Bilotta
  0 siblings, 0 replies; 16+ messages in thread
From: Giuseppe Bilotta @ 2005-02-06 21:58 UTC (permalink / raw)
  To: linux-kernel

kernel wrote:
> You might want to try this;
> 
> Remove the keyboard, remove the cover beneath.  Take a can of air dust
> (or equivalent) and *carefully* blow out the inside of the laptop.  
> 
> -then-
> 
> Look at the back side and the right side of the laptop.  You'll see the
> intake for air and the A/C unit.  Take that air dust and blow in such
> that the plastic fan whirls away.   Take a snapshot of the dust bunnies
> and send them to Dell.
> 
> I have a 5150 Inspiron.  In less than 1 year this thing started powering
> off (hard) on its own, no matter the OS installed (multi-boot).  I dug
> around the 'net and found similar issues, all relating to OVERHEATING. 
> Poorly designed was the culprit, but Dell has not yet admitted to this
> (but look at the Dell Linux forum or just Dell laptop forum and see
> Dell's techs replies) - think of the numbers sold and it makes sense.

Thank you very much for the pointers. I had just reached the 
same conclusion this afternoon, when pissed off by finding the 
CPU was at 85 C despite the fans being on at the max I did 
exactly what you suggest; although I didn't find any dust 
bunny, a thorough cleanup and blowing plus some toothpicking of 
hair and whatnot, I find myself with a computer that is running 
finely --again :)

Too bad, I'll need another excuse to buy myself the new hard disk ...

What about the sensors?

-- 
Giuseppe "Oblomov" Bilotta

Can't you see
It all makes perfect sense
Expressed in dollar and cents
Pounds shillings and pence
                  (Roger Waters)


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2005-02-06 21:59 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-02-04  9:03 Huge unreliability - does Linux have something to do with it? jerome lacoste
2005-02-04  9:20 ` Julien Banchet
2005-02-04 10:31 ` Jim Nelson
2005-02-04 10:45 ` Bernd Eckenfels
2005-02-04 11:28   ` jerome lacoste
2005-02-04 17:13     ` Horst von Brand
2005-02-04 11:27 ` Andre Tomt
2005-02-04 11:51 ` DervishD
2005-02-04 12:18 ` Wakko Warner
2005-02-04 14:44   ` Dmitry Torokhov
2005-02-06 15:58     ` Dell Inspiron sensors (was: Re: Huge unreliability - does Linux have something to do with it?) Giuseppe Bilotta
2005-02-06 16:58       ` kernel
2005-02-06 21:58         ` Giuseppe Bilotta
2005-02-05 10:38   ` Huge unreliability - does Linux have something to do with it? jerome lacoste
2005-02-05 12:27   ` Willy Tarreau
2005-02-05 15:11     ` Wakko Warner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).