Drive breakdown or bug?

Linux ATA/IDE development
 help / color / mirror / Atom feed

* Drive breakdown or bug?
@ 2008-11-16 18:25 Gene Heskett
  2008-11-16 18:33 ` Alan Cox
  0 siblings, 1 reply; 2+ messages in thread
From: Gene Heskett @ 2008-11-16 18:25 UTC (permalink / raw)
  To: linux-ide

Greetings;

I have this drive as /boot and / on my system:
Device Model:     MAXTOR STM3500630A
Serial Number:    9QG7T0CJ
Firmware Version: 3.AAE
User Capacity:    500,107,862,016 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   7
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Sun Nov 16 06:56:45 2008 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

System is quad core phenom on an ASUS M2N-SLI Deluxe board, kernel running 
ATM is 2.6.27.6.

The backup program amanda failed last night on a largish dle and has several 
times in the past.  Failure was a 'holding disk read error' on a block that 
should be quite early in the drives mapping.  But badblocks is complaining
about blocks that are 2/3rds of the way to the spindle.

My logs are loaded with resets, and offline messages that don't seem to be 
truthful as the system eventually recovers.  Sample log outputs:

Nov 16 12:55:11 coyote kernel: [57888.336245] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Nov 16 12:55:11 coyote kernel: [57888.336384] ata1.00: BMDMA stat 0x65
Nov 16 12:55:11 coyote kernel: [57888.336418] ata1.00: cmd 25/00:08:18:c6:ba/00:00:2a:00:00/e0 tag 0 dma 4096 in
Nov 16 12:55:11 coyote kernel: [57888.336419]          res 51/40:08:18:c6:ba/40:00:2a:00:00/e0 Emask 0x9 (media error)
Nov 16 12:55:11 coyote kernel: [57888.336473] ata1.00: status: { DRDY ERR }
Nov 16 12:55:11 coyote kernel: [57888.336498] ata1.00: error: { UNC }
Nov 16 12:55:11 coyote kernel: [57888.345576] ata1.00: configured for UDMA/33
Nov 16 12:55:11 coyote kernel: [57888.345616] sd 0:0:0:0: [sda] Result: hostbyte=0x00 driverbyte=0x08
Nov 16 12:55:11 coyote kernel: [57888.345651] sd 0:0:0:0: [sda] Sense Key : 0x3 [current] [descriptor]
Nov 16 12:55:11 coyote kernel: [57888.345700] Descriptor sense data with sense descriptors (in hex):
Nov 16 12:55:11 coyote kernel: [57888.345736]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
Nov 16 12:55:11 coyote kernel: [57888.345821]         2a ba c6 18
Nov 16 12:55:11 coyote kernel: [57888.345867] sd 0:0:0:0: [sda] ASC=0x11 ASCQ=0x4
Nov 16 12:55:11 coyote kernel: [57888.345906] end_request: I/O error, dev sda, sector 716883480
Nov 16 12:55:11 coyote kernel: [57888.345942] Buffer I/O error on device sda, logical block 89610435
Nov 16 12:55:11 coyote kernel: [57888.346110] ata1: EH complete
Nov 16 12:55:11 coyote kernel: [57888.346234] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or 
FUA
Nov 16 12:55:11 coyote kernel: [57888.365643] sd 0:0:0:0: [sda] 976773168 512-byte hardware sectors (500108 MB)
Nov 16 12:55:11 coyote kernel: [57888.375653] sd 0:0:0:0: [sda] Write Protect is off
Nov 16 12:55:11 coyote kernel: [57888.386807] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or 
FUA

So I am running badblocks, and have collected a lengthy list. But the drive 
is not re-allocating them, and has not re-allocated any bad blocks according
to smartctl.

AND, the badblocks being reported are much farther into the disk than as 
reported by amanda when it fails, no correspondence at all.

But smartctl isn't showing corresponding increments in the error count
either unless it was during an amdump run.  There have been 36 of them
but the last 5 all occurred while amdump was running earlier today, and
the last 5 is all the drive apparently keeps.

The question then is:

Bad drive, (un-)known bug, or configuration?

Thanks all.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
In success there's a tendency to keep on doing what you were doing.
		-- Alan Kay

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Drive breakdown or bug?
  2008-11-16 18:25 Drive breakdown or bug? Gene Heskett
@ 2008-11-16 18:33 ` Alan Cox
  0 siblings, 0 replies; 2+ messages in thread
From: Alan Cox @ 2008-11-16 18:33 UTC (permalink / raw)
  To: Gene Heskett; +Cc: linux-ide

> Nov 16 12:55:11 coyote kernel: [57888.336419]          res 51/40:08:18:c6:ba/40:00:2a:00:00/e0 Emask 0x9 (media error)
> Nov 16 12:55:11 coyote kernel: [57888.336473] ata1.00: status: { DRDY ERR }
> Nov 16 12:55:11 coyote kernel: [57888.336498] ata1.00: error: { UNC }

Unrecoverable read error on the disk is what is being reported

> So I am running badblocks, and have collected a lengthy list. But the drive 
> is not re-allocating them, and has not re-allocated any bad blocks according
> to smartctl.

It cannot reallocate a block which it cannot read as the data on it would
be lost silently at that point. It should reallocate such blocks on a
write to them (which is why fsck.ext3 knows how to rewrite bad inode
blocks when recovering stuff)

> but the last 5 all occurred while amdump was running earlier today, and
> the last 5 is all the drive apparently keeps.

Yes

> The question then is:
> 
> Bad drive, (un-)known bug, or configuration?

Probably bad drive but could also be bad drive as a symptom of something
else (eg heat)

Alan

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2008-11-16 18:33 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-11-16 18:25 Drive breakdown or bug? Gene Heskett
2008-11-16 18:33 ` Alan Cox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox