The Linux Kernel Mailing List
 help / color / mirror / Atom feed
* Re: FW: PDC202XX DMA loss in 2.4.21-pre3-ac4
       [not found] <233C89823A37714D95B1A891DE3BCE5202AB1D07@xch-b.win.zambeel.com>
@ 2003-01-25 20:34 ` Manish Lachwani
  2003-01-26  6:51   ` Bryan Andersen
  0 siblings, 1 reply; 11+ messages in thread
From: Manish Lachwani @ 2003-01-25 20:34 UTC (permalink / raw)
  To: linux-kernel

0xd0 indicates that the driver aborted the command.
Can you try to get the SMART data from the drive using
smartctl?

use "smartctl -e /dev/hdX" to enable SMART collection

use "smartctl -a /dev/hdX" to collect the SMART data
...

Thanks
Manish

> I'm sending this out now to see if others are
> noticing the same problem.
> 
> Under heavy disk IO I'm loosing DMA on a disk disk
> is being handled by 
> the new PDC202XX driver.  The HD controller is a
> PDC20269 based 
> controler like those in the Maxtor HD/Controller
> bundles.  Sofar it has 
> lost DMA 4 times under heavy loads when multiple
> disks are being 
> accessed at once.  It appears to lose DMA at a
> random point durring 
> heavy disk IO.  It has gone hours before it happened
> to lasting less 
> than 30 minutes.  I'm doing a test over night to see
> if it happens when 
> it is the only disk being accessed heavily.  My
> first guess is it is 
> dropping a DMA finish interrupt then failing when it
> tries to set one up 
> again but I'm not sure on that.  The other idea I
> had is that when the 
> code is trying to get a DMA channel all are in use
> and it fails.  Any 
> help on what to look into would be appreciated.
> 
> Jan 24 22:41:14 blip kernel: hde: dma_intr:
> status=0xd0 { Busy }
> Jan 24 22:41:14 blip kernel:
> Jan 24 22:41:14 blip kernel: hde: DMA disabled
> Jan 24 22:41:14 blip kernel: PDC202XX: Primary
> channel reset.
> Jan 24 22:41:14 blip kernel: ide2: reset: success
> 
> The mother board is an ASUS A7N8X and it has
> assigned ide2, ide2 (both
> controllers on the Promices card), the nVidia sound,
> and the display to
> the same interrupt.  I've was trying to see if that
> was the problem but
> dissabling other devices didn't seam to help.  It
> still happened.
> 
> The PDC20269 controller has one Maxtor 4G160J8
> (160GB) disk per channel 
> and each is jumpered to be master.  (hdparm -I
> ouputs below)  I've also 
> added in the current dmesg output, and a cat of
> /proc/interrupts.
> 
> ---------------------
> # dmesg
> Linux version 2.4.21-pre3-ac4 (root@blip) (gcc
> version 2.95.4 20011002 
> (Debian prerelease)) #21 SMP Sun Jan 19 13:54:23 CST
> 2003
> BIOS-provided physical RAM map:
>   BIOS-e820: 0000000000000000 - 000000000009e800
> (usable)
>   BIOS-e820: 000000000009e800 - 00000000000a0000
> (reserved)
>   BIOS-e820: 00000000000f0000 - 0000000000100000
> (reserved)
>   BIOS-e820: 0000000000100000 - 000000001fff0000
> (usable)
>   BIOS-e820: 000000001fff0000 - 000000001fff3000
> (ACPI NVS)
>   BIOS-e820: 000000001fff3000 - 0000000020000000
> (ACPI data)
>   BIOS-e820: 00000000fec00000 - 00000000fec01000
> (reserved)
>   BIOS-e820: 00000000fee00000 - 00000000fee01000
> (reserved)
>   BIOS-e820: 00000000ffff0000 - 0000000100000000
> (reserved)
> 511MB LOWMEM available.
> On node 0 totalpages: 131056
> zone(0): 4096 pages.
> zone(1): 126960 pages.
> zone(2): 0 pages.
> Kernel command line: auto BOOT_IMAGE=Linux ro
> root=302 ide0=ata66 ide1=ata66
> ide_setup: ide0=ata66
> ide_setup: ide1=ata66
> Found and enabled local APIC!
> Initializing CPU#0
> Detected 1737.306 MHz processor.
> Console: colour VGA+ 80x25
> Calibrating delay loop... 3460.30 BogoMIPS
> Memory: 515228k/524224k available (1600k kernel
> code, 8604k reserved, 
> 676k data, 112k init, 0k highmem)
> Dentry cache hash table entries: 65536 (order: 7,
> 524288 bytes)
> Inode cache hash table entries: 32768 (order: 6,
> 262144 bytes)
> Mount cache hash table entries: 512 (order: 0, 4096
> bytes)
> Buffer cache hash table entries: 32768 (order: 5,
> 131072 bytes)
> Page-cache hash table entries: 131072 (order: 7,
> 524288 bytes)
> CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K
> (64 bytes/line)
> CPU: L2 Cache: 256K (64 bytes/line)
> CPU:     After generic, caps: 0383fbff c1c3fbff
> 00000000 00000000
> CPU:             Common caps: 0383fbff c1c3fbff
> 00000000 00000000
> Enabling fast FPU save and restore... done.
> Enabling unmasked SIMD FPU exception support...
> done.
> Checking 'hlt' instruction... OK.
> POSIX conformance testing by UNIFIX
> mtrr: v1.40 (20010327) Richard Gooch
> (rgooch@atnf.csiro.au)
> mtrr: detected mtrr type: Intel
> CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K
> (64 bytes/line)
> CPU: L2 Cache: 256K (64 bytes/line)
> CPU:     After generic, caps: 0383fbff c1c3fbff
> 00000000 00000000
> CPU:             Common caps: 0383fbff c1c3fbff
> 00000000 00000000
> CPU0: AMD Athlon(tm) XP 2100+ stepping 02
> per-CPU timeslice cutoff: 731.30 usecs.
> task migration cache decay timeout: 10 msecs.
> SMP motherboard not detected.
> enabled ExtINT on CPU#0
> ESR value before enabling vector: 00000000
> ESR value after enabling vector: 00000000
> Using local APIC timer interrupts.
> calibrating APIC timer ...
> ..... CPU clock speed is 1737.2981 MHz.
> ..... host bus clock speed is 267.2766 MHz.
> cpu: 0, clocks: 2672766, slice: 1336383
> CPU0<T0:2672752,T1:1336368,D:1,S:1336383,C:2672766>
> migration_task 0 on cpu=0
> PCI: PCI BIOS revision 2.10 entry at 0xfb560, last
> bus=3
> PCI: Using configuration type 1
> PCI: Probing PCI hardware
> PCI: Using IRQ router default [10de/01e0] at 00:00.0
> isapnp: Scanning for PnP cards...
> isapnp: No Plug & Play device found
> Linux NET4.0 for Linux 2.4
> Based upon Swansea University Computer Society
> NET3.039
> Initializing RT netlink socket
> Starting kswapd
> Journalled Block Device driver loaded
> Installing knfsd (copyright (C) 1996
> okir@monad.swb.de).
> parport0: PC-style at 0x378 (0x778) [PCSPP,TRISTATE]
> parport0: irq 7 detected
> i2c-core.o: i2c core module
> i2c-dev.o: i2c /dev entries driver module
> i2c-core.o: driver i2c-dev dummy driver registered.
> i2c-proc.o version 2.6.1 (20010825)
> pty: 256 Unix98 ptys configured
> Serial driver version 5.05c (2001-07-08) with
> MANY_PORTS SHARE_IRQ 
> SERIAL_PCI ISAPNP enabled
> ttyS00 at 0x03f8 (irq = 4) is a 16550A
> ttyS01 at 0x02f8 (irq = 3) is a 16550A
> Real Time Clock Driver v1.10e
> Floppy drive(s): fd0 is 1.44M
> FDC 0 is a post-1991 82077
> RAMDISK driver initialized: 16 RAM disks of 4096K
> size 1024 blocksize
> loop: loaded (max 8 devices)
> Linux agpgart interface v0.99 (c) Jeff Hartmann
> agpgart: Maximum main memory to use for agp memory:
> 439M
> agpgart: unsupported bridge
> agpgart: no supported devices found.
> Uniform Multi-Platform E-IDE driver Revision:
> 7.00beta-2.4
> ide: Assuming 33MHz system bus speed for PIO modes;
> override with idebus=xx
> NFORCE2: IDE controller at PCI slot 00:09.0
> NFORCE2: chipset revision 162
> NFORCE2: not 100% native mode: will probe irqs later
>      ide0: BM-DMA at 0xf000-0xf007, BIOS settings:
> hda:DMA, hdb:DMA
>      ide1: BM-DMA at 0xf008-0xf00f, BIOS settings:
> hdc:DMA, hdd:DMA
> PDC20269: IDE controller at PCI slot 01:07.0
> PDC20269: chipset revision 2
> PDC20269: not 100% native mode: will probe irqs
> later
> 
=== message truncated ===


__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: FW: PDC202XX DMA loss in 2.4.21-pre3-ac4
  2003-01-25 20:34 ` FW: PDC202XX DMA loss in 2.4.21-pre3-ac4 Manish Lachwani
@ 2003-01-26  6:51   ` Bryan Andersen
  2003-01-26  7:15     ` Manish Lachwani
  0 siblings, 1 reply; 11+ messages in thread
From: Bryan Andersen @ 2003-01-26  6:51 UTC (permalink / raw)
  To: Manish Lachwani; +Cc: linux-kernel

Manish Lachwani wrote:
 > 0xd0 indicates that the driver aborted the command.
 > Can you try to get the SMART data from the drive using
 > smartctl?
 >
 > use "smartctl -e /dev/hdX" to enable SMART collection
 >
 > use "smartctl -a /dev/hdX" to collect the SMART data

Ok, so where do I find information on how to decode this?

# smartctl -a /dev/hde
Device: Maxtor 4G160J8  Supports ATA Version 6
Drive supports S.M.A.R.T. and is enabled
Check S.M.A.R.T. Passed.

General Smart Values:
Off-line data collection status: (0x00) Offline data collection activity was
                                         never started

Self-test execution status:      (   0) The previous self-test routine 
completed
                                         without error or no self-test 
has ever
                                         been run

Total time to complete off-line
data collection:                 (  30) Seconds

Offline data collection
Capabilities:                    (0x1b)SMART EXECUTE OFF-LINE IMMEDIATE
                                         Automatic timer ON/OFF support
                                         Suspend Offline Collection upon new
                                         command
                                         Offline surface scan supported
                                         Self-test supported

Smart Capablilities:           (0x0003) Saves SMART data before entering
                                         power-saving mode
                                         Supports SMART auto save timer

Error logging capability:        (0x01) Error logging supported

Short self-test routine
recommended polling time:        (   2) Minutes

Extended self-test routine
recommended polling time:        ( 103) Minutes

Vendor Specific SMART Attributes with Thresholds:
Revision Number: 16
Attribute                    Flag     Value Worst Threshold Raw Value
(  3)Spin Up Time            0x0027   252   252   063       0
(  4)Start Stop Count        0x0032   253   253   000       0
(  5)Reallocated Sector Ct   0x0033   253   253   063       0
(  6)Read Channel Margin     0x0001   253   253   100       0
(  7)Seek Error Rate         0x000a   253   252   000       0
(  8)Seek Time Preformance   0x0027   244   244   187       36736
(  9)Power On Hours          0x0032   253   253   000       4341
( 10)Spin Retry Count        0x002b   252   252   223       0
( 11)Calibration Retry Count 0x002b   252   252   223       0
( 12)Power Cycle Count       0x0032   253   253   000       43
(192)Power-Off Retract Count 0x0032   253   253   000       0
(193)Load Cycle Count        0x0032   253   253   000       0
(194)Temperature             0x0032   253   253   000       0
(195)Hardware ECC Recovered  0x000a   253   252   000       221
(196)Reallocated Event Count 0x0008   253   253   000       0
(197)Current Pending Sector  0x0008   253   253   000       0
(198)Offline Uncorrectable   0x0008   253   253   000       0
(199)UDMA CRC Error Count    0x0008   199   199   000       0
(200)Unknown Attribute       0x000a   253   252   000       0
(201)Unknown Attribute       0x000a   253   252   000       0
(202)Unknown Attribute       0x000a   253   252   000       0
(203)Unknown Attribute       0x000b   253   252   180       0
(204)Unknown Attribute       0x000a   253   252   000       0
(205)Unknown Attribute       0x000a   253   252   000       0
(207)Unknown Attribute       0x002a   252   252   000       0
(208)Unknown Attribute       0x002a   252   252   000       0
(209)Unknown Attribute       0x0024   253   253   000       0
( 99)Unknown Attribute       0x0004   253   253   000       0
(100)Unknown Attribute       0x0004   253   253   000       0
(101)Unknown Attribute       0x0004   253   253   000       0
SMART Error Log:
SMART Error Logging Version: 1
Error Log Data Structure Pointer: 05
ATA Error Count: 8
Non-Fatal Count: 0

Error Log Structure 1:
DCR   FR   SC   SN   CL   SH   D/H   CR   Timestamp
  00   db   00   00   4f   c2    00   b0     201
  00   d8   00   00   4f   c2    00   b0     201
  00   da   00   00   4f   c2    00   b0     201
  00   d9   00   00   4f   c2    00   b0     201
  00   fe   00   00   00   00    00   ef     201
  00   04   50   42   97   23    00   51     5

Error Log Structure 2:
DCR   FR   SC   SN   CL   SH   D/H   CR   Timestamp
  08   00   80   2a   4e   8a    e0   25     458636
  08   00   80   aa   4e   8a    e0   25     458636
  08   00   80   2a   4f   8a    e0   25     458636
  08   00   80   aa   4f   8a    e0   25     458636
  08   d0   01   00   4f   c2    e0   b0     459147
  00   04   01   0b   4f   c2    e0   51     279972

Error Log Structure 3:
DCR   FR   SC   SN   CL   SH   D/H   CR   Timestamp
  08   00   80   aa   4e   8a    e0   25     458636
  08   00   80   2a   4f   8a    e0   25     458636
  08   00   80   aa   4f   8a    e0   25     458636
  08   d0   01   00   4f   c2    e0   b0     459147
  08   d1   01   01   4f   c2    e0   b0     459147
  00   04   01   0b   4f   c2    e0   51     279972

Error Log Structure 4:
DCR   FR   SC   SN   CL   SH   D/H   CR   Timestamp
  08   00   80   2a   4f   8a    e0   25     458636
  08   00   80   aa   4f   8a    e0   25     458636
  08   d0   01   00   4f   c2    e0   b0     459147
  08   d1   01   01   4f   c2    e0   b0     459147
  08   d0   01   00   4f   c2    e0   b0     459148
  00   04   01   0b   4f   c2    e0   51     279972

Error Log Structure 5:
DCR   FR   SC   SN   CL   SH   D/H   CR   Timestamp
  08   00   80   aa   4f   8a    e0   25     458636
  08   d0   01   00   4f   c2    e0   b0     459147
  08   d1   01   01   4f   c2    e0   b0     459147
  08   d0   01   00   4f   c2    e0   b0     459148
  08   d1   01   01   4f   c2    e0   b0     459148
  00   04   01   0b   4f   c2    e0   51     279972


- Bryan


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: FW: PDC202XX DMA loss in 2.4.21-pre3-ac4
  2003-01-26  6:51   ` Bryan Andersen
@ 2003-01-26  7:15     ` Manish Lachwani
  2003-01-26  8:13       ` Andre Hedrick
  0 siblings, 1 reply; 11+ messages in thread
From: Manish Lachwani @ 2003-01-26  7:15 UTC (permalink / raw)
  To: Bryan Andersen; +Cc: linux-kernel

The "Hardware ECC Recovered" indicates the number of
ECC errors corrected in the drive. Do one thing. Try
to swap the drive with the drive on another ATA cable.
So, swap /dev/hde with /dev/hda (or whatever)
physically and check if the error follows the drive or
the ATA cable. 

If it follows the drive, you may have to replace the
drive. Additionally, from the SMART error log #5:

00   04   01   0b   4f   c2    e0   51     279972

indicates an aborted command (0x04) at the sector
0x0c24f0b. Try to read from that sector doing a "dd"
and see if the dd aborts too. 

If the problem follows the drive, you should then run
a DOS based diagnostic utility (from the Maxtor site)
to determine if there are physical defects on the
drive ...

Thanks
Manish



--- Bryan Andersen <bryan@bogonomicon.net> wrote:
> Manish Lachwani wrote:
>  > 0xd0 indicates that the driver aborted the
> command.
>  > Can you try to get the SMART data from the drive
> using
>  > smartctl?
>  >
>  > use "smartctl -e /dev/hdX" to enable SMART
> collection
>  >
>  > use "smartctl -a /dev/hdX" to collect the SMART
> data
> 
> Ok, so where do I find information on how to decode
> this?
> 
> # smartctl -a /dev/hde
> Device: Maxtor 4G160J8  Supports ATA Version 6
> Drive supports S.M.A.R.T. and is enabled
> Check S.M.A.R.T. Passed.
> 
> General Smart Values:
> Off-line data collection status: (0x00) Offline data
> collection activity was
>                                          never
> started
> 
> Self-test execution status:      (   0) The previous
> self-test routine 
> completed
>                                          without
> error or no self-test 
> has ever
>                                          been run
> 
> Total time to complete off-line
> data collection:                 (  30) Seconds
> 
> Offline data collection
> Capabilities:                    (0x1b)SMART EXECUTE
> OFF-LINE IMMEDIATE
>                                          Automatic
> timer ON/OFF support
>                                          Suspend
> Offline Collection upon new
>                                          command
>                                          Offline
> surface scan supported
>                                          Self-test
> supported
> 
> Smart Capablilities:           (0x0003) Saves SMART
> data before entering
>                                         
> power-saving mode
>                                          Supports
> SMART auto save timer
> 
> Error logging capability:        (0x01) Error
> logging supported
> 
> Short self-test routine
> recommended polling time:        (   2) Minutes
> 
> Extended self-test routine
> recommended polling time:        ( 103) Minutes
> 
> Vendor Specific SMART Attributes with Thresholds:
> Revision Number: 16
> Attribute                    Flag     Value Worst
> Threshold Raw Value
> (  3)Spin Up Time            0x0027   252   252  
> 063       0
> (  4)Start Stop Count        0x0032   253   253  
> 000       0
> (  5)Reallocated Sector Ct   0x0033   253   253  
> 063       0
> (  6)Read Channel Margin     0x0001   253   253  
> 100       0
> (  7)Seek Error Rate         0x000a   253   252  
> 000       0
> (  8)Seek Time Preformance   0x0027   244   244  
> 187       36736
> (  9)Power On Hours          0x0032   253   253  
> 000       4341
> ( 10)Spin Retry Count        0x002b   252   252  
> 223       0
> ( 11)Calibration Retry Count 0x002b   252   252  
> 223       0
> ( 12)Power Cycle Count       0x0032   253   253  
> 000       43
> (192)Power-Off Retract Count 0x0032   253   253  
> 000       0
> (193)Load Cycle Count        0x0032   253   253  
> 000       0
> (194)Temperature             0x0032   253   253  
> 000       0
> (195)Hardware ECC Recovered  0x000a   253   252  
> 000       221
> (196)Reallocated Event Count 0x0008   253   253  
> 000       0
> (197)Current Pending Sector  0x0008   253   253  
> 000       0
> (198)Offline Uncorrectable   0x0008   253   253  
> 000       0
> (199)UDMA CRC Error Count    0x0008   199   199  
> 000       0
> (200)Unknown Attribute       0x000a   253   252  
> 000       0
> (201)Unknown Attribute       0x000a   253   252  
> 000       0
> (202)Unknown Attribute       0x000a   253   252  
> 000       0
> (203)Unknown Attribute       0x000b   253   252  
> 180       0
> (204)Unknown Attribute       0x000a   253   252  
> 000       0
> (205)Unknown Attribute       0x000a   253   252  
> 000       0
> (207)Unknown Attribute       0x002a   252   252  
> 000       0
> (208)Unknown Attribute       0x002a   252   252  
> 000       0
> (209)Unknown Attribute       0x0024   253   253  
> 000       0
> ( 99)Unknown Attribute       0x0004   253   253  
> 000       0
> (100)Unknown Attribute       0x0004   253   253  
> 000       0
> (101)Unknown Attribute       0x0004   253   253  
> 000       0
> SMART Error Log:
> SMART Error Logging Version: 1
> Error Log Data Structure Pointer: 05
> ATA Error Count: 8
> Non-Fatal Count: 0
> 
> Error Log Structure 1:
> DCR   FR   SC   SN   CL   SH   D/H   CR   Timestamp
>   00   db   00   00   4f   c2    00   b0     201
>   00   d8   00   00   4f   c2    00   b0     201
>   00   da   00   00   4f   c2    00   b0     201
>   00   d9   00   00   4f   c2    00   b0     201
>   00   fe   00   00   00   00    00   ef     201
>   00   04   50   42   97   23    00   51     5
> 
> Error Log Structure 2:
> DCR   FR   SC   SN   CL   SH   D/H   CR   Timestamp
>   08   00   80   2a   4e   8a    e0   25     458636
>   08   00   80   aa   4e   8a    e0   25     458636
>   08   00   80   2a   4f   8a    e0   25     458636
>   08   00   80   aa   4f   8a    e0   25     458636
>   08   d0   01   00   4f   c2    e0   b0     459147
>   00   04   01   0b   4f   c2    e0   51     279972
> 
> Error Log Structure 3:
> DCR   FR   SC   SN   CL   SH   D/H   CR   Timestamp
>   08   00   80   aa   4e   8a    e0   25     458636
>   08   00   80   2a   4f   8a    e0   25     458636
>   08   00   80   aa   4f   8a    e0   25     458636
>   08   d0   01   00   4f   c2    e0   b0     459147
>   08   d1   01   01   4f   c2    e0   b0     459147
>   00   04   01   0b   4f   c2    e0   51     279972
> 
> Error Log Structure 4:
> DCR   FR   SC   SN   CL   SH   D/H   CR   Timestamp
>   08   00   80   2a   4f   8a    e0   25     458636
>   08   00   80   aa   4f   8a    e0   25     458636
>   08   d0   01   00   4f   c2    e0   b0     459147
>   08   d1   01   01   4f   c2    e0   b0     459147
>   08   d0   01   00   4f   c2    e0   b0     459148
>   00   04   01   0b   4f   c2    e0   51     279972
> 
> Error Log Structure 5:
> DCR   FR   SC   SN   CL   SH   D/H   CR   Timestamp
>   08   00   80   aa   4f   8a    e0   25     458636
>   08   d0   01   00   4f   c2    e0   b0     459147
>   08   d1   01   01   4f   c2    e0   b0     459147
>   08   d0   01   00   4f   c2    e0   b0     459148
>   08   d1   01   01   4f   c2    e0   b0     459148
>   00   04   01   0b   4f   c2    e0   51     279972
> 
> 
> - Bryan
> 


__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: FW: PDC202XX DMA loss in 2.4.21-pre3-ac4
  2003-01-26  7:15     ` Manish Lachwani
@ 2003-01-26  8:13       ` Andre Hedrick
  2003-01-26  8:27         ` Manish Lachwani
  0 siblings, 1 reply; 11+ messages in thread
From: Andre Hedrick @ 2003-01-26  8:13 UTC (permalink / raw)
  To: Manish Lachwani; +Cc: Bryan Andersen, linux-kernel

On Sat, 25 Jan 2003, Manish Lachwani wrote:

> The "Hardware ECC Recovered" indicates the number of
> ECC errors corrected in the drive. Do one thing. Try
> to swap the drive with the drive on another ATA cable.
> So, swap /dev/hde with /dev/hda (or whatever)
> physically and check if the error follows the drive or
> the ATA cable. 
> 
> If it follows the drive, you may have to replace the
> drive. Additionally, from the SMART error log #5:
> 
> 00   04   01   0b   4f   c2    e0   51     279972

NO!
       command aborted
            amount to transfer == 1 sector
                 have to dig through notes to decode ...
                      lcyl smart passcode
                           hcyl smart passcode
                                 primary device
                                      ready_seek_error


It barfed the command ...

try -e first

Cheers,

Andre Hedrick
LAD Storage Consulting Group




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: FW: PDC202XX DMA loss in 2.4.21-pre3-ac4
  2003-01-26  8:13       ` Andre Hedrick
@ 2003-01-26  8:27         ` Manish Lachwani
  2003-01-26  8:48           ` Andre Hedrick
  0 siblings, 1 reply; 11+ messages in thread
From: Manish Lachwani @ 2003-01-26  8:27 UTC (permalink / raw)
  To: Andre Hedrick; +Cc: Bryan Andersen, linux-kernel

I dont think so. Without SMART data collection being
enabled, it wont give out the any SMART data at all.
How, did the SMART data show:

Vendor Specific SMART Attributes with Thresholds:
Revision Number: 16
Attribute                    Flag     Value Worst
Threshold Raw Value
(  3)Spin Up Time            0x0027   252   252   063 
     0
(  4)Start Stop Count        0x0032   253   253   000 
     0
(  5)Reallocated Sector Ct   0x0033   253   253   063 
     0
(  6)Read Channel Margin     0x0001   253   253   100 
     0
(  7)Seek Error Rate         0x000a   253   252   000 
     0
(  8)Seek Time Preformance   0x0027   244   244   187 
     36736
(  9)Power On Hours          0x0032   253   253   000 
     4341
( 10)Spin Retry Count        0x002b   252   252   223 
     0
( 11)Calibration Retry Count 0x002b   252   252   223 
     0
( 12)Power Cycle Count       0x0032   253   253   000 
     43
(192)Power-Off Retract Count 0x0032   253   253   000 
     0
(193)Load Cycle Count        0x0032   253   253   000 
     0
(194)Temperature             0x0032   253   253   000 
     0
(195)Hardware ECC Recovered  0x000a   253   252   000 
     221
(196)Reallocated Event Count 0x0008   253   253   000 
     0
(197)Current Pending Sector  0x0008   253   253   000 
     0
(198)Offline Uncorrectable   0x0008   253   253   000 
     0
(199)UDMA CRC Error Count    0x0008   199   199   000 
     0
(200)Unknown Attribute       0x000a   253   252   000 
     0
(201)Unknown Attribute       0x000a   253   252   000 
     0
(202)Unknown Attribute       0x000a   253   252   000 
     0
(203)Unknown Attribute       0x000b   253   252   180 
     0
(204)Unknown Attribute       0x000a   253   252   000 
     0
(205)Unknown Attribute       0x000a   253   252   000 
     0
(207)Unknown Attribute       0x002a   252   252   000 
     0
(208)Unknown Attribute       0x002a   252   252   000 
     0
(209)Unknown Attribute       0x0024   253   253   000 
     0
( 99)Unknown Attribute       0x0004   253   253   000 
     0
(100)Unknown Attribute       0x0004   253   253   000 
     0
(101)Unknown Attribute       0x0004   253   253   000 
     0
SMART Error Log:
SMART Error Logging Version: 1
Error Log Data Structure Pointer: 05
ATA Error Count: 8
Non-Fatal Count: 0

Also, the SMART error log, 

Error Log Structure 5:
DCR   FR   SC   SN   CL   SH   D/H   CR   Timestamp
  08   00   80   aa   4f   8a    e0   25     458636
  08   d0   01   00   4f   c2    e0   b0     459147
  08   d1   01   01   4f   c2    e0   b0     459147
  08   d0   01   00   4f   c2    e0   b0     459148
  08   d1   01   01   4f   c2    e0   b0     459148
  00   04   01   0b   4f   c2    e0   51     279972

You can retrieve the sector# ...

Thanks
Manish

--- Andre Hedrick <andre@linux-ide.org> wrote:
> On Sat, 25 Jan 2003, Manish Lachwani wrote:
> 
> > The "Hardware ECC Recovered" indicates the number
> of
> > ECC errors corrected in the drive. Do one thing.
> Try
> > to swap the drive with the drive on another ATA
> cable.
> > So, swap /dev/hde with /dev/hda (or whatever)
> > physically and check if the error follows the
> drive or
> > the ATA cable. 
> > 
> > If it follows the drive, you may have to replace
> the
> > drive. Additionally, from the SMART error log #5:
> > 
> > 00   04   01   0b   4f   c2    e0   51     279972
> 
> NO!
>        command aborted
>             amount to transfer == 1 sector
>                  have to dig through notes to decode
> ...
>                       lcyl smart passcode
>                            hcyl smart passcode
>                                  primary device
>                                      
> ready_seek_error
> 
> 
> It barfed the command ...
> 
> try -e first
> 
> Cheers,
> 
> Andre Hedrick
> LAD Storage Consulting Group
> 
> 
> 


__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: FW: PDC202XX DMA loss in 2.4.21-pre3-ac4
  2003-01-26  8:27         ` Manish Lachwani
@ 2003-01-26  8:48           ` Andre Hedrick
  2003-01-26  9:10             ` Manish Lachwani
  0 siblings, 1 reply; 11+ messages in thread
From: Andre Hedrick @ 2003-01-26  8:48 UTC (permalink / raw)
  To: Manish Lachwani; +Cc: Bryan Andersen, linux-kernel


Smart can be enabled by the BIOS, but the BIOS does not issue diagnostic
tests operations.

> General Smart Values:
> Off-line data collection status: (0x00) Offline data collection activity was
>                                          never started

was never started --

> Self-test execution status:      (   0) The previous self-test routine completed
>                                          without error or no self-test has ever
>                                          been run

Was never executed, "after" the vendor cleared the state before shipping.

They can clear the RO log space that can not be gotten to w/o VUO and
passcodes.

So show me the sector form the logs.
You can't!

WIN_READDMA_EXT == 0x25

>   08   00   80   aa   4f   8a    e0   25     458636

You only have the lower 24-bits

0x??????8a4faa

This requires another tool, as the original "smart from sff-8035" is
obsolete.


Cheers,

Andre Hedrick
LAD Storage Consulting Group

On Sun, 26 Jan 2003, Manish Lachwani wrote:

> I dont think so. Without SMART data collection being
> enabled, it wont give out the any SMART data at all.
> How, did the SMART data show:
> 
> Vendor Specific SMART Attributes with Thresholds:
> Revision Number: 16
> Attribute                    Flag     Value Worst
> Threshold Raw Value
> (  3)Spin Up Time            0x0027   252   252   063 
>      0
> (  4)Start Stop Count        0x0032   253   253   000 
>      0
> (  5)Reallocated Sector Ct   0x0033   253   253   063 
>      0
> (  6)Read Channel Margin     0x0001   253   253   100 
>      0
> (  7)Seek Error Rate         0x000a   253   252   000 
>      0
> (  8)Seek Time Preformance   0x0027   244   244   187 
>      36736
> (  9)Power On Hours          0x0032   253   253   000 
>      4341
> ( 10)Spin Retry Count        0x002b   252   252   223 
>      0
> ( 11)Calibration Retry Count 0x002b   252   252   223 
>      0
> ( 12)Power Cycle Count       0x0032   253   253   000 
>      43
> (192)Power-Off Retract Count 0x0032   253   253   000 
>      0
> (193)Load Cycle Count        0x0032   253   253   000 
>      0
> (194)Temperature             0x0032   253   253   000 
>      0
> (195)Hardware ECC Recovered  0x000a   253   252   000 
>      221
> (196)Reallocated Event Count 0x0008   253   253   000 
>      0
> (197)Current Pending Sector  0x0008   253   253   000 
>      0
> (198)Offline Uncorrectable   0x0008   253   253   000 
>      0
> (199)UDMA CRC Error Count    0x0008   199   199   000 
>      0
> (200)Unknown Attribute       0x000a   253   252   000 
>      0
> (201)Unknown Attribute       0x000a   253   252   000 
>      0
> (202)Unknown Attribute       0x000a   253   252   000 
>      0
> (203)Unknown Attribute       0x000b   253   252   180 
>      0
> (204)Unknown Attribute       0x000a   253   252   000 
>      0
> (205)Unknown Attribute       0x000a   253   252   000 
>      0
> (207)Unknown Attribute       0x002a   252   252   000 
>      0
> (208)Unknown Attribute       0x002a   252   252   000 
>      0
> (209)Unknown Attribute       0x0024   253   253   000 
>      0
> ( 99)Unknown Attribute       0x0004   253   253   000 
>      0
> (100)Unknown Attribute       0x0004   253   253   000 
>      0
> (101)Unknown Attribute       0x0004   253   253   000 
>      0
> SMART Error Log:
> SMART Error Logging Version: 1
> Error Log Data Structure Pointer: 05
> ATA Error Count: 8
> Non-Fatal Count: 0
> 
> Also, the SMART error log, 
> 
> Error Log Structure 5:
> DCR   FR   SC   SN   CL   SH   D/H   CR   Timestamp
>   08   00   80   aa   4f   8a    e0   25     458636
>   08   d0   01   00   4f   c2    e0   b0     459147
>   08   d1   01   01   4f   c2    e0   b0     459147
>   08   d0   01   00   4f   c2    e0   b0     459148
>   08   d1   01   01   4f   c2    e0   b0     459148
>   00   04   01   0b   4f   c2    e0   51     279972
> 
> You can retrieve the sector# ...
> 
> Thanks
> Manish
> 
> --- Andre Hedrick <andre@linux-ide.org> wrote:
> > On Sat, 25 Jan 2003, Manish Lachwani wrote:
> > 
> > > The "Hardware ECC Recovered" indicates the number
> > of
> > > ECC errors corrected in the drive. Do one thing.
> > Try
> > > to swap the drive with the drive on another ATA
> > cable.
> > > So, swap /dev/hde with /dev/hda (or whatever)
> > > physically and check if the error follows the
> > drive or
> > > the ATA cable. 
> > > 
> > > If it follows the drive, you may have to replace
> > the
> > > drive. Additionally, from the SMART error log #5:
> > > 
> > > 00   04   01   0b   4f   c2    e0   51     279972
> > 
> > NO!
> >        command aborted
> >             amount to transfer == 1 sector
> >                  have to dig through notes to decode
> > ...
> >                       lcyl smart passcode
> >                            hcyl smart passcode
> >                                  primary device
> >                                      
> > ready_seek_error
> > 
> > 
> > It barfed the command ...
> > 
> > try -e first
> > 
> > Cheers,
> > 
> > Andre Hedrick
> > LAD Storage Consulting Group
> > 
> > 
> > 
> 
> 
> __________________________________________________
> Do you Yahoo!?
> Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
> http://mailplus.yahoo.com
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: FW: PDC202XX DMA loss in 2.4.21-pre3-ac4
  2003-01-26  8:48           ` Andre Hedrick
@ 2003-01-26  9:10             ` Manish Lachwani
  2003-01-26  9:14               ` Andre Hedrick
  0 siblings, 1 reply; 11+ messages in thread
From: Manish Lachwani @ 2003-01-26  9:10 UTC (permalink / raw)
  To: Andre Hedrick; +Cc: Bryan Andersen, linux-kernel

his is the help from smartctl:

smartctl version 2.1 - S.M.A.R.T. Control Program
useage: smartctl -[opts] [device]
Read Only Commands:
                a               Show All S.M.A.R.T.
Information (ATA and SCSI)
                g               Show General
S.M.A.R.T. Attributes (ATA Only)
                v               Show Vendor S.M.A.R.T.
Attributes (ATA Only)
                l               Show S.M.A.R.T. Drive
Error Log (ATA Only
                L               Show S.M.A.R.T. Drive
SelfTest Log (ATA Only)
                i               Show S.M.A.R.T. Drive
Info (ATA and SCSI)
                c               Check S.M.A.R.T.
Status (ATA and SCSI)

Enable / Disable Commands:
                e               Enable S.M.A.R.T. data
collection (ATA and SCSI)
                d               Disable S.M.A.R.T.data
collection (ATA and SCSI)
                t               Enable S.M.A.R.T.
Automatic Offline Test (ATA Only)
                T               Disable S.M.A.R.T.
Automatic Offline Test (ATA Only)

Test Commands:
                O               Execute Off-line data
collectioni(ATA Only)
                S               Execute Short Self
Test (ATA Only)
                s               Execute Short Self
Test (Captive Mode) (ATA Only)
                X               Execute Extended Self
Test (ATA Only)
                x               Execute Extended Self
Test (Captive Mode)(ATA Only)
                A               Execute Self Test
Abort (ATA Only)

Off-line data collection has nothing to do with the
SMART data collection. You enable the offline test,
then run the test and collect the offline data. 

I agree with the fact that we have the lower 24 bits.
However, SMART attributes displayed is appropriately
collected from the drive. Look at the sequence below:

bash# ./smartctl -a /dev/hda
Device: ST380021A  Supports ATA Version 5
Drive supports S.M.A.R.T. and is disabled
Use option -e to enable
bash# ./smartctl -e /dev/hda
bash# ./smartctl -a /dev/hda
Device: ST380021A  Supports ATA Version 5
Drive supports S.M.A.R.T. and is enabled
Check S.M.A.R.T. Passed.

General Smart Values:
Off-line data collection status: (0x82) Offline data
collection activity
                                        completed
without error

Self-test execution status:      (   0) The previous
self-test routine completed
                                        without error
or no self-test has ever
                                        been run

Total time to complete off-line
data collection:                 ( 422) Seconds

Offline data collection
Capabilities:                    (0x1b)SMART EXECUTE
OFF-LINE IMMEDIATE
                                        Automatic
timer ON/OFF support
                                        Suspend
Offline Collection upon new
                                        command
                                        Offline
surface scan supported
                                        Self-test
supported

Smart Capablilities:           (0x0003) Saves SMART
data before entering
                                        power-saving
mode
                                        Supports SMART
auto save timer

Error logging capability:        (0x01) Error logging
supported

Short self-test routine
recommended polling time:        (   1) Minutes

Extended self-test routine
recommended polling time:        (  57) Minutes

Vendor Specific SMART Attributes with Thresholds:
Revision Number: 10
Attribute                    Flag     Value Worst
Threshold Raw Value
(  1)Raw Read Error Rate     0x000f   075   070   034 
     92897937
(  3)Spin Up Time            0x0003   070   070   000 
     0
(  4)Start Stop Count        0x0032   100   100   020 
     3
(  5)Reallocated Sector Ct   0x0033   100   100   036 
     0
(  7)Seek Error Rate         0x000f   079   060   030 
     93809829
(  9)Power On Hours          0x0032   096   096   000 
     4158
( 10)Spin Retry Count        0x0013   100   100   097 
     0
( 12)Power Cycle Count       0x0032   100   100   020 
     261
(194)Temperature             0x0022   028   043   000 
     28
(195)Hardware ECC Recovered  0x001a   075   070   000 
     92897937
(197)Current Pending Sector  0x0012   100   100   000 
     0
(198)Offline Uncorrectable   0x0010   100   100   000 
     0
(199)UDMA CRC Error Count    0x003e   200   200   000 
     0
(200)Unknown Attribute       0x0000   100   253   000 
     0
(202)Unknown Attribute       0x0032   100   253   000 
     0
SMART Error Log:
SMART Error Logging Version: 1
No Errors Logged



--- Andre Hedrick <andre@linux-ide.org> wrote:
> 
> Smart can be enabled by the BIOS, but the BIOS does
> not issue diagnostic
> tests operations.
> 
> > General Smart Values:
> > Off-line data collection status: (0x00) Offline
> data collection activity was
> >                                          never
> started
> 
> was never started --
> 
> > Self-test execution status:      (   0) The
> previous self-test routine completed
> >                                          without
> error or no self-test has ever
> >                                          been run
> 
> Was never executed, "after" the vendor cleared the
> state before shipping.
> 
> They can clear the RO log space that can not be
> gotten to w/o VUO and
> passcodes.
> 
> So show me the sector form the logs.
> You can't!
> 
> WIN_READDMA_EXT == 0x25
> 
> >   08   00   80   aa   4f   8a    e0   25    
> 458636
> 
> You only have the lower 24-bits
> 
> 0x??????8a4faa
> 
> This requires another tool, as the original "smart
> from sff-8035" is
> obsolete.
> 
> 
> Cheers,
> 
> Andre Hedrick
> LAD Storage Consulting Group
> 
> On Sun, 26 Jan 2003, Manish Lachwani wrote:
> 
> > I dont think so. Without SMART data collection
> being
> > enabled, it wont give out the any SMART data at
> all.
> > How, did the SMART data show:
> > 
> > Vendor Specific SMART Attributes with Thresholds:
> > Revision Number: 16
> > Attribute                    Flag     Value Worst
> > Threshold Raw Value
> > (  3)Spin Up Time            0x0027   252   252  
> 063 
> >      0
> > (  4)Start Stop Count        0x0032   253   253  
> 000 
> >      0
> > (  5)Reallocated Sector Ct   0x0033   253   253  
> 063 
> >      0
> > (  6)Read Channel Margin     0x0001   253   253  
> 100 
> >      0
> > (  7)Seek Error Rate         0x000a   253   252  
> 000 
> >      0
> > (  8)Seek Time Preformance   0x0027   244   244  
> 187 
> >      36736
> > (  9)Power On Hours          0x0032   253   253  
> 000 
> >      4341
> > ( 10)Spin Retry Count        0x002b   252   252  
> 223 
> >      0
> > ( 11)Calibration Retry Count 0x002b   252   252  
> 223 
> >      0
> > ( 12)Power Cycle Count       0x0032   253   253  
> 000 
> >      43
> > (192)Power-Off Retract Count 0x0032   253   253  
> 000 
> >      0
> > (193)Load Cycle Count        0x0032   253   253  
> 000 
> >      0
> > (194)Temperature             0x0032   253   253  
> 000 
> >      0
> > (195)Hardware ECC Recovered  0x000a   253   252  
> 000 
> >      221
> > (196)Reallocated Event Count 0x0008   253   253  
> 000 
> >      0
> > (197)Current Pending Sector  0x0008   253   253  
> 000 
> >      0
> > (198)Offline Uncorrectable   0x0008   253   253  
> 000 
> >      0
> > (199)UDMA CRC Error Count    0x0008   199   199  
> 000 
> >      0
> > (200)Unknown Attribute       0x000a   253   252  
> 000 
> >      0
> > (201)Unknown Attribute       0x000a   253   252  
> 000 
> >      0
> > (202)Unknown Attribute       0x000a   253   252  
> 000 
> >      0
> > (203)Unknown Attribute       0x000b   253   252  
> 180 
> >      0
> > (204)Unknown Attribute       0x000a   253   252  
> 000 
> >      0
> > (205)Unknown Attribute       0x000a   253   252  
> 000 
> >      0
> > (207)Unknown Attribute       0x002a   252   252  
> 000 
> >      0
> > (208)Unknown Attribute       0x002a   252   252  
> 000 
> >      0
> > (209)Unknown Attribute       0x0024   253   253  
> 000 
> >      0
> > ( 99)Unknown Attribute       0x0004   253   253  
> 000 
> >      0
> > (100)Unknown Attribute       0x0004   253   253  
> 000 
> >      0
> > (101)Unknown Attribute       0x0004   253   253  
> 000 
> >      0
> > SMART Error Log:
> > SMART Error Logging Version: 1
> > Error Log Data Structure Pointer: 05
> > ATA Error Count: 8
> > Non-Fatal Count: 0
> > 
> > Also, the SMART error log, 
> > 
> > Error Log Structure 5:
> > DCR   FR   SC   SN   CL   SH   D/H   CR  
> Timestamp
> >   08   00   80   aa   4f   8a    e0   25    
> 458636
> >   08   d0   01   00   4f   c2    e0   b0    
> 459147
> >   08   d1   01   01   4f   c2    e0   b0    
> 459147
> >   08   d0   01   00   4f   c2    e0   b0    
> 459148
> >   08   d1   01   01   4f   c2    e0   b0    
> 459148
> >   00   04   01   0b   4f   c2    e0   51    
> 279972
> > 
> > You can retrieve the sector# ...
> > 
> > Thanks
> > Manish
> > 
> > --- Andre Hedrick <andre@linux-ide.org> wrote:
> > > On Sat, 25 Jan 2003, Manish Lachwani wrote:
> > > 
> > > > The "Hardware ECC Recovered" indicates the
> number
> > > of
> > > > ECC errors corrected in the drive. Do one
> thing.
> > > Try
> > > > to swap the drive with the drive on another
> ATA
> > > cable.
> > > > So, swap /dev/hde with /dev/hda (or whatever)
> > > > physically and check if the error follows the
> > > drive or
> > > > the ATA cable. 
> > > > 
> > > > If it follows the drive, you may have to
> replace
> > > the
> > > > drive. Additionally, from the SMART error log
> #5:
> 
=== message truncated ===


__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: FW: PDC202XX DMA loss in 2.4.21-pre3-ac4
  2003-01-26  9:10             ` Manish Lachwani
@ 2003-01-26  9:14               ` Andre Hedrick
  0 siblings, 0 replies; 11+ messages in thread
From: Andre Hedrick @ 2003-01-26  9:14 UTC (permalink / raw)
  To: Manish Lachwani; +Cc: Bryan Andersen, linux-kernel


Yeah, I know a good friend of my is the author.
Since he is not available to comment, you can believe as you wish.

Read the code and read the ioctl transport, you can not get there from
here, period.

So the 28-bit Smart was never executed, however the 48-bit was.
You can't get the meaning full data.

I am out of this arguement, go read the spec.

Cheers,


On Sun, 26 Jan 2003, Manish Lachwani wrote:

> his is the help from smartctl:
> 
> smartctl version 2.1 - S.M.A.R.T. Control Program
> useage: smartctl -[opts] [device]
> Read Only Commands:
>                 a               Show All S.M.A.R.T.
> Information (ATA and SCSI)
>                 g               Show General
> S.M.A.R.T. Attributes (ATA Only)
>                 v               Show Vendor S.M.A.R.T.
> Attributes (ATA Only)
>                 l               Show S.M.A.R.T. Drive
> Error Log (ATA Only
>                 L               Show S.M.A.R.T. Drive
> SelfTest Log (ATA Only)
>                 i               Show S.M.A.R.T. Drive
> Info (ATA and SCSI)
>                 c               Check S.M.A.R.T.
> Status (ATA and SCSI)
> 
> Enable / Disable Commands:
>                 e               Enable S.M.A.R.T. data
> collection (ATA and SCSI)
>                 d               Disable S.M.A.R.T.data
> collection (ATA and SCSI)
>                 t               Enable S.M.A.R.T.
> Automatic Offline Test (ATA Only)
>                 T               Disable S.M.A.R.T.
> Automatic Offline Test (ATA Only)
> 
> Test Commands:
>                 O               Execute Off-line data
> collectioni(ATA Only)
>                 S               Execute Short Self
> Test (ATA Only)
>                 s               Execute Short Self
> Test (Captive Mode) (ATA Only)
>                 X               Execute Extended Self
> Test (ATA Only)
>                 x               Execute Extended Self
> Test (Captive Mode)(ATA Only)
>                 A               Execute Self Test
> Abort (ATA Only)
> 
> Off-line data collection has nothing to do with the
> SMART data collection. You enable the offline test,
> then run the test and collect the offline data. 
> 
> I agree with the fact that we have the lower 24 bits.
> However, SMART attributes displayed is appropriately
> collected from the drive. Look at the sequence below:
> 
> bash# ./smartctl -a /dev/hda
> Device: ST380021A  Supports ATA Version 5
> Drive supports S.M.A.R.T. and is disabled
> Use option -e to enable
> bash# ./smartctl -e /dev/hda
> bash# ./smartctl -a /dev/hda
> Device: ST380021A  Supports ATA Version 5
> Drive supports S.M.A.R.T. and is enabled
> Check S.M.A.R.T. Passed.
> 
> General Smart Values:
> Off-line data collection status: (0x82) Offline data
> collection activity
>                                         completed
> without error
> 
> Self-test execution status:      (   0) The previous
> self-test routine completed
>                                         without error
> or no self-test has ever
>                                         been run
> 
> Total time to complete off-line
> data collection:                 ( 422) Seconds
> 
> Offline data collection
> Capabilities:                    (0x1b)SMART EXECUTE
> OFF-LINE IMMEDIATE
>                                         Automatic
> timer ON/OFF support
>                                         Suspend
> Offline Collection upon new
>                                         command
>                                         Offline
> surface scan supported
>                                         Self-test
> supported
> 
> Smart Capablilities:           (0x0003) Saves SMART
> data before entering
>                                         power-saving
> mode
>                                         Supports SMART
> auto save timer
> 
> Error logging capability:        (0x01) Error logging
> supported
> 
> Short self-test routine
> recommended polling time:        (   1) Minutes
> 
> Extended self-test routine
> recommended polling time:        (  57) Minutes
> 
> Vendor Specific SMART Attributes with Thresholds:
> Revision Number: 10
> Attribute                    Flag     Value Worst
> Threshold Raw Value
> (  1)Raw Read Error Rate     0x000f   075   070   034 
>      92897937
> (  3)Spin Up Time            0x0003   070   070   000 
>      0
> (  4)Start Stop Count        0x0032   100   100   020 
>      3
> (  5)Reallocated Sector Ct   0x0033   100   100   036 
>      0
> (  7)Seek Error Rate         0x000f   079   060   030 
>      93809829
> (  9)Power On Hours          0x0032   096   096   000 
>      4158
> ( 10)Spin Retry Count        0x0013   100   100   097 
>      0
> ( 12)Power Cycle Count       0x0032   100   100   020 
>      261
> (194)Temperature             0x0022   028   043   000 
>      28
> (195)Hardware ECC Recovered  0x001a   075   070   000 
>      92897937
> (197)Current Pending Sector  0x0012   100   100   000 
>      0
> (198)Offline Uncorrectable   0x0010   100   100   000 
>      0
> (199)UDMA CRC Error Count    0x003e   200   200   000 
>      0
> (200)Unknown Attribute       0x0000   100   253   000 
>      0
> (202)Unknown Attribute       0x0032   100   253   000 
>      0
> SMART Error Log:
> SMART Error Logging Version: 1
> No Errors Logged
> 
> 
> 
> --- Andre Hedrick <andre@linux-ide.org> wrote:
> > 
> > Smart can be enabled by the BIOS, but the BIOS does
> > not issue diagnostic
> > tests operations.
> > 
> > > General Smart Values:
> > > Off-line data collection status: (0x00) Offline
> > data collection activity was
> > >                                          never
> > started
> > 
> > was never started --
> > 
> > > Self-test execution status:      (   0) The
> > previous self-test routine completed
> > >                                          without
> > error or no self-test has ever
> > >                                          been run
> > 
> > Was never executed, "after" the vendor cleared the
> > state before shipping.
> > 
> > They can clear the RO log space that can not be
> > gotten to w/o VUO and
> > passcodes.
> > 
> > So show me the sector form the logs.
> > You can't!
> > 
> > WIN_READDMA_EXT == 0x25
> > 
> > >   08   00   80   aa   4f   8a    e0   25    
> > 458636
> > 
> > You only have the lower 24-bits
> > 
> > 0x??????8a4faa
> > 
> > This requires another tool, as the original "smart
> > from sff-8035" is
> > obsolete.
> > 
> > 
> > Cheers,
> > 
> > Andre Hedrick
> > LAD Storage Consulting Group
> > 
> > On Sun, 26 Jan 2003, Manish Lachwani wrote:
> > 
> > > I dont think so. Without SMART data collection
> > being
> > > enabled, it wont give out the any SMART data at
> > all.
> > > How, did the SMART data show:
> > > 
> > > Vendor Specific SMART Attributes with Thresholds:
> > > Revision Number: 16
> > > Attribute                    Flag     Value Worst
> > > Threshold Raw Value
> > > (  3)Spin Up Time            0x0027   252   252  
> > 063 
> > >      0
> > > (  4)Start Stop Count        0x0032   253   253  
> > 000 
> > >      0
> > > (  5)Reallocated Sector Ct   0x0033   253   253  
> > 063 
> > >      0
> > > (  6)Read Channel Margin     0x0001   253   253  
> > 100 
> > >      0
> > > (  7)Seek Error Rate         0x000a   253   252  
> > 000 
> > >      0
> > > (  8)Seek Time Preformance   0x0027   244   244  
> > 187 
> > >      36736
> > > (  9)Power On Hours          0x0032   253   253  
> > 000 
> > >      4341
> > > ( 10)Spin Retry Count        0x002b   252   252  
> > 223 
> > >      0
> > > ( 11)Calibration Retry Count 0x002b   252   252  
> > 223 
> > >      0
> > > ( 12)Power Cycle Count       0x0032   253   253  
> > 000 
> > >      43
> > > (192)Power-Off Retract Count 0x0032   253   253  
> > 000 
> > >      0
> > > (193)Load Cycle Count        0x0032   253   253  
> > 000 
> > >      0
> > > (194)Temperature             0x0032   253   253  
> > 000 
> > >      0
> > > (195)Hardware ECC Recovered  0x000a   253   252  
> > 000 
> > >      221
> > > (196)Reallocated Event Count 0x0008   253   253  
> > 000 
> > >      0
> > > (197)Current Pending Sector  0x0008   253   253  
> > 000 
> > >      0
> > > (198)Offline Uncorrectable   0x0008   253   253  
> > 000 
> > >      0
> > > (199)UDMA CRC Error Count    0x0008   199   199  
> > 000 
> > >      0
> > > (200)Unknown Attribute       0x000a   253   252  
> > 000 
> > >      0
> > > (201)Unknown Attribute       0x000a   253   252  
> > 000 
> > >      0
> > > (202)Unknown Attribute       0x000a   253   252  
> > 000 
> > >      0
> > > (203)Unknown Attribute       0x000b   253   252  
> > 180 
> > >      0
> > > (204)Unknown Attribute       0x000a   253   252  
> > 000 
> > >      0
> > > (205)Unknown Attribute       0x000a   253   252  
> > 000 
> > >      0
> > > (207)Unknown Attribute       0x002a   252   252  
> > 000 
> > >      0
> > > (208)Unknown Attribute       0x002a   252   252  
> > 000 
> > >      0
> > > (209)Unknown Attribute       0x0024   253   253  
> > 000 
> > >      0
> > > ( 99)Unknown Attribute       0x0004   253   253  
> > 000 
> > >      0
> > > (100)Unknown Attribute       0x0004   253   253  
> > 000 
> > >      0
> > > (101)Unknown Attribute       0x0004   253   253  
> > 000 
> > >      0
> > > SMART Error Log:
> > > SMART Error Logging Version: 1
> > > Error Log Data Structure Pointer: 05
> > > ATA Error Count: 8
> > > Non-Fatal Count: 0
> > > 
> > > Also, the SMART error log, 
> > > 
> > > Error Log Structure 5:
> > > DCR   FR   SC   SN   CL   SH   D/H   CR  
> > Timestamp
> > >   08   00   80   aa   4f   8a    e0   25    
> > 458636
> > >   08   d0   01   00   4f   c2    e0   b0    
> > 459147
> > >   08   d1   01   01   4f   c2    e0   b0    
> > 459147
> > >   08   d0   01   00   4f   c2    e0   b0    
> > 459148
> > >   08   d1   01   01   4f   c2    e0   b0    
> > 459148
> > >   00   04   01   0b   4f   c2    e0   51    
> > 279972
> > > 
> > > You can retrieve the sector# ...
> > > 
> > > Thanks
> > > Manish
> > > 
> > > --- Andre Hedrick <andre@linux-ide.org> wrote:
> > > > On Sat, 25 Jan 2003, Manish Lachwani wrote:
> > > > 
> > > > > The "Hardware ECC Recovered" indicates the
> > number
> > > > of
> > > > > ECC errors corrected in the drive. Do one
> > thing.
> > > > Try
> > > > > to swap the drive with the drive on another
> > ATA
> > > > cable.
> > > > > So, swap /dev/hde with /dev/hda (or whatever)
> > > > > physically and check if the error follows the
> > > > drive or
> > > > > the ATA cable. 
> > > > > 
> > > > > If it follows the drive, you may have to
> > replace
> > > > the
> > > > > drive. Additionally, from the SMART error log
> > #5:
> > 
> === message truncated ===
> 
> 
> __________________________________________________
> Do you Yahoo!?
> Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
> http://mailplus.yahoo.com
> 

Andre Hedrick
LAD Storage Consulting Group


^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: FW: PDC202XX DMA loss in 2.4.21-pre3-ac4
       [not found] <233C89823A37714D95B1A891DE3BCE5202AB1D12@xch-b.win.zambeel.com>
@ 2003-01-26  9:42 ` Andre Hedrick
  2003-01-26 11:24   ` Bryan Andersen
  0 siblings, 1 reply; 11+ messages in thread
From: Andre Hedrick @ 2003-01-26  9:42 UTC (permalink / raw)
  To: Manish Lachwani; +Cc: Manish Lachwani, Bryan Andersen, linux-kernel


Manish,

So you come back now as "Zambeel", is this to impress me?
Or has Zambeel modified "MY" driver and is shipping the driver with out
the source code?  Since you are using "smartsuite-2.1", it is not designed
to submit 48-bit commands.

Well do you know what a "General Purpose Log" is ?
This is where the 48-bit logging features are stored.
See during one of the 6 annual meetings each year, the commitee debated
the issue of how to deal with 48-bit logs.  Given there was new class of
logs created for AV Streaming, we elected to use those and preserve as
much of the expected behavor in the legacy logs.

So if the error happened with a 28-bit DMA Read then the full log would be
present in the 28-bit logs, but the error happend with a 48-bit DMA Read,
and the result is stuffed into a 48-bit general purpose log.

You are using a 48-bit drive and issuing 28-bit Smart commands; therefore,
you can not extract the useful data being sought.  

Remind me not to deploy Zambeel products to any of my customers.

Yeah "smartctl" does not work, but reading the correct 48-bit logs does.

All I stated was you are using the wrong tool to extract the needed
information.

Regards,

Andre Hedrick
LAD Storage Consulting Group


On Sun, 26 Jan 2003, Manish Lachwani wrote:

> Yes, and I agree that SMART can be enabled by the BIOS. However, if SMART is
> not enabled, then "smartctl -a /dev/hdX" wont return any values for the
> SMART attributes ...
> 
> Thanks
> Manish
> 
> -----Original Message-----
> From: Andre Hedrick [mailto:andre@linux-ide.org]
> Sent: Sunday, January 26, 2003 12:49 AM
> To: Manish Lachwani
> Cc: Bryan Andersen; linux-kernel@vger.kernel.org
> Subject: Re: FW: PDC202XX DMA loss in 2.4.21-pre3-ac4
> 
> 
> 
> Smart can be enabled by the BIOS, but the BIOS does not issue diagnostic
> tests operations.
> 
> > General Smart Values:
> > Off-line data collection status: (0x00) Offline data collection activity
> was
> >                                          never started
> 
> was never started --
> 
> > Self-test execution status:      (   0) The previous self-test routine
> completed
> >                                          without error or no self-test has
> ever
> >                                          been run
> 
> Was never executed, "after" the vendor cleared the state before shipping.
> 
> They can clear the RO log space that can not be gotten to w/o VUO and
> passcodes.
> 
> So show me the sector form the logs.
> You can't!
> 
> WIN_READDMA_EXT == 0x25
> 
> >   08   00   80   aa   4f   8a    e0   25     458636
> 
> You only have the lower 24-bits
> 
> 0x??????8a4faa
> 
> This requires another tool, as the original "smart from sff-8035" is
> obsolete.
> 
> 
> Cheers,
> 
> Andre Hedrick
> LAD Storage Consulting Group
> 
> On Sun, 26 Jan 2003, Manish Lachwani wrote:
> 
> > I dont think so. Without SMART data collection being
> > enabled, it wont give out the any SMART data at all.
> > How, did the SMART data show:
> > 
> > Vendor Specific SMART Attributes with Thresholds:
> > Revision Number: 16
> > Attribute                    Flag     Value Worst
> > Threshold Raw Value
> > (  3)Spin Up Time            0x0027   252   252   063 
> >      0
> > (  4)Start Stop Count        0x0032   253   253   000 
> >      0
> > (  5)Reallocated Sector Ct   0x0033   253   253   063 
> >      0
> > (  6)Read Channel Margin     0x0001   253   253   100 
> >      0
> > (  7)Seek Error Rate         0x000a   253   252   000 
> >      0
> > (  8)Seek Time Preformance   0x0027   244   244   187 
> >      36736
> > (  9)Power On Hours          0x0032   253   253   000 
> >      4341
> > ( 10)Spin Retry Count        0x002b   252   252   223 
> >      0
> > ( 11)Calibration Retry Count 0x002b   252   252   223 
> >      0
> > ( 12)Power Cycle Count       0x0032   253   253   000 
> >      43
> > (192)Power-Off Retract Count 0x0032   253   253   000 
> >      0
> > (193)Load Cycle Count        0x0032   253   253   000 
> >      0
> > (194)Temperature             0x0032   253   253   000 
> >      0
> > (195)Hardware ECC Recovered  0x000a   253   252   000 
> >      221
> > (196)Reallocated Event Count 0x0008   253   253   000 
> >      0
> > (197)Current Pending Sector  0x0008   253   253   000 
> >      0
> > (198)Offline Uncorrectable   0x0008   253   253   000 
> >      0
> > (199)UDMA CRC Error Count    0x0008   199   199   000 
> >      0
> > (200)Unknown Attribute       0x000a   253   252   000 
> >      0
> > (201)Unknown Attribute       0x000a   253   252   000 
> >      0
> > (202)Unknown Attribute       0x000a   253   252   000 
> >      0
> > (203)Unknown Attribute       0x000b   253   252   180 
> >      0
> > (204)Unknown Attribute       0x000a   253   252   000 
> >      0
> > (205)Unknown Attribute       0x000a   253   252   000 
> >      0
> > (207)Unknown Attribute       0x002a   252   252   000 
> >      0
> > (208)Unknown Attribute       0x002a   252   252   000 
> >      0
> > (209)Unknown Attribute       0x0024   253   253   000 
> >      0
> > ( 99)Unknown Attribute       0x0004   253   253   000 
> >      0
> > (100)Unknown Attribute       0x0004   253   253   000 
> >      0
> > (101)Unknown Attribute       0x0004   253   253   000 
> >      0
> > SMART Error Log:
> > SMART Error Logging Version: 1
> > Error Log Data Structure Pointer: 05
> > ATA Error Count: 8
> > Non-Fatal Count: 0
> > 
> > Also, the SMART error log, 
> > 
> > Error Log Structure 5:
> > DCR   FR   SC   SN   CL   SH   D/H   CR   Timestamp
> >   08   00   80   aa   4f   8a    e0   25     458636
> >   08   d0   01   00   4f   c2    e0   b0     459147
> >   08   d1   01   01   4f   c2    e0   b0     459147
> >   08   d0   01   00   4f   c2    e0   b0     459148
> >   08   d1   01   01   4f   c2    e0   b0     459148
> >   00   04   01   0b   4f   c2    e0   51     279972
> > 
> > You can retrieve the sector# ...
> > 
> > Thanks
> > Manish
> > 
> > --- Andre Hedrick <andre@linux-ide.org> wrote:
> > > On Sat, 25 Jan 2003, Manish Lachwani wrote:
> > > 
> > > > The "Hardware ECC Recovered" indicates the number
> > > of
> > > > ECC errors corrected in the drive. Do one thing.
> > > Try
> > > > to swap the drive with the drive on another ATA
> > > cable.
> > > > So, swap /dev/hde with /dev/hda (or whatever)
> > > > physically and check if the error follows the
> > > drive or
> > > > the ATA cable. 
> > > > 
> > > > If it follows the drive, you may have to replace
> > > the
> > > > drive. Additionally, from the SMART error log #5:
> > > > 
> > > > 00   04   01   0b   4f   c2    e0   51     279972
> > > 
> > > NO!
> > >        command aborted
> > >             amount to transfer == 1 sector
> > >                  have to dig through notes to decode
> > > ...
> > >                       lcyl smart passcode
> > >                            hcyl smart passcode
> > >                                  primary device
> > >                                      
> > > ready_seek_error
> > > 
> > > 
> > > It barfed the command ...
> > > 
> > > try -e first
> > > 
> > > Cheers,
> > > 
> > > Andre Hedrick
> > > LAD Storage Consulting Group
> > > 
> > > 
> > > 
> > 
> > 
> > __________________________________________________
> > Do you Yahoo!?
> > Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
> > http://mailplus.yahoo.com
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> > 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: FW: PDC202XX DMA loss in 2.4.21-pre3-ac4
  2003-01-26  9:42 ` Andre Hedrick
@ 2003-01-26 11:24   ` Bryan Andersen
  2003-01-26 21:27     ` Andre Hedrick
  0 siblings, 1 reply; 11+ messages in thread
From: Bryan Andersen @ 2003-01-26 11:24 UTC (permalink / raw)
  To: Andre Hedrick; +Cc: Manish Lachwani, Manish Lachwani, linux-kernel



Andre Hedrick wrote:
> Yeah "smartctl" does not work, but reading the correct 48-bit logs does.
> 
> All I stated was you are using the wrong tool to extract the needed
> information.

So which tool(s) do I need for getting at the 48-bit log data?  If I 
need a spec doc to decode a binary block of data I can deal with that. 
Prefer not to, but...

I did a full surface read scan of the disk and it turned up no errors.

As for how I got the SMART data dump I enabled SMART on the drive then 
dumped the data a few minutes later.

- Bryan


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: FW: PDC202XX DMA loss in 2.4.21-pre3-ac4
  2003-01-26 11:24   ` Bryan Andersen
@ 2003-01-26 21:27     ` Andre Hedrick
  0 siblings, 0 replies; 11+ messages in thread
From: Andre Hedrick @ 2003-01-26 21:27 UTC (permalink / raw)
  To: Bryan Andersen; +Cc: linux-kernel


Bryan,

The one nobody has written in closed form as of to date.
Basically it is a DNE (does not exist).

The infrastructure is just now being laid into place now so, it will take
some time to code out the rest of the internals to allow standard UI
access.

I have my own suite of "private" tools, and they will not be released.
There are possible patent issues, and licensing problems.

Cheers,

Andre Hedrick
LAD Storage Consulting Group

On Sun, 26 Jan 2003, Bryan Andersen wrote:

> 
> 
> Andre Hedrick wrote:
> > Yeah "smartctl" does not work, but reading the correct 48-bit logs does.
> > 
> > All I stated was you are using the wrong tool to extract the needed
> > information.
> 
> So which tool(s) do I need for getting at the 48-bit log data?  If I 
> need a spec doc to decode a binary block of data I can deal with that. 
> Prefer not to, but...
> 
> I did a full surface read scan of the disk and it turned up no errors.
> 
> As for how I got the SMART data dump I enabled SMART on the drive then 
> dumped the data a few minutes later.
> 
> - Bryan
> 



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2003-01-26 21:23 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <233C89823A37714D95B1A891DE3BCE5202AB1D07@xch-b.win.zambeel.com>
2003-01-25 20:34 ` FW: PDC202XX DMA loss in 2.4.21-pre3-ac4 Manish Lachwani
2003-01-26  6:51   ` Bryan Andersen
2003-01-26  7:15     ` Manish Lachwani
2003-01-26  8:13       ` Andre Hedrick
2003-01-26  8:27         ` Manish Lachwani
2003-01-26  8:48           ` Andre Hedrick
2003-01-26  9:10             ` Manish Lachwani
2003-01-26  9:14               ` Andre Hedrick
     [not found] <233C89823A37714D95B1A891DE3BCE5202AB1D12@xch-b.win.zambeel.com>
2003-01-26  9:42 ` Andre Hedrick
2003-01-26 11:24   ` Bryan Andersen
2003-01-26 21:27     ` Andre Hedrick

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox