* Re: FW: PDC202XX DMA loss in 2.4.21-pre3-ac4 [not found] <233C89823A37714D95B1A891DE3BCE5202AB1D07@xch-b.win.zambeel.com> @ 2003-01-25 20:34 ` Manish Lachwani 2003-01-26 6:51 ` Bryan Andersen 0 siblings, 1 reply; 11+ messages in thread From: Manish Lachwani @ 2003-01-25 20:34 UTC (permalink / raw) To: linux-kernel 0xd0 indicates that the driver aborted the command. Can you try to get the SMART data from the drive using smartctl? use "smartctl -e /dev/hdX" to enable SMART collection use "smartctl -a /dev/hdX" to collect the SMART data ... Thanks Manish > I'm sending this out now to see if others are > noticing the same problem. > > Under heavy disk IO I'm loosing DMA on a disk disk > is being handled by > the new PDC202XX driver. The HD controller is a > PDC20269 based > controler like those in the Maxtor HD/Controller > bundles. Sofar it has > lost DMA 4 times under heavy loads when multiple > disks are being > accessed at once. It appears to lose DMA at a > random point durring > heavy disk IO. It has gone hours before it happened > to lasting less > than 30 minutes. I'm doing a test over night to see > if it happens when > it is the only disk being accessed heavily. My > first guess is it is > dropping a DMA finish interrupt then failing when it > tries to set one up > again but I'm not sure on that. The other idea I > had is that when the > code is trying to get a DMA channel all are in use > and it fails. Any > help on what to look into would be appreciated. > > Jan 24 22:41:14 blip kernel: hde: dma_intr: > status=0xd0 { Busy } > Jan 24 22:41:14 blip kernel: > Jan 24 22:41:14 blip kernel: hde: DMA disabled > Jan 24 22:41:14 blip kernel: PDC202XX: Primary > channel reset. > Jan 24 22:41:14 blip kernel: ide2: reset: success > > The mother board is an ASUS A7N8X and it has > assigned ide2, ide2 (both > controllers on the Promices card), the nVidia sound, > and the display to > the same interrupt. I've was trying to see if that > was the problem but > dissabling other devices didn't seam to help. It > still happened. > > The PDC20269 controller has one Maxtor 4G160J8 > (160GB) disk per channel > and each is jumpered to be master. (hdparm -I > ouputs below) I've also > added in the current dmesg output, and a cat of > /proc/interrupts. > > --------------------- > # dmesg > Linux version 2.4.21-pre3-ac4 (root@blip) (gcc > version 2.95.4 20011002 > (Debian prerelease)) #21 SMP Sun Jan 19 13:54:23 CST > 2003 > BIOS-provided physical RAM map: > BIOS-e820: 0000000000000000 - 000000000009e800 > (usable) > BIOS-e820: 000000000009e800 - 00000000000a0000 > (reserved) > BIOS-e820: 00000000000f0000 - 0000000000100000 > (reserved) > BIOS-e820: 0000000000100000 - 000000001fff0000 > (usable) > BIOS-e820: 000000001fff0000 - 000000001fff3000 > (ACPI NVS) > BIOS-e820: 000000001fff3000 - 0000000020000000 > (ACPI data) > BIOS-e820: 00000000fec00000 - 00000000fec01000 > (reserved) > BIOS-e820: 00000000fee00000 - 00000000fee01000 > (reserved) > BIOS-e820: 00000000ffff0000 - 0000000100000000 > (reserved) > 511MB LOWMEM available. > On node 0 totalpages: 131056 > zone(0): 4096 pages. > zone(1): 126960 pages. > zone(2): 0 pages. > Kernel command line: auto BOOT_IMAGE=Linux ro > root=302 ide0=ata66 ide1=ata66 > ide_setup: ide0=ata66 > ide_setup: ide1=ata66 > Found and enabled local APIC! > Initializing CPU#0 > Detected 1737.306 MHz processor. > Console: colour VGA+ 80x25 > Calibrating delay loop... 3460.30 BogoMIPS > Memory: 515228k/524224k available (1600k kernel > code, 8604k reserved, > 676k data, 112k init, 0k highmem) > Dentry cache hash table entries: 65536 (order: 7, > 524288 bytes) > Inode cache hash table entries: 32768 (order: 6, > 262144 bytes) > Mount cache hash table entries: 512 (order: 0, 4096 > bytes) > Buffer cache hash table entries: 32768 (order: 5, > 131072 bytes) > Page-cache hash table entries: 131072 (order: 7, > 524288 bytes) > CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K > (64 bytes/line) > CPU: L2 Cache: 256K (64 bytes/line) > CPU: After generic, caps: 0383fbff c1c3fbff > 00000000 00000000 > CPU: Common caps: 0383fbff c1c3fbff > 00000000 00000000 > Enabling fast FPU save and restore... done. > Enabling unmasked SIMD FPU exception support... > done. > Checking 'hlt' instruction... OK. > POSIX conformance testing by UNIFIX > mtrr: v1.40 (20010327) Richard Gooch > (rgooch@atnf.csiro.au) > mtrr: detected mtrr type: Intel > CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K > (64 bytes/line) > CPU: L2 Cache: 256K (64 bytes/line) > CPU: After generic, caps: 0383fbff c1c3fbff > 00000000 00000000 > CPU: Common caps: 0383fbff c1c3fbff > 00000000 00000000 > CPU0: AMD Athlon(tm) XP 2100+ stepping 02 > per-CPU timeslice cutoff: 731.30 usecs. > task migration cache decay timeout: 10 msecs. > SMP motherboard not detected. > enabled ExtINT on CPU#0 > ESR value before enabling vector: 00000000 > ESR value after enabling vector: 00000000 > Using local APIC timer interrupts. > calibrating APIC timer ... > ..... CPU clock speed is 1737.2981 MHz. > ..... host bus clock speed is 267.2766 MHz. > cpu: 0, clocks: 2672766, slice: 1336383 > CPU0<T0:2672752,T1:1336368,D:1,S:1336383,C:2672766> > migration_task 0 on cpu=0 > PCI: PCI BIOS revision 2.10 entry at 0xfb560, last > bus=3 > PCI: Using configuration type 1 > PCI: Probing PCI hardware > PCI: Using IRQ router default [10de/01e0] at 00:00.0 > isapnp: Scanning for PnP cards... > isapnp: No Plug & Play device found > Linux NET4.0 for Linux 2.4 > Based upon Swansea University Computer Society > NET3.039 > Initializing RT netlink socket > Starting kswapd > Journalled Block Device driver loaded > Installing knfsd (copyright (C) 1996 > okir@monad.swb.de). > parport0: PC-style at 0x378 (0x778) [PCSPP,TRISTATE] > parport0: irq 7 detected > i2c-core.o: i2c core module > i2c-dev.o: i2c /dev entries driver module > i2c-core.o: driver i2c-dev dummy driver registered. > i2c-proc.o version 2.6.1 (20010825) > pty: 256 Unix98 ptys configured > Serial driver version 5.05c (2001-07-08) with > MANY_PORTS SHARE_IRQ > SERIAL_PCI ISAPNP enabled > ttyS00 at 0x03f8 (irq = 4) is a 16550A > ttyS01 at 0x02f8 (irq = 3) is a 16550A > Real Time Clock Driver v1.10e > Floppy drive(s): fd0 is 1.44M > FDC 0 is a post-1991 82077 > RAMDISK driver initialized: 16 RAM disks of 4096K > size 1024 blocksize > loop: loaded (max 8 devices) > Linux agpgart interface v0.99 (c) Jeff Hartmann > agpgart: Maximum main memory to use for agp memory: > 439M > agpgart: unsupported bridge > agpgart: no supported devices found. > Uniform Multi-Platform E-IDE driver Revision: > 7.00beta-2.4 > ide: Assuming 33MHz system bus speed for PIO modes; > override with idebus=xx > NFORCE2: IDE controller at PCI slot 00:09.0 > NFORCE2: chipset revision 162 > NFORCE2: not 100% native mode: will probe irqs later > ide0: BM-DMA at 0xf000-0xf007, BIOS settings: > hda:DMA, hdb:DMA > ide1: BM-DMA at 0xf008-0xf00f, BIOS settings: > hdc:DMA, hdd:DMA > PDC20269: IDE controller at PCI slot 01:07.0 > PDC20269: chipset revision 2 > PDC20269: not 100% native mode: will probe irqs > later > === message truncated === __________________________________________________ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: FW: PDC202XX DMA loss in 2.4.21-pre3-ac4 2003-01-25 20:34 ` FW: PDC202XX DMA loss in 2.4.21-pre3-ac4 Manish Lachwani @ 2003-01-26 6:51 ` Bryan Andersen 2003-01-26 7:15 ` Manish Lachwani 0 siblings, 1 reply; 11+ messages in thread From: Bryan Andersen @ 2003-01-26 6:51 UTC (permalink / raw) To: Manish Lachwani; +Cc: linux-kernel Manish Lachwani wrote: > 0xd0 indicates that the driver aborted the command. > Can you try to get the SMART data from the drive using > smartctl? > > use "smartctl -e /dev/hdX" to enable SMART collection > > use "smartctl -a /dev/hdX" to collect the SMART data Ok, so where do I find information on how to decode this? # smartctl -a /dev/hde Device: Maxtor 4G160J8 Supports ATA Version 6 Drive supports S.M.A.R.T. and is enabled Check S.M.A.R.T. Passed. General Smart Values: Off-line data collection status: (0x00) Offline data collection activity was never started Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run Total time to complete off-line data collection: ( 30) Seconds Offline data collection Capabilities: (0x1b)SMART EXECUTE OFF-LINE IMMEDIATE Automatic timer ON/OFF support Suspend Offline Collection upon new command Offline surface scan supported Self-test supported Smart Capablilities: (0x0003) Saves SMART data before entering power-saving mode Supports SMART auto save timer Error logging capability: (0x01) Error logging supported Short self-test routine recommended polling time: ( 2) Minutes Extended self-test routine recommended polling time: ( 103) Minutes Vendor Specific SMART Attributes with Thresholds: Revision Number: 16 Attribute Flag Value Worst Threshold Raw Value ( 3)Spin Up Time 0x0027 252 252 063 0 ( 4)Start Stop Count 0x0032 253 253 000 0 ( 5)Reallocated Sector Ct 0x0033 253 253 063 0 ( 6)Read Channel Margin 0x0001 253 253 100 0 ( 7)Seek Error Rate 0x000a 253 252 000 0 ( 8)Seek Time Preformance 0x0027 244 244 187 36736 ( 9)Power On Hours 0x0032 253 253 000 4341 ( 10)Spin Retry Count 0x002b 252 252 223 0 ( 11)Calibration Retry Count 0x002b 252 252 223 0 ( 12)Power Cycle Count 0x0032 253 253 000 43 (192)Power-Off Retract Count 0x0032 253 253 000 0 (193)Load Cycle Count 0x0032 253 253 000 0 (194)Temperature 0x0032 253 253 000 0 (195)Hardware ECC Recovered 0x000a 253 252 000 221 (196)Reallocated Event Count 0x0008 253 253 000 0 (197)Current Pending Sector 0x0008 253 253 000 0 (198)Offline Uncorrectable 0x0008 253 253 000 0 (199)UDMA CRC Error Count 0x0008 199 199 000 0 (200)Unknown Attribute 0x000a 253 252 000 0 (201)Unknown Attribute 0x000a 253 252 000 0 (202)Unknown Attribute 0x000a 253 252 000 0 (203)Unknown Attribute 0x000b 253 252 180 0 (204)Unknown Attribute 0x000a 253 252 000 0 (205)Unknown Attribute 0x000a 253 252 000 0 (207)Unknown Attribute 0x002a 252 252 000 0 (208)Unknown Attribute 0x002a 252 252 000 0 (209)Unknown Attribute 0x0024 253 253 000 0 ( 99)Unknown Attribute 0x0004 253 253 000 0 (100)Unknown Attribute 0x0004 253 253 000 0 (101)Unknown Attribute 0x0004 253 253 000 0 SMART Error Log: SMART Error Logging Version: 1 Error Log Data Structure Pointer: 05 ATA Error Count: 8 Non-Fatal Count: 0 Error Log Structure 1: DCR FR SC SN CL SH D/H CR Timestamp 00 db 00 00 4f c2 00 b0 201 00 d8 00 00 4f c2 00 b0 201 00 da 00 00 4f c2 00 b0 201 00 d9 00 00 4f c2 00 b0 201 00 fe 00 00 00 00 00 ef 201 00 04 50 42 97 23 00 51 5 Error Log Structure 2: DCR FR SC SN CL SH D/H CR Timestamp 08 00 80 2a 4e 8a e0 25 458636 08 00 80 aa 4e 8a e0 25 458636 08 00 80 2a 4f 8a e0 25 458636 08 00 80 aa 4f 8a e0 25 458636 08 d0 01 00 4f c2 e0 b0 459147 00 04 01 0b 4f c2 e0 51 279972 Error Log Structure 3: DCR FR SC SN CL SH D/H CR Timestamp 08 00 80 aa 4e 8a e0 25 458636 08 00 80 2a 4f 8a e0 25 458636 08 00 80 aa 4f 8a e0 25 458636 08 d0 01 00 4f c2 e0 b0 459147 08 d1 01 01 4f c2 e0 b0 459147 00 04 01 0b 4f c2 e0 51 279972 Error Log Structure 4: DCR FR SC SN CL SH D/H CR Timestamp 08 00 80 2a 4f 8a e0 25 458636 08 00 80 aa 4f 8a e0 25 458636 08 d0 01 00 4f c2 e0 b0 459147 08 d1 01 01 4f c2 e0 b0 459147 08 d0 01 00 4f c2 e0 b0 459148 00 04 01 0b 4f c2 e0 51 279972 Error Log Structure 5: DCR FR SC SN CL SH D/H CR Timestamp 08 00 80 aa 4f 8a e0 25 458636 08 d0 01 00 4f c2 e0 b0 459147 08 d1 01 01 4f c2 e0 b0 459147 08 d0 01 00 4f c2 e0 b0 459148 08 d1 01 01 4f c2 e0 b0 459148 00 04 01 0b 4f c2 e0 51 279972 - Bryan ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: FW: PDC202XX DMA loss in 2.4.21-pre3-ac4 2003-01-26 6:51 ` Bryan Andersen @ 2003-01-26 7:15 ` Manish Lachwani 2003-01-26 8:13 ` Andre Hedrick 0 siblings, 1 reply; 11+ messages in thread From: Manish Lachwani @ 2003-01-26 7:15 UTC (permalink / raw) To: Bryan Andersen; +Cc: linux-kernel The "Hardware ECC Recovered" indicates the number of ECC errors corrected in the drive. Do one thing. Try to swap the drive with the drive on another ATA cable. So, swap /dev/hde with /dev/hda (or whatever) physically and check if the error follows the drive or the ATA cable. If it follows the drive, you may have to replace the drive. Additionally, from the SMART error log #5: 00 04 01 0b 4f c2 e0 51 279972 indicates an aborted command (0x04) at the sector 0x0c24f0b. Try to read from that sector doing a "dd" and see if the dd aborts too. If the problem follows the drive, you should then run a DOS based diagnostic utility (from the Maxtor site) to determine if there are physical defects on the drive ... Thanks Manish --- Bryan Andersen <bryan@bogonomicon.net> wrote: > Manish Lachwani wrote: > > 0xd0 indicates that the driver aborted the > command. > > Can you try to get the SMART data from the drive > using > > smartctl? > > > > use "smartctl -e /dev/hdX" to enable SMART > collection > > > > use "smartctl -a /dev/hdX" to collect the SMART > data > > Ok, so where do I find information on how to decode > this? > > # smartctl -a /dev/hde > Device: Maxtor 4G160J8 Supports ATA Version 6 > Drive supports S.M.A.R.T. and is enabled > Check S.M.A.R.T. Passed. > > General Smart Values: > Off-line data collection status: (0x00) Offline data > collection activity was > never > started > > Self-test execution status: ( 0) The previous > self-test routine > completed > without > error or no self-test > has ever > been run > > Total time to complete off-line > data collection: ( 30) Seconds > > Offline data collection > Capabilities: (0x1b)SMART EXECUTE > OFF-LINE IMMEDIATE > Automatic > timer ON/OFF support > Suspend > Offline Collection upon new > command > Offline > surface scan supported > Self-test > supported > > Smart Capablilities: (0x0003) Saves SMART > data before entering > > power-saving mode > Supports > SMART auto save timer > > Error logging capability: (0x01) Error > logging supported > > Short self-test routine > recommended polling time: ( 2) Minutes > > Extended self-test routine > recommended polling time: ( 103) Minutes > > Vendor Specific SMART Attributes with Thresholds: > Revision Number: 16 > Attribute Flag Value Worst > Threshold Raw Value > ( 3)Spin Up Time 0x0027 252 252 > 063 0 > ( 4)Start Stop Count 0x0032 253 253 > 000 0 > ( 5)Reallocated Sector Ct 0x0033 253 253 > 063 0 > ( 6)Read Channel Margin 0x0001 253 253 > 100 0 > ( 7)Seek Error Rate 0x000a 253 252 > 000 0 > ( 8)Seek Time Preformance 0x0027 244 244 > 187 36736 > ( 9)Power On Hours 0x0032 253 253 > 000 4341 > ( 10)Spin Retry Count 0x002b 252 252 > 223 0 > ( 11)Calibration Retry Count 0x002b 252 252 > 223 0 > ( 12)Power Cycle Count 0x0032 253 253 > 000 43 > (192)Power-Off Retract Count 0x0032 253 253 > 000 0 > (193)Load Cycle Count 0x0032 253 253 > 000 0 > (194)Temperature 0x0032 253 253 > 000 0 > (195)Hardware ECC Recovered 0x000a 253 252 > 000 221 > (196)Reallocated Event Count 0x0008 253 253 > 000 0 > (197)Current Pending Sector 0x0008 253 253 > 000 0 > (198)Offline Uncorrectable 0x0008 253 253 > 000 0 > (199)UDMA CRC Error Count 0x0008 199 199 > 000 0 > (200)Unknown Attribute 0x000a 253 252 > 000 0 > (201)Unknown Attribute 0x000a 253 252 > 000 0 > (202)Unknown Attribute 0x000a 253 252 > 000 0 > (203)Unknown Attribute 0x000b 253 252 > 180 0 > (204)Unknown Attribute 0x000a 253 252 > 000 0 > (205)Unknown Attribute 0x000a 253 252 > 000 0 > (207)Unknown Attribute 0x002a 252 252 > 000 0 > (208)Unknown Attribute 0x002a 252 252 > 000 0 > (209)Unknown Attribute 0x0024 253 253 > 000 0 > ( 99)Unknown Attribute 0x0004 253 253 > 000 0 > (100)Unknown Attribute 0x0004 253 253 > 000 0 > (101)Unknown Attribute 0x0004 253 253 > 000 0 > SMART Error Log: > SMART Error Logging Version: 1 > Error Log Data Structure Pointer: 05 > ATA Error Count: 8 > Non-Fatal Count: 0 > > Error Log Structure 1: > DCR FR SC SN CL SH D/H CR Timestamp > 00 db 00 00 4f c2 00 b0 201 > 00 d8 00 00 4f c2 00 b0 201 > 00 da 00 00 4f c2 00 b0 201 > 00 d9 00 00 4f c2 00 b0 201 > 00 fe 00 00 00 00 00 ef 201 > 00 04 50 42 97 23 00 51 5 > > Error Log Structure 2: > DCR FR SC SN CL SH D/H CR Timestamp > 08 00 80 2a 4e 8a e0 25 458636 > 08 00 80 aa 4e 8a e0 25 458636 > 08 00 80 2a 4f 8a e0 25 458636 > 08 00 80 aa 4f 8a e0 25 458636 > 08 d0 01 00 4f c2 e0 b0 459147 > 00 04 01 0b 4f c2 e0 51 279972 > > Error Log Structure 3: > DCR FR SC SN CL SH D/H CR Timestamp > 08 00 80 aa 4e 8a e0 25 458636 > 08 00 80 2a 4f 8a e0 25 458636 > 08 00 80 aa 4f 8a e0 25 458636 > 08 d0 01 00 4f c2 e0 b0 459147 > 08 d1 01 01 4f c2 e0 b0 459147 > 00 04 01 0b 4f c2 e0 51 279972 > > Error Log Structure 4: > DCR FR SC SN CL SH D/H CR Timestamp > 08 00 80 2a 4f 8a e0 25 458636 > 08 00 80 aa 4f 8a e0 25 458636 > 08 d0 01 00 4f c2 e0 b0 459147 > 08 d1 01 01 4f c2 e0 b0 459147 > 08 d0 01 00 4f c2 e0 b0 459148 > 00 04 01 0b 4f c2 e0 51 279972 > > Error Log Structure 5: > DCR FR SC SN CL SH D/H CR Timestamp > 08 00 80 aa 4f 8a e0 25 458636 > 08 d0 01 00 4f c2 e0 b0 459147 > 08 d1 01 01 4f c2 e0 b0 459147 > 08 d0 01 00 4f c2 e0 b0 459148 > 08 d1 01 01 4f c2 e0 b0 459148 > 00 04 01 0b 4f c2 e0 51 279972 > > > - Bryan > __________________________________________________ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: FW: PDC202XX DMA loss in 2.4.21-pre3-ac4 2003-01-26 7:15 ` Manish Lachwani @ 2003-01-26 8:13 ` Andre Hedrick 2003-01-26 8:27 ` Manish Lachwani 0 siblings, 1 reply; 11+ messages in thread From: Andre Hedrick @ 2003-01-26 8:13 UTC (permalink / raw) To: Manish Lachwani; +Cc: Bryan Andersen, linux-kernel On Sat, 25 Jan 2003, Manish Lachwani wrote: > The "Hardware ECC Recovered" indicates the number of > ECC errors corrected in the drive. Do one thing. Try > to swap the drive with the drive on another ATA cable. > So, swap /dev/hde with /dev/hda (or whatever) > physically and check if the error follows the drive or > the ATA cable. > > If it follows the drive, you may have to replace the > drive. Additionally, from the SMART error log #5: > > 00 04 01 0b 4f c2 e0 51 279972 NO! command aborted amount to transfer == 1 sector have to dig through notes to decode ... lcyl smart passcode hcyl smart passcode primary device ready_seek_error It barfed the command ... try -e first Cheers, Andre Hedrick LAD Storage Consulting Group ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: FW: PDC202XX DMA loss in 2.4.21-pre3-ac4 2003-01-26 8:13 ` Andre Hedrick @ 2003-01-26 8:27 ` Manish Lachwani 2003-01-26 8:48 ` Andre Hedrick 0 siblings, 1 reply; 11+ messages in thread From: Manish Lachwani @ 2003-01-26 8:27 UTC (permalink / raw) To: Andre Hedrick; +Cc: Bryan Andersen, linux-kernel I dont think so. Without SMART data collection being enabled, it wont give out the any SMART data at all. How, did the SMART data show: Vendor Specific SMART Attributes with Thresholds: Revision Number: 16 Attribute Flag Value Worst Threshold Raw Value ( 3)Spin Up Time 0x0027 252 252 063 0 ( 4)Start Stop Count 0x0032 253 253 000 0 ( 5)Reallocated Sector Ct 0x0033 253 253 063 0 ( 6)Read Channel Margin 0x0001 253 253 100 0 ( 7)Seek Error Rate 0x000a 253 252 000 0 ( 8)Seek Time Preformance 0x0027 244 244 187 36736 ( 9)Power On Hours 0x0032 253 253 000 4341 ( 10)Spin Retry Count 0x002b 252 252 223 0 ( 11)Calibration Retry Count 0x002b 252 252 223 0 ( 12)Power Cycle Count 0x0032 253 253 000 43 (192)Power-Off Retract Count 0x0032 253 253 000 0 (193)Load Cycle Count 0x0032 253 253 000 0 (194)Temperature 0x0032 253 253 000 0 (195)Hardware ECC Recovered 0x000a 253 252 000 221 (196)Reallocated Event Count 0x0008 253 253 000 0 (197)Current Pending Sector 0x0008 253 253 000 0 (198)Offline Uncorrectable 0x0008 253 253 000 0 (199)UDMA CRC Error Count 0x0008 199 199 000 0 (200)Unknown Attribute 0x000a 253 252 000 0 (201)Unknown Attribute 0x000a 253 252 000 0 (202)Unknown Attribute 0x000a 253 252 000 0 (203)Unknown Attribute 0x000b 253 252 180 0 (204)Unknown Attribute 0x000a 253 252 000 0 (205)Unknown Attribute 0x000a 253 252 000 0 (207)Unknown Attribute 0x002a 252 252 000 0 (208)Unknown Attribute 0x002a 252 252 000 0 (209)Unknown Attribute 0x0024 253 253 000 0 ( 99)Unknown Attribute 0x0004 253 253 000 0 (100)Unknown Attribute 0x0004 253 253 000 0 (101)Unknown Attribute 0x0004 253 253 000 0 SMART Error Log: SMART Error Logging Version: 1 Error Log Data Structure Pointer: 05 ATA Error Count: 8 Non-Fatal Count: 0 Also, the SMART error log, Error Log Structure 5: DCR FR SC SN CL SH D/H CR Timestamp 08 00 80 aa 4f 8a e0 25 458636 08 d0 01 00 4f c2 e0 b0 459147 08 d1 01 01 4f c2 e0 b0 459147 08 d0 01 00 4f c2 e0 b0 459148 08 d1 01 01 4f c2 e0 b0 459148 00 04 01 0b 4f c2 e0 51 279972 You can retrieve the sector# ... Thanks Manish --- Andre Hedrick <andre@linux-ide.org> wrote: > On Sat, 25 Jan 2003, Manish Lachwani wrote: > > > The "Hardware ECC Recovered" indicates the number > of > > ECC errors corrected in the drive. Do one thing. > Try > > to swap the drive with the drive on another ATA > cable. > > So, swap /dev/hde with /dev/hda (or whatever) > > physically and check if the error follows the > drive or > > the ATA cable. > > > > If it follows the drive, you may have to replace > the > > drive. Additionally, from the SMART error log #5: > > > > 00 04 01 0b 4f c2 e0 51 279972 > > NO! > command aborted > amount to transfer == 1 sector > have to dig through notes to decode > ... > lcyl smart passcode > hcyl smart passcode > primary device > > ready_seek_error > > > It barfed the command ... > > try -e first > > Cheers, > > Andre Hedrick > LAD Storage Consulting Group > > > __________________________________________________ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: FW: PDC202XX DMA loss in 2.4.21-pre3-ac4 2003-01-26 8:27 ` Manish Lachwani @ 2003-01-26 8:48 ` Andre Hedrick 2003-01-26 9:10 ` Manish Lachwani 0 siblings, 1 reply; 11+ messages in thread From: Andre Hedrick @ 2003-01-26 8:48 UTC (permalink / raw) To: Manish Lachwani; +Cc: Bryan Andersen, linux-kernel Smart can be enabled by the BIOS, but the BIOS does not issue diagnostic tests operations. > General Smart Values: > Off-line data collection status: (0x00) Offline data collection activity was > never started was never started -- > Self-test execution status: ( 0) The previous self-test routine completed > without error or no self-test has ever > been run Was never executed, "after" the vendor cleared the state before shipping. They can clear the RO log space that can not be gotten to w/o VUO and passcodes. So show me the sector form the logs. You can't! WIN_READDMA_EXT == 0x25 > 08 00 80 aa 4f 8a e0 25 458636 You only have the lower 24-bits 0x??????8a4faa This requires another tool, as the original "smart from sff-8035" is obsolete. Cheers, Andre Hedrick LAD Storage Consulting Group On Sun, 26 Jan 2003, Manish Lachwani wrote: > I dont think so. Without SMART data collection being > enabled, it wont give out the any SMART data at all. > How, did the SMART data show: > > Vendor Specific SMART Attributes with Thresholds: > Revision Number: 16 > Attribute Flag Value Worst > Threshold Raw Value > ( 3)Spin Up Time 0x0027 252 252 063 > 0 > ( 4)Start Stop Count 0x0032 253 253 000 > 0 > ( 5)Reallocated Sector Ct 0x0033 253 253 063 > 0 > ( 6)Read Channel Margin 0x0001 253 253 100 > 0 > ( 7)Seek Error Rate 0x000a 253 252 000 > 0 > ( 8)Seek Time Preformance 0x0027 244 244 187 > 36736 > ( 9)Power On Hours 0x0032 253 253 000 > 4341 > ( 10)Spin Retry Count 0x002b 252 252 223 > 0 > ( 11)Calibration Retry Count 0x002b 252 252 223 > 0 > ( 12)Power Cycle Count 0x0032 253 253 000 > 43 > (192)Power-Off Retract Count 0x0032 253 253 000 > 0 > (193)Load Cycle Count 0x0032 253 253 000 > 0 > (194)Temperature 0x0032 253 253 000 > 0 > (195)Hardware ECC Recovered 0x000a 253 252 000 > 221 > (196)Reallocated Event Count 0x0008 253 253 000 > 0 > (197)Current Pending Sector 0x0008 253 253 000 > 0 > (198)Offline Uncorrectable 0x0008 253 253 000 > 0 > (199)UDMA CRC Error Count 0x0008 199 199 000 > 0 > (200)Unknown Attribute 0x000a 253 252 000 > 0 > (201)Unknown Attribute 0x000a 253 252 000 > 0 > (202)Unknown Attribute 0x000a 253 252 000 > 0 > (203)Unknown Attribute 0x000b 253 252 180 > 0 > (204)Unknown Attribute 0x000a 253 252 000 > 0 > (205)Unknown Attribute 0x000a 253 252 000 > 0 > (207)Unknown Attribute 0x002a 252 252 000 > 0 > (208)Unknown Attribute 0x002a 252 252 000 > 0 > (209)Unknown Attribute 0x0024 253 253 000 > 0 > ( 99)Unknown Attribute 0x0004 253 253 000 > 0 > (100)Unknown Attribute 0x0004 253 253 000 > 0 > (101)Unknown Attribute 0x0004 253 253 000 > 0 > SMART Error Log: > SMART Error Logging Version: 1 > Error Log Data Structure Pointer: 05 > ATA Error Count: 8 > Non-Fatal Count: 0 > > Also, the SMART error log, > > Error Log Structure 5: > DCR FR SC SN CL SH D/H CR Timestamp > 08 00 80 aa 4f 8a e0 25 458636 > 08 d0 01 00 4f c2 e0 b0 459147 > 08 d1 01 01 4f c2 e0 b0 459147 > 08 d0 01 00 4f c2 e0 b0 459148 > 08 d1 01 01 4f c2 e0 b0 459148 > 00 04 01 0b 4f c2 e0 51 279972 > > You can retrieve the sector# ... > > Thanks > Manish > > --- Andre Hedrick <andre@linux-ide.org> wrote: > > On Sat, 25 Jan 2003, Manish Lachwani wrote: > > > > > The "Hardware ECC Recovered" indicates the number > > of > > > ECC errors corrected in the drive. Do one thing. > > Try > > > to swap the drive with the drive on another ATA > > cable. > > > So, swap /dev/hde with /dev/hda (or whatever) > > > physically and check if the error follows the > > drive or > > > the ATA cable. > > > > > > If it follows the drive, you may have to replace > > the > > > drive. Additionally, from the SMART error log #5: > > > > > > 00 04 01 0b 4f c2 e0 51 279972 > > > > NO! > > command aborted > > amount to transfer == 1 sector > > have to dig through notes to decode > > ... > > lcyl smart passcode > > hcyl smart passcode > > primary device > > > > ready_seek_error > > > > > > It barfed the command ... > > > > try -e first > > > > Cheers, > > > > Andre Hedrick > > LAD Storage Consulting Group > > > > > > > > > __________________________________________________ > Do you Yahoo!? > Yahoo! Mail Plus - Powerful. Affordable. Sign up now. > http://mailplus.yahoo.com > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: FW: PDC202XX DMA loss in 2.4.21-pre3-ac4 2003-01-26 8:48 ` Andre Hedrick @ 2003-01-26 9:10 ` Manish Lachwani 2003-01-26 9:14 ` Andre Hedrick 0 siblings, 1 reply; 11+ messages in thread From: Manish Lachwani @ 2003-01-26 9:10 UTC (permalink / raw) To: Andre Hedrick; +Cc: Bryan Andersen, linux-kernel his is the help from smartctl: smartctl version 2.1 - S.M.A.R.T. Control Program useage: smartctl -[opts] [device] Read Only Commands: a Show All S.M.A.R.T. Information (ATA and SCSI) g Show General S.M.A.R.T. Attributes (ATA Only) v Show Vendor S.M.A.R.T. Attributes (ATA Only) l Show S.M.A.R.T. Drive Error Log (ATA Only L Show S.M.A.R.T. Drive SelfTest Log (ATA Only) i Show S.M.A.R.T. Drive Info (ATA and SCSI) c Check S.M.A.R.T. Status (ATA and SCSI) Enable / Disable Commands: e Enable S.M.A.R.T. data collection (ATA and SCSI) d Disable S.M.A.R.T.data collection (ATA and SCSI) t Enable S.M.A.R.T. Automatic Offline Test (ATA Only) T Disable S.M.A.R.T. Automatic Offline Test (ATA Only) Test Commands: O Execute Off-line data collectioni(ATA Only) S Execute Short Self Test (ATA Only) s Execute Short Self Test (Captive Mode) (ATA Only) X Execute Extended Self Test (ATA Only) x Execute Extended Self Test (Captive Mode)(ATA Only) A Execute Self Test Abort (ATA Only) Off-line data collection has nothing to do with the SMART data collection. You enable the offline test, then run the test and collect the offline data. I agree with the fact that we have the lower 24 bits. However, SMART attributes displayed is appropriately collected from the drive. Look at the sequence below: bash# ./smartctl -a /dev/hda Device: ST380021A Supports ATA Version 5 Drive supports S.M.A.R.T. and is disabled Use option -e to enable bash# ./smartctl -e /dev/hda bash# ./smartctl -a /dev/hda Device: ST380021A Supports ATA Version 5 Drive supports S.M.A.R.T. and is enabled Check S.M.A.R.T. Passed. General Smart Values: Off-line data collection status: (0x82) Offline data collection activity completed without error Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run Total time to complete off-line data collection: ( 422) Seconds Offline data collection Capabilities: (0x1b)SMART EXECUTE OFF-LINE IMMEDIATE Automatic timer ON/OFF support Suspend Offline Collection upon new command Offline surface scan supported Self-test supported Smart Capablilities: (0x0003) Saves SMART data before entering power-saving mode Supports SMART auto save timer Error logging capability: (0x01) Error logging supported Short self-test routine recommended polling time: ( 1) Minutes Extended self-test routine recommended polling time: ( 57) Minutes Vendor Specific SMART Attributes with Thresholds: Revision Number: 10 Attribute Flag Value Worst Threshold Raw Value ( 1)Raw Read Error Rate 0x000f 075 070 034 92897937 ( 3)Spin Up Time 0x0003 070 070 000 0 ( 4)Start Stop Count 0x0032 100 100 020 3 ( 5)Reallocated Sector Ct 0x0033 100 100 036 0 ( 7)Seek Error Rate 0x000f 079 060 030 93809829 ( 9)Power On Hours 0x0032 096 096 000 4158 ( 10)Spin Retry Count 0x0013 100 100 097 0 ( 12)Power Cycle Count 0x0032 100 100 020 261 (194)Temperature 0x0022 028 043 000 28 (195)Hardware ECC Recovered 0x001a 075 070 000 92897937 (197)Current Pending Sector 0x0012 100 100 000 0 (198)Offline Uncorrectable 0x0010 100 100 000 0 (199)UDMA CRC Error Count 0x003e 200 200 000 0 (200)Unknown Attribute 0x0000 100 253 000 0 (202)Unknown Attribute 0x0032 100 253 000 0 SMART Error Log: SMART Error Logging Version: 1 No Errors Logged --- Andre Hedrick <andre@linux-ide.org> wrote: > > Smart can be enabled by the BIOS, but the BIOS does > not issue diagnostic > tests operations. > > > General Smart Values: > > Off-line data collection status: (0x00) Offline > data collection activity was > > never > started > > was never started -- > > > Self-test execution status: ( 0) The > previous self-test routine completed > > without > error or no self-test has ever > > been run > > Was never executed, "after" the vendor cleared the > state before shipping. > > They can clear the RO log space that can not be > gotten to w/o VUO and > passcodes. > > So show me the sector form the logs. > You can't! > > WIN_READDMA_EXT == 0x25 > > > 08 00 80 aa 4f 8a e0 25 > 458636 > > You only have the lower 24-bits > > 0x??????8a4faa > > This requires another tool, as the original "smart > from sff-8035" is > obsolete. > > > Cheers, > > Andre Hedrick > LAD Storage Consulting Group > > On Sun, 26 Jan 2003, Manish Lachwani wrote: > > > I dont think so. Without SMART data collection > being > > enabled, it wont give out the any SMART data at > all. > > How, did the SMART data show: > > > > Vendor Specific SMART Attributes with Thresholds: > > Revision Number: 16 > > Attribute Flag Value Worst > > Threshold Raw Value > > ( 3)Spin Up Time 0x0027 252 252 > 063 > > 0 > > ( 4)Start Stop Count 0x0032 253 253 > 000 > > 0 > > ( 5)Reallocated Sector Ct 0x0033 253 253 > 063 > > 0 > > ( 6)Read Channel Margin 0x0001 253 253 > 100 > > 0 > > ( 7)Seek Error Rate 0x000a 253 252 > 000 > > 0 > > ( 8)Seek Time Preformance 0x0027 244 244 > 187 > > 36736 > > ( 9)Power On Hours 0x0032 253 253 > 000 > > 4341 > > ( 10)Spin Retry Count 0x002b 252 252 > 223 > > 0 > > ( 11)Calibration Retry Count 0x002b 252 252 > 223 > > 0 > > ( 12)Power Cycle Count 0x0032 253 253 > 000 > > 43 > > (192)Power-Off Retract Count 0x0032 253 253 > 000 > > 0 > > (193)Load Cycle Count 0x0032 253 253 > 000 > > 0 > > (194)Temperature 0x0032 253 253 > 000 > > 0 > > (195)Hardware ECC Recovered 0x000a 253 252 > 000 > > 221 > > (196)Reallocated Event Count 0x0008 253 253 > 000 > > 0 > > (197)Current Pending Sector 0x0008 253 253 > 000 > > 0 > > (198)Offline Uncorrectable 0x0008 253 253 > 000 > > 0 > > (199)UDMA CRC Error Count 0x0008 199 199 > 000 > > 0 > > (200)Unknown Attribute 0x000a 253 252 > 000 > > 0 > > (201)Unknown Attribute 0x000a 253 252 > 000 > > 0 > > (202)Unknown Attribute 0x000a 253 252 > 000 > > 0 > > (203)Unknown Attribute 0x000b 253 252 > 180 > > 0 > > (204)Unknown Attribute 0x000a 253 252 > 000 > > 0 > > (205)Unknown Attribute 0x000a 253 252 > 000 > > 0 > > (207)Unknown Attribute 0x002a 252 252 > 000 > > 0 > > (208)Unknown Attribute 0x002a 252 252 > 000 > > 0 > > (209)Unknown Attribute 0x0024 253 253 > 000 > > 0 > > ( 99)Unknown Attribute 0x0004 253 253 > 000 > > 0 > > (100)Unknown Attribute 0x0004 253 253 > 000 > > 0 > > (101)Unknown Attribute 0x0004 253 253 > 000 > > 0 > > SMART Error Log: > > SMART Error Logging Version: 1 > > Error Log Data Structure Pointer: 05 > > ATA Error Count: 8 > > Non-Fatal Count: 0 > > > > Also, the SMART error log, > > > > Error Log Structure 5: > > DCR FR SC SN CL SH D/H CR > Timestamp > > 08 00 80 aa 4f 8a e0 25 > 458636 > > 08 d0 01 00 4f c2 e0 b0 > 459147 > > 08 d1 01 01 4f c2 e0 b0 > 459147 > > 08 d0 01 00 4f c2 e0 b0 > 459148 > > 08 d1 01 01 4f c2 e0 b0 > 459148 > > 00 04 01 0b 4f c2 e0 51 > 279972 > > > > You can retrieve the sector# ... > > > > Thanks > > Manish > > > > --- Andre Hedrick <andre@linux-ide.org> wrote: > > > On Sat, 25 Jan 2003, Manish Lachwani wrote: > > > > > > > The "Hardware ECC Recovered" indicates the > number > > > of > > > > ECC errors corrected in the drive. Do one > thing. > > > Try > > > > to swap the drive with the drive on another > ATA > > > cable. > > > > So, swap /dev/hde with /dev/hda (or whatever) > > > > physically and check if the error follows the > > > drive or > > > > the ATA cable. > > > > > > > > If it follows the drive, you may have to > replace > > > the > > > > drive. Additionally, from the SMART error log > #5: > === message truncated === __________________________________________________ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: FW: PDC202XX DMA loss in 2.4.21-pre3-ac4 2003-01-26 9:10 ` Manish Lachwani @ 2003-01-26 9:14 ` Andre Hedrick 0 siblings, 0 replies; 11+ messages in thread From: Andre Hedrick @ 2003-01-26 9:14 UTC (permalink / raw) To: Manish Lachwani; +Cc: Bryan Andersen, linux-kernel Yeah, I know a good friend of my is the author. Since he is not available to comment, you can believe as you wish. Read the code and read the ioctl transport, you can not get there from here, period. So the 28-bit Smart was never executed, however the 48-bit was. You can't get the meaning full data. I am out of this arguement, go read the spec. Cheers, On Sun, 26 Jan 2003, Manish Lachwani wrote: > his is the help from smartctl: > > smartctl version 2.1 - S.M.A.R.T. Control Program > useage: smartctl -[opts] [device] > Read Only Commands: > a Show All S.M.A.R.T. > Information (ATA and SCSI) > g Show General > S.M.A.R.T. Attributes (ATA Only) > v Show Vendor S.M.A.R.T. > Attributes (ATA Only) > l Show S.M.A.R.T. Drive > Error Log (ATA Only > L Show S.M.A.R.T. Drive > SelfTest Log (ATA Only) > i Show S.M.A.R.T. Drive > Info (ATA and SCSI) > c Check S.M.A.R.T. > Status (ATA and SCSI) > > Enable / Disable Commands: > e Enable S.M.A.R.T. data > collection (ATA and SCSI) > d Disable S.M.A.R.T.data > collection (ATA and SCSI) > t Enable S.M.A.R.T. > Automatic Offline Test (ATA Only) > T Disable S.M.A.R.T. > Automatic Offline Test (ATA Only) > > Test Commands: > O Execute Off-line data > collectioni(ATA Only) > S Execute Short Self > Test (ATA Only) > s Execute Short Self > Test (Captive Mode) (ATA Only) > X Execute Extended Self > Test (ATA Only) > x Execute Extended Self > Test (Captive Mode)(ATA Only) > A Execute Self Test > Abort (ATA Only) > > Off-line data collection has nothing to do with the > SMART data collection. You enable the offline test, > then run the test and collect the offline data. > > I agree with the fact that we have the lower 24 bits. > However, SMART attributes displayed is appropriately > collected from the drive. Look at the sequence below: > > bash# ./smartctl -a /dev/hda > Device: ST380021A Supports ATA Version 5 > Drive supports S.M.A.R.T. and is disabled > Use option -e to enable > bash# ./smartctl -e /dev/hda > bash# ./smartctl -a /dev/hda > Device: ST380021A Supports ATA Version 5 > Drive supports S.M.A.R.T. and is enabled > Check S.M.A.R.T. Passed. > > General Smart Values: > Off-line data collection status: (0x82) Offline data > collection activity > completed > without error > > Self-test execution status: ( 0) The previous > self-test routine completed > without error > or no self-test has ever > been run > > Total time to complete off-line > data collection: ( 422) Seconds > > Offline data collection > Capabilities: (0x1b)SMART EXECUTE > OFF-LINE IMMEDIATE > Automatic > timer ON/OFF support > Suspend > Offline Collection upon new > command > Offline > surface scan supported > Self-test > supported > > Smart Capablilities: (0x0003) Saves SMART > data before entering > power-saving > mode > Supports SMART > auto save timer > > Error logging capability: (0x01) Error logging > supported > > Short self-test routine > recommended polling time: ( 1) Minutes > > Extended self-test routine > recommended polling time: ( 57) Minutes > > Vendor Specific SMART Attributes with Thresholds: > Revision Number: 10 > Attribute Flag Value Worst > Threshold Raw Value > ( 1)Raw Read Error Rate 0x000f 075 070 034 > 92897937 > ( 3)Spin Up Time 0x0003 070 070 000 > 0 > ( 4)Start Stop Count 0x0032 100 100 020 > 3 > ( 5)Reallocated Sector Ct 0x0033 100 100 036 > 0 > ( 7)Seek Error Rate 0x000f 079 060 030 > 93809829 > ( 9)Power On Hours 0x0032 096 096 000 > 4158 > ( 10)Spin Retry Count 0x0013 100 100 097 > 0 > ( 12)Power Cycle Count 0x0032 100 100 020 > 261 > (194)Temperature 0x0022 028 043 000 > 28 > (195)Hardware ECC Recovered 0x001a 075 070 000 > 92897937 > (197)Current Pending Sector 0x0012 100 100 000 > 0 > (198)Offline Uncorrectable 0x0010 100 100 000 > 0 > (199)UDMA CRC Error Count 0x003e 200 200 000 > 0 > (200)Unknown Attribute 0x0000 100 253 000 > 0 > (202)Unknown Attribute 0x0032 100 253 000 > 0 > SMART Error Log: > SMART Error Logging Version: 1 > No Errors Logged > > > > --- Andre Hedrick <andre@linux-ide.org> wrote: > > > > Smart can be enabled by the BIOS, but the BIOS does > > not issue diagnostic > > tests operations. > > > > > General Smart Values: > > > Off-line data collection status: (0x00) Offline > > data collection activity was > > > never > > started > > > > was never started -- > > > > > Self-test execution status: ( 0) The > > previous self-test routine completed > > > without > > error or no self-test has ever > > > been run > > > > Was never executed, "after" the vendor cleared the > > state before shipping. > > > > They can clear the RO log space that can not be > > gotten to w/o VUO and > > passcodes. > > > > So show me the sector form the logs. > > You can't! > > > > WIN_READDMA_EXT == 0x25 > > > > > 08 00 80 aa 4f 8a e0 25 > > 458636 > > > > You only have the lower 24-bits > > > > 0x??????8a4faa > > > > This requires another tool, as the original "smart > > from sff-8035" is > > obsolete. > > > > > > Cheers, > > > > Andre Hedrick > > LAD Storage Consulting Group > > > > On Sun, 26 Jan 2003, Manish Lachwani wrote: > > > > > I dont think so. Without SMART data collection > > being > > > enabled, it wont give out the any SMART data at > > all. > > > How, did the SMART data show: > > > > > > Vendor Specific SMART Attributes with Thresholds: > > > Revision Number: 16 > > > Attribute Flag Value Worst > > > Threshold Raw Value > > > ( 3)Spin Up Time 0x0027 252 252 > > 063 > > > 0 > > > ( 4)Start Stop Count 0x0032 253 253 > > 000 > > > 0 > > > ( 5)Reallocated Sector Ct 0x0033 253 253 > > 063 > > > 0 > > > ( 6)Read Channel Margin 0x0001 253 253 > > 100 > > > 0 > > > ( 7)Seek Error Rate 0x000a 253 252 > > 000 > > > 0 > > > ( 8)Seek Time Preformance 0x0027 244 244 > > 187 > > > 36736 > > > ( 9)Power On Hours 0x0032 253 253 > > 000 > > > 4341 > > > ( 10)Spin Retry Count 0x002b 252 252 > > 223 > > > 0 > > > ( 11)Calibration Retry Count 0x002b 252 252 > > 223 > > > 0 > > > ( 12)Power Cycle Count 0x0032 253 253 > > 000 > > > 43 > > > (192)Power-Off Retract Count 0x0032 253 253 > > 000 > > > 0 > > > (193)Load Cycle Count 0x0032 253 253 > > 000 > > > 0 > > > (194)Temperature 0x0032 253 253 > > 000 > > > 0 > > > (195)Hardware ECC Recovered 0x000a 253 252 > > 000 > > > 221 > > > (196)Reallocated Event Count 0x0008 253 253 > > 000 > > > 0 > > > (197)Current Pending Sector 0x0008 253 253 > > 000 > > > 0 > > > (198)Offline Uncorrectable 0x0008 253 253 > > 000 > > > 0 > > > (199)UDMA CRC Error Count 0x0008 199 199 > > 000 > > > 0 > > > (200)Unknown Attribute 0x000a 253 252 > > 000 > > > 0 > > > (201)Unknown Attribute 0x000a 253 252 > > 000 > > > 0 > > > (202)Unknown Attribute 0x000a 253 252 > > 000 > > > 0 > > > (203)Unknown Attribute 0x000b 253 252 > > 180 > > > 0 > > > (204)Unknown Attribute 0x000a 253 252 > > 000 > > > 0 > > > (205)Unknown Attribute 0x000a 253 252 > > 000 > > > 0 > > > (207)Unknown Attribute 0x002a 252 252 > > 000 > > > 0 > > > (208)Unknown Attribute 0x002a 252 252 > > 000 > > > 0 > > > (209)Unknown Attribute 0x0024 253 253 > > 000 > > > 0 > > > ( 99)Unknown Attribute 0x0004 253 253 > > 000 > > > 0 > > > (100)Unknown Attribute 0x0004 253 253 > > 000 > > > 0 > > > (101)Unknown Attribute 0x0004 253 253 > > 000 > > > 0 > > > SMART Error Log: > > > SMART Error Logging Version: 1 > > > Error Log Data Structure Pointer: 05 > > > ATA Error Count: 8 > > > Non-Fatal Count: 0 > > > > > > Also, the SMART error log, > > > > > > Error Log Structure 5: > > > DCR FR SC SN CL SH D/H CR > > Timestamp > > > 08 00 80 aa 4f 8a e0 25 > > 458636 > > > 08 d0 01 00 4f c2 e0 b0 > > 459147 > > > 08 d1 01 01 4f c2 e0 b0 > > 459147 > > > 08 d0 01 00 4f c2 e0 b0 > > 459148 > > > 08 d1 01 01 4f c2 e0 b0 > > 459148 > > > 00 04 01 0b 4f c2 e0 51 > > 279972 > > > > > > You can retrieve the sector# ... > > > > > > Thanks > > > Manish > > > > > > --- Andre Hedrick <andre@linux-ide.org> wrote: > > > > On Sat, 25 Jan 2003, Manish Lachwani wrote: > > > > > > > > > The "Hardware ECC Recovered" indicates the > > number > > > > of > > > > > ECC errors corrected in the drive. Do one > > thing. > > > > Try > > > > > to swap the drive with the drive on another > > ATA > > > > cable. > > > > > So, swap /dev/hde with /dev/hda (or whatever) > > > > > physically and check if the error follows the > > > > drive or > > > > > the ATA cable. > > > > > > > > > > If it follows the drive, you may have to > > replace > > > > the > > > > > drive. Additionally, from the SMART error log > > #5: > > > === message truncated === > > > __________________________________________________ > Do you Yahoo!? > Yahoo! Mail Plus - Powerful. Affordable. Sign up now. > http://mailplus.yahoo.com > Andre Hedrick LAD Storage Consulting Group ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <233C89823A37714D95B1A891DE3BCE5202AB1D12@xch-b.win.zambeel.com>]
* RE: FW: PDC202XX DMA loss in 2.4.21-pre3-ac4 [not found] <233C89823A37714D95B1A891DE3BCE5202AB1D12@xch-b.win.zambeel.com> @ 2003-01-26 9:42 ` Andre Hedrick 2003-01-26 11:24 ` Bryan Andersen 0 siblings, 1 reply; 11+ messages in thread From: Andre Hedrick @ 2003-01-26 9:42 UTC (permalink / raw) To: Manish Lachwani; +Cc: Manish Lachwani, Bryan Andersen, linux-kernel Manish, So you come back now as "Zambeel", is this to impress me? Or has Zambeel modified "MY" driver and is shipping the driver with out the source code? Since you are using "smartsuite-2.1", it is not designed to submit 48-bit commands. Well do you know what a "General Purpose Log" is ? This is where the 48-bit logging features are stored. See during one of the 6 annual meetings each year, the commitee debated the issue of how to deal with 48-bit logs. Given there was new class of logs created for AV Streaming, we elected to use those and preserve as much of the expected behavor in the legacy logs. So if the error happened with a 28-bit DMA Read then the full log would be present in the 28-bit logs, but the error happend with a 48-bit DMA Read, and the result is stuffed into a 48-bit general purpose log. You are using a 48-bit drive and issuing 28-bit Smart commands; therefore, you can not extract the useful data being sought. Remind me not to deploy Zambeel products to any of my customers. Yeah "smartctl" does not work, but reading the correct 48-bit logs does. All I stated was you are using the wrong tool to extract the needed information. Regards, Andre Hedrick LAD Storage Consulting Group On Sun, 26 Jan 2003, Manish Lachwani wrote: > Yes, and I agree that SMART can be enabled by the BIOS. However, if SMART is > not enabled, then "smartctl -a /dev/hdX" wont return any values for the > SMART attributes ... > > Thanks > Manish > > -----Original Message----- > From: Andre Hedrick [mailto:andre@linux-ide.org] > Sent: Sunday, January 26, 2003 12:49 AM > To: Manish Lachwani > Cc: Bryan Andersen; linux-kernel@vger.kernel.org > Subject: Re: FW: PDC202XX DMA loss in 2.4.21-pre3-ac4 > > > > Smart can be enabled by the BIOS, but the BIOS does not issue diagnostic > tests operations. > > > General Smart Values: > > Off-line data collection status: (0x00) Offline data collection activity > was > > never started > > was never started -- > > > Self-test execution status: ( 0) The previous self-test routine > completed > > without error or no self-test has > ever > > been run > > Was never executed, "after" the vendor cleared the state before shipping. > > They can clear the RO log space that can not be gotten to w/o VUO and > passcodes. > > So show me the sector form the logs. > You can't! > > WIN_READDMA_EXT == 0x25 > > > 08 00 80 aa 4f 8a e0 25 458636 > > You only have the lower 24-bits > > 0x??????8a4faa > > This requires another tool, as the original "smart from sff-8035" is > obsolete. > > > Cheers, > > Andre Hedrick > LAD Storage Consulting Group > > On Sun, 26 Jan 2003, Manish Lachwani wrote: > > > I dont think so. Without SMART data collection being > > enabled, it wont give out the any SMART data at all. > > How, did the SMART data show: > > > > Vendor Specific SMART Attributes with Thresholds: > > Revision Number: 16 > > Attribute Flag Value Worst > > Threshold Raw Value > > ( 3)Spin Up Time 0x0027 252 252 063 > > 0 > > ( 4)Start Stop Count 0x0032 253 253 000 > > 0 > > ( 5)Reallocated Sector Ct 0x0033 253 253 063 > > 0 > > ( 6)Read Channel Margin 0x0001 253 253 100 > > 0 > > ( 7)Seek Error Rate 0x000a 253 252 000 > > 0 > > ( 8)Seek Time Preformance 0x0027 244 244 187 > > 36736 > > ( 9)Power On Hours 0x0032 253 253 000 > > 4341 > > ( 10)Spin Retry Count 0x002b 252 252 223 > > 0 > > ( 11)Calibration Retry Count 0x002b 252 252 223 > > 0 > > ( 12)Power Cycle Count 0x0032 253 253 000 > > 43 > > (192)Power-Off Retract Count 0x0032 253 253 000 > > 0 > > (193)Load Cycle Count 0x0032 253 253 000 > > 0 > > (194)Temperature 0x0032 253 253 000 > > 0 > > (195)Hardware ECC Recovered 0x000a 253 252 000 > > 221 > > (196)Reallocated Event Count 0x0008 253 253 000 > > 0 > > (197)Current Pending Sector 0x0008 253 253 000 > > 0 > > (198)Offline Uncorrectable 0x0008 253 253 000 > > 0 > > (199)UDMA CRC Error Count 0x0008 199 199 000 > > 0 > > (200)Unknown Attribute 0x000a 253 252 000 > > 0 > > (201)Unknown Attribute 0x000a 253 252 000 > > 0 > > (202)Unknown Attribute 0x000a 253 252 000 > > 0 > > (203)Unknown Attribute 0x000b 253 252 180 > > 0 > > (204)Unknown Attribute 0x000a 253 252 000 > > 0 > > (205)Unknown Attribute 0x000a 253 252 000 > > 0 > > (207)Unknown Attribute 0x002a 252 252 000 > > 0 > > (208)Unknown Attribute 0x002a 252 252 000 > > 0 > > (209)Unknown Attribute 0x0024 253 253 000 > > 0 > > ( 99)Unknown Attribute 0x0004 253 253 000 > > 0 > > (100)Unknown Attribute 0x0004 253 253 000 > > 0 > > (101)Unknown Attribute 0x0004 253 253 000 > > 0 > > SMART Error Log: > > SMART Error Logging Version: 1 > > Error Log Data Structure Pointer: 05 > > ATA Error Count: 8 > > Non-Fatal Count: 0 > > > > Also, the SMART error log, > > > > Error Log Structure 5: > > DCR FR SC SN CL SH D/H CR Timestamp > > 08 00 80 aa 4f 8a e0 25 458636 > > 08 d0 01 00 4f c2 e0 b0 459147 > > 08 d1 01 01 4f c2 e0 b0 459147 > > 08 d0 01 00 4f c2 e0 b0 459148 > > 08 d1 01 01 4f c2 e0 b0 459148 > > 00 04 01 0b 4f c2 e0 51 279972 > > > > You can retrieve the sector# ... > > > > Thanks > > Manish > > > > --- Andre Hedrick <andre@linux-ide.org> wrote: > > > On Sat, 25 Jan 2003, Manish Lachwani wrote: > > > > > > > The "Hardware ECC Recovered" indicates the number > > > of > > > > ECC errors corrected in the drive. Do one thing. > > > Try > > > > to swap the drive with the drive on another ATA > > > cable. > > > > So, swap /dev/hde with /dev/hda (or whatever) > > > > physically and check if the error follows the > > > drive or > > > > the ATA cable. > > > > > > > > If it follows the drive, you may have to replace > > > the > > > > drive. Additionally, from the SMART error log #5: > > > > > > > > 00 04 01 0b 4f c2 e0 51 279972 > > > > > > NO! > > > command aborted > > > amount to transfer == 1 sector > > > have to dig through notes to decode > > > ... > > > lcyl smart passcode > > > hcyl smart passcode > > > primary device > > > > > > ready_seek_error > > > > > > > > > It barfed the command ... > > > > > > try -e first > > > > > > Cheers, > > > > > > Andre Hedrick > > > LAD Storage Consulting Group > > > > > > > > > > > > > > > __________________________________________________ > > Do you Yahoo!? > > Yahoo! Mail Plus - Powerful. Affordable. Sign up now. > > http://mailplus.yahoo.com > > - > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > Please read the FAQ at http://www.tux.org/lkml/ > > > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: FW: PDC202XX DMA loss in 2.4.21-pre3-ac4 2003-01-26 9:42 ` Andre Hedrick @ 2003-01-26 11:24 ` Bryan Andersen 2003-01-26 21:27 ` Andre Hedrick 0 siblings, 1 reply; 11+ messages in thread From: Bryan Andersen @ 2003-01-26 11:24 UTC (permalink / raw) To: Andre Hedrick; +Cc: Manish Lachwani, Manish Lachwani, linux-kernel Andre Hedrick wrote: > Yeah "smartctl" does not work, but reading the correct 48-bit logs does. > > All I stated was you are using the wrong tool to extract the needed > information. So which tool(s) do I need for getting at the 48-bit log data? If I need a spec doc to decode a binary block of data I can deal with that. Prefer not to, but... I did a full surface read scan of the disk and it turned up no errors. As for how I got the SMART data dump I enabled SMART on the drive then dumped the data a few minutes later. - Bryan ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: FW: PDC202XX DMA loss in 2.4.21-pre3-ac4 2003-01-26 11:24 ` Bryan Andersen @ 2003-01-26 21:27 ` Andre Hedrick 0 siblings, 0 replies; 11+ messages in thread From: Andre Hedrick @ 2003-01-26 21:27 UTC (permalink / raw) To: Bryan Andersen; +Cc: linux-kernel Bryan, The one nobody has written in closed form as of to date. Basically it is a DNE (does not exist). The infrastructure is just now being laid into place now so, it will take some time to code out the rest of the internals to allow standard UI access. I have my own suite of "private" tools, and they will not be released. There are possible patent issues, and licensing problems. Cheers, Andre Hedrick LAD Storage Consulting Group On Sun, 26 Jan 2003, Bryan Andersen wrote: > > > Andre Hedrick wrote: > > Yeah "smartctl" does not work, but reading the correct 48-bit logs does. > > > > All I stated was you are using the wrong tool to extract the needed > > information. > > So which tool(s) do I need for getting at the 48-bit log data? If I > need a spec doc to decode a binary block of data I can deal with that. > Prefer not to, but... > > I did a full surface read scan of the disk and it turned up no errors. > > As for how I got the SMART data dump I enabled SMART on the drive then > dumped the data a few minutes later. > > - Bryan > ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2003-01-26 21:23 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <233C89823A37714D95B1A891DE3BCE5202AB1D07@xch-b.win.zambeel.com>
2003-01-25 20:34 ` FW: PDC202XX DMA loss in 2.4.21-pre3-ac4 Manish Lachwani
2003-01-26 6:51 ` Bryan Andersen
2003-01-26 7:15 ` Manish Lachwani
2003-01-26 8:13 ` Andre Hedrick
2003-01-26 8:27 ` Manish Lachwani
2003-01-26 8:48 ` Andre Hedrick
2003-01-26 9:10 ` Manish Lachwani
2003-01-26 9:14 ` Andre Hedrick
[not found] <233C89823A37714D95B1A891DE3BCE5202AB1D12@xch-b.win.zambeel.com>
2003-01-26 9:42 ` Andre Hedrick
2003-01-26 11:24 ` Bryan Andersen
2003-01-26 21:27 ` Andre Hedrick
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.