linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* libata oops 2.6.11-rc4 yesterdays BK
@ 2005-02-16  4:28 Brad Campbell
  2005-02-16 11:01 ` Brad Campbell
  0 siblings, 1 reply; 27+ messages in thread
From: Brad Campbell @ 2005-02-16  4:28 UTC (permalink / raw)
  To: linux-ide

Woke up to this, this morning.
This is yesterdays 2.6.11-rc4 BK + libata BK + libata-dev BK
It was in the middle of a RAID-6 rebuild (scheduled to take about 740 minutes)


Regards,
Brad

Feb 16 07:32:17 storage1 kernel: ata15: command timeout
Feb 16 07:32:17 storage1 kernel: Assertion failed! qc->flags & 
ATA_QCFLAG_ACTIVE,drivers/scsi/libata-core.c,ata_qc_complete,line=2703
Feb 16 07:32:17 storage1 kernel: ata15: status=0x51 { DriveReady SeekComplete Error }
Feb 16 07:32:17 storage1 kernel: ata15: called with no error (51)!
Feb 16 07:32:17 storage1 kernel: ------------[ cut here ]------------
Feb 16 07:32:17 storage1 kernel: kernel BUG at drivers/scsi/scsi.c:299!
Feb 16 07:32:17 storage1 kernel: invalid operand: 0000 [#1]
Feb 16 07:32:17 storage1 kernel: CPU:    0
Feb 16 07:32:17 storage1 kernel: EIP:    0060:[<c0257a0b>]    Not tainted VLI
Feb 16 07:32:17 storage1 kernel: EFLAGS: 00010046   (2.6.11-rc4)
Feb 16 07:32:17 storage1 kernel: EIP is at scsi_put_command+0x8b/0xa0
Feb 16 07:32:17 storage1 kernel: eax: df99cb80   ebx: df99cb80   ecx: df99cb90   edx: df99cb90
Feb 16 07:32:17 storage1 kernel: esi: dfc70000   edi: 00000292   ebp: dfc66400   esp: dfcabe7c
Feb 16 07:32:17 storage1 kernel: ds: 007b   es: 007b   ss: 0068
Feb 16 07:32:17 storage1 kernel: Process scsi_eh_14 (pid: 799, threadinfo=dfcaa000 task=dfc73a20)
Feb 16 07:32:17 storage1 kernel: Stack: dfc7d3c8 dfcc6200 dfc65a28 df99cb80 00000296 dfc65a28 
c025c859 df99cb80
Feb 16 07:32:17 storage1 kernel:        dfc7d3c8 c025c97f df99cb80 00000001 00000000 df99cb80 
df99cb80 00000004
Feb 16 07:32:17 storage1 kernel:        df99cc3c c025ccfe df99cb80 00000001 00000000 00000000 
000057b3 000057b3
Feb 16 07:32:17 storage1 kernel: Call Trace:
Feb 16 07:32:17 storage1 kernel:  [<c025c859>] scsi_next_command+0x19/0x30
Feb 16 07:32:17 storage1 kernel:  [<c025c97f>] scsi_end_request+0xbf/0xe0
Feb 16 07:32:17 storage1 kernel:  [<c025ccfe>] scsi_io_completion+0x1ae/0x490
Feb 16 07:32:17 storage1 kernel:  [<c026927b>] sd_rw_intr+0xcb/0x280
Feb 16 07:32:17 storage1 kernel:  [<c02580ea>] scsi_finish_command+0x7a/0xc0
Feb 16 07:32:17 storage1 kernel:  [<c0266ad9>] ata_scsi_qc_complete+0x39/0x70
Feb 16 07:32:17 storage1 kernel:  [<c026460a>] ata_qc_complete+0x3a/0xc0
Feb 16 07:32:17 storage1 kernel:  [<c0267e30>] pdc_eng_timeout+0x90/0x120
Feb 16 07:32:17 storage1 kernel:  [<c026663a>] ata_scsi_error+0x1a/0x30
Feb 16 07:32:17 storage1 kernel:  [<c025bd4e>] scsi_error_handler+0x9e/0xe0
Feb 16 07:32:17 storage1 kernel:  [<c025bcb0>] scsi_error_handler+0x0/0xe0
Feb 16 07:32:17 storage1 kernel:  [<c0100831>] kernel_thread_helper+0x5/0x14
Feb 16 07:32:17 storage1 kernel: Code: 5c 24 08 8b 74 24 0c 89 44 24 1c 8b 7c 24 10 8b 6c 24 14 83 
c4 18 e9 e5 15 fd ff 89 43 10 89 48 04 31 db 89 51 04 89 4e 0c eb b7 <0f> 0b 2b 01 e7 1a 32 c0 eb 95 
8d 74 26 00 8d bc 27
00 00 00 00
Feb 16 07:32:19 storage1 kernel:  <1>Unable to handle kernel NULL pointer dereference at virtual 
address 00000004
Feb 16 07:32:19 storage1 kernel:  printing eip:
Feb 16 07:32:19 storage1 kernel: c01333c3
Feb 16 07:32:19 storage1 kernel: *pde = 00000000
Feb 16 07:32:19 storage1 kernel: Oops: 0000 [#2]
Feb 16 07:32:19 storage1 kernel: CPU:    0
Feb 16 07:32:19 storage1 kernel: EIP:    0060:[<c01333c3>]    Not tainted VLI
Feb 16 07:32:19 storage1 kernel: EFLAGS: 00010016   (2.6.11-rc4)
Feb 16 07:32:19 storage1 kernel: EIP is at free_block+0x43/0xd0
Feb 16 07:32:19 storage1 kernel: eax: 00800000   ebx: 00000000   ecx: 00000000   edx: c1000000
Feb 16 07:32:19 storage1 kernel: esi: c1506d60   edi: 00000000   ebp: c1506d6c   esp: c14e9ef0
Feb 16 07:32:19 storage1 kernel: ds: 007b   es: 007b   ss: 0068
Feb 16 07:32:19 storage1 kernel: Process events/0 (pid: 3, threadinfo=c14e8000 task=c14da020)
Feb 16 07:32:19 storage1 kernel: Stack: c0132601 00000014 c1506d7c dff08b70 dff08b60 00000001 
c1506d60 c0133a0a
Feb 16 07:32:19 storage1 kernel:        c1506d60 dff08b70 00000001 c1506d60 00000003 c1506dd0 
c1506cfc c0133aaa
Feb 16 07:32:19 storage1 kernel:        c1506d60 dff08b60 00000000 c14da170 00000293 00000000 
c051fca4 c0133a50
Feb 16 07:32:19 storage1 kernel: Call Trace:
Feb 16 07:32:19 storage1 kernel:  [<c0132601>] kmem_freepages+0x81/0xa0
Feb 16 07:32:19 storage1 kernel:  [<c0133a0a>] drain_array_locked+0x7a/0xc0
Feb 16 07:32:19 storage1 kernel:  [<c0133aaa>] cache_reap+0x5a/0x150
Feb 16 07:32:19 storage1 kernel:  [<c0133a50>] cache_reap+0x0/0x150
Feb 16 07:32:19 storage1 kernel:  [<c012326e>] worker_thread+0x19e/0x240
Feb 16 07:32:19 storage1 kernel:  [<c0111b50>] default_wake_function+0x0/0x20
Feb 16 07:32:19 storage1 kernel:  [<c0111b50>] default_wake_function+0x0/0x20
Feb 16 07:32:19 storage1 kernel:  [<c01230d0>] worker_thread+0x0/0x240
Feb 16 07:32:19 storage1 kernel:  [<c0126ae5>] kthread+0xa5/0xb0
Feb 16 07:32:19 storage1 kernel:  [<c0126a40>] kthread+0x0/0xb0
Feb 16 07:32:19 storage1 kernel:  [<c0100831>] kernel_thread_helper+0x5/0x14
Feb 16 07:32:19 storage1 kernel: Code: 83 00 00 00 8d 46 1c 8d 6e 0c 89 44 24 08 8b 44 24 24 8b 15 
b0 fd 51 c0 8b 0c b8 8d 81 00 00 00 40 c1 e8 0c c1 e0 05 8b 5c 02 1c <8b> 53 04 8b 03 89 50 04 89 02 
c7 43 04 00 02 20 00
2b 4b 0c c7

storage1:/home/brad# lspci -vv
0000:00:00.0 Host bridge: VIA Technologies, Inc. VT8377 [KT400/KT600 AGP] Host Bridge (rev 80)
         Subsystem: Asustek Computer, Inc. A7V8X motherboard
         Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
         Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- 
<PERR-
         Latency: 0
         Region 0: Memory at f0000000 (32-bit, prefetchable) [size=128M]
         Capabilities: [80] AGP version 3.5
                 Status: RQ=32 Iso- ArqSz=0 Cal=2 SBA+ ITACoh- GART64- HTrans- 64bit- FW- AGP3+ 
Rate=x4,x8,x@
                 Command: RQ=1 ArqSz=0 Cal=0 SBA- AGP- GART64- 64bit- FW- Rate=<none>
         Capabilities: [c0] Power Management version 2
                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                 Status: D0 PME-Enable- DSel=0 DScale=0 PME-

0000:00:01.0 PCI bridge: VIA Technologies, Inc. VT8237 PCI Bridge (prog-if 00 [Normal decode])
         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
         Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- 
<PERR-
         Latency: 0
         Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
         Memory behind bridge: e6000000-e7dfffff
         Prefetchable memory behind bridge: e7f00000-efffffff
         BridgeCtl: Parity- SERR- NoISA- VGA+ MAbort- >Reset- FastB2B-
         Capabilities: [80] Power Management version 2
                 Flags: PMEClk- DSI- D1+ D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                 Status: D0 PME-Enable- DSel=0 DScale=0 PME-

0000:00:09.0 Ethernet controller: 3Com Corporation 3c940 10/100/1000Base-T [Marvell] (rev 12)
         Subsystem: Asustek Computer, Inc. P4P800 Mainboard
         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B-
         Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- 
<PERR-
         Latency: 32 (5750ns min, 7750ns max), Cache Line Size: 0x08 (32 bytes)
         Interrupt: pin A routed to IRQ 18
         Region 0: Memory at e5800000 (32-bit, non-prefetchable) [size=16K]
         Region 1: I/O ports at d800 [size=256]
         Capabilities: [48] Power Management version 2
                 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
                 Status: D0 PME-Enable- DSel=0 DScale=1 PME-
         Capabilities: [50] Vital Product Data

0000:00:0b.0 Unknown mass storage controller: Promise Technology, Inc. PDC20318 (SATA150 TX4) (rev 02)
         Subsystem: Promise Technology, Inc. PDC20318 (SATA150 TX4)
         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B-
         Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- 
<PERR-
         Latency: 96 (1000ns min, 4500ns max), Cache Line Size: 0x90 (576 bytes)
         Interrupt: pin A routed to IRQ 19
         Region 0: I/O ports at d400 [size=64]
         Region 1: I/O ports at d000 [size=16]
         Region 2: I/O ports at b800 [size=128]
         Region 3: Memory at e5000000 (32-bit, non-prefetchable) [size=4K]
         Region 4: Memory at e4800000 (32-bit, non-prefetchable) [size=128K]
         Capabilities: [60] Power Management version 2
                 Flags: PMEClk- DSI+ D1+ D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                 Status: D0 PME-Enable- DSel=0 DScale=0 PME-

0000:00:0d.0 Unknown mass storage controller: Promise Technology, Inc. PDC20318 (SATA150 TX4) (rev 02)
         Subsystem: Promise Technology, Inc. PDC20318 (SATA150 TX4)
         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B-
         Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- 
<PERR-
         Latency: 96 (1000ns min, 4500ns max), Cache Line Size: 0x90 (576 bytes)
         Interrupt: pin A routed to IRQ 16
         Region 0: I/O ports at b400 [size=64]
         Region 1: I/O ports at b000 [size=16]
         Region 2: I/O ports at a800 [size=128]
         Region 3: Memory at e4000000 (32-bit, non-prefetchable) [size=4K]
         Region 4: Memory at e3800000 (32-bit, non-prefetchable) [size=128K]
         Capabilities: [60] Power Management version 2
                 Flags: PMEClk- DSI+ D1+ D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                 Status: D0 PME-Enable- DSel=0 DScale=0 PME-

0000:00:0e.0 Unknown mass storage controller: Promise Technology, Inc. PDC20318 (SATA150 TX4) (rev 02)
         Subsystem: Promise Technology, Inc. PDC20318 (SATA150 TX4)
         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B-
         Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- 
<PERR-
         Latency: 32 (1000ns min, 4500ns max), Cache Line Size: 0x08 (32 bytes)
         Interrupt: pin A routed to IRQ 17
         Region 0: I/O ports at a400 [size=64]
         Region 1: I/O ports at a000 [size=16]
         Region 2: I/O ports at 9800 [size=128]
         Region 3: Memory at e3000000 (32-bit, non-prefetchable) [size=4K]
         Region 4: Memory at e2800000 (32-bit, non-prefetchable) [size=128K]
         Capabilities: [60] Power Management version 2
                 Flags: PMEClk- DSI+ D1+ D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                 Status: D0 PME-Enable- DSel=0 DScale=0 PME-

0000:00:0f.0 RAID bus controller: VIA Technologies, Inc. VIA VT6420 SATA RAID Controller (rev 80)
         Subsystem: Asustek Computer, Inc. A7V600 motherboard
         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
         Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- 
<PERR-
         Latency: 32
         Interrupt: pin A routed to IRQ 0
         Region 0: I/O ports at 9400 [size=8]
         Region 1: I/O ports at 9000 [size=4]
         Region 2: I/O ports at 8800 [size=8]
         Region 3: I/O ports at 8400 [size=4]
         Region 4: I/O ports at 8000 [size=16]
         Region 5: I/O ports at 7800 [size=256]
         Capabilities: [c0] Power Management version 2
                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                 Status: D0 PME-Enable- DSel=0 DScale=0 PME-

0000:00:0f.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus 
Master IDE (rev 06) (prog-if 8a [Master SecP PriP])
         Subsystem: Asustek Computer, Inc. A7V600 motherboard
         Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
         Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- 
<PERR-
         Interrupt: pin A routed to IRQ 14
         Region 4: I/O ports at 7400 [disabled] [size=16]
         Capabilities: [c0] Power Management version 2
                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                 Status: D0 PME-Enable- DSel=0 DScale=0 PME-

0000:00:10.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81) 
(prog-if 00 [UHCI])
         Subsystem: Asustek Computer, Inc. A7V600 motherboard
         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B-
         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- 
<PERR-
         Latency: 32, Cache Line Size: 0x08 (32 bytes)
         Interrupt: pin A routed to IRQ 0
         Region 4: I/O ports at 7000 [size=32]
         Capabilities: [80] Power Management version 2
                 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
                 Status: D0 PME-Enable- DSel=0 DScale=0 PME-

0000:00:10.1 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81) 
(prog-if 00 [UHCI])
         Subsystem: Asustek Computer, Inc. A7V600 motherboard
         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B-
         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- 
<PERR-
         Latency: 32, Cache Line Size: 0x08 (32 bytes)
         Interrupt: pin A routed to IRQ 0
         Region 4: I/O ports at 6800 [size=32]
         Capabilities: [80] Power Management version 2
                 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
                 Status: D0 PME-Enable- DSel=0 DScale=0 PME-

0000:00:10.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81) 
(prog-if 00 [UHCI])
         Subsystem: Asustek Computer, Inc. A7V600 motherboard
         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B-
         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- 
<PERR-
         Latency: 32, Cache Line Size: 0x08 (32 bytes)
         Interrupt: pin B routed to IRQ 0
         Region 4: I/O ports at 6400 [size=32]
         Capabilities: [80] Power Management version 2
                 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
                 Status: D0 PME-Enable- DSel=0 DScale=0 PME-

0000:00:10.3 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81) 
(prog-if 00 [UHCI])
         Subsystem: Asustek Computer, Inc. A7V600 motherboard
         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B-
         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- 
<PERR-
         Latency: 32, Cache Line Size: 0x08 (32 bytes)
         Interrupt: pin B routed to IRQ 0
         Region 4: I/O ports at 6000 [size=32]
         Capabilities: [80] Power Management version 2
                 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
                 Status: D0 PME-Enable- DSel=0 DScale=0 PME-

0000:00:10.4 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 86) (prog-if 20 [EHCI])
         Subsystem: Asustek Computer, Inc. A7V600 motherboard
         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B-
         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- 
<PERR-
         Latency: 32, Cache Line Size: 0x08 (32 bytes)
         Interrupt: pin C routed to IRQ 0
         Region 0: Memory at e2000000 (32-bit, non-prefetchable) [size=256]
         Capabilities: [80] Power Management version 2
                 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
                 Status: D0 PME-Enable- DSel=0 DScale=0 PME-

0000:00:11.0 ISA bridge: VIA Technologies, Inc. VT8237 ISA bridge [K8T800 South]
         Subsystem: Asustek Computer, Inc. A7V600 motherboard
         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping+ SERR- FastB2B-
         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- 
<PERR-
         Latency: 0
         Capabilities: [c0] Power Management version 2
                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                 Status: D0 PME-Enable- DSel=0 DScale=0 PME-

0000:00:13.0 Unknown mass storage controller: Promise Technology, Inc. PDC20318 (SATA150 TX4) (rev 02)
         Subsystem: Promise Technology, Inc. PDC20318 (SATA150 TX4)
         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B-
         Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- 
<PERR-
         Latency: 32 (1000ns min, 4500ns max), Cache Line Size: 0x08 (32 bytes)
         Interrupt: pin A routed to IRQ 18
         Region 0: I/O ports at 5800 [size=64]
         Region 1: I/O ports at 5400 [size=16]
         Region 2: I/O ports at 5000 [size=128]
         Region 3: Memory at e1800000 (32-bit, non-prefetchable) [size=4K]
         Region 4: Memory at e1000000 (32-bit, non-prefetchable) [size=128K]
         Capabilities: [60] Power Management version 2
                 Flags: PMEClk- DSI+ D1+ D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                 Status: D0 PME-Enable- DSel=0 DScale=0 PME-

0000:01:00.0 VGA compatible controller: nVidia Corporation NV18 [GeForce4 MX 4000 AGP 8x] (rev c1) 
(prog-if 00 [VGA])
         Subsystem: Unknown device 1682:201a
         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
         Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- 
<PERR-
         Latency: 64 (1250ns min, 250ns max)
         Interrupt: pin A routed to IRQ 11
         Region 0: Memory at e6000000 (32-bit, non-prefetchable) [size=16M]
         Region 1: Memory at e8000000 (32-bit, prefetchable) [size=128M]
         Expansion ROM at e7fe0000 [disabled] [size=128K]
         Capabilities: [60] Power Management version 2
                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                 Status: D0 PME-Enable- DSel=0 DScale=0 PME-
         Capabilities: [44] AGP version 3.0
                 Status: RQ=32 Iso- ArqSz=0 Cal=3 SBA+ ITACoh- GART64- HTrans- 64bit- FW+ AGP3+ 
Rate=x4,x8
                 Command: RQ=1 ArqSz=0 Cal=0 SBA- AGP- GART64- 64bit- FW- Rate=<none>

CONFIG_X86=y
CONFIG_MMU=y
CONFIG_UID16=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_EXPERIMENTAL=y
CONFIG_CLEAN_COMPILE=y
CONFIG_BROKEN_ON_SMP=y
CONFIG_LOCALVERSION=""
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_POSIX_MQUEUE=y
CONFIG_SYSCTL=y
CONFIG_LOG_BUF_SHIFT=20
CONFIG_HOTPLUG=y
CONFIG_KOBJECT_UEVENT=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
CONFIG_KALLSYMS_EXTRA_PASS=y
CONFIG_FUTEX=y
CONFIG_EPOLL=y
CONFIG_SHMEM=y
CONFIG_CC_ALIGN_FUNCTIONS=0
CONFIG_CC_ALIGN_LABELS=0
CONFIG_CC_ALIGN_LOOPS=0
CONFIG_CC_ALIGN_JUMPS=0
CONFIG_X86_PC=y
CONFIG_MK7=y
CONFIG_X86_CMPXCHG=y
CONFIG_X86_XADD=y
CONFIG_X86_L1_CACHE_SHIFT=6
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INVLPG=y
CONFIG_X86_BSWAP=y
CONFIG_X86_POPAD_OK=y
CONFIG_X86_GOOD_APIC=y
CONFIG_X86_INTEL_USERCOPY=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
CONFIG_X86_USE_3DNOW=y
CONFIG_X86_UP_APIC=y
CONFIG_X86_UP_IOAPIC=y
CONFIG_X86_LOCAL_APIC=y
CONFIG_X86_IO_APIC=y
CONFIG_X86_TSC=y
CONFIG_X86_MCE=y
CONFIG_X86_MCE_NONFATAL=y
CONFIG_NOHIGHMEM=y
CONFIG_MTRR=y
CONFIG_PM=y
CONFIG_ACPI=y
CONFIG_ACPI_BOOT=y
CONFIG_ACPI_INTERPRETER=y
CONFIG_ACPI_BUTTON=y
CONFIG_ACPI_BLACKLIST_YEAR=0
CONFIG_ACPI_BUS=y
CONFIG_ACPI_EC=y
CONFIG_ACPI_POWER=y
CONFIG_ACPI_PCI=y
CONFIG_ACPI_SYSTEM=y
CONFIG_PCI=y
CONFIG_PCI_GOBIOS=y
CONFIG_PCI_BIOS=y
CONFIG_PCI_LEGACY_PROC=y
CONFIG_PCI_NAMES=y
CONFIG_BINFMT_ELF=y
CONFIG_STANDALONE=y
CONFIG_PREVENT_FIRMWARE_BUILD=y
CONFIG_BLK_DEV_RAM_COUNT=16
CONFIG_INITRAMFS_SOURCE=""
CONFIG_LBD=y
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_AS=y
CONFIG_SCSI=y
CONFIG_SCSI_PROC_FS=y
CONFIG_BLK_DEV_SD=y
CONFIG_SCSI_CONSTANTS=y
CONFIG_SCSI_SATA=y
CONFIG_SCSI_SATA_PROMISE=y
CONFIG_SCSI_QLA2XXX=y
CONFIG_MD=y
CONFIG_BLK_DEV_MD=y
CONFIG_MD_RAID6=y
CONFIG_NET=y
CONFIG_PACKET=y
CONFIG_UNIX=y
CONFIG_INET=y
CONFIG_IP_PNP=y
CONFIG_IP_PNP_DHCP=y
CONFIG_NETDEVICES=y
CONFIG_SK98LIN=y
CONFIG_INPUT=y
CONFIG_INPUT_MOUSEDEV=y
CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024
CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768
CONFIG_SOUND_GAMEPORT=y
CONFIG_SERIO=y
CONFIG_SERIO_I8042=y
CONFIG_SERIO_LIBPS2=y
CONFIG_INPUT_KEYBOARD=y
CONFIG_KEYBOARD_ATKBD=y
CONFIG_VT=y
CONFIG_VT_CONSOLE=y
CONFIG_HW_CONSOLE=y
CONFIG_SERIAL_8250=y
CONFIG_SERIAL_8250_CONSOLE=y
CONFIG_SERIAL_8250_NR_UARTS=4
CONFIG_SERIAL_CORE=y
CONFIG_SERIAL_CORE_CONSOLE=y
CONFIG_UNIX98_PTYS=y
CONFIG_LEGACY_PTYS=y
CONFIG_LEGACY_PTY_COUNT=256
CONFIG_RTC=y
CONFIG_I2C=y
CONFIG_I2C_CHARDEV=y
CONFIG_I2C_ALGOBIT=y
CONFIG_I2C_ISA=y
CONFIG_I2C_SENSOR=y
CONFIG_SENSORS_IT87=y
CONFIG_VGA_CONSOLE=y
CONFIG_DUMMY_CONSOLE=y
CONFIG_USB_ARCH_HAS_HCD=y
CONFIG_USB_ARCH_HAS_OHCI=y
CONFIG_EXT3_FS=y
CONFIG_JBD=y
CONFIG_DNOTIFY=y
CONFIG_PROC_FS=y
CONFIG_PROC_KCORE=y
CONFIG_SYSFS=y
CONFIG_DEVFS_FS=y
CONFIG_TMPFS=y
CONFIG_RAMFS=y
CONFIG_NFS_FS=y
CONFIG_NFS_V3=y
CONFIG_NFS_V4=y
CONFIG_NFS_DIRECTIO=y
CONFIG_NFSD=y
CONFIG_NFSD_V3=y
CONFIG_NFSD_TCP=y
CONFIG_ROOT_NFS=y
CONFIG_LOCKD=y
CONFIG_LOCKD_V4=y
CONFIG_EXPORTFS=y
CONFIG_SUNRPC=y
CONFIG_SUNRPC_GSS=y
CONFIG_RPCSEC_GSS_KRB5=y
CONFIG_MSDOS_PARTITION=y
CONFIG_NLS=y
CONFIG_NLS_DEFAULT="iso8859-1"
CONFIG_NLS_CODEPAGE_437=y
CONFIG_NLS_ISO8859_1=y
CONFIG_DEBUG_KERNEL=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_DEBUG_SPINLOCK_SLEEP=y
CONFIG_DEBUG_BUGVERBOSE=y
CONFIG_EARLY_PRINTK=y
CONFIG_X86_FIND_SMP_CONFIG=y
CONFIG_X86_MPPARSE=y
CONFIG_CRYPTO=y
CONFIG_CRYPTO_MD5=y
CONFIG_CRYPTO_DES=y
CONFIG_CRC32=y
CONFIG_GENERIC_HARDIRQS=y
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_X86_BIOS_REBOOT=y
CONFIG_PC=y

storage1:/home/brad# cat /proc/scsi/scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
   Vendor: ATA      Model: Maxtor 7Y250M0   Rev: YAR5
   Type:   Direct-Access                    ANSI SCSI revision: 05
Host: scsi1 Channel: 00 Id: 00 Lun: 00
   Vendor: ATA      Model: Maxtor 7Y250M0   Rev: YAR5
   Type:   Direct-Access                    ANSI SCSI revision: 05
Host: scsi2 Channel: 00 Id: 00 Lun: 00
   Vendor: ATA      Model: Maxtor 7Y250M0   Rev: YAR5
   Type:   Direct-Access                    ANSI SCSI revision: 05
Host: scsi3 Channel: 00 Id: 00 Lun: 00
   Vendor: ATA      Model: Maxtor 7Y250M0   Rev: YAR5
   Type:   Direct-Access                    ANSI SCSI revision: 05
Host: scsi4 Channel: 00 Id: 00 Lun: 00
   Vendor: ATA      Model: Maxtor 7Y250M0   Rev: YAR5
   Type:   Direct-Access                    ANSI SCSI revision: 05
Host: scsi5 Channel: 00 Id: 00 Lun: 00
   Vendor: ATA      Model: Maxtor 7Y250M0   Rev: YAR5
   Type:   Direct-Access                    ANSI SCSI revision: 05
Host: scsi6 Channel: 00 Id: 00 Lun: 00
   Vendor: ATA      Model: Maxtor 7Y250M0   Rev: YAR5
   Type:   Direct-Access                    ANSI SCSI revision: 05
Host: scsi7 Channel: 00 Id: 00 Lun: 00
   Vendor: ATA      Model: Maxtor 7Y250M0   Rev: YAR5
   Type:   Direct-Access                    ANSI SCSI revision: 05
Host: scsi8 Channel: 00 Id: 00 Lun: 00
   Vendor: ATA      Model: Maxtor 7Y250M0   Rev: YAR5
   Type:   Direct-Access                    ANSI SCSI revision: 05
Host: scsi9 Channel: 00 Id: 00 Lun: 00
   Vendor: ATA      Model: Maxtor 7Y250M0   Rev: YAR5
   Type:   Direct-Access                    ANSI SCSI revision: 05
Host: scsi10 Channel: 00 Id: 00 Lun: 00
   Vendor: ATA      Model: Maxtor 7Y250M0   Rev: YAR5
   Type:   Direct-Access                    ANSI SCSI revision: 05
Host: scsi11 Channel: 00 Id: 00 Lun: 00
   Vendor: ATA      Model: Maxtor 7Y250M0   Rev: YAR5
   Type:   Direct-Access                    ANSI SCSI revision: 05
Host: scsi12 Channel: 00 Id: 00 Lun: 00
   Vendor: ATA      Model: Maxtor 7Y250M0   Rev: YAR5
   Type:   Direct-Access                    ANSI SCSI revision: 05
Host: scsi13 Channel: 00 Id: 00 Lun: 00
   Vendor: ATA      Model: Maxtor 7Y250M0   Rev: YAR5
   Type:   Direct-Access                    ANSI SCSI revision: 05
Host: scsi14 Channel: 00 Id: 00 Lun: 00
   Vendor: ATA      Model: Maxtor 7Y250M0   Rev: YAR5
   Type:   Direct-Access                    ANSI SCSI revision: 05

storage1:/home/brad# cat /proc/meminfo
MemTotal:       514824 kB
MemFree:        453212 kB
Buffers:           772 kB
Cached:          27820 kB
SwapCached:          0 kB
Active:          12788 kB
Inactive:        20320 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:       514824 kB
LowFree:        453212 kB
SwapTotal:           0 kB
SwapFree:            0 kB
Dirty:               0 kB
Writeback:           0 kB
Mapped:           9480 kB
Slab:            11148 kB
CommitLimit:    257412 kB
Committed_AS:    12824 kB
PageTables:        424 kB
VmallocTotal:   516024 kB
VmallocUsed:        96 kB
VmallocChunk:   515920 kB
storage1:/home/brad# cat /proc/cpuinfo
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 6
model           : 8
model name      : AMD Sempron(TM) 2600+
stepping        : 1
cpu MHz         : 0.000
cache size      : 256 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr 
sse pni syscall mp mmxext 3dnowext 3dnow
bogomips        : 1828.45

storage1:/home/brad# cat /proc/devices
Character devices:
   1 mem
   2 pty
   3 ttyp
   4 /dev/vc/0
   4 tty
   4 ttyS
   5 /dev/tty
   5 /dev/console
   5 /dev/ptmx
   7 vcs
  10 misc
  13 input
  89 i2c
128 ptm
136 pts
254 devfs

Block devices:
   8 sd
   9 md
  65 sd
  66 sd
  67 sd
  68 sd
  69 sd
  70 sd
  71 sd
128 sd
129 sd
130 sd
131 sd
132 sd
133 sd
134 sd
135 sd
254 mdp
storage1:/home/brad# cat /proc/interrupts
            CPU0
   0:     234810    IO-APIC-edge  timer
   1:         74    IO-APIC-edge  i8042
   8:          4    IO-APIC-edge  rtc
   9:          0   IO-APIC-level  acpi
  16:     237783   IO-APIC-level  libata
  17:     234690   IO-APIC-level  libata
  18:     322433   IO-APIC-level  SysKonnect SK-98xx, libata
  19:     214411   IO-APIC-level  libata
NMI:          0
LOC:     235733
ERR:          0
MIS:          0

-- 
"Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so." -- Douglas Adams

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: libata oops 2.6.11-rc4 yesterdays BK
  2005-02-16  4:28 libata oops 2.6.11-rc4 yesterdays BK Brad Campbell
@ 2005-02-16 11:01 ` Brad Campbell
  2005-02-16 17:25   ` Jeff Garzik
  0 siblings, 1 reply; 27+ messages in thread
From: Brad Campbell @ 2005-02-16 11:01 UTC (permalink / raw)
  To: linux-ide

Brad Campbell wrote:
> Woke up to this, this morning.
> This is yesterdays 2.6.11-rc4 BK + libata BK + libata-dev BK
> It was in the middle of a RAID-6 rebuild (scheduled to take about 740 
> minutes)
> 

Oh well, it appears to be reproducible anyway!

Regards,
Brad

ata15: status=0x51 { DriveReady SeekComplete Error }
ata15: called with no error (51)!
------------[ cut here ]------------
kernel BUG at drivers/scsi/scsi.c:299!
invalid operand: 0000 [#1]
CPU:    0
EIP:    0060:[<c0257a0b>]    Not tainted VLI
EFLAGS: 00010046   (2.6.11-rc4)
EIP is at scsi_put_command+0x8b/0xa0
eax: ddaf85c0   ebx: ddaf85c0   ecx: ddaf85d0   edx: ddaf85d0
esi: dfc70000   edi: 00000292   ebp: dfc66400   esp: dfcabe7c
ds: 007b   es: 007b   ss: 0068
Process scsi_eh_14 (pid: 799, threadinfo=dfcaa000 task=dfc73a20)
Stack: df945f58 dfcc6200 dfc65a28 ddaf85c0 00000296 dfc65a28 c025c859 ddaf85c0
        df945f58 c025c97f ddaf85c0 00000001 00000000 ddaf85c0 ddaf85c0 00000004
        ddaf867c c025ccfe ddaf85c0 00000001 00000000 00000000 000057a8 000057a8
Call Trace:
  [<c025c859>] scsi_next_command+0x19/0x30
  [<c025c97f>] scsi_end_request+0xbf/0xe0
  [<c025ccfe>] scsi_io_completion+0x1ae/0x490
  [<c026927b>] sd_rw_intr+0xcb/0x280
  [<c02580ea>] scsi_finish_command+0x7a/0xc0
  [<c0266ad9>] ata_scsi_qc_complete+0x39/0x70
  [<c026460a>] ata_qc_complete+0x3a/0xc0
  [<c0267e30>] pdc_eng_timeout+0x90/0x120
  [<c026663a>] ata_scsi_error+0x1a/0x30
  [<c025bd4e>] scsi_error_handler+0x9e/0xe0
  [<c025bcb0>] scsi_error_handler+0x0/0xe0
  [<c0100831>] kernel_thread_helper+0x5/0x14
Code: 5c 24 08 8b 74 24 0c 89 44 24 1c 8b 7c 24 10 8b 6c 24 14 83 c4 18 e9 e5 15
  fd ff 89 43 10 89 48 04 31 db 89 51 04 89 4e 0c eb b7 <0f> 0b 2b 01 e7 1a 32 c0
  eb 95 8d 74 26 00 8d bc 27 00 00 00 00
  <1>Unable to handle kernel NULL pointer dereference at virtual address 00000004
  printing eip:
c01333c3
*pde = 00000000
Oops: 0000 [#2]
CPU:    0
EIP:    0060:[<c01333c3>]    Not tainted VLI
EFLAGS: 00010016   (2.6.11-rc4)
EIP is at free_block+0x43/0xd0
eax: 00800000   ebx: 00000000   ecx: 00000000   edx: c1000000
esi: c1506d60   edi: 00000000   ebp: c1506d6c   esp: c14e9ef0
ds: 007b   es: 007b   ss: 0068
Process events/0 (pid: 3, threadinfo=c14e8000 task=c14da020)
Stack: c14da020 dffe69f0 c1506d7c dff08b70 dff08b60 00000001 c1506d60 c0133a0a
        c1506d60 dff08b70 00000001 c1506d60 00000001 c1506dd0 c1506a7c c0133aaa
        c1506d60 dff08b60 00000000 c14da170 00000293 00000000 c051fca4 c0133a50
Call Trace:
  [<c0133a0a>] drain_array_locked+0x7a/0xc0
  [<c0133aaa>] cache_reap+0x5a/0x150
  [<c0133a50>] cache_reap+0x0/0x150
  [<c012326e>] worker_thread+0x19e/0x240
  [<c0111b50>] default_wake_function+0x0/0x20
  [<c0111b50>] default_wake_function+0x0/0x20
  [<c01230d0>] worker_thread+0x0/0x240
  [<c0126ae5>] kthread+0xa5/0xb0
  [<c0126a40>] kthread+0x0/0xb0
  [<c0100831>] kernel_thread_helper+0x5/0x14
Code: 83 00 00 00 8d 46 1c 8d 6e 0c 89 44 24 08 8b 44 24 24 8b 15 b0 fd 51 c0 8b
  0c b8 8d 81 00 00 00 40 c1 e8 0c c1 e0 05 8b 5c 02 1c <8b> 53 04 8b 03 89 50 04
  89 02 c7 43 04 00 02 20 00 2b 4b 0c c7



-- 
"Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so." -- Douglas Adams

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: libata oops 2.6.11-rc4 yesterdays BK
  2005-02-16 11:01 ` Brad Campbell
@ 2005-02-16 17:25   ` Jeff Garzik
  2005-02-16 20:54     ` Brad Campbell
  0 siblings, 1 reply; 27+ messages in thread
From: Jeff Garzik @ 2005-02-16 17:25 UTC (permalink / raw)
  To: Brad Campbell; +Cc: linux-ide

Brad Campbell wrote:
> Brad Campbell wrote:
> 
>> Woke up to this, this morning.
>> This is yesterdays 2.6.11-rc4 BK + libata BK + libata-dev BK
>> It was in the middle of a RAID-6 rebuild (scheduled to take about 740 
>> minutes)
>>
> 
> Oh well, it appears to be reproducible anyway!

Reproducible without the libata-dev patch?

Reproducible with the current libata driver... on an older kernel?  Say 
2.6.11-rc4 libata with 2.6.10.

	Jeff



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: libata oops 2.6.11-rc4 yesterdays BK
  2005-02-16 17:25   ` Jeff Garzik
@ 2005-02-16 20:54     ` Brad Campbell
  2005-02-16 21:40       ` Andy Warner
  0 siblings, 1 reply; 27+ messages in thread
From: Brad Campbell @ 2005-02-16 20:54 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: linux-ide

Jeff Garzik wrote:
> Brad Campbell wrote:
> 
>> Brad Campbell wrote:
>>
>>> Woke up to this, this morning.
>>> This is yesterdays 2.6.11-rc4 BK + libata BK + libata-dev BK
>>> It was in the middle of a RAID-6 rebuild (scheduled to take about 740 
>>> minutes)
>>>
>>
>> Oh well, it appears to be reproducible anyway!
> 
> 
> Reproducible without the libata-dev patch?
> 
> Reproducible with the current libata driver... on an older kernel?  Say 
> 2.6.11-rc4 libata with 2.6.10.

Both good questions. I'm running 2.6.10-bk10 with the libata and libata-dev trees of that time now 
and it has made it through 2 ata timeouts and is 50% rebuilt on a 750 minute rebuild. If that 
survives I'll try and dial in some of the other kernels. 11-12 hour test times make it a bit of a 
bear to debug!
Actually, I'm not sure without the libata dev patch as that removes SMART support, and I'm not 
convinced that my smartd polling every 20 minutes does not have something to do with it. All I know 
is the older kernel seems to cope. We'll see. 320 minutes left on this rebuild. I expect it will be 
done in the morning if all goes according to plan. (With the 2.6.11 kernel it never survived past 
about 25% rebuilt)

This 2.6.10-bk10 is the kernel I have been running on my server for a while now (Almost identical 
hardware, just 1 less controller and a couple less disks)

bklaptop:~>ssh srv
Linux srv 2.6.10 #2 Mon Jan 10 18:42:45 GST 2005 i686 GNU/Linux
No mail.

Last login: Wed Feb 16 15:03:13 2005 from bklaptop
brad@srv:~$ uptime
  00:50:16 up 37 days,  5:40,  5 users,  load average: 0.21, 0.27, 0.35


Lucky I have a ready source of failing drives! 29 Maxtors. 1 is dying after 5000 hours and the other 
has 119 reallocated blocks after the first 8 hours.. Looking good thus far! <Bletch>.

Regards,
Brad -- Certified (or is that certifiable) libata torture tester.
-- 
"Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so." -- Douglas Adams

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: libata oops 2.6.11-rc4 yesterdays BK
  2005-02-16 20:54     ` Brad Campbell
@ 2005-02-16 21:40       ` Andy Warner
  2005-02-16 22:47         ` Jeff Garzik
  2005-02-18  6:13         ` libata oops 2.6.11-rc4 yesterdays BK Brad Campbell
  0 siblings, 2 replies; 27+ messages in thread
From: Andy Warner @ 2005-02-16 21:40 UTC (permalink / raw)
  To: Brad Campbell; +Cc: Jeff Garzik, linux-ide

Brad Campbell wrote:
> [...]
> Actually, I'm not sure without the libata dev patch as that removes SMART support, and I'm not 
> convinced that my smartd polling every 20 minutes does not have something to do with it. All I know 
> is the older kernel seems to cope. We'll see. 320 minutes left on this rebuild. I expect it will be 
> done in the morning if all goes according to plan. (With the 2.6.11 kernel it never survived past 
> about 25% rebuilt)

Can you find time to try it without smartd active ?
You report running a uni-processor system, and I have only
seen PIO problems with (fast) SMP systems in my testing,
but I am forming the opinion that libata-PIO functions
are in need of a minor overhaul.

I have seen issues where port activity monopolised
data-paths/arbitration inside chipsets such that the
PIO operations would appear to time out. Since you're
doing a raid rebuild, perhaps the I/O load is causing
something similar to occur.
-- 
andyw@pobox.com

Andy Warner		Voice: (612) 801-8549	Fax: (208) 575-5634

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: libata oops 2.6.11-rc4 yesterdays BK
  2005-02-16 21:40       ` Andy Warner
@ 2005-02-16 22:47         ` Jeff Garzik
  2005-02-16 23:49           ` Andy Warner
  2005-02-18  6:13         ` libata oops 2.6.11-rc4 yesterdays BK Brad Campbell
  1 sibling, 1 reply; 27+ messages in thread
From: Jeff Garzik @ 2005-02-16 22:47 UTC (permalink / raw)
  To: Andy Warner; +Cc: Brad Campbell, linux-ide

Andy Warner wrote:
> Brad Campbell wrote:
> 
>>[...]
>>Actually, I'm not sure without the libata dev patch as that removes SMART support, and I'm not 
>>convinced that my smartd polling every 20 minutes does not have something to do with it. All I know 
>>is the older kernel seems to cope. We'll see. 320 minutes left on this rebuild. I expect it will be 
>>done in the morning if all goes according to plan. (With the 2.6.11 kernel it never survived past 
>>about 25% rebuilt)
> 
> 
> Can you find time to try it without smartd active ?
> You report running a uni-processor system, and I have only
> seen PIO problems with (fast) SMP systems in my testing,
> but I am forming the opinion that libata-PIO functions
> are in need of a minor overhaul.
> 
> I have seen issues where port activity monopolised
> data-paths/arbitration inside chipsets such that the
> PIO operations would appear to time out. Since you're
> doing a raid rebuild, perhaps the I/O load is causing
> something similar to occur.

Does the PIO code deviate from the ATA/ATAPI-[4567] host state machine 
somehow?

Or is it just that newer SATA-emulating-PATA chips have trouble with it?

	Jeff



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: libata oops 2.6.11-rc4 yesterdays BK
  2005-02-16 22:47         ` Jeff Garzik
@ 2005-02-16 23:49           ` Andy Warner
  2005-02-16 23:58             ` Jeff Garzik
  0 siblings, 1 reply; 27+ messages in thread
From: Andy Warner @ 2005-02-16 23:49 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Andy Warner, Brad Campbell, linux-ide

Jeff Garzik wrote:
> [...]
> Does the PIO code deviate from the ATA/ATAPI-[4567] host state machine 
> somehow?

That I can't say (the ata/atapi docs make me want to put my
head under the wheel of a bus), but: on SMP machines the
implementation would turn into busy-waiting for every sector;
I have my suspicions about the ata_busy_wait() calls in
ata_pio_block(); I also looked at implementing ATA_PROT_PIO_MULT
with interrupt support, but then  ran out of time on the
project - what's there doesn't (didn't) use interrupts.

> Or is it just that newer SATA-emulating-PATA chips have trouble with it?

Could be, I for sure saw arbitration/starvation issues that
resulted in geological-grade delays getting status at the end
of some PIO transfers. The result was timeout errors under
heavy load. I believe that the SMP-machine-becomes-busy-wait-
monster bug probably caused the majority of these errors (I
could generate them after a few minutes testing), because I had
4 (fast-ish) cores conspiring to beat the crap out of 1 register
on a PCI card.
-- 
andyw@pobox.com

Andy Warner		Voice: (612) 801-8549	Fax: (208) 575-5634

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: libata oops 2.6.11-rc4 yesterdays BK
  2005-02-16 23:49           ` Andy Warner
@ 2005-02-16 23:58             ` Jeff Garzik
  2005-02-17  0:20               ` Andy Warner
  0 siblings, 1 reply; 27+ messages in thread
From: Jeff Garzik @ 2005-02-16 23:58 UTC (permalink / raw)
  To: Andy Warner; +Cc: Brad Campbell, linux-ide

Andy Warner wrote:
> Jeff Garzik wrote:
> 
>>[...]
>>Does the PIO code deviate from the ATA/ATAPI-[4567] host state machine 
>>somehow?
> 
> 
> That I can't say (the ata/atapi docs make me want to put my
> head under the wheel of a bus), but: on SMP machines the
> implementation would turn into busy-waiting for every sector;
> I have my suspicions about the ata_busy_wait() calls in
> ata_pio_block(); I also looked at implementing ATA_PROT_PIO_MULT
> with interrupt support, but then  ran out of time on the
> project - what's there doesn't (didn't) use interrupts.
> 
> 
>>Or is it just that newer SATA-emulating-PATA chips have trouble with it?
> 
> 
> Could be, I for sure saw arbitration/starvation issues that
> resulted in geological-grade delays getting status at the end
> of some PIO transfers. The result was timeout errors under
> heavy load. I believe that the SMP-machine-becomes-busy-wait-
> monster bug probably caused the majority of these errors (I
> could generate them after a few minutes testing), because I had
> 4 (fast-ish) cores conspiring to beat the crap out of 1 register
> on a PCI card.

Unfortunately, that's what you're _supposed_ to do, busy-wait for every 
"block" (where block == 1 sector for PIO, and <n> sectors for PIO-Mult).

Wasn't it you that had a patch that used ata_altstatus() to mitigate 
this somewhat?

It's entirely possible that I'm one of the first to punish SATA 
controllers with PIO polling data transfer, rather than interrupt-driven 
xfer.  The SMP aspect makes me suspicious that something else might be 
involved, as well.  Ever since the 2.6.10-bkN kernel updated ACPI, the 
one SMP machine I had that failed on libata started working.

Any chance you could debug this further?

	Jeff



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: libata oops 2.6.11-rc4 yesterdays BK
  2005-02-16 23:58             ` Jeff Garzik
@ 2005-02-17  0:20               ` Andy Warner
  2005-02-17  5:08                 ` Jeff Garzik
  0 siblings, 1 reply; 27+ messages in thread
From: Andy Warner @ 2005-02-17  0:20 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Andy Warner, Brad Campbell, linux-ide

Jeff Garzik wrote:
> [...]
> Unfortunately, that's what you're _supposed_ to do, busy-wait for every 
> "block" (where block == 1 sector for PIO, and <n> sectors for PIO-Mult).

The logic surrounding PIO-multi in PATA-land looked markedly different.

> Wasn't it you that had a patch that used ata_altstatus() to mitigate 
> this somewhat?

Yeah - and to not call queue_work() to accomplish the polling
(which could start the next poll immediately on an SMP machine),
I suppose that _could_ just as easily point to a locking problem,
as a state machine logic flaw. My proof-of-concept kludge was to
call queue_delayed_work() instead.

> It's entirely possible that I'm one of the first to punish SATA 
> controllers with PIO polling data transfer, rather than interrupt-driven 
> xfer.  The SMP aspect makes me suspicious that something else might be 
> involved, as well.  Ever since the 2.6.10-bkN kernel updated ACPI, the 
> one SMP machine I had that failed on libata started working.

I saw errors on both SiI (3114) and Promise (20319) cards, so
I'm not convinced that (these) problems are at the chip-level
(not that there aren't plenty of those to go around.)

> Any chance you could debug this further?

I'll see what I can do.
-- 
andyw@pobox.com

Andy Warner		Voice: (612) 801-8549	Fax: (208) 575-5634

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: libata oops 2.6.11-rc4 yesterdays BK
  2005-02-17  0:20               ` Andy Warner
@ 2005-02-17  5:08                 ` Jeff Garzik
  2005-02-17 14:59                   ` Andy Warner
  0 siblings, 1 reply; 27+ messages in thread
From: Jeff Garzik @ 2005-02-17  5:08 UTC (permalink / raw)
  To: Andy Warner; +Cc: Bartlomiej Zolnierkiewicz, linux-ide

Andy Warner wrote:
> Jeff Garzik wrote:
> 
>>[...]
>>Unfortunately, that's what you're _supposed_ to do, busy-wait for every 
>>"block" (where block == 1 sector for PIO, and <n> sectors for PIO-Mult).
> 
> 
> The logic surrounding PIO-multi in PATA-land looked markedly different.

Not surprising, as the path when interrupts are enabled looks different.

I'm starting to wonder if polling isn't just a dismal failure on SATA, 
since the status register/etc. is all emulated.  Thinking further along 
those lines (how an ATA shadow register set is faked by the host 
controller using FIS data), I wonder if polling -- per ATA spec -- 
exposes a race between FIS reception and processing, and the update of 
the ATA shadow register block.


>>Wasn't it you that had a patch that used ata_altstatus() to mitigate 
>>this somewhat?
> 
> 
> Yeah - and to not call queue_work() to accomplish the polling
> (which could start the next poll immediately on an SMP machine),
> I suppose that _could_ just as easily point to a locking problem,
> as a state machine logic flaw. My proof-of-concept kludge was to
> call queue_delayed_work() instead.
> 
> 
>>It's entirely possible that I'm one of the first to punish SATA 
>>controllers with PIO polling data transfer, rather than interrupt-driven 
>>xfer.  The SMP aspect makes me suspicious that something else might be 
>>involved, as well.  Ever since the 2.6.10-bkN kernel updated ACPI, the 
>>one SMP machine I had that failed on libata started working.
> 
> 
> I saw errors on both SiI (3114) and Promise (20319) cards, so
> I'm not convinced that (these) problems are at the chip-level
> (not that there aren't plenty of those to go around.)
> 
> 
>>Any chance you could debug this further?
> 
> 
> I'll see what I can do.

Thanks,

	Jeff



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: libata oops 2.6.11-rc4 yesterdays BK
  2005-02-17  5:08                 ` Jeff Garzik
@ 2005-02-17 14:59                   ` Andy Warner
  2005-02-17 19:13                     ` Jeff Garzik
  0 siblings, 1 reply; 27+ messages in thread
From: Andy Warner @ 2005-02-17 14:59 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Andy Warner, Bartlomiej Zolnierkiewicz, linux-ide

Jeff Garzik wrote:
> [...]
> I'm starting to wonder if polling isn't just a dismal failure on SATA, 
> since the status register/etc. is all emulated.  Thinking further along 
> those lines (how an ATA shadow register set is faked by the host 
> controller using FIS data), I wonder if polling -- per ATA spec -- 
> exposes a race between FIS reception and processing, and the update of 
> the ATA shadow register block.

Quite possibly - though the register set has been fake one way or
another most of the time. This time it's a different fake, with two
vendors getting to put the bits back together in a different order
instead of one vendor's firmware team. The second generation
controllers mostly seem to support/require using DMA to accomplish
the operations named "PIO *" in the ATA/ATAPI spec. This will be
a good thing(tm). I'm still trying to wrap my brain around what
(if any) changes this will impose on libata - my hunch is that
we will require a set of pio_xxx methods in ata_port_operations
with suitable defaults that fall back to the tf_load/tf_read
methods currently specified.

Obviously, there will still be millions of first generation
SATA controllers roaming the earth, and for stuff like SMART
we need to make it play nicely with the other children. Damn.
-- 
andyw@pobox.com

Andy Warner		Voice: (612) 801-8549	Fax: (208) 575-5634

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: libata oops 2.6.11-rc4 yesterdays BK
  2005-02-17 14:59                   ` Andy Warner
@ 2005-02-17 19:13                     ` Jeff Garzik
  2005-02-17 19:25                       ` Andy Warner
  2005-02-17 19:42                       ` Which SATA Combos To Consider? Danny Cox
  0 siblings, 2 replies; 27+ messages in thread
From: Jeff Garzik @ 2005-02-17 19:13 UTC (permalink / raw)
  To: Andy Warner; +Cc: Bartlomiej Zolnierkiewicz, linux-ide

Andy Warner wrote:
> Jeff Garzik wrote:
> 
>>[...]
>>I'm starting to wonder if polling isn't just a dismal failure on SATA, 
>>since the status register/etc. is all emulated.  Thinking further along 
>>those lines (how an ATA shadow register set is faked by the host 
>>controller using FIS data), I wonder if polling -- per ATA spec -- 
>>exposes a race between FIS reception and processing, and the update of 
>>the ATA shadow register block.
> 
> 
> Quite possibly - though the register set has been fake one way or
> another most of the time. This time it's a different fake, with two
> vendors getting to put the bits back together in a different order

heh


> instead of one vendor's firmware team. The second generation
> controllers mostly seem to support/require using DMA to accomplish
> the operations named "PIO *" in the ATA/ATAPI spec. This will be
> a good thing(tm). I'm still trying to wrap my brain around what
> (if any) changes this will impose on libata - my hunch is that
> we will require a set of pio_xxx methods in ata_port_operations
> with suitable defaults that fall back to the tf_load/tf_read
> methods currently specified.

You are behind the times ;-)

Both you and a hardware vendor I just spoke with both seem to have 
missed that ahci.c already does a couple key things:

* all operations, including PIO data xfer and SRST, must be accomplished 
via DMA.  (ata_adma in libata-dev is similar)

* zero access to the taskfile registers.  100% FIS-based.

The SiI 311x supports "virtual DMA", which is PIO via DMA, but it's not 
useful as implemented:  311x requires a separate DMA transaction for 
_each_ DRQ block, AFAICS.

AHCI is the first scenario where PIO-via-DMA could be utilized in an 
efficient manner.  The upcoming SiI 3124 is another.  A few others 
(ADMA, Marvell) are PIO-via-DMA controllers as well.  I agree this is a 
good thing.


> Obviously, there will still be millions of first generation
> SATA controllers roaming the earth, and for stuff like SMART
> we need to make it play nicely with the other children. Damn.

hehe :)

Anyway, getting back to the thread of "problems with PIO polling", I am 
wondering if -- due to SATA's nature -- PIO polling should be avoided, 
and interrupt-driven methodology used instead.

One reason why PIO polling was chosen (for controllers that support it; 
AHCI does not) is that the entire command submission/processing code can 
be written inline:  just submit-command, wait-for-busy-clear, etc. 
Makes the code less complex.

	Jeff



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: libata oops 2.6.11-rc4 yesterdays BK
  2005-02-17 19:13                     ` Jeff Garzik
@ 2005-02-17 19:25                       ` Andy Warner
  2005-02-17 22:36                         ` Jeff Garzik
  2005-02-17 19:42                       ` Which SATA Combos To Consider? Danny Cox
  1 sibling, 1 reply; 27+ messages in thread
From: Andy Warner @ 2005-02-17 19:25 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Andy Warner, Bartlomiej Zolnierkiewicz, linux-ide

Jeff Garzik wrote:
> [...]
> AHCI is the first scenario where PIO-via-DMA could be utilized in an 
> efficient manner.  The upcoming SiI 3124 is another.  A few others 
> (ADMA, Marvell) are PIO-via-DMA controllers as well.  I agree this is a 
> good thing.

I _think_ the SATA-II stuff from Promise (20579) does this too.

> Anyway, getting back to the thread of "problems with PIO polling", I am 
> wondering if -- due to SATA's nature -- PIO polling should be avoided, 
> and interrupt-driven methodology used instead.
> 
> One reason why PIO polling was chosen (for controllers that support it; 
> AHCI does not) is that the entire command submission/processing code can 
> be written inline:  just submit-command, wait-for-busy-clear, etc. 
> Makes the code less complex.

I think going interrupt driven would be a good idea. Of course
when I tried it one chip didn't serve up the interrupt as expected
(can't remember is it was the 3114 or the 20319 - would have to
check my notes.) I don't think it is massively more complex than
what we currently have, and quite possibly might be simpler.
-- 
andyw@pobox.com

Andy Warner		Voice: (612) 801-8549	Fax: (208) 575-5634

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Which SATA Combos To Consider?
  2005-02-17 19:13                     ` Jeff Garzik
  2005-02-17 19:25                       ` Andy Warner
@ 2005-02-17 19:42                       ` Danny Cox
  2005-02-17 20:55                         ` Jeff Garzik
  2005-02-18  0:25                         ` Ryan Bourgeois
  1 sibling, 2 replies; 27+ messages in thread
From: Danny Cox @ 2005-02-17 19:42 UTC (permalink / raw)
  To: Jeff Garzik, Bartlomiej Zolnierkiewicz, Linux IDE List

	I've been mostly lurking here for awhile now, just seeing how things
are going.  I've seen various drives on a blacklist, and various
controllers that do this or that well, but have problems doing foo.
There also seem to have been a Strange Interaction as well, but that's a
fuzzy memory at best.

	So, my question is: if YOU were to purchase an SATA setup brand new,
what would you specify?  Which drives, motherboards, and PCI cards would
you recommend that just work?

	I don't even mean "work like a Mercedes", by which I mean almost
perfection.  I mean like a Chevy.  I don't mind a little tinkering to
get it right, but I want my disk subsystem to be solid thereafter!  I've
got important stuff here!  Like my wife's backup; NEVER lose your wife's
backup (shudder)!

	If SATA isn't ready for consumerdom, I'd like to know that too.  This
just isn't for Jeff and Bart either.  I'd like to hear success stories
from those whose systems just hum along all the time!

	Thanks in advance!

-- 
Daniel S. Cox
Internet Commerce Corporation


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Which SATA Combos To Consider?
  2005-02-17 19:42                       ` Which SATA Combos To Consider? Danny Cox
@ 2005-02-17 20:55                         ` Jeff Garzik
  2005-02-18  0:25                         ` Ryan Bourgeois
  1 sibling, 0 replies; 27+ messages in thread
From: Jeff Garzik @ 2005-02-17 20:55 UTC (permalink / raw)
  To: Danny Cox; +Cc: Bartlomiej Zolnierkiewicz, Linux IDE List

Danny Cox wrote:
> 	I've been mostly lurking here for awhile now, just seeing how things
> are going.  I've seen various drives on a blacklist, and various
> controllers that do this or that well, but have problems doing foo.
> There also seem to have been a Strange Interaction as well, but that's a
> fuzzy memory at best.
> 
> 	So, my question is: if YOU were to purchase an SATA setup brand new,
> what would you specify?  Which drives, motherboards, and PCI cards would
> you recommend that just work?
> 
> 	I don't even mean "work like a Mercedes", by which I mean almost
> perfection.  I mean like a Chevy.  I don't mind a little tinkering to
> get it right, but I want my disk subsystem to be solid thereafter!  I've
> got important stuff here!  Like my wife's backup; NEVER lose your wife's
> backup (shudder)!
> 
> 	If SATA isn't ready for consumerdom, I'd like to know that too.  This
> just isn't for Jeff and Bart either.  I'd like to hear success stories
> from those whose systems just hum along all the time!

In terms of hardware, AHCI (from Intel/SiS/ULi/others) and Silicon Image 
3124 are the best of the current generation of "FIS-based" SATA-II 
controllers.  With these controllers, ATA controllers are __finally__ as 
efficient as SCSI controllers have been for years.

For SATA-I controllers, I tend to feel that the Promise SATA cards 
driven by the sata_promise driver are decent.

The rest of the SATA-I controllers all pretty much look the same, 
hardware-wise:  Decade-old PATA controller interface with PCI 
extensions, with further SATA extensions (SATA phy registers).

Of these, the only thing that really distinguishes the controllers are 
(a) MMIO register access, or not, and (b) SATA phy access.  Silicon 
Image 311x, nVidia, ServerWorks non-QDMA, and Vitesse chips do MMIO and 
offer sata phy access.

The only real combination to avoid is Silicon Image 311x + Seagate. 
311x is fine with other drives, and Seagate drives are fine with other 
controllers.

	Jeff



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: libata oops 2.6.11-rc4 yesterdays BK
  2005-02-17 19:25                       ` Andy Warner
@ 2005-02-17 22:36                         ` Jeff Garzik
  0 siblings, 0 replies; 27+ messages in thread
From: Jeff Garzik @ 2005-02-17 22:36 UTC (permalink / raw)
  To: Andy Warner; +Cc: Bartlomiej Zolnierkiewicz, linux-ide

Andy Warner wrote:
> Jeff Garzik wrote:
> 
>>[...]
>>AHCI is the first scenario where PIO-via-DMA could be utilized in an 
>>efficient manner.  The upcoming SiI 3124 is another.  A few others 
>>(ADMA, Marvell) are PIO-via-DMA controllers as well.  I agree this is a 
>>good thing.
> 
> 
> I _think_ the SATA-II stuff from Promise (20579) does this too.

I don't see it in the docs.  The only possibility is that the "none 
data" bit in the Packet format now implies that ATA (rather than just 
ATAPI) PIO packets are accepted.

It's probably a side effect of Promise having avoided doing a strictly 
FIS-based engine like AHCI.

More likely you're still stuck with the error-prone method of submitting 
an outrageously long packet containing 256 writes to the Data register, 
followed by a microcoded check for BSY and DRQ.


>>Anyway, getting back to the thread of "problems with PIO polling", I am 
>>wondering if -- due to SATA's nature -- PIO polling should be avoided, 
>>and interrupt-driven methodology used instead.
>>
>>One reason why PIO polling was chosen (for controllers that support it; 
>>AHCI does not) is that the entire command submission/processing code can 
>>be written inline:  just submit-command, wait-for-busy-clear, etc. 
>>Makes the code less complex.
> 
> 
> I think going interrupt driven would be a good idea. Of course
> when I tried it one chip didn't serve up the interrupt as expected
> (can't remember is it was the 3114 or the 20319 - would have to
> check my notes.) I don't think it is massively more complex than
> what we currently have, and quite possibly might be simpler.

Yeah, I'm leaning towards ditching polling in favor of interrupts.  If 
we did that, it would probably be easier to do PIO-Mult.

But... no time right now.  Anyone reading this message is welcome to 
take up the challenge.  libata API supports this change seamlessly, it's 
just a matter of changing the internals.

	Jeff



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Which SATA Combos To Consider?
  2005-02-17 19:42                       ` Which SATA Combos To Consider? Danny Cox
  2005-02-17 20:55                         ` Jeff Garzik
@ 2005-02-18  0:25                         ` Ryan Bourgeois
  2005-02-18  0:44                           ` Johny Ågotnes
  1 sibling, 1 reply; 27+ messages in thread
From: Ryan Bourgeois @ 2005-02-18  0:25 UTC (permalink / raw)
  To: Danny Cox; +Cc: Linux IDE List

Danny Cox wrote:

>	I've been mostly lurking here for awhile now, just seeing how things
>are going.  I've seen various drives on a blacklist, and various
>controllers that do this or that well, but have problems doing foo.
>There also seem to have been a Strange Interaction as well, but that's a
>fuzzy memory at best.
>
>	So, my question is: if YOU were to purchase an SATA setup brand new,
>what would you specify?  Which drives, motherboards, and PCI cards would
>you recommend that just work?
>
>	I don't even mean "work like a Mercedes", by which I mean almost
>perfection.  I mean like a Chevy.  I don't mind a little tinkering to
>get it right, but I want my disk subsystem to be solid thereafter!  I've
>got important stuff here!  Like my wife's backup; NEVER lose your wife's
>backup (shudder)!
>
>	If SATA isn't ready for consumerdom, I'd like to know that too.  This
>just isn't for Jeff and Bart either.  I'd like to hear success stories
>from those whose systems just hum along all the time!
>
>	Thanks in advance!
>
>  
>
On my file server I run the Highpoint RocketRAID 1640.  It's a software 
RAID five card.  Basically it's just a PCI card with two HPT374 chips 
each with two SATA plugs (a total of four plugs).  So it's a no frills 
PATA card with SATA converters onboard.  I run three Western Digital 
120gb SATA drives on it with a Linux Software RAID 5 array on them.  
Aside from the fact that it's a PATA card at heart (so it doesn't use 
libata), my only complaint is that it's slow.  It's stable as a rock, 
though, I've had no problems with the card or drives.

I have a Promise SX4 that I tried.  I had those same three drives 
connected to it, but I was having some problems with the array when I 
was using it.  It had a tendency to corrupt the filesystem, for some 
reason.  I was having other problems at the time, though, so it could be 
unrelated to the card I was using.  If I get time and money I may try 
and set up an array on the card for testing.

Anyways, out of personal preference, I go with Western Digital hard 
drives.  For SATA cards, the Promise TX cards seem pretty reliable.  I 
have an onboard TX2 on my main machine running a WD Raptor.  I've had 
absolutely no problems with it and libata.  Or in Windows, for that matter.

I haven't used any other cards or drives, so I cannot offer anything in 
that direction.

-Ryan Bourgeois



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Which SATA Combos To Consider?
  2005-02-18  0:25                         ` Ryan Bourgeois
@ 2005-02-18  0:44                           ` Johny Ågotnes
  2005-02-18  0:52                             ` Jeff Garzik
  0 siblings, 1 reply; 27+ messages in thread
From: Johny Ågotnes @ 2005-02-18  0:44 UTC (permalink / raw)
  To: Ryan Bourgeois; +Cc: Danny Cox, Linux IDE List

Just a quick note on the SX4 Card - I have seen data corruption on 
hard-drives too in the simplest possible setup, so I'd stay clear of 
that card until further notice.

Mine is sitting in a cupboard right now, awaiting 'someone' (me, if I 
get some time to learn low-level kernel drivers) debugging the issue.

Basically, accessing two drives via this card causes corruption, I had 
it working ok when I only accessed 1 drive, which is kinda pointless...

:)J



Ryan Bourgeois wrote:
> Danny Cox wrote:
> 
>>     I've been mostly lurking here for awhile now, just seeing how things
>> are going.  I've seen various drives on a blacklist, and various
>> controllers that do this or that well, but have problems doing foo.
>> There also seem to have been a Strange Interaction as well, but that's a
>> fuzzy memory at best.
>>
>>     So, my question is: if YOU were to purchase an SATA setup brand new,
>> what would you specify?  Which drives, motherboards, and PCI cards would
>> you recommend that just work?
>>
>>     I don't even mean "work like a Mercedes", by which I mean almost
>> perfection.  I mean like a Chevy.  I don't mind a little tinkering to
>> get it right, but I want my disk subsystem to be solid thereafter!  I've
>> got important stuff here!  Like my wife's backup; NEVER lose your wife's
>> backup (shudder)!
>>
>>     If SATA isn't ready for consumerdom, I'd like to know that too.  This
>> just isn't for Jeff and Bart either.  I'd like to hear success stories
>> from those whose systems just hum along all the time!
>>
>>     Thanks in advance!
>>
>>  
>>
> On my file server I run the Highpoint RocketRAID 1640.  It's a software 
> RAID five card.  Basically it's just a PCI card with two HPT374 chips 
> each with two SATA plugs (a total of four plugs).  So it's a no frills 
> PATA card with SATA converters onboard.  I run three Western Digital 
> 120gb SATA drives on it with a Linux Software RAID 5 array on them.  
> Aside from the fact that it's a PATA card at heart (so it doesn't use 
> libata), my only complaint is that it's slow.  It's stable as a rock, 
> though, I've had no problems with the card or drives.
> 
> I have a Promise SX4 that I tried.  I had those same three drives 
> connected to it, but I was having some problems with the array when I 
> was using it.  It had a tendency to corrupt the filesystem, for some 
> reason.  I was having other problems at the time, though, so it could be 
> unrelated to the card I was using.  If I get time and money I may try 
> and set up an array on the card for testing.
> 
> Anyways, out of personal preference, I go with Western Digital hard 
> drives.  For SATA cards, the Promise TX cards seem pretty reliable.  I 
> have an onboard TX2 on my main machine running a WD Raptor.  I've had 
> absolutely no problems with it and libata.  Or in Windows, for that matter.
> 
> I haven't used any other cards or drives, so I cannot offer anything in 
> that direction.
> 
> -Ryan Bourgeois
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ide" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Which SATA Combos To Consider?
  2005-02-18  0:44                           ` Johny Ågotnes
@ 2005-02-18  0:52                             ` Jeff Garzik
  2005-02-21 23:50                               ` Johny Ågotnes
  2005-02-21 23:50                               ` Johny Ågotnes
  0 siblings, 2 replies; 27+ messages in thread
From: Jeff Garzik @ 2005-02-18  0:52 UTC (permalink / raw)
  To: Johny Ågotnes; +Cc: Ryan Bourgeois, Danny Cox, Linux IDE List

On Fri, Feb 18, 2005 at 10:44:57AM +1000, Johny Ågotnes wrote:
> Just a quick note on the SX4 Card - I have seen data corruption on 
> hard-drives too in the simplest possible setup, so I'd stay clear of 
> that card until further notice.
> 
> Mine is sitting in a cupboard right now, awaiting 'someone' (me, if I 
> get some time to learn low-level kernel drivers) debugging the issue.
> 
> Basically, accessing two drives via this card causes corruption, I had 
> it working ok when I only accessed 1 drive, which is kinda pointless...

I've not need about to reproduce this in the lab.

Have you tried switching out RAM and cables?

	Jeff




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: libata oops 2.6.11-rc4 yesterdays BK
  2005-02-16 21:40       ` Andy Warner
  2005-02-16 22:47         ` Jeff Garzik
@ 2005-02-18  6:13         ` Brad Campbell
  2005-02-19  4:14           ` Brad Campbell
  1 sibling, 1 reply; 27+ messages in thread
From: Brad Campbell @ 2005-02-18  6:13 UTC (permalink / raw)
  To: Andy Warner; +Cc: Jeff Garzik, linux-ide

Andy Warner wrote:
> Brad Campbell wrote:
> 
>>[...]
>>Actually, I'm not sure without the libata dev patch as that removes SMART support, and I'm not 
>>convinced that my smartd polling every 20 minutes does not have something to do with it. All I know 
>>is the older kernel seems to cope. We'll see. 320 minutes left on this rebuild. I expect it will be 
>>done in the morning if all goes according to plan. (With the 2.6.11 kernel it never survived past 
>>about 25% rebuilt)
> 
> 
> Can you find time to try it without smartd active ?
> You report running a uni-processor system, and I have only
> seen PIO problems with (fast) SMP systems in my testing,
> but I am forming the opinion that libata-PIO functions
> are in need of a minor overhaul.
> 
> I have seen issues where port activity monopolised
> data-paths/arbitration inside chipsets such that the
> PIO operations would appear to time out. Since you're
> doing a raid rebuild, perhaps the I/O load is causing
> something similar to occur.

I suspect that is exactly the problem I'm seeing. I can see the smart polling on the drive lights 
when it occurs. It slows the rebuild quite significantly for that period.

I hit the problem using 2.6.10-bk10 also. It's just much harder to hit with that kernel. I'm now 
trying 2.6.11-rc4 with all the libata patches (the kernel I was using before), but smartd disabled. 
I have a sneaking suspicion that SMART is the root cause here however I don't see it on the other 
machine because
A) I'm using RAID-5 and not 6, thus my CPU usage during a rebuild is a LOT lower
B) I have more cards/drives in this machine and a RAID-6 rebuild across 15 drives appears to be 
quite taxing on the hardware.

Anyway, rebuild started. We will see in 12 hours.

I did note when I get an ata timeout in 2.6.10 it handles it normally. In 2.6.11 it hardlocks the 
machine. No alt-srq or anything else.

Regards,
Brad
-- 
"Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so." -- Douglas Adams

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: libata oops 2.6.11-rc4 yesterdays BK
  2005-02-18  6:13         ` libata oops 2.6.11-rc4 yesterdays BK Brad Campbell
@ 2005-02-19  4:14           ` Brad Campbell
  2005-02-21  4:27             ` Brad Campbell
  0 siblings, 1 reply; 27+ messages in thread
From: Brad Campbell @ 2005-02-19  4:14 UTC (permalink / raw)
  To: Brad Campbell; +Cc: Andy Warner, Jeff Garzik, linux-ide

Brad Campbell wrote:
> Andy Warner wrote:
> 
>> Brad Campbell wrote:
>>
>>> [...]
>>> Actually, I'm not sure without the libata dev patch as that removes 
>>> SMART support, and I'm not convinced that my smartd polling every 20 
>>> minutes does not have something to do with it. All I know is the 
>>> older kernel seems to cope. We'll see. 320 minutes left on this 
>>> rebuild. I expect it will be done in the morning if all goes 
>>> according to plan. (With the 2.6.11 kernel it never survived past 
>>> about 25% rebuilt)
>>
>>
>>
>> Can you find time to try it without smartd active ?
>> You report running a uni-processor system, and I have only
>> seen PIO problems with (fast) SMP systems in my testing,
>> but I am forming the opinion that libata-PIO functions
>> are in need of a minor overhaul.

Have been unable to hit the bug with SMART disabled (Kernel unchanged). So...... Pass through (or 
SMART in particular) is not really safe on UP either, you just have to work really hard at it to hit 
the corner cases.

I can now reproduce it quite reliably with a nasty while ; do loop in bash if anyone has anything 
that want's testing.

I'm planning on burning in this box for a while before I put it into production, so I'm not adverse 
to blowing it up a little in the mean time.

Regards,
Brad
-- 
"Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so." -- Douglas Adams

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: libata oops 2.6.11-rc4 yesterdays BK
  2005-02-19  4:14           ` Brad Campbell
@ 2005-02-21  4:27             ` Brad Campbell
  2005-02-22 10:09               ` Brad Campbell
  0 siblings, 1 reply; 27+ messages in thread
From: Brad Campbell @ 2005-02-21  4:27 UTC (permalink / raw)
  To: Brad Campbell; +Cc: Andy Warner, Jeff Garzik, linux-ide

Brad Campbell wrote:

> 
> Have been unable to hit the bug with SMART disabled (Kernel unchanged). 
> So...... Pass through (or SMART in particular) is not really safe on UP 
> either, you just have to work really hard at it to hit the corner cases.

I tried it last night with the latest and greatest Vanilla BK + libata + libata-dev and really 
pounded the poor machine hard with very rude bash scripts doing SMART polling from two separate 
tasks (so randomly I would poll the same drive with two different queries simultaneously) and I 
could certainly fail drives out of the array, but no oops.

So yes, SMART is causing timeouts but no longer causing an oops here.

Now I have also replaced the PSU with a much better specced unit, so that may have also been a 
contributing factor to the oops. (The 12v rail was running on the edge).

If this rebuild this morning gets through then I'll go back to the old kernel that was oopsing and 
try it again to compare.

Regards,
Brad
-- 
"Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so." -- Douglas Adams

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Which SATA Combos To Consider?
@ 2005-02-21  9:25 linux
  0 siblings, 0 replies; 27+ messages in thread
From: linux @ 2005-02-21  9:25 UTC (permalink / raw)
  To: linux-ide; +Cc: jgarzik, linux

> In terms of hardware, AHCI (from Intel/SiS/ULi/others) and Silicon Image 
> 3124 are the best of the current generation of "FIS-based" SATA-II 
> controllers.  With these controllers, ATA controllers are __finally__ as 
> efficient as SCSI controllers have been for years.
> 
> For SATA-I controllers, I tend to feel that the Promise SATA cards 
> driven by the sata_promise driver are decent.
> 
> The rest of the SATA-I controllers all pretty much look the same, 
> hardware-wise:  Decade-old PATA controller interface with PCI 
> extensions, with further SATA extensions (SATA phy registers).

I had been looking hard at the Abit SU-2S, with the Marvell 88SX6081
on board.  Is this to say that the Marvell chip is also in the latter
category of "slow annoying PATA-like crap"?

Does anyone know an AHCI controller that can be attached to a dual opteron
motherboard?  All of the ones mentioned seem to be built into Intel
chipsets.

Thanks!

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Which SATA Combos To Consider?
  2005-02-18  0:52                             ` Jeff Garzik
@ 2005-02-21 23:50                               ` Johny Ågotnes
  2005-02-21 23:50                               ` Johny Ågotnes
  1 sibling, 0 replies; 27+ messages in thread
From: Johny Ågotnes @ 2005-02-21 23:50 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Ryan Bourgeois, Danny Cox, Linux IDE List

Jeff,

I swapped cables and tried 3 different RAM chips, so yep, did that :) 
Question on that - does the RAM 'matter' as the driver doesn't use it? 
I.e. is it used by the onboard control logic as cache?

Also tried moving the card to different PCI slots, still no luck :(

Now that server is my home server running off PATA drives, the SX4 card 
is sitting gathering dust (as I said) until I get my old PC operational 
again and can put the card in to re-test.

A few people seem to have this problem, so it is definitely worthwhile 
having a look at, I'd love to have a go just out of my own curiousity. 
Closest I've been to this kinda programming is interfacing with 
telephone switches for softphone and ACD functionality, I'm hoping 
that'll help me get a grasp in the interrupt driven space of hardware! :)

All I need is a bit of time ;)

:)J

Jeff Garzik wrote:
> On Fri, Feb 18, 2005 at 10:44:57AM +1000, Johny Ågotnes wrote:
> 
>>Just a quick note on the SX4 Card - I have seen data corruption on 
>>hard-drives too in the simplest possible setup, so I'd stay clear of 
>>that card until further notice.
>>
>>Mine is sitting in a cupboard right now, awaiting 'someone' (me, if I 
>>get some time to learn low-level kernel drivers) debugging the issue.
>>
>>Basically, accessing two drives via this card causes corruption, I had 
>>it working ok when I only accessed 1 drive, which is kinda pointless...
> 
> 
> I've not need about to reproduce this in the lab.
> 
> Have you tried switching out RAM and cables?
> 
> 	Jeff
> 
> 
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Which SATA Combos To Consider?
  2005-02-18  0:52                             ` Jeff Garzik
  2005-02-21 23:50                               ` Johny Ågotnes
@ 2005-02-21 23:50                               ` Johny Ågotnes
  2005-02-22  1:55                                 ` Johny Ågotnes
  1 sibling, 1 reply; 27+ messages in thread
From: Johny Ågotnes @ 2005-02-21 23:50 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Ryan Bourgeois, Danny Cox, Linux IDE List

Jeff,

I swapped cables and tried 3 different RAM chips, so yep, did that :) 
Question on that - does the RAM 'matter' as the driver doesn't use it? 
I.e. is it used by the onboard control logic as cache?

Also tried moving the card to different PCI slots, still no luck :(

Now that server is my home server running off PATA drives, the SX4 card 
is sitting gathering dust (as I said) until I get my old PC operational 
again and can put the card in to re-test.

A few people seem to have this problem, so it is definitely worthwhile 
having a look at, I'd love to have a go just out of my own curiousity. 
Closest I've been to this kinda programming is interfacing with 
telephone switches for softphone and ACD functionality, I'm hoping 
that'll help me get a grasp in the interrupt driven space of hardware! :)

All I need is a bit of time ;)

:)J

Jeff Garzik wrote:
> On Fri, Feb 18, 2005 at 10:44:57AM +1000, Johny Ågotnes wrote:
> 
>>Just a quick note on the SX4 Card - I have seen data corruption on 
>>hard-drives too in the simplest possible setup, so I'd stay clear of 
>>that card until further notice.
>>
>>Mine is sitting in a cupboard right now, awaiting 'someone' (me, if I 
>>get some time to learn low-level kernel drivers) debugging the issue.
>>
>>Basically, accessing two drives via this card causes corruption, I had 
>>it working ok when I only accessed 1 drive, which is kinda pointless...
> 
> 
> I've not need about to reproduce this in the lab.
> 
> Have you tried switching out RAM and cables?
> 
> 	Jeff
> 
> 
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Which SATA Combos To Consider?
  2005-02-21 23:50                               ` Johny Ågotnes
@ 2005-02-22  1:55                                 ` Johny Ågotnes
  0 siblings, 0 replies; 27+ messages in thread
From: Johny Ågotnes @ 2005-02-22  1:55 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Linux IDE List

Had a look at the libata docco and the driver, and don't worry about 
answering my question on the onboard RAM, I see that information gets 
copied;

DMA -> DIMM -> Drive
Drive -> DIMM -> DMA

So the memory is rather crucial in this matter.

I'll need to investigate all 1468 lines of code in sata_sx4.c in more 
detail to figure out what is going on further. If nothing else I'll 
submit some docco gumpf on the structure of the driver as I figure it 
out myself, will be done as a patch to the libata docco.

:)J

Johny Ågotnes wrote:
> Jeff,
> 
> I swapped cables and tried 3 different RAM chips, so yep, did that :) 
> Question on that - does the RAM 'matter' as the driver doesn't use it? 
> I.e. is it used by the onboard control logic as cache?
> 
> Also tried moving the card to different PCI slots, still no luck :(
> 
> Now that server is my home server running off PATA drives, the SX4 card 
> is sitting gathering dust (as I said) until I get my old PC operational 
> again and can put the card in to re-test.
> 
> A few people seem to have this problem, so it is definitely worthwhile 
> having a look at, I'd love to have a go just out of my own curiousity. 
> Closest I've been to this kinda programming is interfacing with 
> telephone switches for softphone and ACD functionality, I'm hoping 
> that'll help me get a grasp in the interrupt driven space of hardware! :)
> 
> All I need is a bit of time ;)
> 
> :)J
> 
> Jeff Garzik wrote:
> 
>> On Fri, Feb 18, 2005 at 10:44:57AM +1000, Johny Ågotnes wrote:
>>
>>> Just a quick note on the SX4 Card - I have seen data corruption on 
>>> hard-drives too in the simplest possible setup, so I'd stay clear of 
>>> that card until further notice.
>>>
>>> Mine is sitting in a cupboard right now, awaiting 'someone' (me, if I 
>>> get some time to learn low-level kernel drivers) debugging the issue.
>>>
>>> Basically, accessing two drives via this card causes corruption, I 
>>> had it working ok when I only accessed 1 drive, which is kinda 
>>> pointless...
>>
>>
>>
>> I've not need about to reproduce this in the lab.
>>
>> Have you tried switching out RAM and cables?
>>
>>     Jeff
>>
>>
>>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ide" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: libata oops 2.6.11-rc4 yesterdays BK
  2005-02-21  4:27             ` Brad Campbell
@ 2005-02-22 10:09               ` Brad Campbell
  0 siblings, 0 replies; 27+ messages in thread
From: Brad Campbell @ 2005-02-22 10:09 UTC (permalink / raw)
  To: linux-ide; +Cc: Andy Warner, Jeff Garzik

Brad Campbell wrote:
> Now I have also replaced the PSU with a much better specced unit, so 
> that may have also been a contributing factor to the oops. (The 12v rail 
> was running on the edge).
> 
> If this rebuild this morning gets through then I'll go back to the old 
> kernel that was oopsing and try it again to compare.
> 

Ok, I have beat on all the kernels I have available. 2.6.10-bk10, 2.6.11-rc4-something, 
2.6.11-rc4-bk6 and 2.6.11-bk8 all with the libata-dev patches of the time, and none of them oops 
now. It *must* have been a flaky PSU.
SMART polling does cause the occasional drive timeout, causing the drive to be booted from the 
array, but nothing fatal and I have to be seriously hammering the disks with both I/O and SMART 
requests to get it to occur.

Sorry for the noise.

Regards,
Brad
-- 
"Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so." -- Douglas Adams

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2005-02-22 10:09 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-02-16  4:28 libata oops 2.6.11-rc4 yesterdays BK Brad Campbell
2005-02-16 11:01 ` Brad Campbell
2005-02-16 17:25   ` Jeff Garzik
2005-02-16 20:54     ` Brad Campbell
2005-02-16 21:40       ` Andy Warner
2005-02-16 22:47         ` Jeff Garzik
2005-02-16 23:49           ` Andy Warner
2005-02-16 23:58             ` Jeff Garzik
2005-02-17  0:20               ` Andy Warner
2005-02-17  5:08                 ` Jeff Garzik
2005-02-17 14:59                   ` Andy Warner
2005-02-17 19:13                     ` Jeff Garzik
2005-02-17 19:25                       ` Andy Warner
2005-02-17 22:36                         ` Jeff Garzik
2005-02-17 19:42                       ` Which SATA Combos To Consider? Danny Cox
2005-02-17 20:55                         ` Jeff Garzik
2005-02-18  0:25                         ` Ryan Bourgeois
2005-02-18  0:44                           ` Johny Ågotnes
2005-02-18  0:52                             ` Jeff Garzik
2005-02-21 23:50                               ` Johny Ågotnes
2005-02-21 23:50                               ` Johny Ågotnes
2005-02-22  1:55                                 ` Johny Ågotnes
2005-02-18  6:13         ` libata oops 2.6.11-rc4 yesterdays BK Brad Campbell
2005-02-19  4:14           ` Brad Campbell
2005-02-21  4:27             ` Brad Campbell
2005-02-22 10:09               ` Brad Campbell
  -- strict thread matches above, loose matches on Subject: below --
2005-02-21  9:25 Which SATA Combos To Consider? linux

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).