* aic79xx blowups in 2.6.8-1.521smp (RHAT)
@ 2004-10-04 12:24 Bryce as root
2004-10-04 14:00 ` James Bottomley
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Bryce as root @ 2004-10-04 12:24 UTC (permalink / raw)
To: linux-scsi
<sigh> where to start,..
I have a system which has an onboard aic79xx controller
lspci -vv :
03:0a.0 SCSI storage controller: Adaptec AIC-7902 U320 (rev 03)
Subsystem: Adaptec: Unknown device ffff
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 32 (10000ns min, 6250ns max), Cache Line Size 08
Interrupt: pin A routed to IRQ 217
Region 0: I/O ports at 7c00 [disabled]
Region 1: Memory at f3008000 (64-bit, non-prefetchable) [size=8K]
Region 3: I/O ports at 8000 [disabled] [size=256]
Capabilities: [dc] Power Management version 1
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [a0] Message Signalled Interrupts: 64bit+ Queue=0/1 Enable-
Address: 0000000000000000 Data: 0000
Capabilities: [94] PCI-X non-bridge device.
Command: DPERE- ERO+ RBC=0 OST=4
Status: Bus=255 Dev=31 Func=0 64bit+ 133MHz+ SCD- USC-, DC=simple, DMMRBC=0, DMOST=4, DMCRS=1, RSCEM-
This is connected by certified U320 cable to a 15K U320 Seagate 37Gb drive
SCSI subsystem initialized
ACPI: PCI interrupt 0000:03:0a.0[A] -> GSI 22 (level, low) -> IRQ 217
ACPI: PCI interrupt 0000:03:0a.1[B] -> GSI 19 (level, low) -> IRQ 177
scsi0 : Adaptec AIC79XX PCI-X SCSI HBA DRIVER, Rev 1.3.11
<Adaptec AIC7902 Ultra320 SCSI adapter>
aic7902: Ultra320 Wide Channel A, SCSI Id=7, PCI 33 or 66Mhz, 512 SCBs
(scsi0:A:0): 320.000MB/s transfers (160.000MHz DT|IU|QAS, 16bit)
Vendor: SEAGATE Model: ST336753LW Rev: 0006
Type: Direct-Access ANSI SCSI revision: 03
scsi0:A:0:0: Tagged Queuing enabled. Depth 4
SCSI device sda: 71687372 512-byte hdwr sectors (36704 MB)
SCSI device sda: drive cache: write back
sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 >
Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
scsi1 : Adaptec AIC79XX PCI-X SCSI HBA DRIVER, Rev 1.3.11
<Adaptec AIC7902 Ultra320 SCSI adapter>
aic7902: Ultra320 Wide Channel B, SCSI Id=7, PCI 33 or 66Mhz, 512 SCBs
Ok, to the problem itself. I've found that the system will choke badly after
anything from a few days to a couple of weeks, always over a LQI Phase error.
I've asked around to help work out what the LQI Phase error is about and
I still don't understand it.
kernel dmesg dump :
Reseting Channel for LQI Phase error
>>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<<
scsi0: Dumping Card State at program address 0x8 Mode 0x33
Card was paused
HS_MAILBOX[0x0] INTCTL[0x80] SEQINTSTAT[0x0] SAVED_MODE[0x11]
DFFSTAT[0x11] SCSISIGI[0x0] SCSIPHASE[0x0] SCSIBUS[0x0]
LASTPHASE[0x1] SCSISEQ0[0x0] SCSISEQ1[0x12] SEQCTL0[0x10]
SEQINTCTL[0x0] SEQ_FLAGS[0x0] SEQ_FLAGS2[0x0] SSTAT0[0x0]
SSTAT1[0x9] SSTAT2[0x0] SSTAT3[0x0] PERRDIAG[0x0]
SIMODE1[0xa4] LQISTAT0[0x0] LQISTAT1[0x0] LQISTAT2[0x0]
LQOSTAT0[0x0] LQOSTAT1[0x0] LQOSTAT2[0x1]
SCB Count = 4 CMDS_PENDING = 2 LASTSCB 0x1 CURRSCB 0x0 NEXTSCB 0xff00
qinstart = 40786 qinfifonext = 40786
QINFIFO:
WAITING_TID_QUEUES:
Pending list:
0 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
1 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Total 2
Kernel Free SCB list: 2 3
Sequencer Complete DMA-inprog list:
Sequencer Complete list:
Sequencer DMA-Up and Complete list:
scsi0: FIFO0 Free, LONGJMP == 0x80ff, SCB 0x1
SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x0] DFSTATUS[0x89]
SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0]
SOFFCNT[0x66] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0
HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x88]
scsi0: FIFO1 Active, LONGJMP == 0x257, SCB 0x1
SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x8] DFSTATUS[0x0]
SG_CACHE_SHADOW[0x30] SG_STATE[0x6] DFFSXFRCTL[0x0]
SOFFCNT[0x66] MDFFSTAT[0xa] SHADDR = 0x06c420c04, SHCNT = 0x3fc
HADDR = 0x06c420ce0, HCNT = 0x320 CCSGCTL[0x98]
LQIN: 0x4 0x0 0x0 0x1 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x32 0x0 0x0 0x0 0
x2 0x0
scsi0: LQISTATE = 0x0, LQOSTATE = 0x0, OPTIONMODE = 0x42
scsi0: OS_SPACE_CNT = 0x20 MAXCMDCNT = 0x2
SIMODE0[0xc]
CCSCBCTL[0x4]
scsi0: REG0 == 0x3, SINDEX = 0x133, DINDEX = 0x102
scsi0: SCBPTR == 0x0, SCB_NEXT == 0xff00, SCB_NEXT2 == 0xff11
CDB 28 0 1 93 e9 d6
STACK: 0x125 0x0 0x0 0x257 0x240 0x93 0x29 0x1
<<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>>
DevQ(0:0:0): 0 waiting
SCSI error : <0 0 0 0> return code = 0x10000
end_request: I/O error, dev sda, sector 26470870
SCSI error : <0 0 0 0> return code = 0x10000
end_request: I/O error, dev sda, sector 26470878
SCSI error : <0 0 0 0> return code = 0x10000
end_request: I/O error, dev sda, sector 26470886
SCSI error : <0 0 0 0> return code = 0x10000
end_request: I/O error, dev sda, sector 26470894
SCSI error : <0 0 0 0> return code = 0x8000002
Info fld=0x0, Current sda: sense key Aborted Command
end_request: I/O error, dev sda, sector 26470766
SCSI error : <0 0 0 0> return code = 0x8000002
Info fld=0x0, Current sda: sense key Aborted Command
end_request: I/O error, dev sda, sector 26470774
SCSI error : <0 0 0 0> return code = 0x8000002
Info fld=0x0, Current sda: sense key Aborted Command
end_request: I/O error, dev sda, sector 51448252
Buffer I/O error on device sda7, logical block 11048
lost page write due to I/O error on sda7
SCSI error : <0 0 0 0> return code = 0x8000002
Info fld=0x0, Current sda: sense key Aborted Command
end_request: I/O error, dev sda, sector 26470782
SCSI error : <0 0 0 0> return code = 0x8000002
Info fld=0x0, Current sda: sense key Aborted Command
end_request: I/O error, dev sda, sector 68501204
Buffer I/O error on device sda7, logical block 2142667
lost page write due to I/O error on sda7
(scsi0:A:0): 320.000MB/s transfers (160.000MHz DT|IU|QAS, 16bit)
When the system is up and running, the disk is stable and I
get a quite respectable 72MB/s max throughput.
I have other card dumps however since this isn't my area of expertise I've
not included them as they're roughly the same problem always starting off
with a "Reseting Channel for LQI Phase error" error.
I should mention that I have a sister machine with the exact same HW
that exhibits the issues and though not a 100% indication, it
would suggest the issue is more to do with the driver.
I did also try to rebuild the kernel with aic79xx-linux-2.6-20040522-tar.gz
however I've not run it long enough on the box to say that it fixes the issue
Ideas anyone? or shall I just keep banging on 2.0.12 and hope it fixes/masks
the problem?
Phil
=--=
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: aic79xx blowups in 2.6.8-1.521smp (RHAT)
2004-10-04 12:24 aic79xx blowups in 2.6.8-1.521smp (RHAT) Bryce as root
@ 2004-10-04 14:00 ` James Bottomley
2004-10-07 10:48 ` Bryce as root
2004-10-05 14:04 ` Luben Tuikov
2004-10-05 21:51 ` Luben Tuikov
2 siblings, 1 reply; 6+ messages in thread
From: James Bottomley @ 2004-10-04 14:00 UTC (permalink / raw)
To: Bryce as root; +Cc: SCSI Mailing List
On Mon, 2004-10-04 at 07:24, Bryce as root wrote:
> kernel dmesg dump :
> Reseting Channel for LQI Phase error
This is clearly the cause of all the trouble. The driver is a bit
opaque at this point, but it looks like an LQI phase error occurs
because of a phase mismatch during L_Q Information Units (These are a
requirement for fast-160 data transfers).
I think this is a strong indicator of bus instability ... could you try
falling back to fast-80 and turning off IU transfers (You'll probably
either have to use the card bios or dig around in the aic79xx driver for
the options).
James
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: aic79xx blowups in 2.6.8-1.521smp (RHAT)
2004-10-04 12:24 aic79xx blowups in 2.6.8-1.521smp (RHAT) Bryce as root
2004-10-04 14:00 ` James Bottomley
@ 2004-10-05 14:04 ` Luben Tuikov
2004-10-05 21:51 ` Luben Tuikov
2 siblings, 0 replies; 6+ messages in thread
From: Luben Tuikov @ 2004-10-05 14:04 UTC (permalink / raw)
To: Bryce as root; +Cc: linux-scsi
Is it possible that you try Revision 007 or later?
Luben
Bryce as root wrote:
> <sigh> where to start,..
>
> I have a system which has an onboard aic79xx controller
>
> lspci -vv :
> 03:0a.0 SCSI storage controller: Adaptec AIC-7902 U320 (rev 03)
> Subsystem: Adaptec: Unknown device ffff
> Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
> Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
> Latency: 32 (10000ns min, 6250ns max), Cache Line Size 08
> Interrupt: pin A routed to IRQ 217
> Region 0: I/O ports at 7c00 [disabled]
> Region 1: Memory at f3008000 (64-bit, non-prefetchable) [size=8K]
> Region 3: I/O ports at 8000 [disabled] [size=256]
> Capabilities: [dc] Power Management version 1
> Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
> Status: D0 PME-Enable- DSel=0 DScale=0 PME-
> Capabilities: [a0] Message Signalled Interrupts: 64bit+ Queue=0/1 Enable-
> Address: 0000000000000000 Data: 0000
> Capabilities: [94] PCI-X non-bridge device.
> Command: DPERE- ERO+ RBC=0 OST=4
> Status: Bus=255 Dev=31 Func=0 64bit+ 133MHz+ SCD- USC-, DC=simple, DMMRBC=0, DMOST=4, DMCRS=1, RSCEM-
>
> This is connected by certified U320 cable to a 15K U320 Seagate 37Gb drive
>
> SCSI subsystem initialized
> ACPI: PCI interrupt 0000:03:0a.0[A] -> GSI 22 (level, low) -> IRQ 217
> ACPI: PCI interrupt 0000:03:0a.1[B] -> GSI 19 (level, low) -> IRQ 177
> scsi0 : Adaptec AIC79XX PCI-X SCSI HBA DRIVER, Rev 1.3.11
> <Adaptec AIC7902 Ultra320 SCSI adapter>
> aic7902: Ultra320 Wide Channel A, SCSI Id=7, PCI 33 or 66Mhz, 512 SCBs
>
> (scsi0:A:0): 320.000MB/s transfers (160.000MHz DT|IU|QAS, 16bit)
> Vendor: SEAGATE Model: ST336753LW Rev: 0006
> Type: Direct-Access ANSI SCSI revision: 03
> scsi0:A:0:0: Tagged Queuing enabled. Depth 4
> SCSI device sda: 71687372 512-byte hdwr sectors (36704 MB)
> SCSI device sda: drive cache: write back
> sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 >
> Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
> scsi1 : Adaptec AIC79XX PCI-X SCSI HBA DRIVER, Rev 1.3.11
> <Adaptec AIC7902 Ultra320 SCSI adapter>
> aic7902: Ultra320 Wide Channel B, SCSI Id=7, PCI 33 or 66Mhz, 512 SCBs
>
> Ok, to the problem itself. I've found that the system will choke badly after
> anything from a few days to a couple of weeks, always over a LQI Phase error.
> I've asked around to help work out what the LQI Phase error is about and
> I still don't understand it.
>
> kernel dmesg dump :
> Reseting Channel for LQI Phase error
>
>>>>>>>>>>>>>>>>>>>Dump Card State Begins <<<<<<<<<<<<<<<<<
>
> scsi0: Dumping Card State at program address 0x8 Mode 0x33
> Card was paused
> HS_MAILBOX[0x0] INTCTL[0x80] SEQINTSTAT[0x0] SAVED_MODE[0x11]
> DFFSTAT[0x11] SCSISIGI[0x0] SCSIPHASE[0x0] SCSIBUS[0x0]
> LASTPHASE[0x1] SCSISEQ0[0x0] SCSISEQ1[0x12] SEQCTL0[0x10]
> SEQINTCTL[0x0] SEQ_FLAGS[0x0] SEQ_FLAGS2[0x0] SSTAT0[0x0]
> SSTAT1[0x9] SSTAT2[0x0] SSTAT3[0x0] PERRDIAG[0x0]
> SIMODE1[0xa4] LQISTAT0[0x0] LQISTAT1[0x0] LQISTAT2[0x0]
> LQOSTAT0[0x0] LQOSTAT1[0x0] LQOSTAT2[0x1]
>
> SCB Count = 4 CMDS_PENDING = 2 LASTSCB 0x1 CURRSCB 0x0 NEXTSCB 0xff00
> qinstart = 40786 qinfifonext = 40786
> QINFIFO:
> WAITING_TID_QUEUES:
> Pending list:
> 0 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
> 1 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
> Total 2
> Kernel Free SCB list: 2 3
> Sequencer Complete DMA-inprog list:
> Sequencer Complete list:
> Sequencer DMA-Up and Complete list:
>
> scsi0: FIFO0 Free, LONGJMP == 0x80ff, SCB 0x1
> SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x0] DFSTATUS[0x89]
> SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0]
> SOFFCNT[0x66] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0
> HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x88]
> scsi0: FIFO1 Active, LONGJMP == 0x257, SCB 0x1
> SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x8] DFSTATUS[0x0]
> SG_CACHE_SHADOW[0x30] SG_STATE[0x6] DFFSXFRCTL[0x0]
> SOFFCNT[0x66] MDFFSTAT[0xa] SHADDR = 0x06c420c04, SHCNT = 0x3fc
> HADDR = 0x06c420ce0, HCNT = 0x320 CCSGCTL[0x98]
> LQIN: 0x4 0x0 0x0 0x1 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x32 0x0 0x0 0x0 0
> x2 0x0
> scsi0: LQISTATE = 0x0, LQOSTATE = 0x0, OPTIONMODE = 0x42
> scsi0: OS_SPACE_CNT = 0x20 MAXCMDCNT = 0x2
> SIMODE0[0xc]
> CCSCBCTL[0x4]
> scsi0: REG0 == 0x3, SINDEX = 0x133, DINDEX = 0x102
> scsi0: SCBPTR == 0x0, SCB_NEXT == 0xff00, SCB_NEXT2 == 0xff11
> CDB 28 0 1 93 e9 d6
> STACK: 0x125 0x0 0x0 0x257 0x240 0x93 0x29 0x1
> <<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>>
> DevQ(0:0:0): 0 waiting
> SCSI error : <0 0 0 0> return code = 0x10000
> end_request: I/O error, dev sda, sector 26470870
> SCSI error : <0 0 0 0> return code = 0x10000
> end_request: I/O error, dev sda, sector 26470878
> SCSI error : <0 0 0 0> return code = 0x10000
> end_request: I/O error, dev sda, sector 26470886
> SCSI error : <0 0 0 0> return code = 0x10000
> end_request: I/O error, dev sda, sector 26470894
> SCSI error : <0 0 0 0> return code = 0x8000002
> Info fld=0x0, Current sda: sense key Aborted Command
> end_request: I/O error, dev sda, sector 26470766
> SCSI error : <0 0 0 0> return code = 0x8000002
> Info fld=0x0, Current sda: sense key Aborted Command
> end_request: I/O error, dev sda, sector 26470774
> SCSI error : <0 0 0 0> return code = 0x8000002
> Info fld=0x0, Current sda: sense key Aborted Command
> end_request: I/O error, dev sda, sector 51448252
> Buffer I/O error on device sda7, logical block 11048
> lost page write due to I/O error on sda7
> SCSI error : <0 0 0 0> return code = 0x8000002
> Info fld=0x0, Current sda: sense key Aborted Command
> end_request: I/O error, dev sda, sector 26470782
> SCSI error : <0 0 0 0> return code = 0x8000002
> Info fld=0x0, Current sda: sense key Aborted Command
> end_request: I/O error, dev sda, sector 68501204
> Buffer I/O error on device sda7, logical block 2142667
> lost page write due to I/O error on sda7
> (scsi0:A:0): 320.000MB/s transfers (160.000MHz DT|IU|QAS, 16bit)
>
>
> When the system is up and running, the disk is stable and I
> get a quite respectable 72MB/s max throughput.
>
> I have other card dumps however since this isn't my area of expertise I've
> not included them as they're roughly the same problem always starting off
> with a "Reseting Channel for LQI Phase error" error.
>
> I should mention that I have a sister machine with the exact same HW
> that exhibits the issues and though not a 100% indication, it
> would suggest the issue is more to do with the driver.
>
> I did also try to rebuild the kernel with aic79xx-linux-2.6-20040522-tar.gz
> however I've not run it long enough on the box to say that it fixes the issue
>
> Ideas anyone? or shall I just keep banging on 2.0.12 and hope it fixes/masks
> the problem?
>
> Phil
> =--=
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: aic79xx blowups in 2.6.8-1.521smp (RHAT)
2004-10-04 12:24 aic79xx blowups in 2.6.8-1.521smp (RHAT) Bryce as root
2004-10-04 14:00 ` James Bottomley
2004-10-05 14:04 ` Luben Tuikov
@ 2004-10-05 21:51 ` Luben Tuikov
2 siblings, 0 replies; 6+ messages in thread
From: Luben Tuikov @ 2004-10-05 21:51 UTC (permalink / raw)
To: Bryce as root; +Cc: linux-scsi
Bryce as root wrote:
> <sigh> where to start,..
>
> I have a system which has an onboard aic79xx controller
>
> lspci -vv :
> 03:0a.0 SCSI storage controller: Adaptec AIC-7902 U320 (rev 03)
> Subsystem: Adaptec: Unknown device ffff
> Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> ParErr- Stepping- SERR- FastB2B-
> Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
> <TAbort- <MAbort- >SERR- <PERR-
> Latency: 32 (10000ns min, 6250ns max), Cache Line Size 08
> Interrupt: pin A routed to IRQ 217
> Region 0: I/O ports at 7c00 [disabled]
> Region 1: Memory at f3008000 (64-bit, non-prefetchable) [size=8K]
> Region 3: I/O ports at 8000 [disabled] [size=256]
> Capabilities: [dc] Power Management version 1
> Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
> PME(D0-,D1-,D2-,D3hot-,D3cold-)
> Status: D0 PME-Enable- DSel=0 DScale=0 PME-
> Capabilities: [a0] Message Signalled Interrupts: 64bit+
> Queue=0/1 Enable-
> Address: 0000000000000000 Data: 0000
> Capabilities: [94] PCI-X non-bridge device.
> Command: DPERE- ERO+ RBC=0 OST=4
> Status: Bus=255 Dev=31 Func=0 64bit+ 133MHz+ SCD- USC-,
> DC=simple, DMMRBC=0, DMOST=4, DMCRS=1, RSCEM-
>
> This is connected by certified U320 cable to a 15K U320 Seagate 37Gb drive
>
> SCSI subsystem initialized
> ACPI: PCI interrupt 0000:03:0a.0[A] -> GSI 22 (level, low) -> IRQ 217
> ACPI: PCI interrupt 0000:03:0a.1[B] -> GSI 19 (level, low) -> IRQ 177
> scsi0 : Adaptec AIC79XX PCI-X SCSI HBA DRIVER, Rev 1.3.11
> <Adaptec AIC7902 Ultra320 SCSI adapter>
> aic7902: Ultra320 Wide Channel A, SCSI Id=7, PCI 33 or 66Mhz,
> 512 SCBs
>
> (scsi0:A:0): 320.000MB/s transfers (160.000MHz DT|IU|QAS, 16bit)
> Vendor: SEAGATE Model: ST336753LW Rev: 0006
> Type: Direct-Access ANSI SCSI revision: 03
> scsi0:A:0:0: Tagged Queuing enabled. Depth 4
> SCSI device sda: 71687372 512-byte hdwr sectors (36704 MB)
> SCSI device sda: drive cache: write back
> sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 >
> Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
> scsi1 : Adaptec AIC79XX PCI-X SCSI HBA DRIVER, Rev 1.3.11
> <Adaptec AIC7902 Ultra320 SCSI adapter>
> aic7902: Ultra320 Wide Channel B, SCSI Id=7, PCI 33 or 66Mhz,
> 512 SCBs
>
> Ok, to the problem itself. I've found that the system will choke badly
> after
> anything from a few days to a couple of weeks, always over a LQI Phase
> error.
> I've asked around to help work out what the LQI Phase error is about and
> I still don't understand it.
There was an "illegal" phase change while receiving an LQI packet,
since you're at 320, and have parallel information units on.
The card dump state isn't complete as some registers were cleared
when the card was paused and gotten out of critial region, but
it looks like there was a READ(10) and the chip was just about
to transfer some data to host memory.
If the failure can be reproduced consistently, a SCSI Bus trace
would narrow down where the problem is.
Luben
> kernel dmesg dump :
> Reseting Channel for LQI Phase error
> >>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<<
> scsi0: Dumping Card State at program address 0x8 Mode 0x33
> Card was paused
> HS_MAILBOX[0x0] INTCTL[0x80] SEQINTSTAT[0x0] SAVED_MODE[0x11]
> DFFSTAT[0x11] SCSISIGI[0x0] SCSIPHASE[0x0] SCSIBUS[0x0]
> LASTPHASE[0x1] SCSISEQ0[0x0] SCSISEQ1[0x12] SEQCTL0[0x10]
> SEQINTCTL[0x0] SEQ_FLAGS[0x0] SEQ_FLAGS2[0x0] SSTAT0[0x0]
> SSTAT1[0x9] SSTAT2[0x0] SSTAT3[0x0] PERRDIAG[0x0]
> SIMODE1[0xa4] LQISTAT0[0x0] LQISTAT1[0x0] LQISTAT2[0x0]
> LQOSTAT0[0x0] LQOSTAT1[0x0] LQOSTAT2[0x1]
>
> SCB Count = 4 CMDS_PENDING = 2 LASTSCB 0x1 CURRSCB 0x0 NEXTSCB 0xff00
> qinstart = 40786 qinfifonext = 40786
> QINFIFO:
> WAITING_TID_QUEUES:
> Pending list:
> 0 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
> 1 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
> Total 2
> Kernel Free SCB list: 2 3
> Sequencer Complete DMA-inprog list:
> Sequencer Complete list:
> Sequencer DMA-Up and Complete list:
>
> scsi0: FIFO0 Free, LONGJMP == 0x80ff, SCB 0x1
> SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x0] DFSTATUS[0x89]
> SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0]
> SOFFCNT[0x66] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0
> HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x88]
> scsi0: FIFO1 Active, LONGJMP == 0x257, SCB 0x1
> SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x8] DFSTATUS[0x0]
> SG_CACHE_SHADOW[0x30] SG_STATE[0x6] DFFSXFRCTL[0x0]
> SOFFCNT[0x66] MDFFSTAT[0xa] SHADDR = 0x06c420c04, SHCNT = 0x3fc
> HADDR = 0x06c420ce0, HCNT = 0x320 CCSGCTL[0x98]
> LQIN: 0x4 0x0 0x0 0x1 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x32 0x0
> 0x0 0x0 0
> x2 0x0
> scsi0: LQISTATE = 0x0, LQOSTATE = 0x0, OPTIONMODE = 0x42
> scsi0: OS_SPACE_CNT = 0x20 MAXCMDCNT = 0x2
> SIMODE0[0xc]
> CCSCBCTL[0x4]
> scsi0: REG0 == 0x3, SINDEX = 0x133, DINDEX = 0x102
> scsi0: SCBPTR == 0x0, SCB_NEXT == 0xff00, SCB_NEXT2 == 0xff11
> CDB 28 0 1 93 e9 d6
> STACK: 0x125 0x0 0x0 0x257 0x240 0x93 0x29 0x1
> <<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>>
> DevQ(0:0:0): 0 waiting
> SCSI error : <0 0 0 0> return code = 0x10000
> end_request: I/O error, dev sda, sector 26470870
> SCSI error : <0 0 0 0> return code = 0x10000
> end_request: I/O error, dev sda, sector 26470878
> SCSI error : <0 0 0 0> return code = 0x10000
> end_request: I/O error, dev sda, sector 26470886
> SCSI error : <0 0 0 0> return code = 0x10000
> end_request: I/O error, dev sda, sector 26470894
> SCSI error : <0 0 0 0> return code = 0x8000002
> Info fld=0x0, Current sda: sense key Aborted Command
> end_request: I/O error, dev sda, sector 26470766
> SCSI error : <0 0 0 0> return code = 0x8000002
> Info fld=0x0, Current sda: sense key Aborted Command
> end_request: I/O error, dev sda, sector 26470774
> SCSI error : <0 0 0 0> return code = 0x8000002
> Info fld=0x0, Current sda: sense key Aborted Command
> end_request: I/O error, dev sda, sector 51448252
> Buffer I/O error on device sda7, logical block 11048
> lost page write due to I/O error on sda7
> SCSI error : <0 0 0 0> return code = 0x8000002
> Info fld=0x0, Current sda: sense key Aborted Command
> end_request: I/O error, dev sda, sector 26470782
> SCSI error : <0 0 0 0> return code = 0x8000002
> Info fld=0x0, Current sda: sense key Aborted Command
> end_request: I/O error, dev sda, sector 68501204
> Buffer I/O error on device sda7, logical block 2142667
> lost page write due to I/O error on sda7
> (scsi0:A:0): 320.000MB/s transfers (160.000MHz DT|IU|QAS, 16bit)
>
>
> When the system is up and running, the disk is stable and I
> get a quite respectable 72MB/s max throughput.
>
> I have other card dumps however since this isn't my area of expertise I've
> not included them as they're roughly the same problem always starting off
> with a "Reseting Channel for LQI Phase error" error.
>
> I should mention that I have a sister machine with the exact same HW
> that exhibits the issues and though not a 100% indication, it
> would suggest the issue is more to do with the driver.
>
> I did also try to rebuild the kernel with aic79xx-linux-2.6-20040522-tar.gz
> however I've not run it long enough on the box to say that it fixes the
> issue
>
> Ideas anyone? or shall I just keep banging on 2.0.12 and hope it
> fixes/masks
> the problem?
>
> Phil
> =--=
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: aic79xx blowups in 2.6.8-1.521smp (RHAT)
2004-10-04 14:00 ` James Bottomley
@ 2004-10-07 10:48 ` Bryce as root
2004-10-07 15:54 ` Luben Tuikov
0 siblings, 1 reply; 6+ messages in thread
From: Bryce as root @ 2004-10-07 10:48 UTC (permalink / raw)
To: linux-scsi
Detail preamble:
Linux ZenIV.linux.org.uk 2.6.8-1.521smp #1 SMP Mon Aug 16 09:25:06 EDT 2004 i686 i686 i386 GNU/Linux
SCSI subsystem initialized
ACPI: PCI interrupt 0000:03:0a.0[A] -> GSI 22 (level, low) -> IRQ 217
ACPI: PCI interrupt 0000:03:0a.1[B] -> GSI 19 (level, low) -> IRQ 177
scsi0 : Adaptec AIC79XX PCI-X SCSI HBA DRIVER, Rev 1.3.11
<Adaptec AIC7902 Ultra320 SCSI adapter>
aic7902: Ultra320 Wide Channel A, SCSI Id=7, PCI 33 or 66Mhz, 512 SCBs
(scsi0:A:0): 80.000MB/s transfers (40.000MHz DT, 16bit)
Vendor: SEAGATE Model: ST336753LW Rev: 0006
Type: Direct-Access ANSI SCSI revision: 03
scsi0:A:0:0: Tagged Queuing enabled. Depth 4
SCSI device sda: 71687372 512-byte hdwr sectors (36704 MB)
SCSI device sda: drive cache: write back
sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 >
Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
Mutterings:
Well I set the BIOS down to 160 and turned off Packeting and QAS and let
that run for a day
( http://ftp.linux.org.uk/~bryce/scsi-bios.gif )
Unfortunately the driver has blown up in a different way now.
I'm at a bit of a loss as to whats going on as the disk verifies
fine from the adaptec bios utils
I've now set the speed to 80 so we'll see how this goes though it's a shame
to loose the performance as a result in the drop (was 71MB/s now 61MB/s)
Phil
=--=
Logfile:
04:04:18 (scsi0:A:0:0): Unexpected busfree in DT Data-in phase, 1 SCBs aborted, PRGMCNT == 0x2e
04:04:18 >>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<<
04:04:18 scsi0: Dumping Card State at program address 0x2c Mode 0x22
04:04:18 Card was paused
04:04:18 HS_MAILBOX[0x0] INTCTL[0x80] SEQINTSTAT[0x0] SAVED_MODE[0x11]
04:04:18 DFFSTAT[0x11] SCSISIGI[0x0] SCSIPHASE[0x0] SCSIBUS[0x0]
04:04:18 LASTPHASE[0x60] SCSISEQ0[0x0] SCSISEQ1[0x12] SEQCTL0[0x10]
04:04:18 SEQINTCTL[0x0] SEQ_FLAGS[0x20] SEQ_FLAGS2[0x0] SSTAT0[0x0]
04:04:18 SSTAT1[0x9] SSTAT2[0xc0] SSTAT3[0x0] PERRDIAG[0x1]
04:04:18 SIMODE1[0xac] LQISTAT0[0x0] LQISTAT1[0x0] LQISTAT2[0x0]
04:04:19 LQOSTAT0[0x0] LQOSTAT1[0x0] LQOSTAT2[0x0]
04:04:19
04:04:19 SCB Count = 4 CMDS_PENDING = 1 LASTSCB 0x3 CURRSCB 0x3 NEXTSCB 0x0
04:04:19 qinstart = 24363 qinfifonext = 24363
04:04:20 QINFIFO:
04:04:20 WAITING_TID_QUEUES:
04:04:20 Pending list:
04:04:20 Total 0
04:04:20 Kernel Free SCB list: 3 0 1 2
04:04:20 Sequencer Complete DMA-inprog list:
04:04:20 Sequencer Complete list:
04:04:20 Sequencer DMA-Up and Complete list:
04:04:20
04:04:20 scsi0: FIFO0 Free, LONGJMP == 0x80ff, SCB 0x0
04:04:20 SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x0] DFSTATUS[0x89]
04:04:21 SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0]
04:04:21 SOFFCNT[0x21] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0
04:04:21 HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x0]
04:04:21 scsi0: FIFO1 Active, LONGJMP == 0x1ec, SCB 0x3
04:04:21 SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x8] DFSTATUS[0x1]
04:04:21 SG_CACHE_SHADOW[0x28] SG_STATE[0x3] DFFSXFRCTL[0x0]
04:04:21 SOFFCNT[0x21] MDFFSTAT[0xc] SHADDR = 0x0318ebe5e, SHCNT = 0x1a2
04:04:21 HADDR = 0x0318ebec2, HCNT = 0x13e CCSGCTL[0x10]
04:04:21 LQIN: 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0
04:04:21 scsi0: LQISTATE = 0x0, LQOSTATE = 0x0, OPTIONMODE = 0x42
04:04:21 scsi0: OS_SPACE_CNT = 0x20 MAXCMDCNT = 0x0
04:04:21 SIMODE0[0xc]
04:04:21 CCSCBCTL[0x4]
04:04:21 scsi0: REG0 == 0x3, SINDEX = 0x122, DINDEX = 0xa9
04:04:21 scsi0: SCBPTR == 0xff03, SCB_NEXT == 0xff00, SCB_NEXT2 == 0x0
04:04:21 CDB 3 1 0 0 0 0
04:04:22 STACK: 0x206 0x0 0x0 0x0 0x0 0x0 0x0 0x29
04:04:22 <<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>>
04:04:22 DevQ(0:0:0): 0 waiting
04:04:22 SCSI error : <0 0 0 0> return code = 0x10000
04:04:22 end_request: I/O error, dev sda, sector 4239969
04:04:22 SCSI error : <0 0 0 0> return code = 0x10000
04:04:22 end_request: I/O error, dev sda, sector 4239977
04:04:22 SCSI error : <0 0 0 0> return code = 0x10000
04:04:22 end_request: I/O error, dev sda, sector 4239985
04:04:22 SCSI error : <0 0 0 0> return code = 0x10000
04:04:22 end_request: I/O error, dev sda, sector 4239993
04:04:22 SCSI error : <0 0 0 0> return code = 0x10000
04:04:22 end_request: I/O error, dev sda, sector 4240001
04:04:22 (scsi0:A:0:0): No or incomplete CDB sent to device.
04:04:23 scsi0: Issued Channel A Bus Reset. 1 SCBs aborted
04:04:23 SCSI error : <0 0 0 0> return code = 0x8000002
04:04:23 Info fld=0x0, Current sda: sense key Aborted Command
04:04:23 end_request: I/O error, dev sda, sector 4240009
04:04:23 SCSI error : <0 0 0 0> return code = 0x8000002
04:04:23 Info fld=0x0, Current sda: sense key Aborted Command
04:04:23 end_request: I/O error, dev sda, sector 4240017
04:04:23 SCSI error : <0 0 0 0> return code = 0x8000002
04:04:23 Info fld=0x0, Current sda: sense key Aborted Command
04:04:23 end_request: I/O error, dev sda, sector 68860388
04:04:23 Buffer I/O error on device sda7, logical block 2187565
04:04:23 lost page write due to I/O error on sda7
04:04:23 SCSI error : <0 0 0 0> return code = 0x8000002
04:04:23 Info fld=0x0, Current sda: sense key Aborted Command
04:04:23 end_request: I/O error, dev sda, sector 68860396
04:04:25 end_request: I/O error, dev sda, sector 4240041
04:04:25 (scsi0:A:0): 160.000MB/s transfers (80.000MHz DT, 16bit)
04:04:25 (scsi0:A:0:0): Unexpected busfree in DT Data-in phase, 1 SCBs aborted, PRGMCNT == 0x97
04:04:25 >>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<<
04:04:25 scsi0: Dumping Card State at program address 0x95 Mode 0x22
04:04:25 Card was paused
04:04:25 HS_MAILBOX[0x0] INTCTL[0x80] SEQINTSTAT[0x0] SAVED_MODE[0x11]
04:04:25 DFFSTAT[0x11] SCSISIGI[0x0] SCSIPHASE[0x0] SCSIBUS[0x0]
04:04:25 LASTPHASE[0x60] SCSISEQ0[0x0] SCSISEQ1[0x12] SEQCTL0[0x10]
04:04:25 SEQINTCTL[0x80] SEQ_FLAGS[0x20] SEQ_FLAGS2[0x0] SSTAT0[0x0]
04:04:25 SSTAT1[0x9] SSTAT2[0xc0] SSTAT3[0x0] PERRDIAG[0x1]
04:04:25 SIMODE1[0xac] LQISTAT0[0x0] LQISTAT1[0x0] LQISTAT2[0x0]
04:04:26 LQOSTAT0[0x0] LQOSTAT1[0x0] LQOSTAT2[0x0]
04:04:26
04:04:26 SCB Count = 4 CMDS_PENDING = 1 LASTSCB 0x1 CURRSCB 0x1 NEXTSCB 0x0
04:04:26 qinstart = 87 qinfifonext = 87
04:04:26 QINFIFO:
04:04:26 WAITING_TID_QUEUES:
04:04:26 Pending list:
04:04:26 Total 0
04:04:26 Kernel Free SCB list: 1 0 3 2
04:04:26 Sequencer Complete DMA-inprog list:
04:04:26 Sequencer Complete list:
04:04:26 Sequencer DMA-Up and Complete list:
04:04:26
04:04:26 scsi0: FIFO0 Free, LONGJMP == 0x80ff, SCB 0x0
04:04:26 SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x0] DFSTATUS[0x89]
04:04:26 SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0]
04:04:26 SOFFCNT[0x1d] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0
04:04:27 HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x0]
04:04:27 scsi0: FIFO1 Active, LONGJMP == 0x1ec, SCB 0x1
04:04:27 SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x8] DFSTATUS[0x81]
04:04:27 SG_CACHE_SHADOW[0x20] SG_STATE[0x3] DFFSXFRCTL[0x0]
04:04:27 SOFFCNT[0x1d] MDFFSTAT[0xc] SHADDR = 0x05a68f1ac, SHCNT = 0xe54
04:04:27 HADDR = 0x05a68f20a, HCNT = 0xdf6 CCSGCTL[0x10]
04:04:27 LQIN: 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0
04:04:27 scsi0: LQISTATE = 0x0, LQOSTATE = 0x0, OPTIONMODE = 0x42
04:04:27 scsi0: OS_SPACE_CNT = 0x20 MAXCMDCNT = 0x0
04:04:27 SIMODE0[0xc]
04:04:27 CCSCBCTL[0x4]
04:04:27 scsi0: REG0 == 0x1, SINDEX = 0x122, DINDEX = 0x1ba
04:04:27 scsi0: SCBPTR == 0xff01, SCB_NEXT == 0xff00, SCB_NEXT2 == 0x0
04:04:27 CDB 1 1 0 0 0 0
04:04:27 STACK: 0x29 0x206 0x0 0x0 0x0 0x0 0x0 0x0
04:04:27 <<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>>
04:04:28 DevQ(0:0:0): 0 waiting
04:04:28 SCSI error : <0 0 0 0> return code = 0x10000
04:04:28 end_request: I/O error, dev sda, sector 4239913
04:04:28 SCSI error : <0 0 0 0> return code = 0x10000
04:04:28 (scsi0:A:0:0): No or incomplete CDB sent to device.
04:04:28 scsi0: Issued Channel A Bus Reset. 1 SCBs aborted
04:04:28 (scsi0:A:0): 80.000MB/s transfers (40.000MHz DT, 16bit)
04:04:28 SCSI error : <0 0 0 0> return code = 0x8000002
04:04:29 Info fld=0x0, Current sda: sense key Aborted Command
04:04:29 end_request: I/O error, dev sda, sector 4239953
04:04:29 SCSI error : <0 0 0 0> return code = 0x8000002
04:04:29 Info fld=0x0, Current sda: sense key Aborted Command
04:04:29 end_request: I/O error, dev sda, sector 51398828
04:04:29 Buffer I/O error on device sda7, logical block 4870
04:04:29 lost page write due to I/O error on sda7
04:04:29 Aborting journal on device sda7.
04:04:29 journal commit I/O error
04:04:29 ext3_abort called.
04:04:29 EXT3-fs abort (device sda7): ext3_journal_start: Detected aborted journal
04:04:29 Remounting filesystem read-only
>
> On Mon, 2004-10-04 at 07:24, Bryce as root wrote:
> > kernel dmesg dump :
> > Reseting Channel for LQI Phase error
>
> This is clearly the cause of all the trouble. The driver is a bit
> opaque at this point, but it looks like an LQI phase error occurs
> because of a phase mismatch during L_Q Information Units (These are a
> requirement for fast-160 data transfers).
>
> I think this is a strong indicator of bus instability ... could you try
> falling back to fast-80 and turning off IU transfers (You'll probably
> either have to use the card bios or dig around in the aic79xx driver for
> the options).
>
> James
>
>
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: aic79xx blowups in 2.6.8-1.521smp (RHAT)
2004-10-07 10:48 ` Bryce as root
@ 2004-10-07 15:54 ` Luben Tuikov
0 siblings, 0 replies; 6+ messages in thread
From: Luben Tuikov @ 2004-10-07 15:54 UTC (permalink / raw)
To: Bryce as root; +Cc: linux-scsi
Bryce as root wrote:
> Detail preamble:
> Linux ZenIV.linux.org.uk 2.6.8-1.521smp #1 SMP Mon Aug 16 09:25:06 EDT 2004 i686 i686 i386 GNU/Linux
> SCSI subsystem initialized
> ACPI: PCI interrupt 0000:03:0a.0[A] -> GSI 22 (level, low) -> IRQ 217
> ACPI: PCI interrupt 0000:03:0a.1[B] -> GSI 19 (level, low) -> IRQ 177
> scsi0 : Adaptec AIC79XX PCI-X SCSI HBA DRIVER, Rev 1.3.11
> <Adaptec AIC7902 Ultra320 SCSI adapter>
> aic7902: Ultra320 Wide Channel A, SCSI Id=7, PCI 33 or 66Mhz, 512 SCBs
>
> (scsi0:A:0): 80.000MB/s transfers (40.000MHz DT, 16bit)
> Vendor: SEAGATE Model: ST336753LW Rev: 0006
> Type: Direct-Access ANSI SCSI revision: 03
> scsi0:A:0:0: Tagged Queuing enabled. Depth 4
> SCSI device sda: 71687372 512-byte hdwr sectors (36704 MB)
> SCSI device sda: drive cache: write back
> sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 >
> Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
>
>
> Mutterings:
> Well I set the BIOS down to 160 and turned off Packeting and QAS and let
> that run for a day
>
> ( http://ftp.linux.org.uk/~bryce/scsi-bios.gif )
>
> Unfortunately the driver has blown up in a different way now.
> I'm at a bit of a loss as to whats going on as the disk verifies
> fine from the adaptec bios utils
>
> I've now set the speed to 80 so we'll see how this goes though it's a shame
> to loose the performance as a result in the drop (was 71MB/s now 61MB/s)
Yes, I agree. Given the intermittent nature of the problem, SCSI BIOS
is not insured from incurring this failure as well, it's just that it
interacts too little a time with the SCSI bus.
Both bugs display similar problem: unexpected bus phase change which
the driver reports as programmed. In this case, on REQUEST SENSE(desc)
on a Data-In phase. The hardware could possibly be flaky.
Luben
> Logfile:
> 04:04:18 (scsi0:A:0:0): Unexpected busfree in DT Data-in phase, 1 SCBs aborted, PRGMCNT == 0x2e
> 04:04:18 >>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<<
> 04:04:18 scsi0: Dumping Card State at program address 0x2c Mode 0x22
> 04:04:18 Card was paused
> 04:04:18 HS_MAILBOX[0x0] INTCTL[0x80] SEQINTSTAT[0x0] SAVED_MODE[0x11]
> 04:04:18 DFFSTAT[0x11] SCSISIGI[0x0] SCSIPHASE[0x0] SCSIBUS[0x0]
> 04:04:18 LASTPHASE[0x60] SCSISEQ0[0x0] SCSISEQ1[0x12] SEQCTL0[0x10]
> 04:04:18 SEQINTCTL[0x0] SEQ_FLAGS[0x20] SEQ_FLAGS2[0x0] SSTAT0[0x0]
> 04:04:18 SSTAT1[0x9] SSTAT2[0xc0] SSTAT3[0x0] PERRDIAG[0x1]
> 04:04:18 SIMODE1[0xac] LQISTAT0[0x0] LQISTAT1[0x0] LQISTAT2[0x0]
> 04:04:19 LQOSTAT0[0x0] LQOSTAT1[0x0] LQOSTAT2[0x0]
> 04:04:19
> 04:04:19 SCB Count = 4 CMDS_PENDING = 1 LASTSCB 0x3 CURRSCB 0x3 NEXTSCB 0x0
> 04:04:19 qinstart = 24363 qinfifonext = 24363
> 04:04:20 QINFIFO:
> 04:04:20 WAITING_TID_QUEUES:
> 04:04:20 Pending list:
> 04:04:20 Total 0
> 04:04:20 Kernel Free SCB list: 3 0 1 2
> 04:04:20 Sequencer Complete DMA-inprog list:
> 04:04:20 Sequencer Complete list:
> 04:04:20 Sequencer DMA-Up and Complete list:
> 04:04:20
> 04:04:20 scsi0: FIFO0 Free, LONGJMP == 0x80ff, SCB 0x0
> 04:04:20 SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x0] DFSTATUS[0x89]
> 04:04:21 SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0]
> 04:04:21 SOFFCNT[0x21] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0
> 04:04:21 HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x0]
> 04:04:21 scsi0: FIFO1 Active, LONGJMP == 0x1ec, SCB 0x3
> 04:04:21 SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x8] DFSTATUS[0x1]
> 04:04:21 SG_CACHE_SHADOW[0x28] SG_STATE[0x3] DFFSXFRCTL[0x0]
> 04:04:21 SOFFCNT[0x21] MDFFSTAT[0xc] SHADDR = 0x0318ebe5e, SHCNT = 0x1a2
> 04:04:21 HADDR = 0x0318ebec2, HCNT = 0x13e CCSGCTL[0x10]
> 04:04:21 LQIN: 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0
> 04:04:21 scsi0: LQISTATE = 0x0, LQOSTATE = 0x0, OPTIONMODE = 0x42
> 04:04:21 scsi0: OS_SPACE_CNT = 0x20 MAXCMDCNT = 0x0
> 04:04:21 SIMODE0[0xc]
> 04:04:21 CCSCBCTL[0x4]
> 04:04:21 scsi0: REG0 == 0x3, SINDEX = 0x122, DINDEX = 0xa9
> 04:04:21 scsi0: SCBPTR == 0xff03, SCB_NEXT == 0xff00, SCB_NEXT2 == 0x0
> 04:04:21 CDB 3 1 0 0 0 0
> 04:04:22 STACK: 0x206 0x0 0x0 0x0 0x0 0x0 0x0 0x29
> 04:04:22 <<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>>
> 04:04:22 DevQ(0:0:0): 0 waiting
> 04:04:22 SCSI error : <0 0 0 0> return code = 0x10000
> 04:04:22 end_request: I/O error, dev sda, sector 4239969
> 04:04:22 SCSI error : <0 0 0 0> return code = 0x10000
> 04:04:22 end_request: I/O error, dev sda, sector 4239977
> 04:04:22 SCSI error : <0 0 0 0> return code = 0x10000
> 04:04:22 end_request: I/O error, dev sda, sector 4239985
> 04:04:22 SCSI error : <0 0 0 0> return code = 0x10000
> 04:04:22 end_request: I/O error, dev sda, sector 4239993
> 04:04:22 SCSI error : <0 0 0 0> return code = 0x10000
> 04:04:22 end_request: I/O error, dev sda, sector 4240001
> 04:04:22 (scsi0:A:0:0): No or incomplete CDB sent to device.
> 04:04:23 scsi0: Issued Channel A Bus Reset. 1 SCBs aborted
> 04:04:23 SCSI error : <0 0 0 0> return code = 0x8000002
> 04:04:23 Info fld=0x0, Current sda: sense key Aborted Command
> 04:04:23 end_request: I/O error, dev sda, sector 4240009
> 04:04:23 SCSI error : <0 0 0 0> return code = 0x8000002
> 04:04:23 Info fld=0x0, Current sda: sense key Aborted Command
> 04:04:23 end_request: I/O error, dev sda, sector 4240017
> 04:04:23 SCSI error : <0 0 0 0> return code = 0x8000002
> 04:04:23 Info fld=0x0, Current sda: sense key Aborted Command
> 04:04:23 end_request: I/O error, dev sda, sector 68860388
> 04:04:23 Buffer I/O error on device sda7, logical block 2187565
> 04:04:23 lost page write due to I/O error on sda7
> 04:04:23 SCSI error : <0 0 0 0> return code = 0x8000002
> 04:04:23 Info fld=0x0, Current sda: sense key Aborted Command
> 04:04:23 end_request: I/O error, dev sda, sector 68860396
> 04:04:25 end_request: I/O error, dev sda, sector 4240041
> 04:04:25 (scsi0:A:0): 160.000MB/s transfers (80.000MHz DT, 16bit)
> 04:04:25 (scsi0:A:0:0): Unexpected busfree in DT Data-in phase, 1 SCBs aborted, PRGMCNT == 0x97
> 04:04:25 >>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<<
> 04:04:25 scsi0: Dumping Card State at program address 0x95 Mode 0x22
> 04:04:25 Card was paused
> 04:04:25 HS_MAILBOX[0x0] INTCTL[0x80] SEQINTSTAT[0x0] SAVED_MODE[0x11]
> 04:04:25 DFFSTAT[0x11] SCSISIGI[0x0] SCSIPHASE[0x0] SCSIBUS[0x0]
> 04:04:25 LASTPHASE[0x60] SCSISEQ0[0x0] SCSISEQ1[0x12] SEQCTL0[0x10]
> 04:04:25 SEQINTCTL[0x80] SEQ_FLAGS[0x20] SEQ_FLAGS2[0x0] SSTAT0[0x0]
> 04:04:25 SSTAT1[0x9] SSTAT2[0xc0] SSTAT3[0x0] PERRDIAG[0x1]
> 04:04:25 SIMODE1[0xac] LQISTAT0[0x0] LQISTAT1[0x0] LQISTAT2[0x0]
> 04:04:26 LQOSTAT0[0x0] LQOSTAT1[0x0] LQOSTAT2[0x0]
> 04:04:26
> 04:04:26 SCB Count = 4 CMDS_PENDING = 1 LASTSCB 0x1 CURRSCB 0x1 NEXTSCB 0x0
> 04:04:26 qinstart = 87 qinfifonext = 87
> 04:04:26 QINFIFO:
> 04:04:26 WAITING_TID_QUEUES:
> 04:04:26 Pending list:
> 04:04:26 Total 0
> 04:04:26 Kernel Free SCB list: 1 0 3 2
> 04:04:26 Sequencer Complete DMA-inprog list:
> 04:04:26 Sequencer Complete list:
> 04:04:26 Sequencer DMA-Up and Complete list:
> 04:04:26
> 04:04:26 scsi0: FIFO0 Free, LONGJMP == 0x80ff, SCB 0x0
> 04:04:26 SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x0] DFSTATUS[0x89]
> 04:04:26 SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0]
> 04:04:26 SOFFCNT[0x1d] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0
> 04:04:27 HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x0]
> 04:04:27 scsi0: FIFO1 Active, LONGJMP == 0x1ec, SCB 0x1
> 04:04:27 SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x8] DFSTATUS[0x81]
> 04:04:27 SG_CACHE_SHADOW[0x20] SG_STATE[0x3] DFFSXFRCTL[0x0]
> 04:04:27 SOFFCNT[0x1d] MDFFSTAT[0xc] SHADDR = 0x05a68f1ac, SHCNT = 0xe54
> 04:04:27 HADDR = 0x05a68f20a, HCNT = 0xdf6 CCSGCTL[0x10]
> 04:04:27 LQIN: 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0
> 04:04:27 scsi0: LQISTATE = 0x0, LQOSTATE = 0x0, OPTIONMODE = 0x42
> 04:04:27 scsi0: OS_SPACE_CNT = 0x20 MAXCMDCNT = 0x0
> 04:04:27 SIMODE0[0xc]
> 04:04:27 CCSCBCTL[0x4]
> 04:04:27 scsi0: REG0 == 0x1, SINDEX = 0x122, DINDEX = 0x1ba
> 04:04:27 scsi0: SCBPTR == 0xff01, SCB_NEXT == 0xff00, SCB_NEXT2 == 0x0
> 04:04:27 CDB 1 1 0 0 0 0
> 04:04:27 STACK: 0x29 0x206 0x0 0x0 0x0 0x0 0x0 0x0
> 04:04:27 <<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>>
> 04:04:28 DevQ(0:0:0): 0 waiting
> 04:04:28 SCSI error : <0 0 0 0> return code = 0x10000
> 04:04:28 end_request: I/O error, dev sda, sector 4239913
> 04:04:28 SCSI error : <0 0 0 0> return code = 0x10000
> 04:04:28 (scsi0:A:0:0): No or incomplete CDB sent to device.
> 04:04:28 scsi0: Issued Channel A Bus Reset. 1 SCBs aborted
> 04:04:28 (scsi0:A:0): 80.000MB/s transfers (40.000MHz DT, 16bit)
> 04:04:28 SCSI error : <0 0 0 0> return code = 0x8000002
> 04:04:29 Info fld=0x0, Current sda: sense key Aborted Command
> 04:04:29 end_request: I/O error, dev sda, sector 4239953
> 04:04:29 SCSI error : <0 0 0 0> return code = 0x8000002
> 04:04:29 Info fld=0x0, Current sda: sense key Aborted Command
> 04:04:29 end_request: I/O error, dev sda, sector 51398828
> 04:04:29 Buffer I/O error on device sda7, logical block 4870
> 04:04:29 lost page write due to I/O error on sda7
> 04:04:29 Aborting journal on device sda7.
> 04:04:29 journal commit I/O error
> 04:04:29 ext3_abort called.
> 04:04:29 EXT3-fs abort (device sda7): ext3_journal_start: Detected aborted journal
> 04:04:29 Remounting filesystem read-only
>
>
>
>>On Mon, 2004-10-04 at 07:24, Bryce as root wrote:
>>
>>>kernel dmesg dump :
>>>Reseting Channel for LQI Phase error
>>
>>This is clearly the cause of all the trouble. The driver is a bit
>>opaque at this point, but it looks like an LQI phase error occurs
>>because of a phase mismatch during L_Q Information Units (These are a
>>requirement for fast-160 data transfers).
>>
>>I think this is a strong indicator of bus instability ... could you try
>>falling back to fast-80 and turning off IU transfers (You'll probably
>>either have to use the card bios or dig around in the aic79xx driver for
>>the options).
>>
>>James
>>
>>
>>
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2004-10-07 15:54 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-10-04 12:24 aic79xx blowups in 2.6.8-1.521smp (RHAT) Bryce as root
2004-10-04 14:00 ` James Bottomley
2004-10-07 10:48 ` Bryce as root
2004-10-07 15:54 ` Luben Tuikov
2004-10-05 14:04 ` Luben Tuikov
2004-10-05 21:51 ` Luben Tuikov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).