linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* aic79xx blowups in 2.6.8-1.521smp (RHAT)
@ 2004-10-04 12:24 Bryce as root
  2004-10-04 14:00 ` James Bottomley
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Bryce as root @ 2004-10-04 12:24 UTC (permalink / raw)
  To: linux-scsi

<sigh> where to start,..

I have a system which has an onboard aic79xx controller

lspci -vv :
03:0a.0 SCSI storage controller: Adaptec AIC-7902 U320 (rev 03)
        Subsystem: Adaptec: Unknown device ffff
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 32 (10000ns min, 6250ns max), Cache Line Size 08
        Interrupt: pin A routed to IRQ 217
        Region 0: I/O ports at 7c00 [disabled]
        Region 1: Memory at f3008000 (64-bit, non-prefetchable) [size=8K]
        Region 3: I/O ports at 8000 [disabled] [size=256]
        Capabilities: [dc] Power Management version 1
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [a0] Message Signalled Interrupts: 64bit+ Queue=0/1 Enable-
                Address: 0000000000000000  Data: 0000
        Capabilities: [94] PCI-X non-bridge device.
                Command: DPERE- ERO+ RBC=0 OST=4
                Status: Bus=255 Dev=31 Func=0 64bit+ 133MHz+ SCD- USC-, DC=simple, DMMRBC=0, DMOST=4, DMCRS=1, RSCEM-

This is connected by certified U320 cable to a 15K U320 Seagate 37Gb drive

SCSI subsystem initialized
ACPI: PCI interrupt 0000:03:0a.0[A] -> GSI 22 (level, low) -> IRQ 217
ACPI: PCI interrupt 0000:03:0a.1[B] -> GSI 19 (level, low) -> IRQ 177
scsi0 : Adaptec AIC79XX PCI-X SCSI HBA DRIVER, Rev 1.3.11
        <Adaptec AIC7902 Ultra320 SCSI adapter>
        aic7902: Ultra320 Wide Channel A, SCSI Id=7, PCI 33 or 66Mhz, 512 SCBs

(scsi0:A:0): 320.000MB/s transfers (160.000MHz DT|IU|QAS, 16bit)
  Vendor: SEAGATE   Model: ST336753LW        Rev: 0006
  Type:   Direct-Access                      ANSI SCSI revision: 03
scsi0:A:0:0: Tagged Queuing enabled.  Depth 4
SCSI device sda: 71687372 512-byte hdwr sectors (36704 MB)
SCSI device sda: drive cache: write back
 sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 >
Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
scsi1 : Adaptec AIC79XX PCI-X SCSI HBA DRIVER, Rev 1.3.11
        <Adaptec AIC7902 Ultra320 SCSI adapter>
        aic7902: Ultra320 Wide Channel B, SCSI Id=7, PCI 33 or 66Mhz, 512 SCBs

Ok, to the problem itself. I've found that the system will choke badly after
anything from a few days to a couple of weeks, always over a LQI Phase error.
I've asked around to help work out what the LQI Phase error is about and
I still don't understand it.

kernel dmesg dump :
Reseting Channel for LQI Phase error
>>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<<
scsi0: Dumping Card State at program address 0x8 Mode 0x33
Card was paused
HS_MAILBOX[0x0] INTCTL[0x80] SEQINTSTAT[0x0] SAVED_MODE[0x11] 
DFFSTAT[0x11] SCSISIGI[0x0] SCSIPHASE[0x0] SCSIBUS[0x0] 
LASTPHASE[0x1] SCSISEQ0[0x0] SCSISEQ1[0x12] SEQCTL0[0x10] 
SEQINTCTL[0x0] SEQ_FLAGS[0x0] SEQ_FLAGS2[0x0] SSTAT0[0x0] 
SSTAT1[0x9] SSTAT2[0x0] SSTAT3[0x0] PERRDIAG[0x0] 
SIMODE1[0xa4] LQISTAT0[0x0] LQISTAT1[0x0] LQISTAT2[0x0] 
LQOSTAT0[0x0] LQOSTAT1[0x0] LQOSTAT2[0x1] 

SCB Count = 4 CMDS_PENDING = 2 LASTSCB 0x1 CURRSCB 0x0 NEXTSCB 0xff00
qinstart = 40786 qinfifonext = 40786
QINFIFO:
WAITING_TID_QUEUES:
Pending list:
  0 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7] 
  1 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7] 
Total 2
Kernel Free SCB list: 2 3 
Sequencer Complete DMA-inprog list: 
Sequencer Complete list: 
Sequencer DMA-Up and Complete list: 

scsi0: FIFO0 Free, LONGJMP == 0x80ff, SCB 0x1
SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x0] DFSTATUS[0x89] 
SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0] 
SOFFCNT[0x66] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0 
HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x88] 
scsi0: FIFO1 Active, LONGJMP == 0x257, SCB 0x1
SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x8] DFSTATUS[0x0] 
SG_CACHE_SHADOW[0x30] SG_STATE[0x6] DFFSXFRCTL[0x0] 
SOFFCNT[0x66] MDFFSTAT[0xa] SHADDR = 0x06c420c04, SHCNT = 0x3fc 
HADDR = 0x06c420ce0, HCNT = 0x320 CCSGCTL[0x98] 
LQIN: 0x4 0x0 0x0 0x1 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x32 0x0 0x0 0x0 0
x2 0x0 
scsi0: LQISTATE = 0x0, LQOSTATE = 0x0, OPTIONMODE = 0x42
scsi0: OS_SPACE_CNT = 0x20 MAXCMDCNT = 0x2
SIMODE0[0xc] 
CCSCBCTL[0x4] 
scsi0: REG0 == 0x3, SINDEX = 0x133, DINDEX = 0x102
scsi0: SCBPTR == 0x0, SCB_NEXT == 0xff00, SCB_NEXT2 == 0xff11
CDB 28 0 1 93 e9 d6
STACK: 0x125 0x0 0x0 0x257 0x240 0x93 0x29 0x1
<<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>>
DevQ(0:0:0): 0 waiting
SCSI error : <0 0 0 0> return code = 0x10000
end_request: I/O error, dev sda, sector 26470870
SCSI error : <0 0 0 0> return code = 0x10000
end_request: I/O error, dev sda, sector 26470878
SCSI error : <0 0 0 0> return code = 0x10000
end_request: I/O error, dev sda, sector 26470886
SCSI error : <0 0 0 0> return code = 0x10000
end_request: I/O error, dev sda, sector 26470894
SCSI error : <0 0 0 0> return code = 0x8000002
Info fld=0x0, Current sda: sense key Aborted Command
end_request: I/O error, dev sda, sector 26470766
SCSI error : <0 0 0 0> return code = 0x8000002
Info fld=0x0, Current sda: sense key Aborted Command
end_request: I/O error, dev sda, sector 26470774
SCSI error : <0 0 0 0> return code = 0x8000002
Info fld=0x0, Current sda: sense key Aborted Command
end_request: I/O error, dev sda, sector 51448252
Buffer I/O error on device sda7, logical block 11048
lost page write due to I/O error on sda7
SCSI error : <0 0 0 0> return code = 0x8000002
Info fld=0x0, Current sda: sense key Aborted Command
end_request: I/O error, dev sda, sector 26470782
SCSI error : <0 0 0 0> return code = 0x8000002
Info fld=0x0, Current sda: sense key Aborted Command
end_request: I/O error, dev sda, sector 68501204
Buffer I/O error on device sda7, logical block 2142667
lost page write due to I/O error on sda7
(scsi0:A:0): 320.000MB/s transfers (160.000MHz DT|IU|QAS, 16bit)


When the system is up and running, the disk is stable and I
get a quite respectable 72MB/s max throughput.

I have other card dumps however since this isn't my area of expertise I've
not included them as they're roughly the same problem always starting off
with a "Reseting Channel for LQI Phase error" error.

I should mention that I have a sister machine with the exact same HW 
that exhibits the issues and though not a 100% indication, it
would suggest the issue is more to do with the driver.

I did also try to rebuild the kernel with aic79xx-linux-2.6-20040522-tar.gz
however I've not run it long enough on the box to say that it fixes the issue

Ideas anyone? or shall I just keep banging on 2.0.12 and hope it fixes/masks
the problem?

Phil
=--=

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: aic79xx blowups in 2.6.8-1.521smp (RHAT)
  2004-10-04 12:24 aic79xx blowups in 2.6.8-1.521smp (RHAT) Bryce as root
@ 2004-10-04 14:00 ` James Bottomley
  2004-10-07 10:48   ` Bryce as root
  2004-10-05 14:04 ` Luben Tuikov
  2004-10-05 21:51 ` Luben Tuikov
  2 siblings, 1 reply; 6+ messages in thread
From: James Bottomley @ 2004-10-04 14:00 UTC (permalink / raw)
  To: Bryce as root; +Cc: SCSI Mailing List

On Mon, 2004-10-04 at 07:24, Bryce as root wrote:
> kernel dmesg dump :
> Reseting Channel for LQI Phase error

This is clearly the cause of all the trouble.  The driver is a bit
opaque at this point, but it looks like an LQI phase error occurs
because of a phase mismatch during L_Q Information Units (These are a
requirement for fast-160 data transfers).

I think this is a strong indicator of bus instability ... could you try
falling back to fast-80 and turning off IU transfers (You'll probably
either have to use the card bios or dig around in the aic79xx driver for
the options).

James



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: aic79xx blowups in 2.6.8-1.521smp (RHAT)
  2004-10-04 12:24 aic79xx blowups in 2.6.8-1.521smp (RHAT) Bryce as root
  2004-10-04 14:00 ` James Bottomley
@ 2004-10-05 14:04 ` Luben Tuikov
  2004-10-05 21:51 ` Luben Tuikov
  2 siblings, 0 replies; 6+ messages in thread
From: Luben Tuikov @ 2004-10-05 14:04 UTC (permalink / raw)
  To: Bryce as root; +Cc: linux-scsi

Is it possible that you try Revision 007 or later?

	Luben


Bryce as root wrote:
> <sigh> where to start,..
> 
> I have a system which has an onboard aic79xx controller
> 
> lspci -vv :
> 03:0a.0 SCSI storage controller: Adaptec AIC-7902 U320 (rev 03)
>         Subsystem: Adaptec: Unknown device ffff
>         Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
>         Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
>         Latency: 32 (10000ns min, 6250ns max), Cache Line Size 08
>         Interrupt: pin A routed to IRQ 217
>         Region 0: I/O ports at 7c00 [disabled]
>         Region 1: Memory at f3008000 (64-bit, non-prefetchable) [size=8K]
>         Region 3: I/O ports at 8000 [disabled] [size=256]
>         Capabilities: [dc] Power Management version 1
>                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
>                 Status: D0 PME-Enable- DSel=0 DScale=0 PME-
>         Capabilities: [a0] Message Signalled Interrupts: 64bit+ Queue=0/1 Enable-
>                 Address: 0000000000000000  Data: 0000
>         Capabilities: [94] PCI-X non-bridge device.
>                 Command: DPERE- ERO+ RBC=0 OST=4
>                 Status: Bus=255 Dev=31 Func=0 64bit+ 133MHz+ SCD- USC-, DC=simple, DMMRBC=0, DMOST=4, DMCRS=1, RSCEM-
> 
> This is connected by certified U320 cable to a 15K U320 Seagate 37Gb drive
> 
> SCSI subsystem initialized
> ACPI: PCI interrupt 0000:03:0a.0[A] -> GSI 22 (level, low) -> IRQ 217
> ACPI: PCI interrupt 0000:03:0a.1[B] -> GSI 19 (level, low) -> IRQ 177
> scsi0 : Adaptec AIC79XX PCI-X SCSI HBA DRIVER, Rev 1.3.11
>         <Adaptec AIC7902 Ultra320 SCSI adapter>
>         aic7902: Ultra320 Wide Channel A, SCSI Id=7, PCI 33 or 66Mhz, 512 SCBs
> 
> (scsi0:A:0): 320.000MB/s transfers (160.000MHz DT|IU|QAS, 16bit)
>   Vendor: SEAGATE   Model: ST336753LW        Rev: 0006
>   Type:   Direct-Access                      ANSI SCSI revision: 03
> scsi0:A:0:0: Tagged Queuing enabled.  Depth 4
> SCSI device sda: 71687372 512-byte hdwr sectors (36704 MB)
> SCSI device sda: drive cache: write back
>  sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 >
> Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
> scsi1 : Adaptec AIC79XX PCI-X SCSI HBA DRIVER, Rev 1.3.11
>         <Adaptec AIC7902 Ultra320 SCSI adapter>
>         aic7902: Ultra320 Wide Channel B, SCSI Id=7, PCI 33 or 66Mhz, 512 SCBs
> 
> Ok, to the problem itself. I've found that the system will choke badly after
> anything from a few days to a couple of weeks, always over a LQI Phase error.
> I've asked around to help work out what the LQI Phase error is about and
> I still don't understand it.
> 
> kernel dmesg dump :
> Reseting Channel for LQI Phase error
> 
>>>>>>>>>>>>>>>>>>>Dump Card State Begins <<<<<<<<<<<<<<<<<
> 
> scsi0: Dumping Card State at program address 0x8 Mode 0x33
> Card was paused
> HS_MAILBOX[0x0] INTCTL[0x80] SEQINTSTAT[0x0] SAVED_MODE[0x11] 
> DFFSTAT[0x11] SCSISIGI[0x0] SCSIPHASE[0x0] SCSIBUS[0x0] 
> LASTPHASE[0x1] SCSISEQ0[0x0] SCSISEQ1[0x12] SEQCTL0[0x10] 
> SEQINTCTL[0x0] SEQ_FLAGS[0x0] SEQ_FLAGS2[0x0] SSTAT0[0x0] 
> SSTAT1[0x9] SSTAT2[0x0] SSTAT3[0x0] PERRDIAG[0x0] 
> SIMODE1[0xa4] LQISTAT0[0x0] LQISTAT1[0x0] LQISTAT2[0x0] 
> LQOSTAT0[0x0] LQOSTAT1[0x0] LQOSTAT2[0x1] 
> 
> SCB Count = 4 CMDS_PENDING = 2 LASTSCB 0x1 CURRSCB 0x0 NEXTSCB 0xff00
> qinstart = 40786 qinfifonext = 40786
> QINFIFO:
> WAITING_TID_QUEUES:
> Pending list:
>   0 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7] 
>   1 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7] 
> Total 2
> Kernel Free SCB list: 2 3 
> Sequencer Complete DMA-inprog list: 
> Sequencer Complete list: 
> Sequencer DMA-Up and Complete list: 
> 
> scsi0: FIFO0 Free, LONGJMP == 0x80ff, SCB 0x1
> SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x0] DFSTATUS[0x89] 
> SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0] 
> SOFFCNT[0x66] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0 
> HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x88] 
> scsi0: FIFO1 Active, LONGJMP == 0x257, SCB 0x1
> SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x8] DFSTATUS[0x0] 
> SG_CACHE_SHADOW[0x30] SG_STATE[0x6] DFFSXFRCTL[0x0] 
> SOFFCNT[0x66] MDFFSTAT[0xa] SHADDR = 0x06c420c04, SHCNT = 0x3fc 
> HADDR = 0x06c420ce0, HCNT = 0x320 CCSGCTL[0x98] 
> LQIN: 0x4 0x0 0x0 0x1 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x32 0x0 0x0 0x0 0
> x2 0x0 
> scsi0: LQISTATE = 0x0, LQOSTATE = 0x0, OPTIONMODE = 0x42
> scsi0: OS_SPACE_CNT = 0x20 MAXCMDCNT = 0x2
> SIMODE0[0xc] 
> CCSCBCTL[0x4] 
> scsi0: REG0 == 0x3, SINDEX = 0x133, DINDEX = 0x102
> scsi0: SCBPTR == 0x0, SCB_NEXT == 0xff00, SCB_NEXT2 == 0xff11
> CDB 28 0 1 93 e9 d6
> STACK: 0x125 0x0 0x0 0x257 0x240 0x93 0x29 0x1
> <<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>>
> DevQ(0:0:0): 0 waiting
> SCSI error : <0 0 0 0> return code = 0x10000
> end_request: I/O error, dev sda, sector 26470870
> SCSI error : <0 0 0 0> return code = 0x10000
> end_request: I/O error, dev sda, sector 26470878
> SCSI error : <0 0 0 0> return code = 0x10000
> end_request: I/O error, dev sda, sector 26470886
> SCSI error : <0 0 0 0> return code = 0x10000
> end_request: I/O error, dev sda, sector 26470894
> SCSI error : <0 0 0 0> return code = 0x8000002
> Info fld=0x0, Current sda: sense key Aborted Command
> end_request: I/O error, dev sda, sector 26470766
> SCSI error : <0 0 0 0> return code = 0x8000002
> Info fld=0x0, Current sda: sense key Aborted Command
> end_request: I/O error, dev sda, sector 26470774
> SCSI error : <0 0 0 0> return code = 0x8000002
> Info fld=0x0, Current sda: sense key Aborted Command
> end_request: I/O error, dev sda, sector 51448252
> Buffer I/O error on device sda7, logical block 11048
> lost page write due to I/O error on sda7
> SCSI error : <0 0 0 0> return code = 0x8000002
> Info fld=0x0, Current sda: sense key Aborted Command
> end_request: I/O error, dev sda, sector 26470782
> SCSI error : <0 0 0 0> return code = 0x8000002
> Info fld=0x0, Current sda: sense key Aborted Command
> end_request: I/O error, dev sda, sector 68501204
> Buffer I/O error on device sda7, logical block 2142667
> lost page write due to I/O error on sda7
> (scsi0:A:0): 320.000MB/s transfers (160.000MHz DT|IU|QAS, 16bit)
> 
> 
> When the system is up and running, the disk is stable and I
> get a quite respectable 72MB/s max throughput.
> 
> I have other card dumps however since this isn't my area of expertise I've
> not included them as they're roughly the same problem always starting off
> with a "Reseting Channel for LQI Phase error" error.
> 
> I should mention that I have a sister machine with the exact same HW 
> that exhibits the issues and though not a 100% indication, it
> would suggest the issue is more to do with the driver.
> 
> I did also try to rebuild the kernel with aic79xx-linux-2.6-20040522-tar.gz
> however I've not run it long enough on the box to say that it fixes the issue
> 
> Ideas anyone? or shall I just keep banging on 2.0.12 and hope it fixes/masks
> the problem?
> 
> Phil
> =--=
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: aic79xx blowups in 2.6.8-1.521smp (RHAT)
  2004-10-04 12:24 aic79xx blowups in 2.6.8-1.521smp (RHAT) Bryce as root
  2004-10-04 14:00 ` James Bottomley
  2004-10-05 14:04 ` Luben Tuikov
@ 2004-10-05 21:51 ` Luben Tuikov
  2 siblings, 0 replies; 6+ messages in thread
From: Luben Tuikov @ 2004-10-05 21:51 UTC (permalink / raw)
  To: Bryce as root; +Cc: linux-scsi

Bryce as root wrote:
> <sigh> where to start,..
> 
> I have a system which has an onboard aic79xx controller
> 
> lspci -vv :
> 03:0a.0 SCSI storage controller: Adaptec AIC-7902 U320 (rev 03)
>         Subsystem: Adaptec: Unknown device ffff
>         Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- 
> ParErr- Stepping- SERR- FastB2B-
>         Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- 
> <TAbort- <MAbort- >SERR- <PERR-
>         Latency: 32 (10000ns min, 6250ns max), Cache Line Size 08
>         Interrupt: pin A routed to IRQ 217
>         Region 0: I/O ports at 7c00 [disabled]
>         Region 1: Memory at f3008000 (64-bit, non-prefetchable) [size=8K]
>         Region 3: I/O ports at 8000 [disabled] [size=256]
>         Capabilities: [dc] Power Management version 1
>                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA 
> PME(D0-,D1-,D2-,D3hot-,D3cold-)
>                 Status: D0 PME-Enable- DSel=0 DScale=0 PME-
>         Capabilities: [a0] Message Signalled Interrupts: 64bit+ 
> Queue=0/1 Enable-
>                 Address: 0000000000000000  Data: 0000
>         Capabilities: [94] PCI-X non-bridge device.
>                 Command: DPERE- ERO+ RBC=0 OST=4
>                 Status: Bus=255 Dev=31 Func=0 64bit+ 133MHz+ SCD- USC-, 
> DC=simple, DMMRBC=0, DMOST=4, DMCRS=1, RSCEM-
> 
> This is connected by certified U320 cable to a 15K U320 Seagate 37Gb drive
> 
> SCSI subsystem initialized
> ACPI: PCI interrupt 0000:03:0a.0[A] -> GSI 22 (level, low) -> IRQ 217
> ACPI: PCI interrupt 0000:03:0a.1[B] -> GSI 19 (level, low) -> IRQ 177
> scsi0 : Adaptec AIC79XX PCI-X SCSI HBA DRIVER, Rev 1.3.11
>         <Adaptec AIC7902 Ultra320 SCSI adapter>
>         aic7902: Ultra320 Wide Channel A, SCSI Id=7, PCI 33 or 66Mhz, 
> 512 SCBs
> 
> (scsi0:A:0): 320.000MB/s transfers (160.000MHz DT|IU|QAS, 16bit)
>   Vendor: SEAGATE   Model: ST336753LW        Rev: 0006
>   Type:   Direct-Access                      ANSI SCSI revision: 03
> scsi0:A:0:0: Tagged Queuing enabled.  Depth 4
> SCSI device sda: 71687372 512-byte hdwr sectors (36704 MB)
> SCSI device sda: drive cache: write back
>  sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 >
> Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
> scsi1 : Adaptec AIC79XX PCI-X SCSI HBA DRIVER, Rev 1.3.11
>         <Adaptec AIC7902 Ultra320 SCSI adapter>
>         aic7902: Ultra320 Wide Channel B, SCSI Id=7, PCI 33 or 66Mhz, 
> 512 SCBs
> 
> Ok, to the problem itself. I've found that the system will choke badly 
> after
> anything from a few days to a couple of weeks, always over a LQI Phase 
> error.
> I've asked around to help work out what the LQI Phase error is about and
> I still don't understand it.

There was an "illegal" phase change while receiving an LQI packet,
since you're at 320, and have parallel information units on.

The card dump state isn't complete as some registers were cleared
when the card was paused and gotten out of critial region, but
it looks like there was a READ(10) and the chip was just about
to transfer some data to host memory.

If the failure can be reproduced consistently, a SCSI Bus trace
would narrow down where the problem is.

	Luben

> kernel dmesg dump :
> Reseting Channel for LQI Phase error
>  >>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<<
> scsi0: Dumping Card State at program address 0x8 Mode 0x33
> Card was paused
> HS_MAILBOX[0x0] INTCTL[0x80] SEQINTSTAT[0x0] SAVED_MODE[0x11]
> DFFSTAT[0x11] SCSISIGI[0x0] SCSIPHASE[0x0] SCSIBUS[0x0]
> LASTPHASE[0x1] SCSISEQ0[0x0] SCSISEQ1[0x12] SEQCTL0[0x10]
> SEQINTCTL[0x0] SEQ_FLAGS[0x0] SEQ_FLAGS2[0x0] SSTAT0[0x0]
> SSTAT1[0x9] SSTAT2[0x0] SSTAT3[0x0] PERRDIAG[0x0]
> SIMODE1[0xa4] LQISTAT0[0x0] LQISTAT1[0x0] LQISTAT2[0x0]
> LQOSTAT0[0x0] LQOSTAT1[0x0] LQOSTAT2[0x1]
> 
> SCB Count = 4 CMDS_PENDING = 2 LASTSCB 0x1 CURRSCB 0x0 NEXTSCB 0xff00
> qinstart = 40786 qinfifonext = 40786
> QINFIFO:
> WAITING_TID_QUEUES:
> Pending list:
>   0 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
>   1 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
> Total 2
> Kernel Free SCB list: 2 3
> Sequencer Complete DMA-inprog list:
> Sequencer Complete list:
> Sequencer DMA-Up and Complete list:
> 
> scsi0: FIFO0 Free, LONGJMP == 0x80ff, SCB 0x1
> SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x0] DFSTATUS[0x89]
> SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0]
> SOFFCNT[0x66] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0
> HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x88]
> scsi0: FIFO1 Active, LONGJMP == 0x257, SCB 0x1
> SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x8] DFSTATUS[0x0]
> SG_CACHE_SHADOW[0x30] SG_STATE[0x6] DFFSXFRCTL[0x0]
> SOFFCNT[0x66] MDFFSTAT[0xa] SHADDR = 0x06c420c04, SHCNT = 0x3fc
> HADDR = 0x06c420ce0, HCNT = 0x320 CCSGCTL[0x98]
> LQIN: 0x4 0x0 0x0 0x1 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x32 0x0 
> 0x0 0x0 0
> x2 0x0
> scsi0: LQISTATE = 0x0, LQOSTATE = 0x0, OPTIONMODE = 0x42
> scsi0: OS_SPACE_CNT = 0x20 MAXCMDCNT = 0x2
> SIMODE0[0xc]
> CCSCBCTL[0x4]
> scsi0: REG0 == 0x3, SINDEX = 0x133, DINDEX = 0x102
> scsi0: SCBPTR == 0x0, SCB_NEXT == 0xff00, SCB_NEXT2 == 0xff11
> CDB 28 0 1 93 e9 d6
> STACK: 0x125 0x0 0x0 0x257 0x240 0x93 0x29 0x1
> <<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>>
> DevQ(0:0:0): 0 waiting
> SCSI error : <0 0 0 0> return code = 0x10000
> end_request: I/O error, dev sda, sector 26470870
> SCSI error : <0 0 0 0> return code = 0x10000
> end_request: I/O error, dev sda, sector 26470878
> SCSI error : <0 0 0 0> return code = 0x10000
> end_request: I/O error, dev sda, sector 26470886
> SCSI error : <0 0 0 0> return code = 0x10000
> end_request: I/O error, dev sda, sector 26470894
> SCSI error : <0 0 0 0> return code = 0x8000002
> Info fld=0x0, Current sda: sense key Aborted Command
> end_request: I/O error, dev sda, sector 26470766
> SCSI error : <0 0 0 0> return code = 0x8000002
> Info fld=0x0, Current sda: sense key Aborted Command
> end_request: I/O error, dev sda, sector 26470774
> SCSI error : <0 0 0 0> return code = 0x8000002
> Info fld=0x0, Current sda: sense key Aborted Command
> end_request: I/O error, dev sda, sector 51448252
> Buffer I/O error on device sda7, logical block 11048
> lost page write due to I/O error on sda7
> SCSI error : <0 0 0 0> return code = 0x8000002
> Info fld=0x0, Current sda: sense key Aborted Command
> end_request: I/O error, dev sda, sector 26470782
> SCSI error : <0 0 0 0> return code = 0x8000002
> Info fld=0x0, Current sda: sense key Aborted Command
> end_request: I/O error, dev sda, sector 68501204
> Buffer I/O error on device sda7, logical block 2142667
> lost page write due to I/O error on sda7
> (scsi0:A:0): 320.000MB/s transfers (160.000MHz DT|IU|QAS, 16bit)
> 
> 
> When the system is up and running, the disk is stable and I
> get a quite respectable 72MB/s max throughput.
> 
> I have other card dumps however since this isn't my area of expertise I've
> not included them as they're roughly the same problem always starting off
> with a "Reseting Channel for LQI Phase error" error.
> 
> I should mention that I have a sister machine with the exact same HW
> that exhibits the issues and though not a 100% indication, it
> would suggest the issue is more to do with the driver.
> 
> I did also try to rebuild the kernel with aic79xx-linux-2.6-20040522-tar.gz
> however I've not run it long enough on the box to say that it fixes the 
> issue
> 
> Ideas anyone? or shall I just keep banging on 2.0.12 and hope it 
> fixes/masks
> the problem?
> 
> Phil
> =--=
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: aic79xx blowups in 2.6.8-1.521smp (RHAT)
  2004-10-04 14:00 ` James Bottomley
@ 2004-10-07 10:48   ` Bryce as root
  2004-10-07 15:54     ` Luben Tuikov
  0 siblings, 1 reply; 6+ messages in thread
From: Bryce as root @ 2004-10-07 10:48 UTC (permalink / raw)
  To: linux-scsi

Detail preamble:
Linux ZenIV.linux.org.uk 2.6.8-1.521smp #1 SMP Mon Aug 16 09:25:06 EDT 2004 i686 i686 i386 GNU/Linux
SCSI subsystem initialized
ACPI: PCI interrupt 0000:03:0a.0[A] -> GSI 22 (level, low) -> IRQ 217
ACPI: PCI interrupt 0000:03:0a.1[B] -> GSI 19 (level, low) -> IRQ 177
scsi0 : Adaptec AIC79XX PCI-X SCSI HBA DRIVER, Rev 1.3.11
        <Adaptec AIC7902 Ultra320 SCSI adapter>
        aic7902: Ultra320 Wide Channel A, SCSI Id=7, PCI 33 or 66Mhz, 512 SCBs

(scsi0:A:0): 80.000MB/s transfers (40.000MHz DT, 16bit)
  Vendor: SEAGATE   Model: ST336753LW        Rev: 0006
  Type:   Direct-Access                      ANSI SCSI revision: 03
scsi0:A:0:0: Tagged Queuing enabled.  Depth 4
SCSI device sda: 71687372 512-byte hdwr sectors (36704 MB)
SCSI device sda: drive cache: write back
 sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 >
Attached scsi disk sda at scsi0, channel 0, id 0, lun 0


Mutterings:
Well I set the BIOS down to 160 and turned off Packeting and QAS and let
that run for a day

( http://ftp.linux.org.uk/~bryce/scsi-bios.gif )

Unfortunately the driver has blown up in a different way now.
I'm at a bit of a loss as to whats going on as the disk verifies
fine from the adaptec bios utils

I've now set the speed to 80 so we'll see how this goes though it's a shame
to loose the performance as a result in the drop (was 71MB/s now 61MB/s)

Phil
=--=

Logfile:
04:04:18 (scsi0:A:0:0): Unexpected busfree in DT Data-in phase, 1 SCBs aborted, PRGMCNT == 0x2e 
04:04:18 >>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<< 
04:04:18 scsi0: Dumping Card State at program address 0x2c Mode 0x22 
04:04:18 Card was paused 
04:04:18 HS_MAILBOX[0x0] INTCTL[0x80] SEQINTSTAT[0x0] SAVED_MODE[0x11]  
04:04:18 DFFSTAT[0x11] SCSISIGI[0x0] SCSIPHASE[0x0] SCSIBUS[0x0]  
04:04:18 LASTPHASE[0x60] SCSISEQ0[0x0] SCSISEQ1[0x12] SEQCTL0[0x10]  
04:04:18 SEQINTCTL[0x0] SEQ_FLAGS[0x20] SEQ_FLAGS2[0x0] SSTAT0[0x0]  
04:04:18 SSTAT1[0x9] SSTAT2[0xc0] SSTAT3[0x0] PERRDIAG[0x1]  
04:04:18 SIMODE1[0xac] LQISTAT0[0x0] LQISTAT1[0x0] LQISTAT2[0x0]  
04:04:19 LQOSTAT0[0x0] LQOSTAT1[0x0] LQOSTAT2[0x0]  
04:04:19  
04:04:19 SCB Count = 4 CMDS_PENDING = 1 LASTSCB 0x3 CURRSCB 0x3 NEXTSCB 0x0 
04:04:19 qinstart = 24363 qinfifonext = 24363 
04:04:20 QINFIFO: 
04:04:20 WAITING_TID_QUEUES: 
04:04:20 Pending list: 
04:04:20 Total 0 
04:04:20 Kernel Free SCB list: 3 0 1 2  
04:04:20 Sequencer Complete DMA-inprog list:  
04:04:20 Sequencer Complete list:  
04:04:20 Sequencer DMA-Up and Complete list:  
04:04:20  
04:04:20 scsi0: FIFO0 Free, LONGJMP == 0x80ff, SCB 0x0 
04:04:20 SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x0] DFSTATUS[0x89]  
04:04:21 SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0]  
04:04:21 SOFFCNT[0x21] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0  
04:04:21 HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x0]  
04:04:21 scsi0: FIFO1 Active, LONGJMP == 0x1ec, SCB 0x3 
04:04:21 SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x8] DFSTATUS[0x1]  
04:04:21 SG_CACHE_SHADOW[0x28] SG_STATE[0x3] DFFSXFRCTL[0x0]  
04:04:21 SOFFCNT[0x21] MDFFSTAT[0xc] SHADDR = 0x0318ebe5e, SHCNT = 0x1a2  
04:04:21 HADDR = 0x0318ebec2, HCNT = 0x13e CCSGCTL[0x10]  
04:04:21 LQIN: 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0  
04:04:21 scsi0: LQISTATE = 0x0, LQOSTATE = 0x0, OPTIONMODE = 0x42 
04:04:21 scsi0: OS_SPACE_CNT = 0x20 MAXCMDCNT = 0x0 
04:04:21 SIMODE0[0xc]  
04:04:21 CCSCBCTL[0x4]  
04:04:21 scsi0: REG0 == 0x3, SINDEX = 0x122, DINDEX = 0xa9 
04:04:21 scsi0: SCBPTR == 0xff03, SCB_NEXT == 0xff00, SCB_NEXT2 == 0x0 
04:04:21 CDB 3 1 0 0 0 0 
04:04:22 STACK: 0x206 0x0 0x0 0x0 0x0 0x0 0x0 0x29 
04:04:22 <<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>> 
04:04:22 DevQ(0:0:0): 0 waiting 
04:04:22 SCSI error : <0 0 0 0> return code = 0x10000 
04:04:22 end_request: I/O error, dev sda, sector 4239969 
04:04:22 SCSI error : <0 0 0 0> return code = 0x10000 
04:04:22 end_request: I/O error, dev sda, sector 4239977 
04:04:22 SCSI error : <0 0 0 0> return code = 0x10000 
04:04:22 end_request: I/O error, dev sda, sector 4239985 
04:04:22 SCSI error : <0 0 0 0> return code = 0x10000 
04:04:22 end_request: I/O error, dev sda, sector 4239993 
04:04:22 SCSI error : <0 0 0 0> return code = 0x10000 
04:04:22 end_request: I/O error, dev sda, sector 4240001 
04:04:22 (scsi0:A:0:0): No or incomplete CDB sent to device. 
04:04:23 scsi0: Issued Channel A Bus Reset. 1 SCBs aborted 
04:04:23 SCSI error : <0 0 0 0> return code = 0x8000002 
04:04:23 Info fld=0x0, Current sda: sense key Aborted Command 
04:04:23 end_request: I/O error, dev sda, sector 4240009 
04:04:23 SCSI error : <0 0 0 0> return code = 0x8000002 
04:04:23 Info fld=0x0, Current sda: sense key Aborted Command 
04:04:23 end_request: I/O error, dev sda, sector 4240017 
04:04:23 SCSI error : <0 0 0 0> return code = 0x8000002 
04:04:23 Info fld=0x0, Current sda: sense key Aborted Command 
04:04:23 end_request: I/O error, dev sda, sector 68860388 
04:04:23 Buffer I/O error on device sda7, logical block 2187565 
04:04:23 lost page write due to I/O error on sda7 
04:04:23 SCSI error : <0 0 0 0> return code = 0x8000002 
04:04:23 Info fld=0x0, Current sda: sense key Aborted Command 
04:04:23 end_request: I/O error, dev sda, sector 68860396 
04:04:25 end_request: I/O error, dev sda, sector 4240041 
04:04:25 (scsi0:A:0): 160.000MB/s transfers (80.000MHz DT, 16bit) 
04:04:25 (scsi0:A:0:0): Unexpected busfree in DT Data-in phase, 1 SCBs aborted, PRGMCNT == 0x97 
04:04:25 >>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<< 
04:04:25 scsi0: Dumping Card State at program address 0x95 Mode 0x22 
04:04:25 Card was paused 
04:04:25 HS_MAILBOX[0x0] INTCTL[0x80] SEQINTSTAT[0x0] SAVED_MODE[0x11]  
04:04:25 DFFSTAT[0x11] SCSISIGI[0x0] SCSIPHASE[0x0] SCSIBUS[0x0]  
04:04:25 LASTPHASE[0x60] SCSISEQ0[0x0] SCSISEQ1[0x12] SEQCTL0[0x10]  
04:04:25 SEQINTCTL[0x80] SEQ_FLAGS[0x20] SEQ_FLAGS2[0x0] SSTAT0[0x0]  
04:04:25 SSTAT1[0x9] SSTAT2[0xc0] SSTAT3[0x0] PERRDIAG[0x1]  
04:04:25 SIMODE1[0xac] LQISTAT0[0x0] LQISTAT1[0x0] LQISTAT2[0x0]  
04:04:26 LQOSTAT0[0x0] LQOSTAT1[0x0] LQOSTAT2[0x0]  
04:04:26  
04:04:26 SCB Count = 4 CMDS_PENDING = 1 LASTSCB 0x1 CURRSCB 0x1 NEXTSCB 0x0 
04:04:26 qinstart = 87 qinfifonext = 87 
04:04:26 QINFIFO: 
04:04:26 WAITING_TID_QUEUES: 
04:04:26 Pending list: 
04:04:26 Total 0 
04:04:26 Kernel Free SCB list: 1 0 3 2  
04:04:26 Sequencer Complete DMA-inprog list:  
04:04:26 Sequencer Complete list:  
04:04:26 Sequencer DMA-Up and Complete list:  
04:04:26  
04:04:26 scsi0: FIFO0 Free, LONGJMP == 0x80ff, SCB 0x0 
04:04:26 SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x0] DFSTATUS[0x89]  
04:04:26 SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0]  
04:04:26 SOFFCNT[0x1d] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0  
04:04:27 HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x0]  
04:04:27 scsi0: FIFO1 Active, LONGJMP == 0x1ec, SCB 0x1 
04:04:27 SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x8] DFSTATUS[0x81]  
04:04:27 SG_CACHE_SHADOW[0x20] SG_STATE[0x3] DFFSXFRCTL[0x0]  
04:04:27 SOFFCNT[0x1d] MDFFSTAT[0xc] SHADDR = 0x05a68f1ac, SHCNT = 0xe54  
04:04:27 HADDR = 0x05a68f20a, HCNT = 0xdf6 CCSGCTL[0x10]  
04:04:27 LQIN: 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0  
04:04:27 scsi0: LQISTATE = 0x0, LQOSTATE = 0x0, OPTIONMODE = 0x42 
04:04:27 scsi0: OS_SPACE_CNT = 0x20 MAXCMDCNT = 0x0 
04:04:27 SIMODE0[0xc]  
04:04:27 CCSCBCTL[0x4]  
04:04:27 scsi0: REG0 == 0x1, SINDEX = 0x122, DINDEX = 0x1ba 
04:04:27 scsi0: SCBPTR == 0xff01, SCB_NEXT == 0xff00, SCB_NEXT2 == 0x0 
04:04:27 CDB 1 1 0 0 0 0 
04:04:27 STACK: 0x29 0x206 0x0 0x0 0x0 0x0 0x0 0x0 
04:04:27 <<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>> 
04:04:28 DevQ(0:0:0): 0 waiting 
04:04:28 SCSI error : <0 0 0 0> return code = 0x10000 
04:04:28 end_request: I/O error, dev sda, sector 4239913 
04:04:28 SCSI error : <0 0 0 0> return code = 0x10000 
04:04:28 (scsi0:A:0:0): No or incomplete CDB sent to device.
04:04:28 scsi0: Issued Channel A Bus Reset. 1 SCBs aborted
04:04:28 (scsi0:A:0): 80.000MB/s transfers (40.000MHz DT, 16bit)
04:04:28 SCSI error : <0 0 0 0> return code = 0x8000002
04:04:29 Info fld=0x0, Current sda: sense key Aborted Command
04:04:29 end_request: I/O error, dev sda, sector 4239953
04:04:29 SCSI error : <0 0 0 0> return code = 0x8000002
04:04:29 Info fld=0x0, Current sda: sense key Aborted Command
04:04:29 end_request: I/O error, dev sda, sector 51398828
04:04:29 Buffer I/O error on device sda7, logical block 4870
04:04:29 lost page write due to I/O error on sda7
04:04:29 Aborting journal on device sda7.
04:04:29 journal commit I/O error
04:04:29 ext3_abort called.
04:04:29 EXT3-fs abort (device sda7): ext3_journal_start: Detected aborted journal
04:04:29 Remounting filesystem read-only


> 
> On Mon, 2004-10-04 at 07:24, Bryce as root wrote:
> > kernel dmesg dump :
> > Reseting Channel for LQI Phase error
> 
> This is clearly the cause of all the trouble.  The driver is a bit
> opaque at this point, but it looks like an LQI phase error occurs
> because of a phase mismatch during L_Q Information Units (These are a
> requirement for fast-160 data transfers).
> 
> I think this is a strong indicator of bus instability ... could you try
> falling back to fast-80 and turning off IU transfers (You'll probably
> either have to use the card bios or dig around in the aic79xx driver for
> the options).
> 
> James
> 
> 
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: aic79xx blowups in 2.6.8-1.521smp (RHAT)
  2004-10-07 10:48   ` Bryce as root
@ 2004-10-07 15:54     ` Luben Tuikov
  0 siblings, 0 replies; 6+ messages in thread
From: Luben Tuikov @ 2004-10-07 15:54 UTC (permalink / raw)
  To: Bryce as root; +Cc: linux-scsi

Bryce as root wrote:
> Detail preamble:
> Linux ZenIV.linux.org.uk 2.6.8-1.521smp #1 SMP Mon Aug 16 09:25:06 EDT 2004 i686 i686 i386 GNU/Linux
> SCSI subsystem initialized
> ACPI: PCI interrupt 0000:03:0a.0[A] -> GSI 22 (level, low) -> IRQ 217
> ACPI: PCI interrupt 0000:03:0a.1[B] -> GSI 19 (level, low) -> IRQ 177
> scsi0 : Adaptec AIC79XX PCI-X SCSI HBA DRIVER, Rev 1.3.11
>         <Adaptec AIC7902 Ultra320 SCSI adapter>
>         aic7902: Ultra320 Wide Channel A, SCSI Id=7, PCI 33 or 66Mhz, 512 SCBs
> 
> (scsi0:A:0): 80.000MB/s transfers (40.000MHz DT, 16bit)
>   Vendor: SEAGATE   Model: ST336753LW        Rev: 0006
>   Type:   Direct-Access                      ANSI SCSI revision: 03
> scsi0:A:0:0: Tagged Queuing enabled.  Depth 4
> SCSI device sda: 71687372 512-byte hdwr sectors (36704 MB)
> SCSI device sda: drive cache: write back
>  sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 >
> Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
> 
> 
> Mutterings:
> Well I set the BIOS down to 160 and turned off Packeting and QAS and let
> that run for a day
> 
> ( http://ftp.linux.org.uk/~bryce/scsi-bios.gif )
> 
> Unfortunately the driver has blown up in a different way now.
> I'm at a bit of a loss as to whats going on as the disk verifies
> fine from the adaptec bios utils
> 
> I've now set the speed to 80 so we'll see how this goes though it's a shame
> to loose the performance as a result in the drop (was 71MB/s now 61MB/s)

Yes, I agree.  Given the intermittent nature of the problem, SCSI BIOS
is not insured from incurring this failure as well, it's just that it
interacts too little a time with the SCSI bus.

Both bugs display similar problem: unexpected bus phase change which
the driver reports as programmed.  In this case, on REQUEST SENSE(desc)
on a Data-In phase.  The hardware could possibly be flaky.

	Luben


> Logfile:
> 04:04:18 (scsi0:A:0:0): Unexpected busfree in DT Data-in phase, 1 SCBs aborted, PRGMCNT == 0x2e 
> 04:04:18 >>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<< 
> 04:04:18 scsi0: Dumping Card State at program address 0x2c Mode 0x22 
> 04:04:18 Card was paused 
> 04:04:18 HS_MAILBOX[0x0] INTCTL[0x80] SEQINTSTAT[0x0] SAVED_MODE[0x11]  
> 04:04:18 DFFSTAT[0x11] SCSISIGI[0x0] SCSIPHASE[0x0] SCSIBUS[0x0]  
> 04:04:18 LASTPHASE[0x60] SCSISEQ0[0x0] SCSISEQ1[0x12] SEQCTL0[0x10]  
> 04:04:18 SEQINTCTL[0x0] SEQ_FLAGS[0x20] SEQ_FLAGS2[0x0] SSTAT0[0x0]  
> 04:04:18 SSTAT1[0x9] SSTAT2[0xc0] SSTAT3[0x0] PERRDIAG[0x1]  
> 04:04:18 SIMODE1[0xac] LQISTAT0[0x0] LQISTAT1[0x0] LQISTAT2[0x0]  
> 04:04:19 LQOSTAT0[0x0] LQOSTAT1[0x0] LQOSTAT2[0x0]  
> 04:04:19  
> 04:04:19 SCB Count = 4 CMDS_PENDING = 1 LASTSCB 0x3 CURRSCB 0x3 NEXTSCB 0x0 
> 04:04:19 qinstart = 24363 qinfifonext = 24363 
> 04:04:20 QINFIFO: 
> 04:04:20 WAITING_TID_QUEUES: 
> 04:04:20 Pending list: 
> 04:04:20 Total 0 
> 04:04:20 Kernel Free SCB list: 3 0 1 2  
> 04:04:20 Sequencer Complete DMA-inprog list:  
> 04:04:20 Sequencer Complete list:  
> 04:04:20 Sequencer DMA-Up and Complete list:  
> 04:04:20  
> 04:04:20 scsi0: FIFO0 Free, LONGJMP == 0x80ff, SCB 0x0 
> 04:04:20 SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x0] DFSTATUS[0x89]  
> 04:04:21 SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0]  
> 04:04:21 SOFFCNT[0x21] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0  
> 04:04:21 HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x0]  
> 04:04:21 scsi0: FIFO1 Active, LONGJMP == 0x1ec, SCB 0x3 
> 04:04:21 SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x8] DFSTATUS[0x1]  
> 04:04:21 SG_CACHE_SHADOW[0x28] SG_STATE[0x3] DFFSXFRCTL[0x0]  
> 04:04:21 SOFFCNT[0x21] MDFFSTAT[0xc] SHADDR = 0x0318ebe5e, SHCNT = 0x1a2  
> 04:04:21 HADDR = 0x0318ebec2, HCNT = 0x13e CCSGCTL[0x10]  
> 04:04:21 LQIN: 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0  
> 04:04:21 scsi0: LQISTATE = 0x0, LQOSTATE = 0x0, OPTIONMODE = 0x42 
> 04:04:21 scsi0: OS_SPACE_CNT = 0x20 MAXCMDCNT = 0x0 
> 04:04:21 SIMODE0[0xc]  
> 04:04:21 CCSCBCTL[0x4]  
> 04:04:21 scsi0: REG0 == 0x3, SINDEX = 0x122, DINDEX = 0xa9 
> 04:04:21 scsi0: SCBPTR == 0xff03, SCB_NEXT == 0xff00, SCB_NEXT2 == 0x0 
> 04:04:21 CDB 3 1 0 0 0 0 
> 04:04:22 STACK: 0x206 0x0 0x0 0x0 0x0 0x0 0x0 0x29 
> 04:04:22 <<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>> 
> 04:04:22 DevQ(0:0:0): 0 waiting 
> 04:04:22 SCSI error : <0 0 0 0> return code = 0x10000 
> 04:04:22 end_request: I/O error, dev sda, sector 4239969 
> 04:04:22 SCSI error : <0 0 0 0> return code = 0x10000 
> 04:04:22 end_request: I/O error, dev sda, sector 4239977 
> 04:04:22 SCSI error : <0 0 0 0> return code = 0x10000 
> 04:04:22 end_request: I/O error, dev sda, sector 4239985 
> 04:04:22 SCSI error : <0 0 0 0> return code = 0x10000 
> 04:04:22 end_request: I/O error, dev sda, sector 4239993 
> 04:04:22 SCSI error : <0 0 0 0> return code = 0x10000 
> 04:04:22 end_request: I/O error, dev sda, sector 4240001 
> 04:04:22 (scsi0:A:0:0): No or incomplete CDB sent to device. 
> 04:04:23 scsi0: Issued Channel A Bus Reset. 1 SCBs aborted 
> 04:04:23 SCSI error : <0 0 0 0> return code = 0x8000002 
> 04:04:23 Info fld=0x0, Current sda: sense key Aborted Command 
> 04:04:23 end_request: I/O error, dev sda, sector 4240009 
> 04:04:23 SCSI error : <0 0 0 0> return code = 0x8000002 
> 04:04:23 Info fld=0x0, Current sda: sense key Aborted Command 
> 04:04:23 end_request: I/O error, dev sda, sector 4240017 
> 04:04:23 SCSI error : <0 0 0 0> return code = 0x8000002 
> 04:04:23 Info fld=0x0, Current sda: sense key Aborted Command 
> 04:04:23 end_request: I/O error, dev sda, sector 68860388 
> 04:04:23 Buffer I/O error on device sda7, logical block 2187565 
> 04:04:23 lost page write due to I/O error on sda7 
> 04:04:23 SCSI error : <0 0 0 0> return code = 0x8000002 
> 04:04:23 Info fld=0x0, Current sda: sense key Aborted Command 
> 04:04:23 end_request: I/O error, dev sda, sector 68860396 
> 04:04:25 end_request: I/O error, dev sda, sector 4240041 
> 04:04:25 (scsi0:A:0): 160.000MB/s transfers (80.000MHz DT, 16bit) 
> 04:04:25 (scsi0:A:0:0): Unexpected busfree in DT Data-in phase, 1 SCBs aborted, PRGMCNT == 0x97 
> 04:04:25 >>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<< 
> 04:04:25 scsi0: Dumping Card State at program address 0x95 Mode 0x22 
> 04:04:25 Card was paused 
> 04:04:25 HS_MAILBOX[0x0] INTCTL[0x80] SEQINTSTAT[0x0] SAVED_MODE[0x11]  
> 04:04:25 DFFSTAT[0x11] SCSISIGI[0x0] SCSIPHASE[0x0] SCSIBUS[0x0]  
> 04:04:25 LASTPHASE[0x60] SCSISEQ0[0x0] SCSISEQ1[0x12] SEQCTL0[0x10]  
> 04:04:25 SEQINTCTL[0x80] SEQ_FLAGS[0x20] SEQ_FLAGS2[0x0] SSTAT0[0x0]  
> 04:04:25 SSTAT1[0x9] SSTAT2[0xc0] SSTAT3[0x0] PERRDIAG[0x1]  
> 04:04:25 SIMODE1[0xac] LQISTAT0[0x0] LQISTAT1[0x0] LQISTAT2[0x0]  
> 04:04:26 LQOSTAT0[0x0] LQOSTAT1[0x0] LQOSTAT2[0x0]  
> 04:04:26  
> 04:04:26 SCB Count = 4 CMDS_PENDING = 1 LASTSCB 0x1 CURRSCB 0x1 NEXTSCB 0x0 
> 04:04:26 qinstart = 87 qinfifonext = 87 
> 04:04:26 QINFIFO: 
> 04:04:26 WAITING_TID_QUEUES: 
> 04:04:26 Pending list: 
> 04:04:26 Total 0 
> 04:04:26 Kernel Free SCB list: 1 0 3 2  
> 04:04:26 Sequencer Complete DMA-inprog list:  
> 04:04:26 Sequencer Complete list:  
> 04:04:26 Sequencer DMA-Up and Complete list:  
> 04:04:26  
> 04:04:26 scsi0: FIFO0 Free, LONGJMP == 0x80ff, SCB 0x0 
> 04:04:26 SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x0] DFSTATUS[0x89]  
> 04:04:26 SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0]  
> 04:04:26 SOFFCNT[0x1d] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0  
> 04:04:27 HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x0]  
> 04:04:27 scsi0: FIFO1 Active, LONGJMP == 0x1ec, SCB 0x1 
> 04:04:27 SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x8] DFSTATUS[0x81]  
> 04:04:27 SG_CACHE_SHADOW[0x20] SG_STATE[0x3] DFFSXFRCTL[0x0]  
> 04:04:27 SOFFCNT[0x1d] MDFFSTAT[0xc] SHADDR = 0x05a68f1ac, SHCNT = 0xe54  
> 04:04:27 HADDR = 0x05a68f20a, HCNT = 0xdf6 CCSGCTL[0x10]  
> 04:04:27 LQIN: 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0  
> 04:04:27 scsi0: LQISTATE = 0x0, LQOSTATE = 0x0, OPTIONMODE = 0x42 
> 04:04:27 scsi0: OS_SPACE_CNT = 0x20 MAXCMDCNT = 0x0 
> 04:04:27 SIMODE0[0xc]  
> 04:04:27 CCSCBCTL[0x4]  
> 04:04:27 scsi0: REG0 == 0x1, SINDEX = 0x122, DINDEX = 0x1ba 
> 04:04:27 scsi0: SCBPTR == 0xff01, SCB_NEXT == 0xff00, SCB_NEXT2 == 0x0 
> 04:04:27 CDB 1 1 0 0 0 0 
> 04:04:27 STACK: 0x29 0x206 0x0 0x0 0x0 0x0 0x0 0x0 
> 04:04:27 <<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>> 
> 04:04:28 DevQ(0:0:0): 0 waiting 
> 04:04:28 SCSI error : <0 0 0 0> return code = 0x10000 
> 04:04:28 end_request: I/O error, dev sda, sector 4239913 
> 04:04:28 SCSI error : <0 0 0 0> return code = 0x10000 
> 04:04:28 (scsi0:A:0:0): No or incomplete CDB sent to device.
> 04:04:28 scsi0: Issued Channel A Bus Reset. 1 SCBs aborted
> 04:04:28 (scsi0:A:0): 80.000MB/s transfers (40.000MHz DT, 16bit)
> 04:04:28 SCSI error : <0 0 0 0> return code = 0x8000002
> 04:04:29 Info fld=0x0, Current sda: sense key Aborted Command
> 04:04:29 end_request: I/O error, dev sda, sector 4239953
> 04:04:29 SCSI error : <0 0 0 0> return code = 0x8000002
> 04:04:29 Info fld=0x0, Current sda: sense key Aborted Command
> 04:04:29 end_request: I/O error, dev sda, sector 51398828
> 04:04:29 Buffer I/O error on device sda7, logical block 4870
> 04:04:29 lost page write due to I/O error on sda7
> 04:04:29 Aborting journal on device sda7.
> 04:04:29 journal commit I/O error
> 04:04:29 ext3_abort called.
> 04:04:29 EXT3-fs abort (device sda7): ext3_journal_start: Detected aborted journal
> 04:04:29 Remounting filesystem read-only
> 
> 
> 
>>On Mon, 2004-10-04 at 07:24, Bryce as root wrote:
>>
>>>kernel dmesg dump :
>>>Reseting Channel for LQI Phase error
>>
>>This is clearly the cause of all the trouble.  The driver is a bit
>>opaque at this point, but it looks like an LQI phase error occurs
>>because of a phase mismatch during L_Q Information Units (These are a
>>requirement for fast-160 data transfers).
>>
>>I think this is a strong indicator of bus instability ... could you try
>>falling back to fast-80 and turning off IU transfers (You'll probably
>>either have to use the card bios or dig around in the aic79xx driver for
>>the options).
>>
>>James
>>
>>
>>
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2004-10-07 15:54 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-10-04 12:24 aic79xx blowups in 2.6.8-1.521smp (RHAT) Bryce as root
2004-10-04 14:00 ` James Bottomley
2004-10-07 10:48   ` Bryce as root
2004-10-07 15:54     ` Luben Tuikov
2004-10-05 14:04 ` Luben Tuikov
2004-10-05 21:51 ` Luben Tuikov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).