linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* aic79xx blowups in 2.6.8-1.521smp (RHAT)
@ 2004-10-04 12:24 Bryce as root
  2004-10-04 14:00 ` James Bottomley
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Bryce as root @ 2004-10-04 12:24 UTC (permalink / raw)
  To: linux-scsi

<sigh> where to start,..

I have a system which has an onboard aic79xx controller

lspci -vv :
03:0a.0 SCSI storage controller: Adaptec AIC-7902 U320 (rev 03)
        Subsystem: Adaptec: Unknown device ffff
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 32 (10000ns min, 6250ns max), Cache Line Size 08
        Interrupt: pin A routed to IRQ 217
        Region 0: I/O ports at 7c00 [disabled]
        Region 1: Memory at f3008000 (64-bit, non-prefetchable) [size=8K]
        Region 3: I/O ports at 8000 [disabled] [size=256]
        Capabilities: [dc] Power Management version 1
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [a0] Message Signalled Interrupts: 64bit+ Queue=0/1 Enable-
                Address: 0000000000000000  Data: 0000
        Capabilities: [94] PCI-X non-bridge device.
                Command: DPERE- ERO+ RBC=0 OST=4
                Status: Bus=255 Dev=31 Func=0 64bit+ 133MHz+ SCD- USC-, DC=simple, DMMRBC=0, DMOST=4, DMCRS=1, RSCEM-

This is connected by certified U320 cable to a 15K U320 Seagate 37Gb drive

SCSI subsystem initialized
ACPI: PCI interrupt 0000:03:0a.0[A] -> GSI 22 (level, low) -> IRQ 217
ACPI: PCI interrupt 0000:03:0a.1[B] -> GSI 19 (level, low) -> IRQ 177
scsi0 : Adaptec AIC79XX PCI-X SCSI HBA DRIVER, Rev 1.3.11
        <Adaptec AIC7902 Ultra320 SCSI adapter>
        aic7902: Ultra320 Wide Channel A, SCSI Id=7, PCI 33 or 66Mhz, 512 SCBs

(scsi0:A:0): 320.000MB/s transfers (160.000MHz DT|IU|QAS, 16bit)
  Vendor: SEAGATE   Model: ST336753LW        Rev: 0006
  Type:   Direct-Access                      ANSI SCSI revision: 03
scsi0:A:0:0: Tagged Queuing enabled.  Depth 4
SCSI device sda: 71687372 512-byte hdwr sectors (36704 MB)
SCSI device sda: drive cache: write back
 sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 >
Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
scsi1 : Adaptec AIC79XX PCI-X SCSI HBA DRIVER, Rev 1.3.11
        <Adaptec AIC7902 Ultra320 SCSI adapter>
        aic7902: Ultra320 Wide Channel B, SCSI Id=7, PCI 33 or 66Mhz, 512 SCBs

Ok, to the problem itself. I've found that the system will choke badly after
anything from a few days to a couple of weeks, always over a LQI Phase error.
I've asked around to help work out what the LQI Phase error is about and
I still don't understand it.

kernel dmesg dump :
Reseting Channel for LQI Phase error
>>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<<
scsi0: Dumping Card State at program address 0x8 Mode 0x33
Card was paused
HS_MAILBOX[0x0] INTCTL[0x80] SEQINTSTAT[0x0] SAVED_MODE[0x11] 
DFFSTAT[0x11] SCSISIGI[0x0] SCSIPHASE[0x0] SCSIBUS[0x0] 
LASTPHASE[0x1] SCSISEQ0[0x0] SCSISEQ1[0x12] SEQCTL0[0x10] 
SEQINTCTL[0x0] SEQ_FLAGS[0x0] SEQ_FLAGS2[0x0] SSTAT0[0x0] 
SSTAT1[0x9] SSTAT2[0x0] SSTAT3[0x0] PERRDIAG[0x0] 
SIMODE1[0xa4] LQISTAT0[0x0] LQISTAT1[0x0] LQISTAT2[0x0] 
LQOSTAT0[0x0] LQOSTAT1[0x0] LQOSTAT2[0x1] 

SCB Count = 4 CMDS_PENDING = 2 LASTSCB 0x1 CURRSCB 0x0 NEXTSCB 0xff00
qinstart = 40786 qinfifonext = 40786
QINFIFO:
WAITING_TID_QUEUES:
Pending list:
  0 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7] 
  1 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7] 
Total 2
Kernel Free SCB list: 2 3 
Sequencer Complete DMA-inprog list: 
Sequencer Complete list: 
Sequencer DMA-Up and Complete list: 

scsi0: FIFO0 Free, LONGJMP == 0x80ff, SCB 0x1
SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x0] DFSTATUS[0x89] 
SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0] 
SOFFCNT[0x66] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0 
HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x88] 
scsi0: FIFO1 Active, LONGJMP == 0x257, SCB 0x1
SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x8] DFSTATUS[0x0] 
SG_CACHE_SHADOW[0x30] SG_STATE[0x6] DFFSXFRCTL[0x0] 
SOFFCNT[0x66] MDFFSTAT[0xa] SHADDR = 0x06c420c04, SHCNT = 0x3fc 
HADDR = 0x06c420ce0, HCNT = 0x320 CCSGCTL[0x98] 
LQIN: 0x4 0x0 0x0 0x1 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x32 0x0 0x0 0x0 0
x2 0x0 
scsi0: LQISTATE = 0x0, LQOSTATE = 0x0, OPTIONMODE = 0x42
scsi0: OS_SPACE_CNT = 0x20 MAXCMDCNT = 0x2
SIMODE0[0xc] 
CCSCBCTL[0x4] 
scsi0: REG0 == 0x3, SINDEX = 0x133, DINDEX = 0x102
scsi0: SCBPTR == 0x0, SCB_NEXT == 0xff00, SCB_NEXT2 == 0xff11
CDB 28 0 1 93 e9 d6
STACK: 0x125 0x0 0x0 0x257 0x240 0x93 0x29 0x1
<<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>>
DevQ(0:0:0): 0 waiting
SCSI error : <0 0 0 0> return code = 0x10000
end_request: I/O error, dev sda, sector 26470870
SCSI error : <0 0 0 0> return code = 0x10000
end_request: I/O error, dev sda, sector 26470878
SCSI error : <0 0 0 0> return code = 0x10000
end_request: I/O error, dev sda, sector 26470886
SCSI error : <0 0 0 0> return code = 0x10000
end_request: I/O error, dev sda, sector 26470894
SCSI error : <0 0 0 0> return code = 0x8000002
Info fld=0x0, Current sda: sense key Aborted Command
end_request: I/O error, dev sda, sector 26470766
SCSI error : <0 0 0 0> return code = 0x8000002
Info fld=0x0, Current sda: sense key Aborted Command
end_request: I/O error, dev sda, sector 26470774
SCSI error : <0 0 0 0> return code = 0x8000002
Info fld=0x0, Current sda: sense key Aborted Command
end_request: I/O error, dev sda, sector 51448252
Buffer I/O error on device sda7, logical block 11048
lost page write due to I/O error on sda7
SCSI error : <0 0 0 0> return code = 0x8000002
Info fld=0x0, Current sda: sense key Aborted Command
end_request: I/O error, dev sda, sector 26470782
SCSI error : <0 0 0 0> return code = 0x8000002
Info fld=0x0, Current sda: sense key Aborted Command
end_request: I/O error, dev sda, sector 68501204
Buffer I/O error on device sda7, logical block 2142667
lost page write due to I/O error on sda7
(scsi0:A:0): 320.000MB/s transfers (160.000MHz DT|IU|QAS, 16bit)


When the system is up and running, the disk is stable and I
get a quite respectable 72MB/s max throughput.

I have other card dumps however since this isn't my area of expertise I've
not included them as they're roughly the same problem always starting off
with a "Reseting Channel for LQI Phase error" error.

I should mention that I have a sister machine with the exact same HW 
that exhibits the issues and though not a 100% indication, it
would suggest the issue is more to do with the driver.

I did also try to rebuild the kernel with aic79xx-linux-2.6-20040522-tar.gz
however I've not run it long enough on the box to say that it fixes the issue

Ideas anyone? or shall I just keep banging on 2.0.12 and hope it fixes/masks
the problem?

Phil
=--=

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2004-10-07 15:54 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-10-04 12:24 aic79xx blowups in 2.6.8-1.521smp (RHAT) Bryce as root
2004-10-04 14:00 ` James Bottomley
2004-10-07 10:48   ` Bryce as root
2004-10-07 15:54     ` Luben Tuikov
2004-10-05 14:04 ` Luben Tuikov
2004-10-05 21:51 ` Luben Tuikov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).