From mboxrd@z Thu Jan 1 00:00:00 1970 From: Bryce as root Subject: aic79xx blowups in 2.6.8-1.521smp (RHAT) Date: Mon, 4 Oct 2004 13:24:40 +0100 (BST) Sender: linux-scsi-owner@vger.kernel.org Message-ID: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Return-path: Received: from parcelfarce.linux.theplanet.co.uk ([195.92.249.252]:8329 "EHLO www.linux.org.uk") by vger.kernel.org with ESMTP id S267566AbUJDMYo (ORCPT ); Mon, 4 Oct 2004 08:24:44 -0400 List-Id: linux-scsi@vger.kernel.org To: linux-scsi@vger.kernel.org where to start,.. I have a system which has an onboard aic79xx controller lspci -vv : 03:0a.0 SCSI storage controller: Adaptec AIC-7902 U320 (rev 03) Subsystem: Adaptec: Unknown device ffff Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- SERR- GSI 22 (level, low) -> IRQ 217 ACPI: PCI interrupt 0000:03:0a.1[B] -> GSI 19 (level, low) -> IRQ 177 scsi0 : Adaptec AIC79XX PCI-X SCSI HBA DRIVER, Rev 1.3.11 aic7902: Ultra320 Wide Channel A, SCSI Id=7, PCI 33 or 66Mhz, 512 SCBs (scsi0:A:0): 320.000MB/s transfers (160.000MHz DT|IU|QAS, 16bit) Vendor: SEAGATE Model: ST336753LW Rev: 0006 Type: Direct-Access ANSI SCSI revision: 03 scsi0:A:0:0: Tagged Queuing enabled. Depth 4 SCSI device sda: 71687372 512-byte hdwr sectors (36704 MB) SCSI device sda: drive cache: write back sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 > Attached scsi disk sda at scsi0, channel 0, id 0, lun 0 scsi1 : Adaptec AIC79XX PCI-X SCSI HBA DRIVER, Rev 1.3.11 aic7902: Ultra320 Wide Channel B, SCSI Id=7, PCI 33 or 66Mhz, 512 SCBs Ok, to the problem itself. I've found that the system will choke badly after anything from a few days to a couple of weeks, always over a LQI Phase error. I've asked around to help work out what the LQI Phase error is about and I still don't understand it. kernel dmesg dump : Reseting Channel for LQI Phase error >>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<< scsi0: Dumping Card State at program address 0x8 Mode 0x33 Card was paused HS_MAILBOX[0x0] INTCTL[0x80] SEQINTSTAT[0x0] SAVED_MODE[0x11] DFFSTAT[0x11] SCSISIGI[0x0] SCSIPHASE[0x0] SCSIBUS[0x0] LASTPHASE[0x1] SCSISEQ0[0x0] SCSISEQ1[0x12] SEQCTL0[0x10] SEQINTCTL[0x0] SEQ_FLAGS[0x0] SEQ_FLAGS2[0x0] SSTAT0[0x0] SSTAT1[0x9] SSTAT2[0x0] SSTAT3[0x0] PERRDIAG[0x0] SIMODE1[0xa4] LQISTAT0[0x0] LQISTAT1[0x0] LQISTAT2[0x0] LQOSTAT0[0x0] LQOSTAT1[0x0] LQOSTAT2[0x1] SCB Count = 4 CMDS_PENDING = 2 LASTSCB 0x1 CURRSCB 0x0 NEXTSCB 0xff00 qinstart = 40786 qinfifonext = 40786 QINFIFO: WAITING_TID_QUEUES: Pending list: 0 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7] 1 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7] Total 2 Kernel Free SCB list: 2 3 Sequencer Complete DMA-inprog list: Sequencer Complete list: Sequencer DMA-Up and Complete list: scsi0: FIFO0 Free, LONGJMP == 0x80ff, SCB 0x1 SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x0] DFSTATUS[0x89] SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0] SOFFCNT[0x66] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0 HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x88] scsi0: FIFO1 Active, LONGJMP == 0x257, SCB 0x1 SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x8] DFSTATUS[0x0] SG_CACHE_SHADOW[0x30] SG_STATE[0x6] DFFSXFRCTL[0x0] SOFFCNT[0x66] MDFFSTAT[0xa] SHADDR = 0x06c420c04, SHCNT = 0x3fc HADDR = 0x06c420ce0, HCNT = 0x320 CCSGCTL[0x98] LQIN: 0x4 0x0 0x0 0x1 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x32 0x0 0x0 0x0 0 x2 0x0 scsi0: LQISTATE = 0x0, LQOSTATE = 0x0, OPTIONMODE = 0x42 scsi0: OS_SPACE_CNT = 0x20 MAXCMDCNT = 0x2 SIMODE0[0xc] CCSCBCTL[0x4] scsi0: REG0 == 0x3, SINDEX = 0x133, DINDEX = 0x102 scsi0: SCBPTR == 0x0, SCB_NEXT == 0xff00, SCB_NEXT2 == 0xff11 CDB 28 0 1 93 e9 d6 STACK: 0x125 0x0 0x0 0x257 0x240 0x93 0x29 0x1 <<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>> DevQ(0:0:0): 0 waiting SCSI error : <0 0 0 0> return code = 0x10000 end_request: I/O error, dev sda, sector 26470870 SCSI error : <0 0 0 0> return code = 0x10000 end_request: I/O error, dev sda, sector 26470878 SCSI error : <0 0 0 0> return code = 0x10000 end_request: I/O error, dev sda, sector 26470886 SCSI error : <0 0 0 0> return code = 0x10000 end_request: I/O error, dev sda, sector 26470894 SCSI error : <0 0 0 0> return code = 0x8000002 Info fld=0x0, Current sda: sense key Aborted Command end_request: I/O error, dev sda, sector 26470766 SCSI error : <0 0 0 0> return code = 0x8000002 Info fld=0x0, Current sda: sense key Aborted Command end_request: I/O error, dev sda, sector 26470774 SCSI error : <0 0 0 0> return code = 0x8000002 Info fld=0x0, Current sda: sense key Aborted Command end_request: I/O error, dev sda, sector 51448252 Buffer I/O error on device sda7, logical block 11048 lost page write due to I/O error on sda7 SCSI error : <0 0 0 0> return code = 0x8000002 Info fld=0x0, Current sda: sense key Aborted Command end_request: I/O error, dev sda, sector 26470782 SCSI error : <0 0 0 0> return code = 0x8000002 Info fld=0x0, Current sda: sense key Aborted Command end_request: I/O error, dev sda, sector 68501204 Buffer I/O error on device sda7, logical block 2142667 lost page write due to I/O error on sda7 (scsi0:A:0): 320.000MB/s transfers (160.000MHz DT|IU|QAS, 16bit) When the system is up and running, the disk is stable and I get a quite respectable 72MB/s max throughput. I have other card dumps however since this isn't my area of expertise I've not included them as they're roughly the same problem always starting off with a "Reseting Channel for LQI Phase error" error. I should mention that I have a sister machine with the exact same HW that exhibits the issues and though not a 100% indication, it would suggest the issue is more to do with the driver. I did also try to rebuild the kernel with aic79xx-linux-2.6-20040522-tar.gz however I've not run it long enough on the box to say that it fixes the issue Ideas anyone? or shall I just keep banging on 2.0.12 and hope it fixes/masks the problem? Phil =--=