From mboxrd@z Thu Jan 1 00:00:00 1970 From: Luben Tuikov Subject: Re: aic79xx blowups in 2.6.8-1.521smp (RHAT) Date: Thu, 07 Oct 2004 11:54:04 -0400 Sender: linux-scsi-owner@vger.kernel.org Message-ID: <4165669C.9060304@adaptec.com> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Return-path: Received: from magic.adaptec.com ([216.52.22.17]:16852 "EHLO magic.adaptec.com") by vger.kernel.org with ESMTP id S267411AbUJGPyP (ORCPT ); Thu, 7 Oct 2004 11:54:15 -0400 In-Reply-To: List-Id: linux-scsi@vger.kernel.org To: Bryce as root Cc: linux-scsi@vger.kernel.org Bryce as root wrote: > Detail preamble: > Linux ZenIV.linux.org.uk 2.6.8-1.521smp #1 SMP Mon Aug 16 09:25:06 EDT 2004 i686 i686 i386 GNU/Linux > SCSI subsystem initialized > ACPI: PCI interrupt 0000:03:0a.0[A] -> GSI 22 (level, low) -> IRQ 217 > ACPI: PCI interrupt 0000:03:0a.1[B] -> GSI 19 (level, low) -> IRQ 177 > scsi0 : Adaptec AIC79XX PCI-X SCSI HBA DRIVER, Rev 1.3.11 > > aic7902: Ultra320 Wide Channel A, SCSI Id=7, PCI 33 or 66Mhz, 512 SCBs > > (scsi0:A:0): 80.000MB/s transfers (40.000MHz DT, 16bit) > Vendor: SEAGATE Model: ST336753LW Rev: 0006 > Type: Direct-Access ANSI SCSI revision: 03 > scsi0:A:0:0: Tagged Queuing enabled. Depth 4 > SCSI device sda: 71687372 512-byte hdwr sectors (36704 MB) > SCSI device sda: drive cache: write back > sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 > > Attached scsi disk sda at scsi0, channel 0, id 0, lun 0 > > > Mutterings: > Well I set the BIOS down to 160 and turned off Packeting and QAS and let > that run for a day > > ( http://ftp.linux.org.uk/~bryce/scsi-bios.gif ) > > Unfortunately the driver has blown up in a different way now. > I'm at a bit of a loss as to whats going on as the disk verifies > fine from the adaptec bios utils > > I've now set the speed to 80 so we'll see how this goes though it's a shame > to loose the performance as a result in the drop (was 71MB/s now 61MB/s) Yes, I agree. Given the intermittent nature of the problem, SCSI BIOS is not insured from incurring this failure as well, it's just that it interacts too little a time with the SCSI bus. Both bugs display similar problem: unexpected bus phase change which the driver reports as programmed. In this case, on REQUEST SENSE(desc) on a Data-In phase. The hardware could possibly be flaky. Luben > Logfile: > 04:04:18 (scsi0:A:0:0): Unexpected busfree in DT Data-in phase, 1 SCBs aborted, PRGMCNT == 0x2e > 04:04:18 >>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<< > 04:04:18 scsi0: Dumping Card State at program address 0x2c Mode 0x22 > 04:04:18 Card was paused > 04:04:18 HS_MAILBOX[0x0] INTCTL[0x80] SEQINTSTAT[0x0] SAVED_MODE[0x11] > 04:04:18 DFFSTAT[0x11] SCSISIGI[0x0] SCSIPHASE[0x0] SCSIBUS[0x0] > 04:04:18 LASTPHASE[0x60] SCSISEQ0[0x0] SCSISEQ1[0x12] SEQCTL0[0x10] > 04:04:18 SEQINTCTL[0x0] SEQ_FLAGS[0x20] SEQ_FLAGS2[0x0] SSTAT0[0x0] > 04:04:18 SSTAT1[0x9] SSTAT2[0xc0] SSTAT3[0x0] PERRDIAG[0x1] > 04:04:18 SIMODE1[0xac] LQISTAT0[0x0] LQISTAT1[0x0] LQISTAT2[0x0] > 04:04:19 LQOSTAT0[0x0] LQOSTAT1[0x0] LQOSTAT2[0x0] > 04:04:19 > 04:04:19 SCB Count = 4 CMDS_PENDING = 1 LASTSCB 0x3 CURRSCB 0x3 NEXTSCB 0x0 > 04:04:19 qinstart = 24363 qinfifonext = 24363 > 04:04:20 QINFIFO: > 04:04:20 WAITING_TID_QUEUES: > 04:04:20 Pending list: > 04:04:20 Total 0 > 04:04:20 Kernel Free SCB list: 3 0 1 2 > 04:04:20 Sequencer Complete DMA-inprog list: > 04:04:20 Sequencer Complete list: > 04:04:20 Sequencer DMA-Up and Complete list: > 04:04:20 > 04:04:20 scsi0: FIFO0 Free, LONGJMP == 0x80ff, SCB 0x0 > 04:04:20 SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x0] DFSTATUS[0x89] > 04:04:21 SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0] > 04:04:21 SOFFCNT[0x21] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0 > 04:04:21 HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x0] > 04:04:21 scsi0: FIFO1 Active, LONGJMP == 0x1ec, SCB 0x3 > 04:04:21 SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x8] DFSTATUS[0x1] > 04:04:21 SG_CACHE_SHADOW[0x28] SG_STATE[0x3] DFFSXFRCTL[0x0] > 04:04:21 SOFFCNT[0x21] MDFFSTAT[0xc] SHADDR = 0x0318ebe5e, SHCNT = 0x1a2 > 04:04:21 HADDR = 0x0318ebec2, HCNT = 0x13e CCSGCTL[0x10] > 04:04:21 LQIN: 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 > 04:04:21 scsi0: LQISTATE = 0x0, LQOSTATE = 0x0, OPTIONMODE = 0x42 > 04:04:21 scsi0: OS_SPACE_CNT = 0x20 MAXCMDCNT = 0x0 > 04:04:21 SIMODE0[0xc] > 04:04:21 CCSCBCTL[0x4] > 04:04:21 scsi0: REG0 == 0x3, SINDEX = 0x122, DINDEX = 0xa9 > 04:04:21 scsi0: SCBPTR == 0xff03, SCB_NEXT == 0xff00, SCB_NEXT2 == 0x0 > 04:04:21 CDB 3 1 0 0 0 0 > 04:04:22 STACK: 0x206 0x0 0x0 0x0 0x0 0x0 0x0 0x29 > 04:04:22 <<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>> > 04:04:22 DevQ(0:0:0): 0 waiting > 04:04:22 SCSI error : <0 0 0 0> return code = 0x10000 > 04:04:22 end_request: I/O error, dev sda, sector 4239969 > 04:04:22 SCSI error : <0 0 0 0> return code = 0x10000 > 04:04:22 end_request: I/O error, dev sda, sector 4239977 > 04:04:22 SCSI error : <0 0 0 0> return code = 0x10000 > 04:04:22 end_request: I/O error, dev sda, sector 4239985 > 04:04:22 SCSI error : <0 0 0 0> return code = 0x10000 > 04:04:22 end_request: I/O error, dev sda, sector 4239993 > 04:04:22 SCSI error : <0 0 0 0> return code = 0x10000 > 04:04:22 end_request: I/O error, dev sda, sector 4240001 > 04:04:22 (scsi0:A:0:0): No or incomplete CDB sent to device. > 04:04:23 scsi0: Issued Channel A Bus Reset. 1 SCBs aborted > 04:04:23 SCSI error : <0 0 0 0> return code = 0x8000002 > 04:04:23 Info fld=0x0, Current sda: sense key Aborted Command > 04:04:23 end_request: I/O error, dev sda, sector 4240009 > 04:04:23 SCSI error : <0 0 0 0> return code = 0x8000002 > 04:04:23 Info fld=0x0, Current sda: sense key Aborted Command > 04:04:23 end_request: I/O error, dev sda, sector 4240017 > 04:04:23 SCSI error : <0 0 0 0> return code = 0x8000002 > 04:04:23 Info fld=0x0, Current sda: sense key Aborted Command > 04:04:23 end_request: I/O error, dev sda, sector 68860388 > 04:04:23 Buffer I/O error on device sda7, logical block 2187565 > 04:04:23 lost page write due to I/O error on sda7 > 04:04:23 SCSI error : <0 0 0 0> return code = 0x8000002 > 04:04:23 Info fld=0x0, Current sda: sense key Aborted Command > 04:04:23 end_request: I/O error, dev sda, sector 68860396 > 04:04:25 end_request: I/O error, dev sda, sector 4240041 > 04:04:25 (scsi0:A:0): 160.000MB/s transfers (80.000MHz DT, 16bit) > 04:04:25 (scsi0:A:0:0): Unexpected busfree in DT Data-in phase, 1 SCBs aborted, PRGMCNT == 0x97 > 04:04:25 >>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<< > 04:04:25 scsi0: Dumping Card State at program address 0x95 Mode 0x22 > 04:04:25 Card was paused > 04:04:25 HS_MAILBOX[0x0] INTCTL[0x80] SEQINTSTAT[0x0] SAVED_MODE[0x11] > 04:04:25 DFFSTAT[0x11] SCSISIGI[0x0] SCSIPHASE[0x0] SCSIBUS[0x0] > 04:04:25 LASTPHASE[0x60] SCSISEQ0[0x0] SCSISEQ1[0x12] SEQCTL0[0x10] > 04:04:25 SEQINTCTL[0x80] SEQ_FLAGS[0x20] SEQ_FLAGS2[0x0] SSTAT0[0x0] > 04:04:25 SSTAT1[0x9] SSTAT2[0xc0] SSTAT3[0x0] PERRDIAG[0x1] > 04:04:25 SIMODE1[0xac] LQISTAT0[0x0] LQISTAT1[0x0] LQISTAT2[0x0] > 04:04:26 LQOSTAT0[0x0] LQOSTAT1[0x0] LQOSTAT2[0x0] > 04:04:26 > 04:04:26 SCB Count = 4 CMDS_PENDING = 1 LASTSCB 0x1 CURRSCB 0x1 NEXTSCB 0x0 > 04:04:26 qinstart = 87 qinfifonext = 87 > 04:04:26 QINFIFO: > 04:04:26 WAITING_TID_QUEUES: > 04:04:26 Pending list: > 04:04:26 Total 0 > 04:04:26 Kernel Free SCB list: 1 0 3 2 > 04:04:26 Sequencer Complete DMA-inprog list: > 04:04:26 Sequencer Complete list: > 04:04:26 Sequencer DMA-Up and Complete list: > 04:04:26 > 04:04:26 scsi0: FIFO0 Free, LONGJMP == 0x80ff, SCB 0x0 > 04:04:26 SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x0] DFSTATUS[0x89] > 04:04:26 SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0] > 04:04:26 SOFFCNT[0x1d] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0 > 04:04:27 HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x0] > 04:04:27 scsi0: FIFO1 Active, LONGJMP == 0x1ec, SCB 0x1 > 04:04:27 SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x8] DFSTATUS[0x81] > 04:04:27 SG_CACHE_SHADOW[0x20] SG_STATE[0x3] DFFSXFRCTL[0x0] > 04:04:27 SOFFCNT[0x1d] MDFFSTAT[0xc] SHADDR = 0x05a68f1ac, SHCNT = 0xe54 > 04:04:27 HADDR = 0x05a68f20a, HCNT = 0xdf6 CCSGCTL[0x10] > 04:04:27 LQIN: 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 > 04:04:27 scsi0: LQISTATE = 0x0, LQOSTATE = 0x0, OPTIONMODE = 0x42 > 04:04:27 scsi0: OS_SPACE_CNT = 0x20 MAXCMDCNT = 0x0 > 04:04:27 SIMODE0[0xc] > 04:04:27 CCSCBCTL[0x4] > 04:04:27 scsi0: REG0 == 0x1, SINDEX = 0x122, DINDEX = 0x1ba > 04:04:27 scsi0: SCBPTR == 0xff01, SCB_NEXT == 0xff00, SCB_NEXT2 == 0x0 > 04:04:27 CDB 1 1 0 0 0 0 > 04:04:27 STACK: 0x29 0x206 0x0 0x0 0x0 0x0 0x0 0x0 > 04:04:27 <<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>> > 04:04:28 DevQ(0:0:0): 0 waiting > 04:04:28 SCSI error : <0 0 0 0> return code = 0x10000 > 04:04:28 end_request: I/O error, dev sda, sector 4239913 > 04:04:28 SCSI error : <0 0 0 0> return code = 0x10000 > 04:04:28 (scsi0:A:0:0): No or incomplete CDB sent to device. > 04:04:28 scsi0: Issued Channel A Bus Reset. 1 SCBs aborted > 04:04:28 (scsi0:A:0): 80.000MB/s transfers (40.000MHz DT, 16bit) > 04:04:28 SCSI error : <0 0 0 0> return code = 0x8000002 > 04:04:29 Info fld=0x0, Current sda: sense key Aborted Command > 04:04:29 end_request: I/O error, dev sda, sector 4239953 > 04:04:29 SCSI error : <0 0 0 0> return code = 0x8000002 > 04:04:29 Info fld=0x0, Current sda: sense key Aborted Command > 04:04:29 end_request: I/O error, dev sda, sector 51398828 > 04:04:29 Buffer I/O error on device sda7, logical block 4870 > 04:04:29 lost page write due to I/O error on sda7 > 04:04:29 Aborting journal on device sda7. > 04:04:29 journal commit I/O error > 04:04:29 ext3_abort called. > 04:04:29 EXT3-fs abort (device sda7): ext3_journal_start: Detected aborted journal > 04:04:29 Remounting filesystem read-only > > > >>On Mon, 2004-10-04 at 07:24, Bryce as root wrote: >> >>>kernel dmesg dump : >>>Reseting Channel for LQI Phase error >> >>This is clearly the cause of all the trouble. The driver is a bit >>opaque at this point, but it looks like an LQI phase error occurs >>because of a phase mismatch during L_Q Information Units (These are a >>requirement for fast-160 data transfers). >> >>I think this is a strong indicator of bus instability ... could you try >>falling back to fast-80 and turning off IU transfers (You'll probably >>either have to use the card bios or dig around in the aic79xx driver for >>the options). >> >>James >> >> >> > > > - > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html