From: Luben Tuikov <luben_tuikov@adaptec.com>
To: Bryce as root <root@www.linux.org.uk>
Cc: linux-scsi@vger.kernel.org
Subject: Re: aic79xx blowups in 2.6.8-1.521smp (RHAT)
Date: Thu, 07 Oct 2004 11:54:04 -0400 [thread overview]
Message-ID: <4165669C.9060304@adaptec.com> (raw)
In-Reply-To: <E1CFVoq-0004N7-JX@www.linux.org.uk>
Bryce as root wrote:
> Detail preamble:
> Linux ZenIV.linux.org.uk 2.6.8-1.521smp #1 SMP Mon Aug 16 09:25:06 EDT 2004 i686 i686 i386 GNU/Linux
> SCSI subsystem initialized
> ACPI: PCI interrupt 0000:03:0a.0[A] -> GSI 22 (level, low) -> IRQ 217
> ACPI: PCI interrupt 0000:03:0a.1[B] -> GSI 19 (level, low) -> IRQ 177
> scsi0 : Adaptec AIC79XX PCI-X SCSI HBA DRIVER, Rev 1.3.11
> <Adaptec AIC7902 Ultra320 SCSI adapter>
> aic7902: Ultra320 Wide Channel A, SCSI Id=7, PCI 33 or 66Mhz, 512 SCBs
>
> (scsi0:A:0): 80.000MB/s transfers (40.000MHz DT, 16bit)
> Vendor: SEAGATE Model: ST336753LW Rev: 0006
> Type: Direct-Access ANSI SCSI revision: 03
> scsi0:A:0:0: Tagged Queuing enabled. Depth 4
> SCSI device sda: 71687372 512-byte hdwr sectors (36704 MB)
> SCSI device sda: drive cache: write back
> sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 >
> Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
>
>
> Mutterings:
> Well I set the BIOS down to 160 and turned off Packeting and QAS and let
> that run for a day
>
> ( http://ftp.linux.org.uk/~bryce/scsi-bios.gif )
>
> Unfortunately the driver has blown up in a different way now.
> I'm at a bit of a loss as to whats going on as the disk verifies
> fine from the adaptec bios utils
>
> I've now set the speed to 80 so we'll see how this goes though it's a shame
> to loose the performance as a result in the drop (was 71MB/s now 61MB/s)
Yes, I agree. Given the intermittent nature of the problem, SCSI BIOS
is not insured from incurring this failure as well, it's just that it
interacts too little a time with the SCSI bus.
Both bugs display similar problem: unexpected bus phase change which
the driver reports as programmed. In this case, on REQUEST SENSE(desc)
on a Data-In phase. The hardware could possibly be flaky.
Luben
> Logfile:
> 04:04:18 (scsi0:A:0:0): Unexpected busfree in DT Data-in phase, 1 SCBs aborted, PRGMCNT == 0x2e
> 04:04:18 >>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<<
> 04:04:18 scsi0: Dumping Card State at program address 0x2c Mode 0x22
> 04:04:18 Card was paused
> 04:04:18 HS_MAILBOX[0x0] INTCTL[0x80] SEQINTSTAT[0x0] SAVED_MODE[0x11]
> 04:04:18 DFFSTAT[0x11] SCSISIGI[0x0] SCSIPHASE[0x0] SCSIBUS[0x0]
> 04:04:18 LASTPHASE[0x60] SCSISEQ0[0x0] SCSISEQ1[0x12] SEQCTL0[0x10]
> 04:04:18 SEQINTCTL[0x0] SEQ_FLAGS[0x20] SEQ_FLAGS2[0x0] SSTAT0[0x0]
> 04:04:18 SSTAT1[0x9] SSTAT2[0xc0] SSTAT3[0x0] PERRDIAG[0x1]
> 04:04:18 SIMODE1[0xac] LQISTAT0[0x0] LQISTAT1[0x0] LQISTAT2[0x0]
> 04:04:19 LQOSTAT0[0x0] LQOSTAT1[0x0] LQOSTAT2[0x0]
> 04:04:19
> 04:04:19 SCB Count = 4 CMDS_PENDING = 1 LASTSCB 0x3 CURRSCB 0x3 NEXTSCB 0x0
> 04:04:19 qinstart = 24363 qinfifonext = 24363
> 04:04:20 QINFIFO:
> 04:04:20 WAITING_TID_QUEUES:
> 04:04:20 Pending list:
> 04:04:20 Total 0
> 04:04:20 Kernel Free SCB list: 3 0 1 2
> 04:04:20 Sequencer Complete DMA-inprog list:
> 04:04:20 Sequencer Complete list:
> 04:04:20 Sequencer DMA-Up and Complete list:
> 04:04:20
> 04:04:20 scsi0: FIFO0 Free, LONGJMP == 0x80ff, SCB 0x0
> 04:04:20 SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x0] DFSTATUS[0x89]
> 04:04:21 SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0]
> 04:04:21 SOFFCNT[0x21] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0
> 04:04:21 HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x0]
> 04:04:21 scsi0: FIFO1 Active, LONGJMP == 0x1ec, SCB 0x3
> 04:04:21 SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x8] DFSTATUS[0x1]
> 04:04:21 SG_CACHE_SHADOW[0x28] SG_STATE[0x3] DFFSXFRCTL[0x0]
> 04:04:21 SOFFCNT[0x21] MDFFSTAT[0xc] SHADDR = 0x0318ebe5e, SHCNT = 0x1a2
> 04:04:21 HADDR = 0x0318ebec2, HCNT = 0x13e CCSGCTL[0x10]
> 04:04:21 LQIN: 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0
> 04:04:21 scsi0: LQISTATE = 0x0, LQOSTATE = 0x0, OPTIONMODE = 0x42
> 04:04:21 scsi0: OS_SPACE_CNT = 0x20 MAXCMDCNT = 0x0
> 04:04:21 SIMODE0[0xc]
> 04:04:21 CCSCBCTL[0x4]
> 04:04:21 scsi0: REG0 == 0x3, SINDEX = 0x122, DINDEX = 0xa9
> 04:04:21 scsi0: SCBPTR == 0xff03, SCB_NEXT == 0xff00, SCB_NEXT2 == 0x0
> 04:04:21 CDB 3 1 0 0 0 0
> 04:04:22 STACK: 0x206 0x0 0x0 0x0 0x0 0x0 0x0 0x29
> 04:04:22 <<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>>
> 04:04:22 DevQ(0:0:0): 0 waiting
> 04:04:22 SCSI error : <0 0 0 0> return code = 0x10000
> 04:04:22 end_request: I/O error, dev sda, sector 4239969
> 04:04:22 SCSI error : <0 0 0 0> return code = 0x10000
> 04:04:22 end_request: I/O error, dev sda, sector 4239977
> 04:04:22 SCSI error : <0 0 0 0> return code = 0x10000
> 04:04:22 end_request: I/O error, dev sda, sector 4239985
> 04:04:22 SCSI error : <0 0 0 0> return code = 0x10000
> 04:04:22 end_request: I/O error, dev sda, sector 4239993
> 04:04:22 SCSI error : <0 0 0 0> return code = 0x10000
> 04:04:22 end_request: I/O error, dev sda, sector 4240001
> 04:04:22 (scsi0:A:0:0): No or incomplete CDB sent to device.
> 04:04:23 scsi0: Issued Channel A Bus Reset. 1 SCBs aborted
> 04:04:23 SCSI error : <0 0 0 0> return code = 0x8000002
> 04:04:23 Info fld=0x0, Current sda: sense key Aborted Command
> 04:04:23 end_request: I/O error, dev sda, sector 4240009
> 04:04:23 SCSI error : <0 0 0 0> return code = 0x8000002
> 04:04:23 Info fld=0x0, Current sda: sense key Aborted Command
> 04:04:23 end_request: I/O error, dev sda, sector 4240017
> 04:04:23 SCSI error : <0 0 0 0> return code = 0x8000002
> 04:04:23 Info fld=0x0, Current sda: sense key Aborted Command
> 04:04:23 end_request: I/O error, dev sda, sector 68860388
> 04:04:23 Buffer I/O error on device sda7, logical block 2187565
> 04:04:23 lost page write due to I/O error on sda7
> 04:04:23 SCSI error : <0 0 0 0> return code = 0x8000002
> 04:04:23 Info fld=0x0, Current sda: sense key Aborted Command
> 04:04:23 end_request: I/O error, dev sda, sector 68860396
> 04:04:25 end_request: I/O error, dev sda, sector 4240041
> 04:04:25 (scsi0:A:0): 160.000MB/s transfers (80.000MHz DT, 16bit)
> 04:04:25 (scsi0:A:0:0): Unexpected busfree in DT Data-in phase, 1 SCBs aborted, PRGMCNT == 0x97
> 04:04:25 >>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<<
> 04:04:25 scsi0: Dumping Card State at program address 0x95 Mode 0x22
> 04:04:25 Card was paused
> 04:04:25 HS_MAILBOX[0x0] INTCTL[0x80] SEQINTSTAT[0x0] SAVED_MODE[0x11]
> 04:04:25 DFFSTAT[0x11] SCSISIGI[0x0] SCSIPHASE[0x0] SCSIBUS[0x0]
> 04:04:25 LASTPHASE[0x60] SCSISEQ0[0x0] SCSISEQ1[0x12] SEQCTL0[0x10]
> 04:04:25 SEQINTCTL[0x80] SEQ_FLAGS[0x20] SEQ_FLAGS2[0x0] SSTAT0[0x0]
> 04:04:25 SSTAT1[0x9] SSTAT2[0xc0] SSTAT3[0x0] PERRDIAG[0x1]
> 04:04:25 SIMODE1[0xac] LQISTAT0[0x0] LQISTAT1[0x0] LQISTAT2[0x0]
> 04:04:26 LQOSTAT0[0x0] LQOSTAT1[0x0] LQOSTAT2[0x0]
> 04:04:26
> 04:04:26 SCB Count = 4 CMDS_PENDING = 1 LASTSCB 0x1 CURRSCB 0x1 NEXTSCB 0x0
> 04:04:26 qinstart = 87 qinfifonext = 87
> 04:04:26 QINFIFO:
> 04:04:26 WAITING_TID_QUEUES:
> 04:04:26 Pending list:
> 04:04:26 Total 0
> 04:04:26 Kernel Free SCB list: 1 0 3 2
> 04:04:26 Sequencer Complete DMA-inprog list:
> 04:04:26 Sequencer Complete list:
> 04:04:26 Sequencer DMA-Up and Complete list:
> 04:04:26
> 04:04:26 scsi0: FIFO0 Free, LONGJMP == 0x80ff, SCB 0x0
> 04:04:26 SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x0] DFSTATUS[0x89]
> 04:04:26 SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0]
> 04:04:26 SOFFCNT[0x1d] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0
> 04:04:27 HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x0]
> 04:04:27 scsi0: FIFO1 Active, LONGJMP == 0x1ec, SCB 0x1
> 04:04:27 SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x8] DFSTATUS[0x81]
> 04:04:27 SG_CACHE_SHADOW[0x20] SG_STATE[0x3] DFFSXFRCTL[0x0]
> 04:04:27 SOFFCNT[0x1d] MDFFSTAT[0xc] SHADDR = 0x05a68f1ac, SHCNT = 0xe54
> 04:04:27 HADDR = 0x05a68f20a, HCNT = 0xdf6 CCSGCTL[0x10]
> 04:04:27 LQIN: 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0
> 04:04:27 scsi0: LQISTATE = 0x0, LQOSTATE = 0x0, OPTIONMODE = 0x42
> 04:04:27 scsi0: OS_SPACE_CNT = 0x20 MAXCMDCNT = 0x0
> 04:04:27 SIMODE0[0xc]
> 04:04:27 CCSCBCTL[0x4]
> 04:04:27 scsi0: REG0 == 0x1, SINDEX = 0x122, DINDEX = 0x1ba
> 04:04:27 scsi0: SCBPTR == 0xff01, SCB_NEXT == 0xff00, SCB_NEXT2 == 0x0
> 04:04:27 CDB 1 1 0 0 0 0
> 04:04:27 STACK: 0x29 0x206 0x0 0x0 0x0 0x0 0x0 0x0
> 04:04:27 <<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>>
> 04:04:28 DevQ(0:0:0): 0 waiting
> 04:04:28 SCSI error : <0 0 0 0> return code = 0x10000
> 04:04:28 end_request: I/O error, dev sda, sector 4239913
> 04:04:28 SCSI error : <0 0 0 0> return code = 0x10000
> 04:04:28 (scsi0:A:0:0): No or incomplete CDB sent to device.
> 04:04:28 scsi0: Issued Channel A Bus Reset. 1 SCBs aborted
> 04:04:28 (scsi0:A:0): 80.000MB/s transfers (40.000MHz DT, 16bit)
> 04:04:28 SCSI error : <0 0 0 0> return code = 0x8000002
> 04:04:29 Info fld=0x0, Current sda: sense key Aborted Command
> 04:04:29 end_request: I/O error, dev sda, sector 4239953
> 04:04:29 SCSI error : <0 0 0 0> return code = 0x8000002
> 04:04:29 Info fld=0x0, Current sda: sense key Aborted Command
> 04:04:29 end_request: I/O error, dev sda, sector 51398828
> 04:04:29 Buffer I/O error on device sda7, logical block 4870
> 04:04:29 lost page write due to I/O error on sda7
> 04:04:29 Aborting journal on device sda7.
> 04:04:29 journal commit I/O error
> 04:04:29 ext3_abort called.
> 04:04:29 EXT3-fs abort (device sda7): ext3_journal_start: Detected aborted journal
> 04:04:29 Remounting filesystem read-only
>
>
>
>>On Mon, 2004-10-04 at 07:24, Bryce as root wrote:
>>
>>>kernel dmesg dump :
>>>Reseting Channel for LQI Phase error
>>
>>This is clearly the cause of all the trouble. The driver is a bit
>>opaque at this point, but it looks like an LQI phase error occurs
>>because of a phase mismatch during L_Q Information Units (These are a
>>requirement for fast-160 data transfers).
>>
>>I think this is a strong indicator of bus instability ... could you try
>>falling back to fast-80 and turning off IU transfers (You'll probably
>>either have to use the card bios or dig around in the aic79xx driver for
>>the options).
>>
>>James
>>
>>
>>
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2004-10-07 15:54 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-10-04 12:24 aic79xx blowups in 2.6.8-1.521smp (RHAT) Bryce as root
2004-10-04 14:00 ` James Bottomley
2004-10-07 10:48 ` Bryce as root
2004-10-07 15:54 ` Luben Tuikov [this message]
2004-10-05 14:04 ` Luben Tuikov
2004-10-05 21:51 ` Luben Tuikov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4165669C.9060304@adaptec.com \
--to=luben_tuikov@adaptec.com \
--cc=linux-scsi@vger.kernel.org \
--cc=root@www.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).