* Adaptec 29320 [aic79xx] fails on power cycle of LUN
@ 2006-10-18 22:24 Sean Bruno
2006-10-18 22:27 ` James Bottomley
2006-10-18 22:32 ` Sean Bruno
0 siblings, 2 replies; 11+ messages in thread
From: Sean Bruno @ 2006-10-18 22:24 UTC (permalink / raw)
To: linux-scsi
I have had a tough time tracking this one down, however I can say for
certain that the 29320 is really having trouble if a LUN is power
cycled.
I don't have access to a BUS analyzer right now, but here is my
regression.
1. Hook an external SCSI array/disk to a 29320.
2. Power up SCSI array/disk
3. Power up PC with 29320.
4. When PC has booted, login and test device by creating a file
system, eg. mkfs /dev/sda (or whatever disk the array is called on
ur machine).
5. Power cycle array/disk
6. Retest device with another 'mkfs /dev/sda' ... panic/crash/lock-up
ensues.
This did not happen in 2.6.15.7 but did appear in 2.6.16 and higher.
Any ideas?
Sean
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: Adaptec 29320 [aic79xx] fails on power cycle of LUN 2006-10-18 22:24 Adaptec 29320 [aic79xx] fails on power cycle of LUN Sean Bruno @ 2006-10-18 22:27 ` James Bottomley 2006-10-18 22:32 ` Sean Bruno 1 sibling, 0 replies; 11+ messages in thread From: James Bottomley @ 2006-10-18 22:27 UTC (permalink / raw) To: Sean Bruno; +Cc: linux-scsi On Wed, 2006-10-18 at 15:24 -0700, Sean Bruno wrote: > Any ideas? It would help if you could post the trace from the panic or crash. James ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Adaptec 29320 [aic79xx] fails on power cycle of LUN 2006-10-18 22:24 Adaptec 29320 [aic79xx] fails on power cycle of LUN Sean Bruno 2006-10-18 22:27 ` James Bottomley @ 2006-10-18 22:32 ` Sean Bruno 2006-10-19 5:52 ` Mike Christie 1 sibling, 1 reply; 11+ messages in thread From: Sean Bruno @ 2006-10-18 22:32 UTC (permalink / raw) To: linux-scsi On Wed, 2006-10-18 at 15:24 -0700, Sean Bruno wrote: > I have had a tough time tracking this one down, however I can say for > certain that the 29320 is really having trouble if a LUN is power > cycled. > > I don't have access to a BUS analyzer right now, but here is my > regression. > > 1. Hook an external SCSI array/disk to a 29320. > 2. Power up SCSI array/disk > 3. Power up PC with 29320. > 4. When PC has booted, login and test device by creating a file > system, eg. mkfs /dev/sda (or whatever disk the array is called on > ur machine). > 5. Power cycle array/disk > 6. Retest device with another 'mkfs /dev/sda' ... panic/crash/lock-up > ensues. > > > > This did not happen in 2.6.15.7 but did appear in 2.6.16 and higher. > >From 2.6.19-rc2 I at least get something from a crash without the entire box locking up on me. The process tdg_2 is a 'test data generator' basically it writes data to the scsi disk in a testable pattern that is later validated. ------------[ cut here ]------------ kernel BUG at mm/slab.c:594! invalid opcode: 0000 [#1] SMP Modules linked in: autofs4 hidp rfcomm l2cap bluetooth sunrpc iscsi_tcp libiscsi scsi_transport_iscsi ipv6 video sbs i2c_ec i2c_core button battery asus_acpi ac parport_pc lp parport snd_intel8x0 snd_ac97_codec snd_ac97_bus sg snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm floppy snd_timer snd soundcore snd_page_alloc serio_raw ide_cd skge cdrom pcspkr dm_snapshot dm_zero dm_mirror dm_mod aic79xx scsi_transport_spi sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd CPU: 0 EIP: 0060:[<c0169562>] Not tainted VLI EFLAGS: 00010246 (2.6.19-rc2 #1) EIP is at kmem_cache_free+0x29/0x6d eax: 00000000 ebx: dffae300 ecx: dff91b80 edx: c1a00000 esi: dffaaf80 edi: 00000000 ebp: d3f324c0 esp: d3fb9dd0 ds: 007b es: 007b ss: 0068 Process tdg_2 (pid: 2362, ti=d3fb9000 task=dfd6cd50 task.ti=d3fb9000) Stack: dffae300 dffaaf80 00000000 c0154448 00000000 d3e09a80 dffaaf80 d3e09a80 c018bafc 00001000 00000000 c018b822 e088efa0 00001000 00000000 0000000a d3fb9ef0 d43f76c8 00003000 00000000 00000001 c130cac8 00008000 00000000 Call Trace: [<c0154448>] mempool_free+0x66/0x6b [<c018bafc>] bio_free+0x25/0x30 [<c018b822>] bio_put+0x28/0x29 [<e088efa0>] scsi_execute_async+0x15f/0x33d [scsi_mod] [<e09c9913>] sg_common_write+0x704/0x772 [sg] [<e09c9ba6>] sg_new_write+0x225/0x248 [sg] [<e09cae45>] sg_write+0x106/0x33a [sg] [<c016dae7>] vfs_write+0xa8/0x159 [<c016e114>] sys_write+0x41/0x67 [<c0103dc9>] sysenter_past_esp+0x56/0x79 DWARF2 unwinder stuck at sysenter_past_esp+0x56/0x79 Leftover inexact backtrace: [<c031007b>] sleep_on+0x1e/0x6c ======================= Code: 5f c3 89 c1 8d 82 00 00 00 40 c1 e8 0c 57 89 d7 6b d0 28 03 15 00 d6 50 c0 56 53 8b 02 f6 c4 40 74 03 8b 52 0c 8b 02 84 c0 78 08 <0f> 0b 52 02 e6 6b 33 c0 39 4a 20 74 08 0f 0b ca 0d e6 6b 33 c0 EIP: [<c0169562>] kmem_cache_free+0x29/0x6d SS:ESP 0068:d3fb9dd0 Sean ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Adaptec 29320 [aic79xx] fails on power cycle of LUN 2006-10-18 22:32 ` Sean Bruno @ 2006-10-19 5:52 ` Mike Christie 2006-10-19 12:23 ` Sean Bruno 2006-10-19 12:25 ` Sean Bruno 0 siblings, 2 replies; 11+ messages in thread From: Mike Christie @ 2006-10-19 5:52 UTC (permalink / raw) To: Sean Bruno; +Cc: linux-scsi On Wed, 2006-10-18 at 15:32 -0700, Sean Bruno wrote: > On Wed, 2006-10-18 at 15:24 -0700, Sean Bruno wrote: > > I have had a tough time tracking this one down, however I can say for > > certain that the 29320 is really having trouble if a LUN is power > > cycled. > > > > I don't have access to a BUS analyzer right now, but here is my > > regression. > > > > 1. Hook an external SCSI array/disk to a 29320. > > 2. Power up SCSI array/disk > > 3. Power up PC with 29320. > > 4. When PC has booted, login and test device by creating a file > > system, eg. mkfs /dev/sda (or whatever disk the array is called on > > ur machine). > > 5. Power cycle array/disk > > 6. Retest device with another 'mkfs /dev/sda' ... panic/crash/lock-up > > ensues. > > > > > > > > This did not happen in 2.6.15.7 but did appear in 2.6.16 and higher. > > > >From 2.6.19-rc2 I at least get something from a crash without the entire > box locking up on me. > > The process tdg_2 is a 'test data generator' basically it writes data to > the scsi disk in a testable pattern that is later validated. > > ------------[ cut here ]------------ > kernel BUG at mm/slab.c:594! > invalid opcode: 0000 [#1] > SMP > Modules linked in: autofs4 hidp rfcomm l2cap bluetooth sunrpc iscsi_tcp > libiscsi scsi_transport_iscsi ipv6 video sbs i2c_ec i2c_core button > battery asus_acpi ac parport_pc lp parport snd_intel8x0 snd_ac97_codec > snd_ac97_bus sg snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq > snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm floppy snd_timer snd > soundcore snd_page_alloc serio_raw ide_cd skge cdrom pcspkr dm_snapshot > dm_zero dm_mirror dm_mod aic79xx scsi_transport_spi sd_mod scsi_mod ext3 > jbd ehci_hcd ohci_hcd uhci_hcd > CPU: 0 > EIP: 0060:[<c0169562>] Not tainted VLI > EFLAGS: 00010246 (2.6.19-rc2 #1) > EIP is at kmem_cache_free+0x29/0x6d > eax: 00000000 ebx: dffae300 ecx: dff91b80 edx: c1a00000 > esi: dffaaf80 edi: 00000000 ebp: d3f324c0 esp: d3fb9dd0 > ds: 007b es: 007b ss: 0068 > Process tdg_2 (pid: 2362, ti=d3fb9000 task=dfd6cd50 task.ti=d3fb9000) > Stack: dffae300 dffaaf80 00000000 c0154448 00000000 d3e09a80 dffaaf80 > d3e09a80 > c018bafc 00001000 00000000 c018b822 e088efa0 00001000 00000000 > 0000000a > d3fb9ef0 d43f76c8 00003000 00000000 00000001 c130cac8 00008000 > 00000000 > Call Trace: > [<c0154448>] mempool_free+0x66/0x6b > [<c018bafc>] bio_free+0x25/0x30 > [<c018b822>] bio_put+0x28/0x29 > [<e088efa0>] scsi_execute_async+0x15f/0x33d [scsi_mod] > [<e09c9913>] sg_common_write+0x704/0x772 [sg] > [<e09c9ba6>] sg_new_write+0x225/0x248 [sg] > [<e09cae45>] sg_write+0x106/0x33a [sg] > [<c016dae7>] vfs_write+0xa8/0x159 > [<c016e114>] sys_write+0x41/0x67 > [<c0103dc9>] sysenter_past_esp+0x56/0x79 > DWARF2 unwinder stuck at sysenter_past_esp+0x56/0x79 > > Leftover inexact backtrace: > > [<c031007b>] sleep_on+0x1e/0x6c > ======================= > Code: 5f c3 89 c1 8d 82 00 00 00 40 c1 e8 0c 57 89 d7 6b d0 28 03 15 00 > d6 50 c0 56 53 8b 02 f6 c4 40 74 03 8b 52 0c 8b 02 84 c0 78 08 <0f> 0b > 52 02 e6 6b 33 c0 39 4a 20 74 08 0f 0b ca 0d e6 6b 33 c0 > EIP: [<c0169562>] kmem_cache_free+0x29/0x6d SS:ESP 0068:d3fb9dd0 Does this only occur with sg or is that the only way you got a trace? In the original bug report you mentioned it occurring with mkfs, but the bug oops is from a sg request. Is tdg_2 run while the mkfs is running? ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Adaptec 29320 [aic79xx] fails on power cycle of LUN 2006-10-19 5:52 ` Mike Christie @ 2006-10-19 12:23 ` Sean Bruno 2006-10-19 12:25 ` Sean Bruno 1 sibling, 0 replies; 11+ messages in thread From: Sean Bruno @ 2006-10-19 12:23 UTC (permalink / raw) To: Mike Christie; +Cc: linux-scsi > Does this only occur with sg or is that the only way you got a trace? In > the original bug report you mentioned it occurring with mkfs, but the > bug oops is from a sg request. Is tdg_2 run while the mkfs is running? In my case, the issue exists with or without sg. In my example with mkfs, it may or may not dump a trace. The mkfs may just take 'forever' and not really do anything. Thankfully, the sg module was causing a dump. Hope this helps. I'll try to get a SCSI BUS trace this Morning. Sean ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Adaptec 29320 [aic79xx] fails on power cycle of LUN 2006-10-19 5:52 ` Mike Christie 2006-10-19 12:23 ` Sean Bruno @ 2006-10-19 12:25 ` Sean Bruno 2006-10-19 14:10 ` Hannes Reinecke 1 sibling, 1 reply; 11+ messages in thread From: Sean Bruno @ 2006-10-19 12:25 UTC (permalink / raw) To: linux-scsi On Thu, 2006-10-19 at 01:52 -0400, Mike Christie wrote: > On Wed, 2006-10-18 at 15:32 -0700, Sean Bruno wrote: > > On Wed, 2006-10-18 at 15:24 -0700, Sean Bruno wrote: > > > I have had a tough time tracking this one down, however I can say for > > > certain that the 29320 is really having trouble if a LUN is power > > > cycled. > > > > > > I don't have access to a BUS analyzer right now, but here is my > > > regression. > > > > > > 1. Hook an external SCSI array/disk to a 29320. > > > 2. Power up SCSI array/disk > > > 3. Power up PC with 29320. > > > 4. When PC has booted, login and test device by creating a file > > > system, eg. mkfs /dev/sda (or whatever disk the array is called on > > > ur machine). > > > 5. Power cycle array/disk > > > 6. Retest device with another 'mkfs /dev/sda' ... panic/crash/lock-up > > > ensues. > > > > > > > > > > > > This did not happen in 2.6.15.7 but did appear in 2.6.16 and higher. > > > > Does this only occur with sg or is that the only way you got a trace? In > the original bug report you mentioned it occurring with mkfs, but the > bug oops is from a sg request. Is tdg_2 run while the mkfs is running? Snippets from 'dmesg' during step 6: scsi0: Someone reset channel A sd 0:0:4:0: Attempting to queue an ABORT message:CDB: 0x28 0x0 0x0 0x0 0x0 0x80 0x0 0x0 0x80 0x0 Infinite interrupt loop, INTSTAT = 8scsi0: At time of recovery, card was paused >>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<< scsi0: Dumping Card State at program address 0x13 Mode 0x33 Card was paused INTSTAT[0x8] SELOID[0x4] SELID[0x40] HS_MAILBOX[0x0] INTCTL[0x80] SEQINTSTAT[0x0] SAVED_MODE[0x11] DFFSTAT[0x33] SCSISIGI[0x0] SCSIPHASE[0x0] SCSIBUS[0x0] LASTPHASE[0x1] SCSISEQ0[0x0] SCSISEQ1[0x2] SEQCTL0[0x0] SEQINTCTL[0x0] SEQ_FLAGS[0xc0] SEQ_FLAGS2[0x4] QFREEZE_COUNT[0x9] KERNEL_QFREEZE_COUNT[0x9] MK_MESSAGE_SCB[0x0] MK_MESSAGE_SCSIID[0x47] SSTAT0[0x0] SSTAT1[0x28] SSTAT2[0x0] SSTAT3[0x0] PERRDIAG[0xc0] SIMODE1[0xa4] LQISTAT0[0x0] LQISTAT1[0x0] LQISTAT2[0x0] LQOSTAT0[0x0] LQOSTAT1[0x0] LQOSTAT2[0x0] SCB Count = 4 CMDS_PENDING = 0 LASTSCB 0xffff CURRSCB 0x0 NEXTSCB 0x0 qinstart = 11319 qinfifonext = 11321 QINFIFO: 0x0 0x2 WAITING_TID_QUEUES: Pending list: 2 FIFO_USE[0x0] SCB_CONTROL[0x68] SCB_SCSIID[0x47] 0 FIFO_USE[0x0] SCB_CONTROL[0x68] SCB_SCSIID[0x47] Total 2 Kernel Free SCB list: 1 3 Sequencer Complete DMA-inprog list: Sequencer Complete list: Sequencer DMA-Up and Complete list: Sequencer On QFreeze and Complete list: scsi0: FIFO0 Free, LONGJMP == 0x80ff, SCB 0x0 SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x0] DFSTATUS[0x89] SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0] SOFFCNT[0x0] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0 HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10] scsi0: FIFO1 Free, LONGJMP == 0x81f1, SCB 0x0 SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x4] DFSTATUS[0x89] SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0] SOFFCNT[0x0] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0 HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10] LQIN: 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 scsi0: LQISTATE = 0x0, LQOSTATE = 0x0, OPTIONMODE = 0x52 scsi0: OS_SPACE_CNT = 0x20 MAXCMDCNT = 0x0 scsi0: SAVED_SCSIID = 0x0 SAVED_LUN = 0x0 SIMODE0[0xc] CCSCBCTL[0x0] scsi0: REG0 == 0xffff, SINDEX = 0x1e0, DINDEX = 0xe1 scsi0: SCBPTR == 0x0, SCB_NEXT == 0xff00, SCB_NEXT2 == 0xff57 CDB 2a 0 0 80 9 e0 STACK: 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 <<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>> scsi0:0:4:0: Cmd aborted from QINFIFO sd 0:0:4:0: Attempting to queue an ABORT message:CDB: 0x0 0x0 0x0 0x0 0x0 0x0 Infinite interrupt loop, INTSTAT = 8scsi0: At time of recovery, card was paused >>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<< scsi0: Dumping Card State at program address 0x13 Mode 0x33 Card was paused INTSTAT[0x8] SELOID[0x4] SELID[0x40] HS_MAILBOX[0x0] INTCTL[0x80] SEQINTSTAT[0x0] SAVED_MODE[0x11] DFFSTAT[0x33] SCSISIGI[0x0] SCSIPHASE[0x0] SCSIBUS[0x0] LASTPHASE[0x1] SCSISEQ0[0x0] SCSISEQ1[0x2] SEQCTL0[0x0] SEQINTCTL[0x0] SEQ_FLAGS[0xc0] SEQ_FLAGS2[0x4] QFREEZE_COUNT[0x9] KERNEL_QFREEZE_COUNT[0x9] MK_MESSAGE_SCB[0x0] MK_MESSAGE_SCSIID[0x47] SSTAT0[0x0] SSTAT1[0x28] SSTAT2[0x0] SSTAT3[0x0] PERRDIAG[0xc0] SIMODE1[0xa4] LQISTAT0[0x0] LQISTAT1[0x0] LQISTAT2[0x0] LQOSTAT0[0x0] LQOSTAT1[0x0] LQOSTAT2[0x0] SCB Count = 4 CMDS_PENDING = 0 LASTSCB 0xffff CURRSCB 0x0 NEXTSCB 0x0 qinstart = 11319 qinfifonext = 11321 QINFIFO: 0x2 0x0 WAITING_TID_QUEUES: Pending list: 0 FIFO_USE[0x0] SCB_CONTROL[0x68] SCB_SCSIID[0x47] 2 FIFO_USE[0x0] SCB_CONTROL[0x68] SCB_SCSIID[0x47] Total 2 Kernel Free SCB list: 1 3 Sequencer Complete DMA-inprog list: Sequencer Complete list: Sequencer DMA-Up and Complete list: Sequencer On QFreeze and Complete list: scsi0: FIFO0 Free, LONGJMP == 0x80ff, SCB 0x0 SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x0] DFSTATUS[0x89] SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0] SOFFCNT[0x0] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0 HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10] scsi0: FIFO1 Free, LONGJMP == 0x81f1, SCB 0x0 SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x4] DFSTATUS[0x89] SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0] SOFFCNT[0x0] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0 HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10] LQIN: 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 scsi0: LQISTATE = 0x0, LQOSTATE = 0x0, OPTIONMODE = 0x52 scsi0: OS_SPACE_CNT = 0x20 MAXCMDCNT = 0x0 scsi0: SAVED_SCSIID = 0x0 SAVED_LUN = 0x0 SIMODE0[0xc] CCSCBCTL[0x0] scsi0: REG0 == 0xffff, SINDEX = 0x1e0, DINDEX = 0xe1 scsi0: SCBPTR == 0x0, SCB_NEXT == 0xff00, SCB_NEXT2 == 0xff57 CDB 2a 0 0 80 9 e0 STACK: 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 <<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>> scsi0:0:4:0: Cmd aborted from QINFIFO sd 0:0:4:0: Attempting to queue an ABORT message:CDB: 0x28 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x80 0x0 Infinite interrupt loop, INTSTAT = 8scsi0: At time of recovery, card was paused >>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<< scsi0: Dumping Card State at program address 0x13 Mode 0x33 Card was paused INTSTAT[0x8] SELOID[0x4] SELID[0x40] HS_MAILBOX[0x0] INTCTL[0x80] SEQINTSTAT[0x0] SAVED_MODE[0x11] DFFSTAT[0x33] SCSISIGI[0x0] SCSIPHASE[0x0] SCSIBUS[0x0] LASTPHASE[0x1] SCSISEQ0[0x0] SCSISEQ1[0x2] SEQCTL0[0x0] SEQINTCTL[0x0] SEQ_FLAGS[0xc0] SEQ_FLAGS2[0x4] QFREEZE_COUNT[0x9] KERNEL_QFREEZE_COUNT[0x9] MK_MESSAGE_SCB[0x0] MK_MESSAGE_SCSIID[0x47] SSTAT0[0x0] SSTAT1[0x28] SSTAT2[0x0] SSTAT3[0x0] PERRDIAG[0xc0] SIMODE1[0xa4] LQISTAT0[0x0] LQISTAT1[0x0] LQISTAT2[0x0] LQOSTAT0[0x0] LQOSTAT1[0x0] LQOSTAT2[0x0] SCB Count = 4 CMDS_PENDING = 0 LASTSCB 0xffff CURRSCB 0x0 NEXTSCB 0x0 qinstart = 11319 qinfifonext = 11320 QINFIFO: 0x2 WAITING_TID_QUEUES: Pending list: 2 FIFO_USE[0x0] SCB_CONTROL[0x68] SCB_SCSIID[0x47] Total 1 Kernel Free SCB list: 0 1 3 Sequencer Complete DMA-inprog list: Sequencer Complete list: Sequencer DMA-Up and Complete list: Sequencer On QFreeze and Complete list: scsi0: FIFO0 Free, LONGJMP == 0x80ff, SCB 0x0 SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x0] DFSTATUS[0x89] SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0] SOFFCNT[0x0] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0 HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10] scsi0: FIFO1 Free, LONGJMP == 0x81f1, SCB 0x0 SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x4] DFSTATUS[0x89] SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0] SOFFCNT[0x0] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0 HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10] LQIN: 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 scsi0: LQISTATE = 0x0, LQOSTATE = 0x0, OPTIONMODE = 0x52 scsi0: OS_SPACE_CNT = 0x20 MAXCMDCNT = 0x0 scsi0: SAVED_SCSIID = 0x0 SAVED_LUN = 0x0 SIMODE0[0xc] CCSCBCTL[0x0] scsi0: REG0 == 0xffff, SINDEX = 0x1e0, DINDEX = 0xe1 scsi0: SCBPTR == 0x0, SCB_NEXT == 0xff00, SCB_NEXT2 == 0xff57 CDB 2a 0 0 80 9 e0 STACK: 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 <<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>> scsi0:0:4:0: Cmd aborted from QINFIFO sd 0:0:4:0: Attempting to queue an ABORT message:CDB: 0x0 0x0 0x0 0x0 0x0 0x0 Infinite interrupt loop, INTSTAT = 8scsi0: At time of recovery, card was paused >>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<< scsi0: Dumping Card State at program address 0x13 Mode 0x33 Card was paused INTSTAT[0x8] SELOID[0x4] SELID[0x40] HS_MAILBOX[0x0] INTCTL[0x80] SEQINTSTAT[0x0] SAVED_MODE[0x11] DFFSTAT[0x33] SCSISIGI[0x0] SCSIPHASE[0x0] SCSIBUS[0x0] LASTPHASE[0x1] SCSISEQ0[0x0] SCSISEQ1[0x2] SEQCTL0[0x0] SEQINTCTL[0x0] SEQ_FLAGS[0xc0] SEQ_FLAGS2[0x4] QFREEZE_COUNT[0x9] KERNEL_QFREEZE_COUNT[0x9] MK_MESSAGE_SCB[0x0] MK_MESSAGE_SCSIID[0x47] SSTAT0[0x0] SSTAT1[0x28] SSTAT2[0x0] SSTAT3[0x0] PERRDIAG[0xc0] SIMODE1[0xa4] LQISTAT0[0x0] LQISTAT1[0x0] LQISTAT2[0x0] LQOSTAT0[0x0] LQOSTAT1[0x0] LQOSTAT2[0x0] SCB Count = 4 CMDS_PENDING = 0 LASTSCB 0xffff CURRSCB 0x0 NEXTSCB 0x0 qinstart = 11319 qinfifonext = 11320 QINFIFO: 0x2 WAITING_TID_QUEUES: Pending list: 2 FIFO_USE[0x0] SCB_CONTROL[0x68] SCB_SCSIID[0x47] Total 1 Kernel Free SCB list: 0 1 3 Sequencer Complete DMA-inprog list: Sequencer Complete list: Sequencer DMA-Up and Complete list: Sequencer On QFreeze and Complete list: scsi0: FIFO0 Free, LONGJMP == 0x80ff, SCB 0x0 SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x0] DFSTATUS[0x89] SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0] SOFFCNT[0x0] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0 HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10] scsi0: FIFO1 Free, LONGJMP == 0x81f1, SCB 0x0 SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x4] DFSTATUS[0x89] SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0] SOFFCNT[0x0] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0 HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10] LQIN: 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 scsi0: LQISTATE = 0x0, LQOSTATE = 0x0, OPTIONMODE = 0x52 scsi0: OS_SPACE_CNT = 0x20 MAXCMDCNT = 0x0 scsi0: SAVED_SCSIID = 0x0 SAVED_LUN = 0x0 SIMODE0[0xc] CCSCBCTL[0x0] scsi0: REG0 == 0xffff, SINDEX = 0x1e0, DINDEX = 0xe1 scsi0: SCBPTR == 0x0, SCB_NEXT == 0xff00, SCB_NEXT2 == 0xff57 CDB 2a 0 0 80 9 e0 STACK: 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 <<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>> scsi0:0:4:0: Cmd aborted from QINFIFO sd 0:0:4:0: Attempting to queue a TARGET RESET message:CDB: 0x28 0x0 0x0 0x0 0x0 0x80 0x0 0x0 0x80 0x0 scsi0: Device reset code sleeping scsi0: Device reset timer expired (active 1) scsi0: Device reset returning 0x2003 scsi0: bus reset still active sd 0:0:4:0: Attempting to queue an ABORT message:CDB: 0x0 0x0 0x0 0x0 0x0 0x0 Infinite interrupt loop, INTSTAT = 8scsi0: At time of recovery, card was paused >>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<< scsi0: Dumping Card State at program address 0x13 Mode 0x33 Card was paused INTSTAT[0x8] SELOID[0x4] SELID[0x40] HS_MAILBOX[0x0] INTCTL[0x80] SEQINTSTAT[0x0] SAVED_MODE[0x11] DFFSTAT[0x33] SCSISIGI[0x0] SCSIPHASE[0x0] SCSIBUS[0x0] LASTPHASE[0x1] SCSISEQ0[0x0] SCSISEQ1[0x2] SEQCTL0[0x0] SEQINTCTL[0x0] SEQ_FLAGS[0xc0] SEQ_FLAGS2[0x4] QFREEZE_COUNT[0x9] KERNEL_QFREEZE_COUNT[0x9] MK_MESSAGE_SCB[0x0] MK_MESSAGE_SCSIID[0x47] SSTAT0[0x0] SSTAT1[0x28] SSTAT2[0x0] SSTAT3[0x0] PERRDIAG[0xc0] SIMODE1[0xa4] LQISTAT0[0x0] LQISTAT1[0x0] LQISTAT2[0x0] LQOSTAT0[0x0] LQOSTAT1[0x0] LQOSTAT2[0x0] SCB Count = 4 CMDS_PENDING = 0 LASTSCB 0xffff CURRSCB 0x0 NEXTSCB 0x0 qinstart = 11319 qinfifonext = 11321 QINFIFO: 0x2 0x0 WAITING_TID_QUEUES: Pending list: 0 FIFO_USE[0x0] SCB_CONTROL[0x68] SCB_SCSIID[0x47] 2 FIFO_USE[0x0] SCB_CONTROL[0x68] SCB_SCSIID[0x47] Total 2 Kernel Free SCB list: 1 3 Sequencer Complete DMA-inprog list: Sequencer Complete list: Sequencer DMA-Up and Complete list: Sequencer On QFreeze and Complete list: scsi0: FIFO0 Free, LONGJMP == 0x80ff, SCB 0x0 SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x0] DFSTATUS[0x89] SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0] SOFFCNT[0x0] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0 HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10] scsi0: FIFO1 Free, LONGJMP == 0x81f1, SCB 0x0 SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x4] DFSTATUS[0x89] SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0] SOFFCNT[0x0] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0 HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10] LQIN: 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 scsi0: LQISTATE = 0x0, LQOSTATE = 0x0, OPTIONMODE = 0x52 scsi0: OS_SPACE_CNT = 0x20 MAXCMDCNT = 0x0 scsi0: SAVED_SCSIID = 0x0 SAVED_LUN = 0x0 SIMODE0[0xc] CCSCBCTL[0x0] scsi0: REG0 == 0xffff, SINDEX = 0x1e0, DINDEX = 0xe1 scsi0: SCBPTR == 0x0, SCB_NEXT == 0xff00, SCB_NEXT2 == 0xff57 CDB 2a 0 0 80 9 e0 STACK: 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 <<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>> scsi0:0:4:0: Cmd aborted from QINFIFO sd 0:0:4:0: Attempting to queue an ABORT message:CDB: 0x0 0x0 0x0 0x0 0x0 0x0 Infinite interrupt loop, INTSTAT = 8scsi0: At time of recovery, card was paused >>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<< scsi0: Dumping Card State at program address 0x13 Mode 0x33 Card was paused INTSTAT[0x8] SELOID[0x4] SELID[0x40] HS_MAILBOX[0x0] INTCTL[0x80] SEQINTSTAT[0x0] SAVED_MODE[0x11] DFFSTAT[0x33] SCSISIGI[0x0] SCSIPHASE[0x0] SCSIBUS[0x0] LASTPHASE[0x1] SCSISEQ0[0x0] SCSISEQ1[0x2] SEQCTL0[0x0] SEQINTCTL[0x0] SEQ_FLAGS[0xc0] SEQ_FLAGS2[0x4] QFREEZE_COUNT[0x9] KERNEL_QFREEZE_COUNT[0x9] MK_MESSAGE_SCB[0x0] MK_MESSAGE_SCSIID[0x47] SSTAT0[0x0] SSTAT1[0x28] SSTAT2[0x0] SSTAT3[0x0] PERRDIAG[0xc0] SIMODE1[0xa4] LQISTAT0[0x0] LQISTAT1[0x0] LQISTAT2[0x0] LQOSTAT0[0x0] LQOSTAT1[0x0] LQOSTAT2[0x0] SCB Count = 4 CMDS_PENDING = 0 LASTSCB 0xffff CURRSCB 0x0 NEXTSCB 0x0 qinstart = 11319 qinfifonext = 11321 QINFIFO: 0x2 0x0 WAITING_TID_QUEUES: Pending list: 0 FIFO_USE[0x0] SCB_CONTROL[0x68] SCB_SCSIID[0x47] 2 FIFO_USE[0x0] SCB_CONTROL[0x68] SCB_SCSIID[0x47] Total 2 Kernel Free SCB list: 1 3 Sequencer Complete DMA-inprog list: Sequencer Complete list: Sequencer DMA-Up and Complete list: Sequencer On QFreeze and Complete list: scsi0: FIFO0 Free, LONGJMP == 0x80ff, SCB 0x0 SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x0] DFSTATUS[0x89] SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0] SOFFCNT[0x0] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0 HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10] scsi0: FIFO1 Free, LONGJMP == 0x81f1, SCB 0x0 SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x4] DFSTATUS[0x89] SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0] SOFFCNT[0x0] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0 HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10] LQIN: 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 scsi0: LQISTATE = 0x0, LQOSTATE = 0x0, OPTIONMODE = 0x52 scsi0: OS_SPACE_CNT = 0x20 MAXCMDCNT = 0x0 scsi0: SAVED_SCSIID = 0x0 SAVED_LUN = 0x0 SIMODE0[0xc] CCSCBCTL[0x0] scsi0: REG0 == 0xffff, SINDEX = 0x1e0, DINDEX = 0xe1 scsi0: SCBPTR == 0x0, SCB_NEXT == 0xff00, SCB_NEXT2 == 0xff57 CDB 2a 0 0 80 9 e0 STACK: 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 <<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>> scsi0:0:4:0: Cmd aborted from QINFIFO sd 0:0:4:0: scsi: Device offlined - not ready after error recovery sd 0:0:4:0: scsi: Device offlined - not ready after error recovery sd 0:0:4:0: SCSI error: return code = 0x00050000 end_request: I/O error, dev sda, sector 128 Buffer I/O error on device sda, logical block 128 Buffer I/O error on device sda, logical block 129 Buffer I/O error on device sda, logical block 130 Buffer I/O error on device sda, logical block 131 Buffer I/O error on device sda, logical block 132 Buffer I/O error on device sda, logical block 133 Buffer I/O error on device sda, logical block 134 Buffer I/O error on device sda, logical block 135 Buffer I/O error on device sda, logical block 136 Buffer I/O error on device sda, logical block 137 sd 0:0:4:0: SCSI error: return code = 0x00050000 end_request: I/O error, dev sda, sector 0 sd 0:0:4:0: rejecting I/O to offline device sd 0:0:4:0: rejecting I/O to offline device sd 0:0:4:0: rejecting I/O to offline device sd 0:0:4:0: rejecting I/O to offline device sd 0:0:4:0: rejecting I/O to offline device sd 0:0:4:0: rejecting I/O to offline device sd 0:0:4:0: rejecting I/O to offline device sd 0:0:4:0: rejecting I/O to offline device sd 0:0:4:0: rejecting I/O to offline device sd 0:0:4:0: rejecting I/O to offline device sd 0:0:4:0: rejecting I/O to offline device sd 0:0:4:0: rejecting I/O to offline device sd 0:0:4:0: rejecting I/O to offline device sd 0:0:4:0: rejecting I/O to offline device sd 0:0:4:0: rejecting I/O to offline device sd 0:0:4:0: rejecting I/O to offline device sd 0:0:4:0: rejecting I/O to offline device Sean ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Adaptec 29320 [aic79xx] fails on power cycle of LUN 2006-10-19 12:25 ` Sean Bruno @ 2006-10-19 14:10 ` Hannes Reinecke 2006-10-19 16:18 ` Sean Bruno 0 siblings, 1 reply; 11+ messages in thread From: Hannes Reinecke @ 2006-10-19 14:10 UTC (permalink / raw) To: Sean Bruno; +Cc: linux-scsi [-- Attachment #1: Type: text/plain, Size: 1721 bytes --] Sean Bruno wrote: > On Thu, 2006-10-19 at 01:52 -0400, Mike Christie wrote: >> On Wed, 2006-10-18 at 15:32 -0700, Sean Bruno wrote: >>> On Wed, 2006-10-18 at 15:24 -0700, Sean Bruno wrote: >>>> I have had a tough time tracking this one down, however I can say for >>>> certain that the 29320 is really having trouble if a LUN is power >>>> cycled. >>>> >>>> I don't have access to a BUS analyzer right now, but here is my >>>> regression. >>>> >>>> 1. Hook an external SCSI array/disk to a 29320. >>>> 2. Power up SCSI array/disk >>>> 3. Power up PC with 29320. >>>> 4. When PC has booted, login and test device by creating a file >>>> system, eg. mkfs /dev/sda (or whatever disk the array is called on >>>> ur machine). >>>> 5. Power cycle array/disk >>>> 6. Retest device with another 'mkfs /dev/sda' ... panic/crash/lock-up >>>> ensues. >>>> >>>> >>>> >>>> This did not happen in 2.6.15.7 but did appear in 2.6.16 and higher. >>>> > >> Does this only occur with sg or is that the only way you got a trace? In >> the original bug report you mentioned it occurring with mkfs, but the >> bug oops is from a sg request. Is tdg_2 run while the mkfs is running? > > Snippets from 'dmesg' during step 6: > > scsi0: Someone reset channel A > sd 0:0:4:0: Attempting to queue an ABORT message:CDB: 0x28 0x0 0x0 0x0 > 0x0 0x80 0x0 0x0 0x80 0x0 > Infinite interrupt loop, INTSTAT = 8scsi0: At time of recovery, card was > paused Ah. Hmm. Infinite SCSI interrupt. Maybe someone forgot to clear the status ... Can you try the attached patch? Cheers, Hannes -- Dr. Hannes Reinecke hare@suse.de SuSE Linux Products GmbH S390 & zSeries Maxfeldstraße 5 +49 911 74053 688 90409 Nürnberg http://www.suse.de [-- Attachment #2: aic79xx-reset-scsiint --] [-- Type: text/plain, Size: 1324 bytes --] diff --git a/drivers/scsi/aic7xxx/aic79xx_core.c b/drivers/scsi/aic7xxx/aic79xx_core.c index 653818d..78fa71d 100644 --- a/drivers/scsi/aic7xxx/aic79xx_core.c +++ b/drivers/scsi/aic7xxx/aic79xx_core.c @@ -1519,8 +1519,10 @@ ahd_handle_scsiint(struct ahd_softc *ahd /* * Ignore external resets after a bus reset. */ - if (((status & SCSIRSTI) != 0) && (ahd->flags & AHD_BUS_RESET_ACTIVE)) + if (((status & SCSIRSTI) != 0) && (ahd->flags & AHD_BUS_RESET_ACTIVE)) { + ahd_outb(ahd, CLRSINT1, CLRSCSIRSTI); return; + } /* * Clear bus reset flag @@ -7920,6 +7922,11 @@ #endif ahd_clear_fifo(ahd, 1); /* + * Clear SCSI interrupt status + */ + ahd_outb(ahd, CLRSINT1, CLRSCSIRSTI); + + /* * Reenable selections */ ahd_outb(ahd, SIMODE1, ahd_inb(ahd, SIMODE1) | ENSCSIRST); @@ -7952,10 +7959,6 @@ #ifdef AHD_TARGET_MODE } } #endif - /* Notify the XPT that a bus reset occurred */ - ahd_send_async(ahd, devinfo.channel, CAM_TARGET_WILDCARD, - CAM_LUN_WILDCARD, AC_BUS_RESET); - /* * Revert to async/narrow transfers until we renegotiate. */ @@ -7977,6 +7980,10 @@ #endif } } + /* Notify the XPT that a bus reset occurred */ + ahd_send_async(ahd, devinfo.channel, CAM_TARGET_WILDCARD, + CAM_LUN_WILDCARD, AC_BUS_RESET); + ahd_restart(ahd); return (found); ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: Adaptec 29320 [aic79xx] fails on power cycle of LUN 2006-10-19 14:10 ` Hannes Reinecke @ 2006-10-19 16:18 ` Sean Bruno 2006-10-20 7:01 ` Hannes Reinecke 0 siblings, 1 reply; 11+ messages in thread From: Sean Bruno @ 2006-10-19 16:18 UTC (permalink / raw) To: Hannes Reinecke; +Cc: linux-scsi On Thu, 2006-10-19 at 16:10 +0200, Hannes Reinecke wrote: > Sean Bruno wrote: > > On Thu, 2006-10-19 at 01:52 -0400, Mike Christie wrote: > >> On Wed, 2006-10-18 at 15:32 -0700, Sean Bruno wrote: > >>> On Wed, 2006-10-18 at 15:24 -0700, Sean Bruno wrote: > >>>> I have had a tough time tracking this one down, however I can say for > >>>> certain that the 29320 is really having trouble if a LUN is power > >>>> cycled. > >>>> > >>>> I don't have access to a BUS analyzer right now, but here is my > >>>> regression. > >>>> > >>>> 1. Hook an external SCSI array/disk to a 29320. > >>>> 2. Power up SCSI array/disk > >>>> 3. Power up PC with 29320. > >>>> 4. When PC has booted, login and test device by creating a file > >>>> system, eg. mkfs /dev/sda (or whatever disk the array is called on > >>>> ur machine). > >>>> 5. Power cycle array/disk > >>>> 6. Retest device with another 'mkfs /dev/sda' ... panic/crash/lock-up > >>>> ensues. > >>>> > >>>> > >>>> > >>>> This did not happen in 2.6.15.7 but did appear in 2.6.16 and higher. > >>>> > > > >> Does this only occur with sg or is that the only way you got a trace? In > >> the original bug report you mentioned it occurring with mkfs, but the > >> bug oops is from a sg request. Is tdg_2 run while the mkfs is running? > > > > Snippets from 'dmesg' during step 6: > > > > scsi0: Someone reset channel A > > sd 0:0:4:0: Attempting to queue an ABORT message:CDB: 0x28 0x0 0x0 0x0 > > 0x0 0x80 0x0 0x0 0x80 0x0 > > Infinite interrupt loop, INTSTAT = 8scsi0: At time of recovery, card was > > paused > Ah. Hmm. Infinite SCSI interrupt. > > Maybe someone forgot to clear the status ... > > Can you try the attached patch? > > Cheers, > > Hannes Better. The patch allows me to cycle power on the array exactly once. So the new regression is: 1. Hook an external SCSI array/disk to a 29320. 2. Power up SCSI array/disk 3. Power up PC with 29320. 4. When PC has booted, login and test device by creating a file system, eg. mkfs /dev/sda (or whatever disk the array is called on ur machine). 5. Power cycle array/disk 6. Retest device with another 'mkfs /dev/sda' <-- works just fine! 7. Power cycle array/disk 8. No need to do anything, card dump in dmesg/messages appears and device in not useable: Oct 19 09:12:26 testsrv kernel: scsi0: Someone reset channel A Oct 19 09:16:33 testsrv kernel: scsi0: Unexpected PKT busfree condition Oct 19 09:16:33 testsrv kernel: >>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<< Oct 19 09:16:33 testsrv kernel: scsi0: Dumping Card State at program address 0x20 Mode 0x33 Oct 19 09:16:33 testsrv kernel: Card was paused Oct 19 09:16:33 testsrv kernel: INTSTAT[0x0] SELOID[0x4] SELID[0x40] HS_MAILBOX[0x0] Oct 19 09:16:33 testsrv kernel: INTCTL[0x80] SEQINTSTAT[0x0] SAVED_MODE[0x11] DFFSTAT[0x33] Oct 19 09:16:33 testsrv kernel: SCSISIGI[0x0] SCSIPHASE[0x0] SCSIBUS[0x0] LASTPHASE[0x1] Oct 19 09:16:33 testsrv kernel: SCSISEQ0[0x0] SCSISEQ1[0x2] SEQCTL0[0x0] SEQINTCTL[0x0] Oct 19 09:16:33 testsrv kernel: SEQ_FLAGS[0xc0] SEQ_FLAGS2[0x0] QFREEZE_COUNT[0x7] Oct 19 09:16:33 testsrv kernel: KERNEL_QFREEZE_COUNT[0x7] MK_MESSAGE_SCB[0x2] MK_MESSAGE_SCSIID[0x47] Oct 19 09:16:33 testsrv kernel: SSTAT0[0x0] SSTAT1[0x0] SSTAT2[0x0] SSTAT3[0x0] Oct 19 09:16:33 testsrv kernel: PERRDIAG[0xc0] SIMODE1[0xa4] LQISTAT0[0x0] LQISTAT1[0x0] Oct 19 09:16:33 testsrv kernel: LQISTAT2[0x0] LQOSTAT0[0x0] LQOSTAT1[0x0] LQOSTAT2[0x0] Oct 19 09:16:33 testsrv kernel: Oct 19 09:16:33 testsrv kernel: SCB Count = 4 CMDS_PENDING = 0 LASTSCB 0xffff CURRSCB 0x0 NEXTSCB 0x0 Oct 19 09:16:33 testsrv kernel: qinstart = 52908 qinfifonext = 52908 Oct 19 09:16:33 testsrv kernel: QINFIFO: Oct 19 09:16:33 testsrv kernel: WAITING_TID_QUEUES: Oct 19 09:16:33 testsrv kernel: Pending list: Oct 19 09:16:33 testsrv kernel: Total 0 Oct 19 09:16:33 testsrv kernel: Kernel Free SCB list: 0 1 2 3 Oct 19 09:16:33 testsrv kernel: Sequencer Complete DMA-inprog list: Oct 19 09:16:33 testsrv kernel: Sequencer Complete list: Oct 19 09:16:33 testsrv kernel: Sequencer DMA-Up and Complete list: Oct 19 09:16:33 testsrv kernel: Sequencer On QFreeze and Complete list: Oct 19 09:16:33 testsrv kernel: Oct 19 09:16:33 testsrv kernel: Oct 19 09:16:33 testsrv kernel: scsi0: FIFO0 Free, LONGJMP == 0x80ff, SCB 0x0 Oct 19 09:16:33 testsrv kernel: SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x0] DFSTATUS[0x89] Oct 19 09:16:33 testsrv kernel: SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0] Oct 19 09:16:33 testsrv kernel: SOFFCNT[0x0] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0 Oct 19 09:16:33 testsrv kernel: HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10] Oct 19 09:16:33 testsrv kernel: Oct 19 09:16:33 testsrv kernel: scsi0: FIFO1 Free, LONGJMP == 0x81f1, SCB 0x0 Oct 19 09:16:33 testsrv kernel: SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x4] DFSTATUS[0x89] Oct 19 09:16:33 testsrv kernel: SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0] Oct 19 09:16:33 testsrv kernel: SOFFCNT[0x0] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0 Oct 19 09:16:33 testsrv kernel: HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10] Oct 19 09:16:33 testsrv kernel: LQIN: 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 Oct 19 09:16:33 testsrv kernel: scsi0: LQISTATE = 0x0, LQOSTATE = 0x0, OPTIONMODE = 0x52 Oct 19 09:16:33 testsrv kernel: scsi0: OS_SPACE_CNT = 0x20 MAXCMDCNT = 0x0 Oct 19 09:16:33 testsrv kernel: scsi0: SAVED_SCSIID = 0x0 SAVED_LUN = 0x0 Oct 19 09:16:33 testsrv kernel: SIMODE0[0xc] Oct 19 09:16:33 testsrv kernel: CCSCBCTL[0x0] Oct 19 09:16:33 testsrv kernel: scsi0: REG0 == 0xffff, SINDEX = 0x1e0, DINDEX = 0xe1 Oct 19 09:16:33 testsrv kernel: scsi0: SCBPTR == 0x0, SCB_NEXT == 0xffc0, SCB_NEXT2 == 0xff57 Oct 19 09:16:33 testsrv kernel: CDB 2a 0 0 80 9 d0 Oct 19 09:16:33 testsrv kernel: STACK: 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 Oct 19 09:16:33 testsrv kernel: <<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>> Sean ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Adaptec 29320 [aic79xx] fails on power cycle of LUN 2006-10-19 16:18 ` Sean Bruno @ 2006-10-20 7:01 ` Hannes Reinecke 2006-10-21 20:48 ` Sean Bruno 0 siblings, 1 reply; 11+ messages in thread From: Hannes Reinecke @ 2006-10-20 7:01 UTC (permalink / raw) To: Sean Bruno; +Cc: linux-scsi [-- Attachment #1: Type: text/plain, Size: 2649 bytes --] Sean Bruno wrote: > On Thu, 2006-10-19 at 16:10 +0200, Hannes Reinecke wrote: >> Sean Bruno wrote: >>> On Thu, 2006-10-19 at 01:52 -0400, Mike Christie wrote: >>>> On Wed, 2006-10-18 at 15:32 -0700, Sean Bruno wrote: >>>>> On Wed, 2006-10-18 at 15:24 -0700, Sean Bruno wrote: >>>>>> I have had a tough time tracking this one down, however I can say for >>>>>> certain that the 29320 is really having trouble if a LUN is power >>>>>> cycled. >>>>>> >>>>>> I don't have access to a BUS analyzer right now, but here is my >>>>>> regression. >>>>>> >>>>>> 1. Hook an external SCSI array/disk to a 29320. >>>>>> 2. Power up SCSI array/disk >>>>>> 3. Power up PC with 29320. >>>>>> 4. When PC has booted, login and test device by creating a file >>>>>> system, eg. mkfs /dev/sda (or whatever disk the array is called on >>>>>> ur machine). >>>>>> 5. Power cycle array/disk >>>>>> 6. Retest device with another 'mkfs /dev/sda' ... panic/crash/lock-up >>>>>> ensues. >>>>>> >>>>>> >>>>>> >>>>>> This did not happen in 2.6.15.7 but did appear in 2.6.16 and higher. >>>>>> >>>> Does this only occur with sg or is that the only way you got a trace? In >>>> the original bug report you mentioned it occurring with mkfs, but the >>>> bug oops is from a sg request. Is tdg_2 run while the mkfs is running? >>> Snippets from 'dmesg' during step 6: >>> >>> scsi0: Someone reset channel A >>> sd 0:0:4:0: Attempting to queue an ABORT message:CDB: 0x28 0x0 0x0 0x0 >>> 0x0 0x80 0x0 0x0 0x80 0x0 >>> Infinite interrupt loop, INTSTAT = 8scsi0: At time of recovery, card was >>> paused >> Ah. Hmm. Infinite SCSI interrupt. >> >> Maybe someone forgot to clear the status ... >> >> Can you try the attached patch? >> >> Cheers, >> >> Hannes > > Better. The patch allows me to cycle power on the array exactly once. > So the new regression is: > > 1. Hook an external SCSI array/disk to a 29320. > 2. Power up SCSI array/disk > 3. Power up PC with 29320. > 4. When PC has booted, login and test device by creating a file > system, eg. mkfs /dev/sda (or whatever disk the array is called on > ur machine). > 5. Power cycle array/disk > 6. Retest device with another 'mkfs /dev/sda' <-- works just fine! > 7. Power cycle array/disk > 8. No need to do anything, card dump in dmesg/messages appears and > device in not useable: > Ok. Not bad. So we have to switch to non-pkt commands after a reset. Make sense. Care to try the updated patch? Thanks for all the testing! Cheers, Hannes -- Dr. Hannes Reinecke hare@suse.de SuSE Linux Products GmbH S390 & zSeries Maxfeldstraße 5 +49 911 74053 688 90409 Nürnberg http://www.suse.de [-- Attachment #2: aic79xx-external-device-reset --] [-- Type: text/plain, Size: 3984 bytes --] diff --git a/drivers/scsi/aic7xxx/aic79xx_core.c b/drivers/scsi/aic7xxx/aic79xx_core.c index 653818d..555920a 100644 --- a/drivers/scsi/aic7xxx/aic79xx_core.c +++ b/drivers/scsi/aic7xxx/aic79xx_core.c @@ -1053,10 +1053,12 @@ #endif * If a target takes us into the command phase * assume that it has been externally reset and * has thus lost our previous packetized negotiation - * agreement. - * Revert to async/narrow transfers until we - * can renegotiate with the device and notify - * the OSM about the reset. + * agreement. Since we have not sent an identify + * message and may not have fully qualified the + * connection, we change our command to TUR, assert + * ATN and ABORT the task when we go to message in + * phase. The OSM will see the REQUEUE_REQUEST + * status and retry the command. */ scbid = ahd_get_scbptr(ahd); scb = ahd_lookup_scb(ahd, scbid); @@ -1083,7 +1085,28 @@ #endif ahd_set_syncrate(ahd, &devinfo, /*period*/0, /*offset*/0, /*ppr_options*/0, AHD_TRANS_ACTIVE, /*paused*/TRUE); - scb->flags |= SCB_EXTERNAL_RESET; + /* Hand-craft TUR command */ + ahd_outb(ahd, SCB_CDB_STORE, 0); + ahd_outb(ahd, SCB_CDB_STORE+1, 0); + ahd_outb(ahd, SCB_CDB_STORE+2, 0); + ahd_outb(ahd, SCB_CDB_STORE+3, 0); + ahd_outb(ahd, SCB_CDB_STORE+4, 0); + ahd_outb(ahd, SCB_CDB_STORE+5, 0); + ahd_outb(ahd, SCB_CDB_LEN, 6); + scb->hscb->control &= ~(TAG_ENB|SCB_TAG_TYPE); + scb->hscb->control |= MK_MESSAGE; + ahd_outb(ahd, SCB_CONTROL, scb->hscb->control); + ahd_outb(ahd, MSG_OUT, HOST_MSG); + ahd_outb(ahd, SAVED_SCSIID, scb->hscb->scsiid); + /* + * The lun is 0, regardless of the SCB's lun + * as we have not sent an identify message. + */ + ahd_outb(ahd, SAVED_LUN, 0); + ahd_outb(ahd, SEQ_FLAGS, 0); + ahd_assert_atn(ahd); + scb->flags &= ~SCB_PACKETIZED; + scb->flags |= SCB_ABORT|SCB_EXTERNAL_RESET; ahd_freeze_devq(ahd, scb); ahd_set_transaction_status(scb, CAM_REQUEUE_REQ); ahd_freeze_scb(scb); @@ -1519,8 +1542,10 @@ ahd_handle_scsiint(struct ahd_softc *ahd /* * Ignore external resets after a bus reset. */ - if (((status & SCSIRSTI) != 0) && (ahd->flags & AHD_BUS_RESET_ACTIVE)) + if (((status & SCSIRSTI) != 0) && (ahd->flags & AHD_BUS_RESET_ACTIVE)) { + ahd_outb(ahd, CLRSINT1, CLRSCSIRSTI); return; + } /* * Clear bus reset flag @@ -2200,6 +2225,22 @@ ahd_handle_nonpkt_busfree(struct ahd_sof if (sent_msg == MSG_ABORT_TAG) tag = SCB_GET_TAG(scb); + if ((scb->flags & SCB_EXTERNAL_RESET) != 0) { + /* + * This abort is in response to an + * unexpected switch to command phase + * for a packetized connection. Since + * the identify message was never sent, + * "saved lun" is 0. We really want to + * abort only the SCB that encountered + * this error, which could have a different + * lun. The SCB will be retried so the OS + * will see the UA after renegotiating to + * packetized. + */ + tag = SCB_GET_TAG(scb); + saved_lun = scb->hscb->lun; + } found = ahd_abort_scbs(ahd, target, 'A', saved_lun, tag, ROLE_INITIATOR, CAM_REQ_ABORTED); @@ -7920,6 +7961,11 @@ #endif ahd_clear_fifo(ahd, 1); /* + * Clear SCSI interrupt status + */ + ahd_outb(ahd, CLRSINT1, CLRSCSIRSTI); + + /* * Reenable selections */ ahd_outb(ahd, SIMODE1, ahd_inb(ahd, SIMODE1) | ENSCSIRST); @@ -7952,10 +7998,6 @@ #ifdef AHD_TARGET_MODE } } #endif - /* Notify the XPT that a bus reset occurred */ - ahd_send_async(ahd, devinfo.channel, CAM_TARGET_WILDCARD, - CAM_LUN_WILDCARD, AC_BUS_RESET); - /* * Revert to async/narrow transfers until we renegotiate. */ @@ -7977,6 +8019,10 @@ #endif } } + /* Notify the XPT that a bus reset occurred */ + ahd_send_async(ahd, devinfo.channel, CAM_TARGET_WILDCARD, + CAM_LUN_WILDCARD, AC_BUS_RESET); + ahd_restart(ahd); return (found); ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: Adaptec 29320 [aic79xx] fails on power cycle of LUN 2006-10-20 7:01 ` Hannes Reinecke @ 2006-10-21 20:48 ` Sean Bruno 2006-10-22 4:45 ` Sean Bruno 0 siblings, 1 reply; 11+ messages in thread From: Sean Bruno @ 2006-10-21 20:48 UTC (permalink / raw) To: Hannes Reinecke; +Cc: linux-scsi > > Better. The patch allows me to cycle power on the array exactly once. > > So the new regression is: > > > > 1. Hook an external SCSI array/disk to a 29320. > > 2. Power up SCSI array/disk > > 3. Power up PC with 29320. > > 4. When PC has booted, login and test device by creating a file > > system, eg. mkfs /dev/sda (or whatever disk the array is called on > > ur machine). > > 5. Power cycle array/disk > > 6. Retest device with another 'mkfs /dev/sda' <-- works just fine! > > 7. Power cycle array/disk > > 8. No need to do anything, card dump in dmesg/messages appears and > > device in not useable: > > > Ok. Not bad. So we have to switch to non-pkt commands after a reset. > Make sense. Care to try the updated patch? > > Thanks for all the testing! > > Cheers, Looking much better with this patch. I spent some time to whack together a SCSI target on a second machine with a QLA1040 that I have lying around and connected my test machine with the 29320 to it. Essentially, I'm just too lazy to sit at my desk an continually disconnect and reconnect the power on my array! :) After a couple of minutes of whacking on it, I was able to script something to automatically cycle through 5 iterations of 'mkfs' then reboot the new 'scsi disk', sleep for 120 seconds and repeat. Seems to work fine with the patch applied to 2.6.19-rc2. I'm running a longer 100 iteration loop and I'll get back with results. Sean ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Adaptec 29320 [aic79xx] fails on power cycle of LUN 2006-10-21 20:48 ` Sean Bruno @ 2006-10-22 4:45 ` Sean Bruno 0 siblings, 0 replies; 11+ messages in thread From: Sean Bruno @ 2006-10-22 4:45 UTC (permalink / raw) To: Hannes Reinecke; +Cc: linux-scsi > I spent some time to whack together a SCSI target on a second machine > with a QLA1040 that I have lying around and connected my test machine > with the 29320 to it. Essentially, I'm just too lazy to sit at my desk > an continually disconnect and reconnect the power on my array! :) > > After a couple of minutes of whacking on it, I was able to script > something to automatically cycle through 5 iterations of 'mkfs' then > reboot the new 'scsi disk', sleep for 120 seconds and repeat. > > Seems to work fine with the patch applied to 2.6.19-rc2. I'm running a > longer 100 iteration loop and I'll get back with results. > Nice. I ran a 100 loop test with zero errors. Sean ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2006-10-22 4:45 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-10-18 22:24 Adaptec 29320 [aic79xx] fails on power cycle of LUN Sean Bruno 2006-10-18 22:27 ` James Bottomley 2006-10-18 22:32 ` Sean Bruno 2006-10-19 5:52 ` Mike Christie 2006-10-19 12:23 ` Sean Bruno 2006-10-19 12:25 ` Sean Bruno 2006-10-19 14:10 ` Hannes Reinecke 2006-10-19 16:18 ` Sean Bruno 2006-10-20 7:01 ` Hannes Reinecke 2006-10-21 20:48 ` Sean Bruno 2006-10-22 4:45 ` Sean Bruno
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox