* Adaptec 29320 [aic79xx] fails on power cycle of LUN
@ 2006-10-18 22:24 Sean Bruno
2006-10-18 22:27 ` James Bottomley
2006-10-18 22:32 ` Sean Bruno
0 siblings, 2 replies; 11+ messages in thread
From: Sean Bruno @ 2006-10-18 22:24 UTC (permalink / raw)
To: linux-scsi
I have had a tough time tracking this one down, however I can say for
certain that the 29320 is really having trouble if a LUN is power
cycled.
I don't have access to a BUS analyzer right now, but here is my
regression.
1. Hook an external SCSI array/disk to a 29320.
2. Power up SCSI array/disk
3. Power up PC with 29320.
4. When PC has booted, login and test device by creating a file
system, eg. mkfs /dev/sda (or whatever disk the array is called on
ur machine).
5. Power cycle array/disk
6. Retest device with another 'mkfs /dev/sda' ... panic/crash/lock-up
ensues.
This did not happen in 2.6.15.7 but did appear in 2.6.16 and higher.
Any ideas?
Sean
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Adaptec 29320 [aic79xx] fails on power cycle of LUN
2006-10-18 22:24 Adaptec 29320 [aic79xx] fails on power cycle of LUN Sean Bruno
@ 2006-10-18 22:27 ` James Bottomley
2006-10-18 22:32 ` Sean Bruno
1 sibling, 0 replies; 11+ messages in thread
From: James Bottomley @ 2006-10-18 22:27 UTC (permalink / raw)
To: Sean Bruno; +Cc: linux-scsi
On Wed, 2006-10-18 at 15:24 -0700, Sean Bruno wrote:
> Any ideas?
It would help if you could post the trace from the panic or crash.
James
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Adaptec 29320 [aic79xx] fails on power cycle of LUN
2006-10-18 22:24 Adaptec 29320 [aic79xx] fails on power cycle of LUN Sean Bruno
2006-10-18 22:27 ` James Bottomley
@ 2006-10-18 22:32 ` Sean Bruno
2006-10-19 5:52 ` Mike Christie
1 sibling, 1 reply; 11+ messages in thread
From: Sean Bruno @ 2006-10-18 22:32 UTC (permalink / raw)
To: linux-scsi
On Wed, 2006-10-18 at 15:24 -0700, Sean Bruno wrote:
> I have had a tough time tracking this one down, however I can say for
> certain that the 29320 is really having trouble if a LUN is power
> cycled.
>
> I don't have access to a BUS analyzer right now, but here is my
> regression.
>
> 1. Hook an external SCSI array/disk to a 29320.
> 2. Power up SCSI array/disk
> 3. Power up PC with 29320.
> 4. When PC has booted, login and test device by creating a file
> system, eg. mkfs /dev/sda (or whatever disk the array is called on
> ur machine).
> 5. Power cycle array/disk
> 6. Retest device with another 'mkfs /dev/sda' ... panic/crash/lock-up
> ensues.
>
>
>
> This did not happen in 2.6.15.7 but did appear in 2.6.16 and higher.
>
>From 2.6.19-rc2 I at least get something from a crash without the entire
box locking up on me.
The process tdg_2 is a 'test data generator' basically it writes data to
the scsi disk in a testable pattern that is later validated.
------------[ cut here ]------------
kernel BUG at mm/slab.c:594!
invalid opcode: 0000 [#1]
SMP
Modules linked in: autofs4 hidp rfcomm l2cap bluetooth sunrpc iscsi_tcp
libiscsi scsi_transport_iscsi ipv6 video sbs i2c_ec i2c_core button
battery asus_acpi ac parport_pc lp parport snd_intel8x0 snd_ac97_codec
snd_ac97_bus sg snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq
snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm floppy snd_timer snd
soundcore snd_page_alloc serio_raw ide_cd skge cdrom pcspkr dm_snapshot
dm_zero dm_mirror dm_mod aic79xx scsi_transport_spi sd_mod scsi_mod ext3
jbd ehci_hcd ohci_hcd uhci_hcd
CPU: 0
EIP: 0060:[<c0169562>] Not tainted VLI
EFLAGS: 00010246 (2.6.19-rc2 #1)
EIP is at kmem_cache_free+0x29/0x6d
eax: 00000000 ebx: dffae300 ecx: dff91b80 edx: c1a00000
esi: dffaaf80 edi: 00000000 ebp: d3f324c0 esp: d3fb9dd0
ds: 007b es: 007b ss: 0068
Process tdg_2 (pid: 2362, ti=d3fb9000 task=dfd6cd50 task.ti=d3fb9000)
Stack: dffae300 dffaaf80 00000000 c0154448 00000000 d3e09a80 dffaaf80
d3e09a80
c018bafc 00001000 00000000 c018b822 e088efa0 00001000 00000000
0000000a
d3fb9ef0 d43f76c8 00003000 00000000 00000001 c130cac8 00008000
00000000
Call Trace:
[<c0154448>] mempool_free+0x66/0x6b
[<c018bafc>] bio_free+0x25/0x30
[<c018b822>] bio_put+0x28/0x29
[<e088efa0>] scsi_execute_async+0x15f/0x33d [scsi_mod]
[<e09c9913>] sg_common_write+0x704/0x772 [sg]
[<e09c9ba6>] sg_new_write+0x225/0x248 [sg]
[<e09cae45>] sg_write+0x106/0x33a [sg]
[<c016dae7>] vfs_write+0xa8/0x159
[<c016e114>] sys_write+0x41/0x67
[<c0103dc9>] sysenter_past_esp+0x56/0x79
DWARF2 unwinder stuck at sysenter_past_esp+0x56/0x79
Leftover inexact backtrace:
[<c031007b>] sleep_on+0x1e/0x6c
=======================
Code: 5f c3 89 c1 8d 82 00 00 00 40 c1 e8 0c 57 89 d7 6b d0 28 03 15 00
d6 50 c0 56 53 8b 02 f6 c4 40 74 03 8b 52 0c 8b 02 84 c0 78 08 <0f> 0b
52 02 e6 6b 33 c0 39 4a 20 74 08 0f 0b ca 0d e6 6b 33 c0
EIP: [<c0169562>] kmem_cache_free+0x29/0x6d SS:ESP 0068:d3fb9dd0
Sean
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Adaptec 29320 [aic79xx] fails on power cycle of LUN
2006-10-18 22:32 ` Sean Bruno
@ 2006-10-19 5:52 ` Mike Christie
2006-10-19 12:23 ` Sean Bruno
2006-10-19 12:25 ` Sean Bruno
0 siblings, 2 replies; 11+ messages in thread
From: Mike Christie @ 2006-10-19 5:52 UTC (permalink / raw)
To: Sean Bruno; +Cc: linux-scsi
On Wed, 2006-10-18 at 15:32 -0700, Sean Bruno wrote:
> On Wed, 2006-10-18 at 15:24 -0700, Sean Bruno wrote:
> > I have had a tough time tracking this one down, however I can say for
> > certain that the 29320 is really having trouble if a LUN is power
> > cycled.
> >
> > I don't have access to a BUS analyzer right now, but here is my
> > regression.
> >
> > 1. Hook an external SCSI array/disk to a 29320.
> > 2. Power up SCSI array/disk
> > 3. Power up PC with 29320.
> > 4. When PC has booted, login and test device by creating a file
> > system, eg. mkfs /dev/sda (or whatever disk the array is called on
> > ur machine).
> > 5. Power cycle array/disk
> > 6. Retest device with another 'mkfs /dev/sda' ... panic/crash/lock-up
> > ensues.
> >
> >
> >
> > This did not happen in 2.6.15.7 but did appear in 2.6.16 and higher.
> >
> >From 2.6.19-rc2 I at least get something from a crash without the entire
> box locking up on me.
>
> The process tdg_2 is a 'test data generator' basically it writes data to
> the scsi disk in a testable pattern that is later validated.
>
> ------------[ cut here ]------------
> kernel BUG at mm/slab.c:594!
> invalid opcode: 0000 [#1]
> SMP
> Modules linked in: autofs4 hidp rfcomm l2cap bluetooth sunrpc iscsi_tcp
> libiscsi scsi_transport_iscsi ipv6 video sbs i2c_ec i2c_core button
> battery asus_acpi ac parport_pc lp parport snd_intel8x0 snd_ac97_codec
> snd_ac97_bus sg snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq
> snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm floppy snd_timer snd
> soundcore snd_page_alloc serio_raw ide_cd skge cdrom pcspkr dm_snapshot
> dm_zero dm_mirror dm_mod aic79xx scsi_transport_spi sd_mod scsi_mod ext3
> jbd ehci_hcd ohci_hcd uhci_hcd
> CPU: 0
> EIP: 0060:[<c0169562>] Not tainted VLI
> EFLAGS: 00010246 (2.6.19-rc2 #1)
> EIP is at kmem_cache_free+0x29/0x6d
> eax: 00000000 ebx: dffae300 ecx: dff91b80 edx: c1a00000
> esi: dffaaf80 edi: 00000000 ebp: d3f324c0 esp: d3fb9dd0
> ds: 007b es: 007b ss: 0068
> Process tdg_2 (pid: 2362, ti=d3fb9000 task=dfd6cd50 task.ti=d3fb9000)
> Stack: dffae300 dffaaf80 00000000 c0154448 00000000 d3e09a80 dffaaf80
> d3e09a80
> c018bafc 00001000 00000000 c018b822 e088efa0 00001000 00000000
> 0000000a
> d3fb9ef0 d43f76c8 00003000 00000000 00000001 c130cac8 00008000
> 00000000
> Call Trace:
> [<c0154448>] mempool_free+0x66/0x6b
> [<c018bafc>] bio_free+0x25/0x30
> [<c018b822>] bio_put+0x28/0x29
> [<e088efa0>] scsi_execute_async+0x15f/0x33d [scsi_mod]
> [<e09c9913>] sg_common_write+0x704/0x772 [sg]
> [<e09c9ba6>] sg_new_write+0x225/0x248 [sg]
> [<e09cae45>] sg_write+0x106/0x33a [sg]
> [<c016dae7>] vfs_write+0xa8/0x159
> [<c016e114>] sys_write+0x41/0x67
> [<c0103dc9>] sysenter_past_esp+0x56/0x79
> DWARF2 unwinder stuck at sysenter_past_esp+0x56/0x79
>
> Leftover inexact backtrace:
>
> [<c031007b>] sleep_on+0x1e/0x6c
> =======================
> Code: 5f c3 89 c1 8d 82 00 00 00 40 c1 e8 0c 57 89 d7 6b d0 28 03 15 00
> d6 50 c0 56 53 8b 02 f6 c4 40 74 03 8b 52 0c 8b 02 84 c0 78 08 <0f> 0b
> 52 02 e6 6b 33 c0 39 4a 20 74 08 0f 0b ca 0d e6 6b 33 c0
> EIP: [<c0169562>] kmem_cache_free+0x29/0x6d SS:ESP 0068:d3fb9dd0
Does this only occur with sg or is that the only way you got a trace? In
the original bug report you mentioned it occurring with mkfs, but the
bug oops is from a sg request. Is tdg_2 run while the mkfs is running?
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Adaptec 29320 [aic79xx] fails on power cycle of LUN
2006-10-19 5:52 ` Mike Christie
@ 2006-10-19 12:23 ` Sean Bruno
2006-10-19 12:25 ` Sean Bruno
1 sibling, 0 replies; 11+ messages in thread
From: Sean Bruno @ 2006-10-19 12:23 UTC (permalink / raw)
To: Mike Christie; +Cc: linux-scsi
> Does this only occur with sg or is that the only way you got a trace? In
> the original bug report you mentioned it occurring with mkfs, but the
> bug oops is from a sg request. Is tdg_2 run while the mkfs is running?
In my case, the issue exists with or without sg.
In my example with mkfs, it may or may not dump a trace. The mkfs may
just take 'forever' and not really do anything. Thankfully, the sg
module was causing a dump. Hope this helps.
I'll try to get a SCSI BUS trace this Morning.
Sean
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Adaptec 29320 [aic79xx] fails on power cycle of LUN
2006-10-19 5:52 ` Mike Christie
2006-10-19 12:23 ` Sean Bruno
@ 2006-10-19 12:25 ` Sean Bruno
2006-10-19 14:10 ` Hannes Reinecke
1 sibling, 1 reply; 11+ messages in thread
From: Sean Bruno @ 2006-10-19 12:25 UTC (permalink / raw)
To: linux-scsi
On Thu, 2006-10-19 at 01:52 -0400, Mike Christie wrote:
> On Wed, 2006-10-18 at 15:32 -0700, Sean Bruno wrote:
> > On Wed, 2006-10-18 at 15:24 -0700, Sean Bruno wrote:
> > > I have had a tough time tracking this one down, however I can say for
> > > certain that the 29320 is really having trouble if a LUN is power
> > > cycled.
> > >
> > > I don't have access to a BUS analyzer right now, but here is my
> > > regression.
> > >
> > > 1. Hook an external SCSI array/disk to a 29320.
> > > 2. Power up SCSI array/disk
> > > 3. Power up PC with 29320.
> > > 4. When PC has booted, login and test device by creating a file
> > > system, eg. mkfs /dev/sda (or whatever disk the array is called on
> > > ur machine).
> > > 5. Power cycle array/disk
> > > 6. Retest device with another 'mkfs /dev/sda' ... panic/crash/lock-up
> > > ensues.
> > >
> > >
> > >
> > > This did not happen in 2.6.15.7 but did appear in 2.6.16 and higher.
> > >
> Does this only occur with sg or is that the only way you got a trace? In
> the original bug report you mentioned it occurring with mkfs, but the
> bug oops is from a sg request. Is tdg_2 run while the mkfs is running?
Snippets from 'dmesg' during step 6:
scsi0: Someone reset channel A
sd 0:0:4:0: Attempting to queue an ABORT message:CDB: 0x28 0x0 0x0 0x0
0x0 0x80 0x0 0x0 0x80 0x0
Infinite interrupt loop, INTSTAT = 8scsi0: At time of recovery, card was
paused
>>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<<
scsi0: Dumping Card State at program address 0x13 Mode 0x33
Card was paused
INTSTAT[0x8] SELOID[0x4] SELID[0x40] HS_MAILBOX[0x0]
INTCTL[0x80] SEQINTSTAT[0x0] SAVED_MODE[0x11] DFFSTAT[0x33]
SCSISIGI[0x0] SCSIPHASE[0x0] SCSIBUS[0x0] LASTPHASE[0x1]
SCSISEQ0[0x0] SCSISEQ1[0x2] SEQCTL0[0x0] SEQINTCTL[0x0]
SEQ_FLAGS[0xc0] SEQ_FLAGS2[0x4] QFREEZE_COUNT[0x9]
KERNEL_QFREEZE_COUNT[0x9] MK_MESSAGE_SCB[0x0] MK_MESSAGE_SCSIID[0x47]
SSTAT0[0x0] SSTAT1[0x28] SSTAT2[0x0] SSTAT3[0x0]
PERRDIAG[0xc0] SIMODE1[0xa4] LQISTAT0[0x0] LQISTAT1[0x0]
LQISTAT2[0x0] LQOSTAT0[0x0] LQOSTAT1[0x0] LQOSTAT2[0x0]
SCB Count = 4 CMDS_PENDING = 0 LASTSCB 0xffff CURRSCB 0x0 NEXTSCB 0x0
qinstart = 11319 qinfifonext = 11321
QINFIFO: 0x0 0x2
WAITING_TID_QUEUES:
Pending list:
2 FIFO_USE[0x0] SCB_CONTROL[0x68] SCB_SCSIID[0x47]
0 FIFO_USE[0x0] SCB_CONTROL[0x68] SCB_SCSIID[0x47]
Total 2
Kernel Free SCB list: 1 3
Sequencer Complete DMA-inprog list:
Sequencer Complete list:
Sequencer DMA-Up and Complete list:
Sequencer On QFreeze and Complete list:
scsi0: FIFO0 Free, LONGJMP == 0x80ff, SCB 0x0
SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x0] DFSTATUS[0x89]
SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0]
SOFFCNT[0x0] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0
HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10]
scsi0: FIFO1 Free, LONGJMP == 0x81f1, SCB 0x0
SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x4] DFSTATUS[0x89]
SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0]
SOFFCNT[0x0] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0
HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10]
LQIN: 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0
0x0 0x0 0x0 0x0
scsi0: LQISTATE = 0x0, LQOSTATE = 0x0, OPTIONMODE = 0x52
scsi0: OS_SPACE_CNT = 0x20 MAXCMDCNT = 0x0
scsi0: SAVED_SCSIID = 0x0 SAVED_LUN = 0x0
SIMODE0[0xc]
CCSCBCTL[0x0]
scsi0: REG0 == 0xffff, SINDEX = 0x1e0, DINDEX = 0xe1
scsi0: SCBPTR == 0x0, SCB_NEXT == 0xff00, SCB_NEXT2 == 0xff57
CDB 2a 0 0 80 9 e0
STACK: 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0
<<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>>
scsi0:0:4:0: Cmd aborted from QINFIFO
sd 0:0:4:0: Attempting to queue an ABORT message:CDB: 0x0 0x0 0x0 0x0
0x0 0x0
Infinite interrupt loop, INTSTAT = 8scsi0: At time of recovery, card was
paused
>>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<<
scsi0: Dumping Card State at program address 0x13 Mode 0x33
Card was paused
INTSTAT[0x8] SELOID[0x4] SELID[0x40] HS_MAILBOX[0x0]
INTCTL[0x80] SEQINTSTAT[0x0] SAVED_MODE[0x11] DFFSTAT[0x33]
SCSISIGI[0x0] SCSIPHASE[0x0] SCSIBUS[0x0] LASTPHASE[0x1]
SCSISEQ0[0x0] SCSISEQ1[0x2] SEQCTL0[0x0] SEQINTCTL[0x0]
SEQ_FLAGS[0xc0] SEQ_FLAGS2[0x4] QFREEZE_COUNT[0x9]
KERNEL_QFREEZE_COUNT[0x9] MK_MESSAGE_SCB[0x0] MK_MESSAGE_SCSIID[0x47]
SSTAT0[0x0] SSTAT1[0x28] SSTAT2[0x0] SSTAT3[0x0]
PERRDIAG[0xc0] SIMODE1[0xa4] LQISTAT0[0x0] LQISTAT1[0x0]
LQISTAT2[0x0] LQOSTAT0[0x0] LQOSTAT1[0x0] LQOSTAT2[0x0]
SCB Count = 4 CMDS_PENDING = 0 LASTSCB 0xffff CURRSCB 0x0 NEXTSCB 0x0
qinstart = 11319 qinfifonext = 11321
QINFIFO: 0x2 0x0
WAITING_TID_QUEUES:
Pending list:
0 FIFO_USE[0x0] SCB_CONTROL[0x68] SCB_SCSIID[0x47]
2 FIFO_USE[0x0] SCB_CONTROL[0x68] SCB_SCSIID[0x47]
Total 2
Kernel Free SCB list: 1 3
Sequencer Complete DMA-inprog list:
Sequencer Complete list:
Sequencer DMA-Up and Complete list:
Sequencer On QFreeze and Complete list:
scsi0: FIFO0 Free, LONGJMP == 0x80ff, SCB 0x0
SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x0] DFSTATUS[0x89]
SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0]
SOFFCNT[0x0] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0
HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10]
scsi0: FIFO1 Free, LONGJMP == 0x81f1, SCB 0x0
SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x4] DFSTATUS[0x89]
SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0]
SOFFCNT[0x0] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0
HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10]
LQIN: 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0
0x0 0x0 0x0 0x0
scsi0: LQISTATE = 0x0, LQOSTATE = 0x0, OPTIONMODE = 0x52
scsi0: OS_SPACE_CNT = 0x20 MAXCMDCNT = 0x0
scsi0: SAVED_SCSIID = 0x0 SAVED_LUN = 0x0
SIMODE0[0xc]
CCSCBCTL[0x0]
scsi0: REG0 == 0xffff, SINDEX = 0x1e0, DINDEX = 0xe1
scsi0: SCBPTR == 0x0, SCB_NEXT == 0xff00, SCB_NEXT2 == 0xff57
CDB 2a 0 0 80 9 e0
STACK: 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0
<<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>>
scsi0:0:4:0: Cmd aborted from QINFIFO
sd 0:0:4:0: Attempting to queue an ABORT message:CDB: 0x28 0x0 0x0 0x0
0x0 0x0 0x0 0x0 0x80 0x0
Infinite interrupt loop, INTSTAT = 8scsi0: At time of recovery, card was
paused
>>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<<
scsi0: Dumping Card State at program address 0x13 Mode 0x33
Card was paused
INTSTAT[0x8] SELOID[0x4] SELID[0x40] HS_MAILBOX[0x0]
INTCTL[0x80] SEQINTSTAT[0x0] SAVED_MODE[0x11] DFFSTAT[0x33]
SCSISIGI[0x0] SCSIPHASE[0x0] SCSIBUS[0x0] LASTPHASE[0x1]
SCSISEQ0[0x0] SCSISEQ1[0x2] SEQCTL0[0x0] SEQINTCTL[0x0]
SEQ_FLAGS[0xc0] SEQ_FLAGS2[0x4] QFREEZE_COUNT[0x9]
KERNEL_QFREEZE_COUNT[0x9] MK_MESSAGE_SCB[0x0] MK_MESSAGE_SCSIID[0x47]
SSTAT0[0x0] SSTAT1[0x28] SSTAT2[0x0] SSTAT3[0x0]
PERRDIAG[0xc0] SIMODE1[0xa4] LQISTAT0[0x0] LQISTAT1[0x0]
LQISTAT2[0x0] LQOSTAT0[0x0] LQOSTAT1[0x0] LQOSTAT2[0x0]
SCB Count = 4 CMDS_PENDING = 0 LASTSCB 0xffff CURRSCB 0x0 NEXTSCB 0x0
qinstart = 11319 qinfifonext = 11320
QINFIFO: 0x2
WAITING_TID_QUEUES:
Pending list:
2 FIFO_USE[0x0] SCB_CONTROL[0x68] SCB_SCSIID[0x47]
Total 1
Kernel Free SCB list: 0 1 3
Sequencer Complete DMA-inprog list:
Sequencer Complete list:
Sequencer DMA-Up and Complete list:
Sequencer On QFreeze and Complete list:
scsi0: FIFO0 Free, LONGJMP == 0x80ff, SCB 0x0
SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x0] DFSTATUS[0x89]
SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0]
SOFFCNT[0x0] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0
HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10]
scsi0: FIFO1 Free, LONGJMP == 0x81f1, SCB 0x0
SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x4] DFSTATUS[0x89]
SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0]
SOFFCNT[0x0] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0
HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10]
LQIN: 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0
0x0 0x0 0x0 0x0
scsi0: LQISTATE = 0x0, LQOSTATE = 0x0, OPTIONMODE = 0x52
scsi0: OS_SPACE_CNT = 0x20 MAXCMDCNT = 0x0
scsi0: SAVED_SCSIID = 0x0 SAVED_LUN = 0x0
SIMODE0[0xc]
CCSCBCTL[0x0]
scsi0: REG0 == 0xffff, SINDEX = 0x1e0, DINDEX = 0xe1
scsi0: SCBPTR == 0x0, SCB_NEXT == 0xff00, SCB_NEXT2 == 0xff57
CDB 2a 0 0 80 9 e0
STACK: 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0
<<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>>
scsi0:0:4:0: Cmd aborted from QINFIFO
sd 0:0:4:0: Attempting to queue an ABORT message:CDB: 0x0 0x0 0x0 0x0
0x0 0x0
Infinite interrupt loop, INTSTAT = 8scsi0: At time of recovery, card was
paused
>>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<<
scsi0: Dumping Card State at program address 0x13 Mode 0x33
Card was paused
INTSTAT[0x8] SELOID[0x4] SELID[0x40] HS_MAILBOX[0x0]
INTCTL[0x80] SEQINTSTAT[0x0] SAVED_MODE[0x11] DFFSTAT[0x33]
SCSISIGI[0x0] SCSIPHASE[0x0] SCSIBUS[0x0] LASTPHASE[0x1]
SCSISEQ0[0x0] SCSISEQ1[0x2] SEQCTL0[0x0] SEQINTCTL[0x0]
SEQ_FLAGS[0xc0] SEQ_FLAGS2[0x4] QFREEZE_COUNT[0x9]
KERNEL_QFREEZE_COUNT[0x9] MK_MESSAGE_SCB[0x0] MK_MESSAGE_SCSIID[0x47]
SSTAT0[0x0] SSTAT1[0x28] SSTAT2[0x0] SSTAT3[0x0]
PERRDIAG[0xc0] SIMODE1[0xa4] LQISTAT0[0x0] LQISTAT1[0x0]
LQISTAT2[0x0] LQOSTAT0[0x0] LQOSTAT1[0x0] LQOSTAT2[0x0]
SCB Count = 4 CMDS_PENDING = 0 LASTSCB 0xffff CURRSCB 0x0 NEXTSCB 0x0
qinstart = 11319 qinfifonext = 11320
QINFIFO: 0x2
WAITING_TID_QUEUES:
Pending list:
2 FIFO_USE[0x0] SCB_CONTROL[0x68] SCB_SCSIID[0x47]
Total 1
Kernel Free SCB list: 0 1 3
Sequencer Complete DMA-inprog list:
Sequencer Complete list:
Sequencer DMA-Up and Complete list:
Sequencer On QFreeze and Complete list:
scsi0: FIFO0 Free, LONGJMP == 0x80ff, SCB 0x0
SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x0] DFSTATUS[0x89]
SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0]
SOFFCNT[0x0] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0
HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10]
scsi0: FIFO1 Free, LONGJMP == 0x81f1, SCB 0x0
SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x4] DFSTATUS[0x89]
SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0]
SOFFCNT[0x0] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0
HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10]
LQIN: 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0
0x0 0x0 0x0 0x0
scsi0: LQISTATE = 0x0, LQOSTATE = 0x0, OPTIONMODE = 0x52
scsi0: OS_SPACE_CNT = 0x20 MAXCMDCNT = 0x0
scsi0: SAVED_SCSIID = 0x0 SAVED_LUN = 0x0
SIMODE0[0xc]
CCSCBCTL[0x0]
scsi0: REG0 == 0xffff, SINDEX = 0x1e0, DINDEX = 0xe1
scsi0: SCBPTR == 0x0, SCB_NEXT == 0xff00, SCB_NEXT2 == 0xff57
CDB 2a 0 0 80 9 e0
STACK: 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0
<<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>>
scsi0:0:4:0: Cmd aborted from QINFIFO
sd 0:0:4:0: Attempting to queue a TARGET RESET message:CDB: 0x28 0x0 0x0
0x0 0x0 0x80 0x0 0x0 0x80 0x0
scsi0: Device reset code sleeping
scsi0: Device reset timer expired (active 1)
scsi0: Device reset returning 0x2003
scsi0: bus reset still active
sd 0:0:4:0: Attempting to queue an ABORT message:CDB: 0x0 0x0 0x0 0x0
0x0 0x0
Infinite interrupt loop, INTSTAT = 8scsi0: At time of recovery, card was
paused
>>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<<
scsi0: Dumping Card State at program address 0x13 Mode 0x33
Card was paused
INTSTAT[0x8] SELOID[0x4] SELID[0x40] HS_MAILBOX[0x0]
INTCTL[0x80] SEQINTSTAT[0x0] SAVED_MODE[0x11] DFFSTAT[0x33]
SCSISIGI[0x0] SCSIPHASE[0x0] SCSIBUS[0x0] LASTPHASE[0x1]
SCSISEQ0[0x0] SCSISEQ1[0x2] SEQCTL0[0x0] SEQINTCTL[0x0]
SEQ_FLAGS[0xc0] SEQ_FLAGS2[0x4] QFREEZE_COUNT[0x9]
KERNEL_QFREEZE_COUNT[0x9] MK_MESSAGE_SCB[0x0] MK_MESSAGE_SCSIID[0x47]
SSTAT0[0x0] SSTAT1[0x28] SSTAT2[0x0] SSTAT3[0x0]
PERRDIAG[0xc0] SIMODE1[0xa4] LQISTAT0[0x0] LQISTAT1[0x0]
LQISTAT2[0x0] LQOSTAT0[0x0] LQOSTAT1[0x0] LQOSTAT2[0x0]
SCB Count = 4 CMDS_PENDING = 0 LASTSCB 0xffff CURRSCB 0x0 NEXTSCB 0x0
qinstart = 11319 qinfifonext = 11321
QINFIFO: 0x2 0x0
WAITING_TID_QUEUES:
Pending list:
0 FIFO_USE[0x0] SCB_CONTROL[0x68] SCB_SCSIID[0x47]
2 FIFO_USE[0x0] SCB_CONTROL[0x68] SCB_SCSIID[0x47]
Total 2
Kernel Free SCB list: 1 3
Sequencer Complete DMA-inprog list:
Sequencer Complete list:
Sequencer DMA-Up and Complete list:
Sequencer On QFreeze and Complete list:
scsi0: FIFO0 Free, LONGJMP == 0x80ff, SCB 0x0
SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x0] DFSTATUS[0x89]
SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0]
SOFFCNT[0x0] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0
HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10]
scsi0: FIFO1 Free, LONGJMP == 0x81f1, SCB 0x0
SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x4] DFSTATUS[0x89]
SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0]
SOFFCNT[0x0] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0
HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10]
LQIN: 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0
0x0 0x0 0x0 0x0
scsi0: LQISTATE = 0x0, LQOSTATE = 0x0, OPTIONMODE = 0x52
scsi0: OS_SPACE_CNT = 0x20 MAXCMDCNT = 0x0
scsi0: SAVED_SCSIID = 0x0 SAVED_LUN = 0x0
SIMODE0[0xc]
CCSCBCTL[0x0]
scsi0: REG0 == 0xffff, SINDEX = 0x1e0, DINDEX = 0xe1
scsi0: SCBPTR == 0x0, SCB_NEXT == 0xff00, SCB_NEXT2 == 0xff57
CDB 2a 0 0 80 9 e0
STACK: 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0
<<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>>
scsi0:0:4:0: Cmd aborted from QINFIFO
sd 0:0:4:0: Attempting to queue an ABORT message:CDB: 0x0 0x0 0x0 0x0
0x0 0x0
Infinite interrupt loop, INTSTAT = 8scsi0: At time of recovery, card was
paused
>>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<<
scsi0: Dumping Card State at program address 0x13 Mode 0x33
Card was paused
INTSTAT[0x8] SELOID[0x4] SELID[0x40] HS_MAILBOX[0x0]
INTCTL[0x80] SEQINTSTAT[0x0] SAVED_MODE[0x11] DFFSTAT[0x33]
SCSISIGI[0x0] SCSIPHASE[0x0] SCSIBUS[0x0] LASTPHASE[0x1]
SCSISEQ0[0x0] SCSISEQ1[0x2] SEQCTL0[0x0] SEQINTCTL[0x0]
SEQ_FLAGS[0xc0] SEQ_FLAGS2[0x4] QFREEZE_COUNT[0x9]
KERNEL_QFREEZE_COUNT[0x9] MK_MESSAGE_SCB[0x0] MK_MESSAGE_SCSIID[0x47]
SSTAT0[0x0] SSTAT1[0x28] SSTAT2[0x0] SSTAT3[0x0]
PERRDIAG[0xc0] SIMODE1[0xa4] LQISTAT0[0x0] LQISTAT1[0x0]
LQISTAT2[0x0] LQOSTAT0[0x0] LQOSTAT1[0x0] LQOSTAT2[0x0]
SCB Count = 4 CMDS_PENDING = 0 LASTSCB 0xffff CURRSCB 0x0 NEXTSCB 0x0
qinstart = 11319 qinfifonext = 11321
QINFIFO: 0x2 0x0
WAITING_TID_QUEUES:
Pending list:
0 FIFO_USE[0x0] SCB_CONTROL[0x68] SCB_SCSIID[0x47]
2 FIFO_USE[0x0] SCB_CONTROL[0x68] SCB_SCSIID[0x47]
Total 2
Kernel Free SCB list: 1 3
Sequencer Complete DMA-inprog list:
Sequencer Complete list:
Sequencer DMA-Up and Complete list:
Sequencer On QFreeze and Complete list:
scsi0: FIFO0 Free, LONGJMP == 0x80ff, SCB 0x0
SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x0] DFSTATUS[0x89]
SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0]
SOFFCNT[0x0] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0
HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10]
scsi0: FIFO1 Free, LONGJMP == 0x81f1, SCB 0x0
SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x4] DFSTATUS[0x89]
SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0]
SOFFCNT[0x0] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0
HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10]
LQIN: 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0
0x0 0x0 0x0 0x0
scsi0: LQISTATE = 0x0, LQOSTATE = 0x0, OPTIONMODE = 0x52
scsi0: OS_SPACE_CNT = 0x20 MAXCMDCNT = 0x0
scsi0: SAVED_SCSIID = 0x0 SAVED_LUN = 0x0
SIMODE0[0xc]
CCSCBCTL[0x0]
scsi0: REG0 == 0xffff, SINDEX = 0x1e0, DINDEX = 0xe1
scsi0: SCBPTR == 0x0, SCB_NEXT == 0xff00, SCB_NEXT2 == 0xff57
CDB 2a 0 0 80 9 e0
STACK: 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0
<<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>>
scsi0:0:4:0: Cmd aborted from QINFIFO
sd 0:0:4:0: scsi: Device offlined - not ready after error recovery
sd 0:0:4:0: scsi: Device offlined - not ready after error recovery
sd 0:0:4:0: SCSI error: return code = 0x00050000
end_request: I/O error, dev sda, sector 128
Buffer I/O error on device sda, logical block 128
Buffer I/O error on device sda, logical block 129
Buffer I/O error on device sda, logical block 130
Buffer I/O error on device sda, logical block 131
Buffer I/O error on device sda, logical block 132
Buffer I/O error on device sda, logical block 133
Buffer I/O error on device sda, logical block 134
Buffer I/O error on device sda, logical block 135
Buffer I/O error on device sda, logical block 136
Buffer I/O error on device sda, logical block 137
sd 0:0:4:0: SCSI error: return code = 0x00050000
end_request: I/O error, dev sda, sector 0
sd 0:0:4:0: rejecting I/O to offline device
sd 0:0:4:0: rejecting I/O to offline device
sd 0:0:4:0: rejecting I/O to offline device
sd 0:0:4:0: rejecting I/O to offline device
sd 0:0:4:0: rejecting I/O to offline device
sd 0:0:4:0: rejecting I/O to offline device
sd 0:0:4:0: rejecting I/O to offline device
sd 0:0:4:0: rejecting I/O to offline device
sd 0:0:4:0: rejecting I/O to offline device
sd 0:0:4:0: rejecting I/O to offline device
sd 0:0:4:0: rejecting I/O to offline device
sd 0:0:4:0: rejecting I/O to offline device
sd 0:0:4:0: rejecting I/O to offline device
sd 0:0:4:0: rejecting I/O to offline device
sd 0:0:4:0: rejecting I/O to offline device
sd 0:0:4:0: rejecting I/O to offline device
sd 0:0:4:0: rejecting I/O to offline device
Sean
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Adaptec 29320 [aic79xx] fails on power cycle of LUN
2006-10-19 12:25 ` Sean Bruno
@ 2006-10-19 14:10 ` Hannes Reinecke
2006-10-19 16:18 ` Sean Bruno
0 siblings, 1 reply; 11+ messages in thread
From: Hannes Reinecke @ 2006-10-19 14:10 UTC (permalink / raw)
To: Sean Bruno; +Cc: linux-scsi
[-- Attachment #1: Type: text/plain, Size: 1721 bytes --]
Sean Bruno wrote:
> On Thu, 2006-10-19 at 01:52 -0400, Mike Christie wrote:
>> On Wed, 2006-10-18 at 15:32 -0700, Sean Bruno wrote:
>>> On Wed, 2006-10-18 at 15:24 -0700, Sean Bruno wrote:
>>>> I have had a tough time tracking this one down, however I can say for
>>>> certain that the 29320 is really having trouble if a LUN is power
>>>> cycled.
>>>>
>>>> I don't have access to a BUS analyzer right now, but here is my
>>>> regression.
>>>>
>>>> 1. Hook an external SCSI array/disk to a 29320.
>>>> 2. Power up SCSI array/disk
>>>> 3. Power up PC with 29320.
>>>> 4. When PC has booted, login and test device by creating a file
>>>> system, eg. mkfs /dev/sda (or whatever disk the array is called on
>>>> ur machine).
>>>> 5. Power cycle array/disk
>>>> 6. Retest device with another 'mkfs /dev/sda' ... panic/crash/lock-up
>>>> ensues.
>>>>
>>>>
>>>>
>>>> This did not happen in 2.6.15.7 but did appear in 2.6.16 and higher.
>>>>
>
>> Does this only occur with sg or is that the only way you got a trace? In
>> the original bug report you mentioned it occurring with mkfs, but the
>> bug oops is from a sg request. Is tdg_2 run while the mkfs is running?
>
> Snippets from 'dmesg' during step 6:
>
> scsi0: Someone reset channel A
> sd 0:0:4:0: Attempting to queue an ABORT message:CDB: 0x28 0x0 0x0 0x0
> 0x0 0x80 0x0 0x0 0x80 0x0
> Infinite interrupt loop, INTSTAT = 8scsi0: At time of recovery, card was
> paused
Ah. Hmm. Infinite SCSI interrupt.
Maybe someone forgot to clear the status ...
Can you try the attached patch?
Cheers,
Hannes
--
Dr. Hannes Reinecke hare@suse.de
SuSE Linux Products GmbH S390 & zSeries
Maxfeldstraße 5 +49 911 74053 688
90409 Nürnberg http://www.suse.de
[-- Attachment #2: aic79xx-reset-scsiint --]
[-- Type: text/plain, Size: 1324 bytes --]
diff --git a/drivers/scsi/aic7xxx/aic79xx_core.c b/drivers/scsi/aic7xxx/aic79xx_core.c
index 653818d..78fa71d 100644
--- a/drivers/scsi/aic7xxx/aic79xx_core.c
+++ b/drivers/scsi/aic7xxx/aic79xx_core.c
@@ -1519,8 +1519,10 @@ ahd_handle_scsiint(struct ahd_softc *ahd
/*
* Ignore external resets after a bus reset.
*/
- if (((status & SCSIRSTI) != 0) && (ahd->flags & AHD_BUS_RESET_ACTIVE))
+ if (((status & SCSIRSTI) != 0) && (ahd->flags & AHD_BUS_RESET_ACTIVE)) {
+ ahd_outb(ahd, CLRSINT1, CLRSCSIRSTI);
return;
+ }
/*
* Clear bus reset flag
@@ -7920,6 +7922,11 @@ #endif
ahd_clear_fifo(ahd, 1);
/*
+ * Clear SCSI interrupt status
+ */
+ ahd_outb(ahd, CLRSINT1, CLRSCSIRSTI);
+
+ /*
* Reenable selections
*/
ahd_outb(ahd, SIMODE1, ahd_inb(ahd, SIMODE1) | ENSCSIRST);
@@ -7952,10 +7959,6 @@ #ifdef AHD_TARGET_MODE
}
}
#endif
- /* Notify the XPT that a bus reset occurred */
- ahd_send_async(ahd, devinfo.channel, CAM_TARGET_WILDCARD,
- CAM_LUN_WILDCARD, AC_BUS_RESET);
-
/*
* Revert to async/narrow transfers until we renegotiate.
*/
@@ -7977,6 +7980,10 @@ #endif
}
}
+ /* Notify the XPT that a bus reset occurred */
+ ahd_send_async(ahd, devinfo.channel, CAM_TARGET_WILDCARD,
+ CAM_LUN_WILDCARD, AC_BUS_RESET);
+
ahd_restart(ahd);
return (found);
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: Adaptec 29320 [aic79xx] fails on power cycle of LUN
2006-10-19 14:10 ` Hannes Reinecke
@ 2006-10-19 16:18 ` Sean Bruno
2006-10-20 7:01 ` Hannes Reinecke
0 siblings, 1 reply; 11+ messages in thread
From: Sean Bruno @ 2006-10-19 16:18 UTC (permalink / raw)
To: Hannes Reinecke; +Cc: linux-scsi
On Thu, 2006-10-19 at 16:10 +0200, Hannes Reinecke wrote:
> Sean Bruno wrote:
> > On Thu, 2006-10-19 at 01:52 -0400, Mike Christie wrote:
> >> On Wed, 2006-10-18 at 15:32 -0700, Sean Bruno wrote:
> >>> On Wed, 2006-10-18 at 15:24 -0700, Sean Bruno wrote:
> >>>> I have had a tough time tracking this one down, however I can say for
> >>>> certain that the 29320 is really having trouble if a LUN is power
> >>>> cycled.
> >>>>
> >>>> I don't have access to a BUS analyzer right now, but here is my
> >>>> regression.
> >>>>
> >>>> 1. Hook an external SCSI array/disk to a 29320.
> >>>> 2. Power up SCSI array/disk
> >>>> 3. Power up PC with 29320.
> >>>> 4. When PC has booted, login and test device by creating a file
> >>>> system, eg. mkfs /dev/sda (or whatever disk the array is called on
> >>>> ur machine).
> >>>> 5. Power cycle array/disk
> >>>> 6. Retest device with another 'mkfs /dev/sda' ... panic/crash/lock-up
> >>>> ensues.
> >>>>
> >>>>
> >>>>
> >>>> This did not happen in 2.6.15.7 but did appear in 2.6.16 and higher.
> >>>>
> >
> >> Does this only occur with sg or is that the only way you got a trace? In
> >> the original bug report you mentioned it occurring with mkfs, but the
> >> bug oops is from a sg request. Is tdg_2 run while the mkfs is running?
> >
> > Snippets from 'dmesg' during step 6:
> >
> > scsi0: Someone reset channel A
> > sd 0:0:4:0: Attempting to queue an ABORT message:CDB: 0x28 0x0 0x0 0x0
> > 0x0 0x80 0x0 0x0 0x80 0x0
> > Infinite interrupt loop, INTSTAT = 8scsi0: At time of recovery, card was
> > paused
> Ah. Hmm. Infinite SCSI interrupt.
>
> Maybe someone forgot to clear the status ...
>
> Can you try the attached patch?
>
> Cheers,
>
> Hannes
Better. The patch allows me to cycle power on the array exactly once.
So the new regression is:
1. Hook an external SCSI array/disk to a 29320.
2. Power up SCSI array/disk
3. Power up PC with 29320.
4. When PC has booted, login and test device by creating a file
system, eg. mkfs /dev/sda (or whatever disk the array is called on
ur machine).
5. Power cycle array/disk
6. Retest device with another 'mkfs /dev/sda' <-- works just fine!
7. Power cycle array/disk
8. No need to do anything, card dump in dmesg/messages appears and
device in not useable:
Oct 19 09:12:26 testsrv kernel: scsi0: Someone reset channel A
Oct 19 09:16:33 testsrv kernel: scsi0: Unexpected PKT busfree condition
Oct 19 09:16:33 testsrv kernel: >>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<<
Oct 19 09:16:33 testsrv kernel: scsi0: Dumping Card State at program address 0x20 Mode 0x33
Oct 19 09:16:33 testsrv kernel: Card was paused
Oct 19 09:16:33 testsrv kernel: INTSTAT[0x0] SELOID[0x4] SELID[0x40] HS_MAILBOX[0x0]
Oct 19 09:16:33 testsrv kernel: INTCTL[0x80] SEQINTSTAT[0x0] SAVED_MODE[0x11] DFFSTAT[0x33]
Oct 19 09:16:33 testsrv kernel: SCSISIGI[0x0] SCSIPHASE[0x0] SCSIBUS[0x0] LASTPHASE[0x1]
Oct 19 09:16:33 testsrv kernel: SCSISEQ0[0x0] SCSISEQ1[0x2] SEQCTL0[0x0] SEQINTCTL[0x0]
Oct 19 09:16:33 testsrv kernel: SEQ_FLAGS[0xc0] SEQ_FLAGS2[0x0] QFREEZE_COUNT[0x7]
Oct 19 09:16:33 testsrv kernel: KERNEL_QFREEZE_COUNT[0x7] MK_MESSAGE_SCB[0x2] MK_MESSAGE_SCSIID[0x47]
Oct 19 09:16:33 testsrv kernel: SSTAT0[0x0] SSTAT1[0x0] SSTAT2[0x0] SSTAT3[0x0]
Oct 19 09:16:33 testsrv kernel: PERRDIAG[0xc0] SIMODE1[0xa4] LQISTAT0[0x0] LQISTAT1[0x0]
Oct 19 09:16:33 testsrv kernel: LQISTAT2[0x0] LQOSTAT0[0x0] LQOSTAT1[0x0] LQOSTAT2[0x0]
Oct 19 09:16:33 testsrv kernel:
Oct 19 09:16:33 testsrv kernel: SCB Count = 4 CMDS_PENDING = 0 LASTSCB 0xffff CURRSCB 0x0 NEXTSCB 0x0
Oct 19 09:16:33 testsrv kernel: qinstart = 52908 qinfifonext = 52908
Oct 19 09:16:33 testsrv kernel: QINFIFO:
Oct 19 09:16:33 testsrv kernel: WAITING_TID_QUEUES:
Oct 19 09:16:33 testsrv kernel: Pending list:
Oct 19 09:16:33 testsrv kernel: Total 0
Oct 19 09:16:33 testsrv kernel: Kernel Free SCB list: 0 1 2 3
Oct 19 09:16:33 testsrv kernel: Sequencer Complete DMA-inprog list:
Oct 19 09:16:33 testsrv kernel: Sequencer Complete list:
Oct 19 09:16:33 testsrv kernel: Sequencer DMA-Up and Complete list:
Oct 19 09:16:33 testsrv kernel: Sequencer On QFreeze and Complete list:
Oct 19 09:16:33 testsrv kernel:
Oct 19 09:16:33 testsrv kernel:
Oct 19 09:16:33 testsrv kernel: scsi0: FIFO0 Free, LONGJMP == 0x80ff, SCB 0x0
Oct 19 09:16:33 testsrv kernel: SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x0] DFSTATUS[0x89]
Oct 19 09:16:33 testsrv kernel: SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0]
Oct 19 09:16:33 testsrv kernel: SOFFCNT[0x0] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0
Oct 19 09:16:33 testsrv kernel: HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10]
Oct 19 09:16:33 testsrv kernel:
Oct 19 09:16:33 testsrv kernel: scsi0: FIFO1 Free, LONGJMP == 0x81f1, SCB 0x0
Oct 19 09:16:33 testsrv kernel: SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x4] DFSTATUS[0x89]
Oct 19 09:16:33 testsrv kernel: SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0]
Oct 19 09:16:33 testsrv kernel: SOFFCNT[0x0] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0
Oct 19 09:16:33 testsrv kernel: HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10]
Oct 19 09:16:33 testsrv kernel: LQIN: 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0
Oct 19 09:16:33 testsrv kernel: scsi0: LQISTATE = 0x0, LQOSTATE = 0x0, OPTIONMODE = 0x52
Oct 19 09:16:33 testsrv kernel: scsi0: OS_SPACE_CNT = 0x20 MAXCMDCNT = 0x0
Oct 19 09:16:33 testsrv kernel: scsi0: SAVED_SCSIID = 0x0 SAVED_LUN = 0x0
Oct 19 09:16:33 testsrv kernel: SIMODE0[0xc]
Oct 19 09:16:33 testsrv kernel: CCSCBCTL[0x0]
Oct 19 09:16:33 testsrv kernel: scsi0: REG0 == 0xffff, SINDEX = 0x1e0, DINDEX = 0xe1
Oct 19 09:16:33 testsrv kernel: scsi0: SCBPTR == 0x0, SCB_NEXT == 0xffc0, SCB_NEXT2 == 0xff57
Oct 19 09:16:33 testsrv kernel: CDB 2a 0 0 80 9 d0
Oct 19 09:16:33 testsrv kernel: STACK: 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0
Oct 19 09:16:33 testsrv kernel: <<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>>
Sean
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Adaptec 29320 [aic79xx] fails on power cycle of LUN
2006-10-19 16:18 ` Sean Bruno
@ 2006-10-20 7:01 ` Hannes Reinecke
2006-10-21 20:48 ` Sean Bruno
0 siblings, 1 reply; 11+ messages in thread
From: Hannes Reinecke @ 2006-10-20 7:01 UTC (permalink / raw)
To: Sean Bruno; +Cc: linux-scsi
[-- Attachment #1: Type: text/plain, Size: 2649 bytes --]
Sean Bruno wrote:
> On Thu, 2006-10-19 at 16:10 +0200, Hannes Reinecke wrote:
>> Sean Bruno wrote:
>>> On Thu, 2006-10-19 at 01:52 -0400, Mike Christie wrote:
>>>> On Wed, 2006-10-18 at 15:32 -0700, Sean Bruno wrote:
>>>>> On Wed, 2006-10-18 at 15:24 -0700, Sean Bruno wrote:
>>>>>> I have had a tough time tracking this one down, however I can say for
>>>>>> certain that the 29320 is really having trouble if a LUN is power
>>>>>> cycled.
>>>>>>
>>>>>> I don't have access to a BUS analyzer right now, but here is my
>>>>>> regression.
>>>>>>
>>>>>> 1. Hook an external SCSI array/disk to a 29320.
>>>>>> 2. Power up SCSI array/disk
>>>>>> 3. Power up PC with 29320.
>>>>>> 4. When PC has booted, login and test device by creating a file
>>>>>> system, eg. mkfs /dev/sda (or whatever disk the array is called on
>>>>>> ur machine).
>>>>>> 5. Power cycle array/disk
>>>>>> 6. Retest device with another 'mkfs /dev/sda' ... panic/crash/lock-up
>>>>>> ensues.
>>>>>>
>>>>>>
>>>>>>
>>>>>> This did not happen in 2.6.15.7 but did appear in 2.6.16 and higher.
>>>>>>
>>>> Does this only occur with sg or is that the only way you got a trace? In
>>>> the original bug report you mentioned it occurring with mkfs, but the
>>>> bug oops is from a sg request. Is tdg_2 run while the mkfs is running?
>>> Snippets from 'dmesg' during step 6:
>>>
>>> scsi0: Someone reset channel A
>>> sd 0:0:4:0: Attempting to queue an ABORT message:CDB: 0x28 0x0 0x0 0x0
>>> 0x0 0x80 0x0 0x0 0x80 0x0
>>> Infinite interrupt loop, INTSTAT = 8scsi0: At time of recovery, card was
>>> paused
>> Ah. Hmm. Infinite SCSI interrupt.
>>
>> Maybe someone forgot to clear the status ...
>>
>> Can you try the attached patch?
>>
>> Cheers,
>>
>> Hannes
>
> Better. The patch allows me to cycle power on the array exactly once.
> So the new regression is:
>
> 1. Hook an external SCSI array/disk to a 29320.
> 2. Power up SCSI array/disk
> 3. Power up PC with 29320.
> 4. When PC has booted, login and test device by creating a file
> system, eg. mkfs /dev/sda (or whatever disk the array is called on
> ur machine).
> 5. Power cycle array/disk
> 6. Retest device with another 'mkfs /dev/sda' <-- works just fine!
> 7. Power cycle array/disk
> 8. No need to do anything, card dump in dmesg/messages appears and
> device in not useable:
>
Ok. Not bad. So we have to switch to non-pkt commands after a reset.
Make sense. Care to try the updated patch?
Thanks for all the testing!
Cheers,
Hannes
--
Dr. Hannes Reinecke hare@suse.de
SuSE Linux Products GmbH S390 & zSeries
Maxfeldstraße 5 +49 911 74053 688
90409 Nürnberg http://www.suse.de
[-- Attachment #2: aic79xx-external-device-reset --]
[-- Type: text/plain, Size: 3984 bytes --]
diff --git a/drivers/scsi/aic7xxx/aic79xx_core.c b/drivers/scsi/aic7xxx/aic79xx_core.c
index 653818d..555920a 100644
--- a/drivers/scsi/aic7xxx/aic79xx_core.c
+++ b/drivers/scsi/aic7xxx/aic79xx_core.c
@@ -1053,10 +1053,12 @@ #endif
* If a target takes us into the command phase
* assume that it has been externally reset and
* has thus lost our previous packetized negotiation
- * agreement.
- * Revert to async/narrow transfers until we
- * can renegotiate with the device and notify
- * the OSM about the reset.
+ * agreement. Since we have not sent an identify
+ * message and may not have fully qualified the
+ * connection, we change our command to TUR, assert
+ * ATN and ABORT the task when we go to message in
+ * phase. The OSM will see the REQUEUE_REQUEST
+ * status and retry the command.
*/
scbid = ahd_get_scbptr(ahd);
scb = ahd_lookup_scb(ahd, scbid);
@@ -1083,7 +1085,28 @@ #endif
ahd_set_syncrate(ahd, &devinfo, /*period*/0,
/*offset*/0, /*ppr_options*/0,
AHD_TRANS_ACTIVE, /*paused*/TRUE);
- scb->flags |= SCB_EXTERNAL_RESET;
+ /* Hand-craft TUR command */
+ ahd_outb(ahd, SCB_CDB_STORE, 0);
+ ahd_outb(ahd, SCB_CDB_STORE+1, 0);
+ ahd_outb(ahd, SCB_CDB_STORE+2, 0);
+ ahd_outb(ahd, SCB_CDB_STORE+3, 0);
+ ahd_outb(ahd, SCB_CDB_STORE+4, 0);
+ ahd_outb(ahd, SCB_CDB_STORE+5, 0);
+ ahd_outb(ahd, SCB_CDB_LEN, 6);
+ scb->hscb->control &= ~(TAG_ENB|SCB_TAG_TYPE);
+ scb->hscb->control |= MK_MESSAGE;
+ ahd_outb(ahd, SCB_CONTROL, scb->hscb->control);
+ ahd_outb(ahd, MSG_OUT, HOST_MSG);
+ ahd_outb(ahd, SAVED_SCSIID, scb->hscb->scsiid);
+ /*
+ * The lun is 0, regardless of the SCB's lun
+ * as we have not sent an identify message.
+ */
+ ahd_outb(ahd, SAVED_LUN, 0);
+ ahd_outb(ahd, SEQ_FLAGS, 0);
+ ahd_assert_atn(ahd);
+ scb->flags &= ~SCB_PACKETIZED;
+ scb->flags |= SCB_ABORT|SCB_EXTERNAL_RESET;
ahd_freeze_devq(ahd, scb);
ahd_set_transaction_status(scb, CAM_REQUEUE_REQ);
ahd_freeze_scb(scb);
@@ -1519,8 +1542,10 @@ ahd_handle_scsiint(struct ahd_softc *ahd
/*
* Ignore external resets after a bus reset.
*/
- if (((status & SCSIRSTI) != 0) && (ahd->flags & AHD_BUS_RESET_ACTIVE))
+ if (((status & SCSIRSTI) != 0) && (ahd->flags & AHD_BUS_RESET_ACTIVE)) {
+ ahd_outb(ahd, CLRSINT1, CLRSCSIRSTI);
return;
+ }
/*
* Clear bus reset flag
@@ -2200,6 +2225,22 @@ ahd_handle_nonpkt_busfree(struct ahd_sof
if (sent_msg == MSG_ABORT_TAG)
tag = SCB_GET_TAG(scb);
+ if ((scb->flags & SCB_EXTERNAL_RESET) != 0) {
+ /*
+ * This abort is in response to an
+ * unexpected switch to command phase
+ * for a packetized connection. Since
+ * the identify message was never sent,
+ * "saved lun" is 0. We really want to
+ * abort only the SCB that encountered
+ * this error, which could have a different
+ * lun. The SCB will be retried so the OS
+ * will see the UA after renegotiating to
+ * packetized.
+ */
+ tag = SCB_GET_TAG(scb);
+ saved_lun = scb->hscb->lun;
+ }
found = ahd_abort_scbs(ahd, target, 'A', saved_lun,
tag, ROLE_INITIATOR,
CAM_REQ_ABORTED);
@@ -7920,6 +7961,11 @@ #endif
ahd_clear_fifo(ahd, 1);
/*
+ * Clear SCSI interrupt status
+ */
+ ahd_outb(ahd, CLRSINT1, CLRSCSIRSTI);
+
+ /*
* Reenable selections
*/
ahd_outb(ahd, SIMODE1, ahd_inb(ahd, SIMODE1) | ENSCSIRST);
@@ -7952,10 +7998,6 @@ #ifdef AHD_TARGET_MODE
}
}
#endif
- /* Notify the XPT that a bus reset occurred */
- ahd_send_async(ahd, devinfo.channel, CAM_TARGET_WILDCARD,
- CAM_LUN_WILDCARD, AC_BUS_RESET);
-
/*
* Revert to async/narrow transfers until we renegotiate.
*/
@@ -7977,6 +8019,10 @@ #endif
}
}
+ /* Notify the XPT that a bus reset occurred */
+ ahd_send_async(ahd, devinfo.channel, CAM_TARGET_WILDCARD,
+ CAM_LUN_WILDCARD, AC_BUS_RESET);
+
ahd_restart(ahd);
return (found);
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: Adaptec 29320 [aic79xx] fails on power cycle of LUN
2006-10-20 7:01 ` Hannes Reinecke
@ 2006-10-21 20:48 ` Sean Bruno
2006-10-22 4:45 ` Sean Bruno
0 siblings, 1 reply; 11+ messages in thread
From: Sean Bruno @ 2006-10-21 20:48 UTC (permalink / raw)
To: Hannes Reinecke; +Cc: linux-scsi
> > Better. The patch allows me to cycle power on the array exactly once.
> > So the new regression is:
> >
> > 1. Hook an external SCSI array/disk to a 29320.
> > 2. Power up SCSI array/disk
> > 3. Power up PC with 29320.
> > 4. When PC has booted, login and test device by creating a file
> > system, eg. mkfs /dev/sda (or whatever disk the array is called on
> > ur machine).
> > 5. Power cycle array/disk
> > 6. Retest device with another 'mkfs /dev/sda' <-- works just fine!
> > 7. Power cycle array/disk
> > 8. No need to do anything, card dump in dmesg/messages appears and
> > device in not useable:
> >
> Ok. Not bad. So we have to switch to non-pkt commands after a reset.
> Make sense. Care to try the updated patch?
>
> Thanks for all the testing!
>
> Cheers,
Looking much better with this patch.
I spent some time to whack together a SCSI target on a second machine
with a QLA1040 that I have lying around and connected my test machine
with the 29320 to it. Essentially, I'm just too lazy to sit at my desk
an continually disconnect and reconnect the power on my array! :)
After a couple of minutes of whacking on it, I was able to script
something to automatically cycle through 5 iterations of 'mkfs' then
reboot the new 'scsi disk', sleep for 120 seconds and repeat.
Seems to work fine with the patch applied to 2.6.19-rc2. I'm running a
longer 100 iteration loop and I'll get back with results.
Sean
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Adaptec 29320 [aic79xx] fails on power cycle of LUN
2006-10-21 20:48 ` Sean Bruno
@ 2006-10-22 4:45 ` Sean Bruno
0 siblings, 0 replies; 11+ messages in thread
From: Sean Bruno @ 2006-10-22 4:45 UTC (permalink / raw)
To: Hannes Reinecke; +Cc: linux-scsi
> I spent some time to whack together a SCSI target on a second machine
> with a QLA1040 that I have lying around and connected my test machine
> with the 29320 to it. Essentially, I'm just too lazy to sit at my desk
> an continually disconnect and reconnect the power on my array! :)
>
> After a couple of minutes of whacking on it, I was able to script
> something to automatically cycle through 5 iterations of 'mkfs' then
> reboot the new 'scsi disk', sleep for 120 seconds and repeat.
>
> Seems to work fine with the patch applied to 2.6.19-rc2. I'm running a
> longer 100 iteration loop and I'll get back with results.
>
Nice. I ran a 100 loop test with zero errors.
Sean
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2006-10-22 4:45 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-10-18 22:24 Adaptec 29320 [aic79xx] fails on power cycle of LUN Sean Bruno
2006-10-18 22:27 ` James Bottomley
2006-10-18 22:32 ` Sean Bruno
2006-10-19 5:52 ` Mike Christie
2006-10-19 12:23 ` Sean Bruno
2006-10-19 12:25 ` Sean Bruno
2006-10-19 14:10 ` Hannes Reinecke
2006-10-19 16:18 ` Sean Bruno
2006-10-20 7:01 ` Hannes Reinecke
2006-10-21 20:48 ` Sean Bruno
2006-10-22 4:45 ` Sean Bruno
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox