Sean Bruno wrote: > On Thu, 2006-10-19 at 16:10 +0200, Hannes Reinecke wrote: >> Sean Bruno wrote: >>> On Thu, 2006-10-19 at 01:52 -0400, Mike Christie wrote: >>>> On Wed, 2006-10-18 at 15:32 -0700, Sean Bruno wrote: >>>>> On Wed, 2006-10-18 at 15:24 -0700, Sean Bruno wrote: >>>>>> I have had a tough time tracking this one down, however I can say for >>>>>> certain that the 29320 is really having trouble if a LUN is power >>>>>> cycled. >>>>>> >>>>>> I don't have access to a BUS analyzer right now, but here is my >>>>>> regression. >>>>>> >>>>>> 1. Hook an external SCSI array/disk to a 29320. >>>>>> 2. Power up SCSI array/disk >>>>>> 3. Power up PC with 29320. >>>>>> 4. When PC has booted, login and test device by creating a file >>>>>> system, eg. mkfs /dev/sda (or whatever disk the array is called on >>>>>> ur machine). >>>>>> 5. Power cycle array/disk >>>>>> 6. Retest device with another 'mkfs /dev/sda' ... panic/crash/lock-up >>>>>> ensues. >>>>>> >>>>>> >>>>>> >>>>>> This did not happen in 2.6.15.7 but did appear in 2.6.16 and higher. >>>>>> >>>> Does this only occur with sg or is that the only way you got a trace? In >>>> the original bug report you mentioned it occurring with mkfs, but the >>>> bug oops is from a sg request. Is tdg_2 run while the mkfs is running? >>> Snippets from 'dmesg' during step 6: >>> >>> scsi0: Someone reset channel A >>> sd 0:0:4:0: Attempting to queue an ABORT message:CDB: 0x28 0x0 0x0 0x0 >>> 0x0 0x80 0x0 0x0 0x80 0x0 >>> Infinite interrupt loop, INTSTAT = 8scsi0: At time of recovery, card was >>> paused >> Ah. Hmm. Infinite SCSI interrupt. >> >> Maybe someone forgot to clear the status ... >> >> Can you try the attached patch? >> >> Cheers, >> >> Hannes > > Better. The patch allows me to cycle power on the array exactly once. > So the new regression is: > > 1. Hook an external SCSI array/disk to a 29320. > 2. Power up SCSI array/disk > 3. Power up PC with 29320. > 4. When PC has booted, login and test device by creating a file > system, eg. mkfs /dev/sda (or whatever disk the array is called on > ur machine). > 5. Power cycle array/disk > 6. Retest device with another 'mkfs /dev/sda' <-- works just fine! > 7. Power cycle array/disk > 8. No need to do anything, card dump in dmesg/messages appears and > device in not useable: > Ok. Not bad. So we have to switch to non-pkt commands after a reset. Make sense. Care to try the updated patch? Thanks for all the testing! Cheers, Hannes -- Dr. Hannes Reinecke hare@suse.de SuSE Linux Products GmbH S390 & zSeries Maxfeldstraße 5 +49 911 74053 688 90409 Nürnberg http://www.suse.de