Sean Bruno wrote:
> On Thu, 2006-10-19 at 16:10 +0200, Hannes Reinecke wrote:
>> Sean Bruno wrote:
>>> On Thu, 2006-10-19 at 01:52 -0400, Mike Christie wrote:
>>>> On Wed, 2006-10-18 at 15:32 -0700, Sean Bruno wrote:
>>>>> On Wed, 2006-10-18 at 15:24 -0700, Sean Bruno wrote:
>>>>>> I have had a tough time tracking this one down, however I can say for
>>>>>> certain that the 29320 is really having trouble if a LUN is power
>>>>>> cycled.
>>>>>>
>>>>>> I don't have access to a BUS analyzer right now, but here is my
>>>>>> regression.
>>>>>>
>>>>>> 1.  Hook an external SCSI array/disk to a 29320.
>>>>>> 2.  Power up SCSI array/disk
>>>>>> 3.  Power up PC with 29320.
>>>>>> 4.  When PC has booted, login and test device by creating a file
>>>>>>     system, eg. mkfs /dev/sda (or whatever disk the array is called on
>>>>>>     ur machine).
>>>>>> 5.  Power cycle array/disk
>>>>>> 6.  Retest device with another 'mkfs /dev/sda' ... panic/crash/lock-up
>>>>>> ensues.
>>>>>>
>>>>>>
>>>>>>
>>>>>> This did not happen in 2.6.15.7 but did appear in 2.6.16 and higher.
>>>>>>
>>>> Does this only occur with sg or is that the only way you got a trace? In
>>>> the original bug report you mentioned it occurring with mkfs, but the
>>>> bug oops is from a sg request. Is tdg_2 run while the mkfs is running?
>>> Snippets from 'dmesg' during step 6:
>>>
>>> scsi0: Someone reset channel A
>>> sd 0:0:4:0: Attempting to queue an ABORT message:CDB: 0x28 0x0 0x0 0x0
>>> 0x0 0x80 0x0 0x0 0x80 0x0
>>> Infinite interrupt loop, INTSTAT = 8scsi0: At time of recovery, card was
>>> paused
>> Ah. Hmm. Infinite SCSI interrupt.
>>
>> Maybe someone forgot to clear the status ...
>>
>> Can you try the attached patch?
>>
>> Cheers,
>>
>> Hannes
> 
> Better.  The patch allows me to cycle power on the array exactly once.
> So the new regression is:
> 
> 1.  Hook an external SCSI array/disk to a 29320.
> 2.  Power up SCSI array/disk
> 3.  Power up PC with 29320.
> 4.  When PC has booted, login and test device by creating a file
>     system, eg. mkfs /dev/sda (or whatever disk the array is called on
>     ur machine).
> 5.  Power cycle array/disk
> 6.  Retest device with another 'mkfs /dev/sda'  <-- works just fine!
> 7.  Power cycle array/disk
> 8.  No need to do anything, card dump in dmesg/messages appears and
> device in not useable:
> 
Ok. Not bad. So we have to switch to non-pkt commands after a reset.
Make sense. Care to try the updated patch?

Thanks for all the testing!

Cheers,

Hannes
-- 
Dr. Hannes Reinecke			hare@suse.de
SuSE Linux Products GmbH		S390 & zSeries
Maxfeldstraße 5				+49 911 74053 688
90409 Nürnberg				http://www.suse.de