Randy.Dunlap wrote: > On Tue, 06 May 2003 18:38:39 +1000 Douglas Gilbert wrote: > > | > New kernel log file is now at > | > http://www.xenotime.net/linux/capture2.txt > | > > | > Are there any docs on SCSI logging? > | > | Randy, > | The failure looks very similar. This time it timed > | out and corrupted on a Mode Sense (10) for page 5 > | (alloc length=2) while last time its failed on a > | Mode Sense (10) for page 0x2a (alloc length=2). > | The Mode Sense is not translated (i.e. the app sent a > | 10 byte Mode Sense). Perhaps there is a weakness > | when the allocation length is that short. > | > | This time 141 commands where sent (last time 91) > | with lots of Test Unit Readys. Without timestamps > | it is hard to say whether a "magicdev" type program > | is at work or libscg (cdrecord transport layer) is > | solely responsible for that sequence of SCSI commands. > | [cdrecord may be accessing the cdwriter via the cdrom > | driver and the sg driver.] > > Well, I saw no scsi logging messages except when I ran > cdrecord. Does that help any? Randy, Now I have a generic atapi cdwriter. With it cdrecord (version in RH9) burnt several small dummy iso images. Some with pio (the default) and others with DMA (after 'hdparm -d 1 /dev/hdb'). Everything worked, here is a link to a tarball of the cdrecord output and the log from "ide-scsi debug=2": http://www.torque.net/sg/p/ide_scsi_capture.tgz The initial sequence from cdrecord looks close to your previous two captures. So maybe cdrecord was producing those sequences. > | Here is the point of failure from Randy's log: > | > | ide-scsi: hdd: que 141, cmd = [ 5a 0 5 0 0 0 0 0 2 0 ] > | hdd: lost interrupt > | ide-scsi: Reached idescsi_pc_intr interrupt handler > | ide-scsi: hdd: DMA complete > | ide-scsi: CoD != 0 in idescsi_pc_intr > | hdd: DMA disabled > | Error handler scsi_eh_2 waking up > | scsi_eh_t_hfdadi: lA_TstAaPIts r: e2se:t0: 0co:m0 plcmetdse > | failiedde:- sc0s,i :c anRecaeclh: ed1 > | idTeosctsali_ pofc_ 1in ctro mminantdersr oupnt 1 hdanedvliceres > | rePaqcuikrete ceho mmwoanrdk > | .... > | > | Ouch, after the error handler starts the dump looks like it > | is in Klingon :-) Lots of data but no information. It could > | be a result of the corruption or another problem. > > I think (but I'm not sure) that this is because it's a dual-proc > system and both procs are garbling the message log. Later on during > a panic(), all procs except the current one are disabled, but for > early messages, all procs can lead to this. I've seen this on 2 > other occasions (non-scsi). Do your failures only occur in an SMP environment? Attached is a patch for ide-scsi.c against lk 2.5.69 - fixes multiple answering problem I reported yesterday (and should quieten ide-scsi when CONFIG_SCSI_MULTI_LUN is set) - sets the DMA direction for all data transfers in idescsi_set_direction() [not just read + write] - more debug message cleanups - changes reported earlier in this thread So bugs are being fixed and the driver cleaned up but still no insight into Randy's problem. Now for the strange observation of the day: # modprobe ide-scsi # lsscsi [0:0:1:0] disk FUJITSU MAM3184MP 0106 /dev/sda [2:0:0:0] cd ATAPI CD-RW 48X16 A.RZ /dev/sr0 [3:0:0:0] cd CREATIVE CD5233E 1.00 /dev/sr1 # rmmod ide-scsi # modprobe ide-scsi # lsscsi [0:0:1:0] disk FUJITSU MAM3184MP 0106 /dev/sda [2:0:0:0] cd CREATIVE CD5233E 1.00 /dev/sr0 [3:0:0:0] cd ATAPI CD-RW 48X16 A.RZ /dev/sr1 # rmmod ide-scsi # modprobe ide-scsi # lsscsi [0:0:1:0] disk FUJITSU MAM3184MP 0106 /dev/sda [2:0:0:0] cd ATAPI CD-RW 48X16 A.RZ /dev/sr0 [3:0:0:0] cd CREATIVE CD5233E 1.00 /dev/sr1 and the pattern continues with the two atapi devices changing positions. So much for device naming stability. Doug Gilbert