Randy.Dunlap wrote:
> On Tue, 06 May 2003 18:38:39 +1000 Douglas Gilbert <dougg@torque.net> wrote:
> 
> | > New kernel log file is now at
> | >   http://www.xenotime.net/linux/capture2.txt
> | > 
> | > Are there any docs on SCSI logging?
> | 
> | Randy,
> | The failure looks very similar. This time it timed
> | out and corrupted on a Mode Sense (10) for page 5
> | (alloc length=2) while last time its failed on a
> | Mode Sense (10) for page 0x2a (alloc length=2).
> | The Mode Sense is not translated (i.e. the app sent a
> | 10 byte Mode Sense). Perhaps there is a weakness
> | when the allocation length is that short.
> | 
> | This time 141 commands where sent (last time 91)
> | with lots of Test Unit Readys. Without timestamps
> | it is hard to say whether a "magicdev" type program
> | is at work or libscg (cdrecord transport layer) is
> | solely responsible for that sequence of SCSI commands.
> | [cdrecord may be accessing the cdwriter via the cdrom
> | driver and the sg driver.]
> 
> Well, I saw no scsi logging messages except when I ran
> cdrecord.  Does that help any?

Randy,
Now I have a generic atapi cdwriter. With it cdrecord (version
in RH9) burnt several small dummy iso images. Some with pio
(the default) and others with DMA (after 'hdparm -d 1 /dev/hdb').
Everything worked, here is a link to a tarball of the
cdrecord output and the log from "ide-scsi debug=2":
     http://www.torque.net/sg/p/ide_scsi_capture.tgz

The initial sequence from cdrecord looks close to
your previous two captures. So maybe cdrecord was
producing those sequences.

> | Here is the point of failure from Randy's log:
> | 
> | ide-scsi: hdd: que 141, cmd = [ 5a 0 5 0 0 0 0 0 2 0 ]
> | hdd: lost interrupt
> | ide-scsi: Reached idescsi_pc_intr interrupt handler
> | ide-scsi: hdd: DMA complete
> | ide-scsi: CoD != 0 in idescsi_pc_intr
> | hdd: DMA disabled
> | Error handler scsi_eh_2 waking up
> | scsi_eh_<pr4>t_hfdadi: lA_TstAaPIts r: e2se:t0: 0co:m0 plcmetdse
> | failiedde:- sc0s,i :c anRecaeclh: ed1
> | idTeosctsali_ pofc_ 1in ctro mminantdersr oupnt 1  hdanedvliceres
> |   rePaqcuikrete  ceho mmwoanrdk
> | ....
> | 
> | Ouch, after the error handler starts the dump looks like it
> | is in Klingon :-) Lots of data but no information. It could
> | be a result of the corruption or another problem.
> 
> I think (but I'm not sure) that this is because it's a dual-proc
> system and both procs are garbling the message log.  Later on during
> a panic(), all procs except the current one are disabled, but for
> early messages, all procs can lead to this.  I've seen this on 2
> other occasions (non-scsi).

Do your failures only occur in an SMP environment?


Attached is a patch for ide-scsi.c against lk 2.5.69
   - fixes multiple answering problem I reported
     yesterday (and should quieten ide-scsi when
     CONFIG_SCSI_MULTI_LUN is set)
   - sets the DMA direction for all data transfers in
     idescsi_set_direction() [not just read + write]
   - more debug message cleanups
   - changes reported earlier in this thread

So bugs are being fixed and the driver cleaned up but
still no insight into Randy's problem.


Now for the strange observation of the day:
# modprobe ide-scsi
# lsscsi
[0:0:1:0]    disk    FUJITSU  MAM3184MP        0106  /dev/sda
[2:0:0:0]    cd      ATAPI    CD-RW 48X16      A.RZ  /dev/sr0
[3:0:0:0]    cd      CREATIVE CD5233E          1.00  /dev/sr1
# rmmod ide-scsi
# modprobe ide-scsi
# lsscsi
[0:0:1:0]    disk    FUJITSU  MAM3184MP        0106  /dev/sda
[2:0:0:0]    cd      CREATIVE CD5233E          1.00  /dev/sr0
[3:0:0:0]    cd      ATAPI    CD-RW 48X16      A.RZ  /dev/sr1
# rmmod ide-scsi
# modprobe ide-scsi
# lsscsi
[0:0:1:0]    disk    FUJITSU  MAM3184MP        0106  /dev/sda
[2:0:0:0]    cd      ATAPI    CD-RW 48X16      A.RZ  /dev/sr0
[3:0:0:0]    cd      CREATIVE CD5233E          1.00  /dev/sr1

and the pattern continues with the two atapi devices changing
positions. So much for device naming stability.

Doug Gilbert