* libata new EH issue - ATAPI device error not properly reported
@ 2006-06-28 8:27 Albert Lee
2006-06-28 10:27 ` Tejun Heo
0 siblings, 1 reply; 3+ messages in thread
From: Albert Lee @ 2006-06-28 8:27 UTC (permalink / raw)
To: Tejun Heo; +Cc: Jeff Garzik, Linux IDE, Unicorn Chang, Alan Cox, Doug Maxey
Hi Tejun,
Unicorn is doing some test with the current upstream branch and something
looks strange in the log.
With the new EH, when the ATAPI device reports dev_status 0x51,
the err_mask is reported as 0x0. This does not look right.
Maybe we should report AC_ERR_DEV error back to the upper layers?
(Test log attached for your review.)
--
albert
============================================================================================
(Test log with new EH.
Error transaction with 0x51 dev_status)
Jun 27 18:10:57 xlinux19 kernel: ata_scsi_dump_cdb: CDB (3:0,0,0) 1e 00 00 00 01 00 00 00 00
Jun 27 18:10:57 xlinux19 kernel: ata_scsi_translate: ENTER
Jun 27 18:10:57 xlinux19 kernel: ata3: ata_dev_select: ENTER, ata3: device 0, wait 1
Jun 27 18:10:57 xlinux19 kernel: ata_tf_load_pio: feat 0x0 nsect 0x0 lba 0x0 0x0 0x20
Jun 27 18:10:57 xlinux19 kernel: ata_tf_load_pio: device 0xA0
Jun 27 18:10:57 xlinux19 kernel: ata_exec_command_pio: ata3: cmd 0xA0
Jun 27 18:10:57 xlinux19 kernel: ata_scsi_translate: EXIT
Jun 27 18:10:57 xlinux19 kernel: ata_host_intr: ata3: protocol 6 task_state 5
Jun 27 18:10:57 xlinux19 kernel: ata_hsm_move: ata3: protocol 6 task_state 5 (dev_stat 0x58)
Jun 27 18:10:57 xlinux19 kernel: atapi_send_cdb: send cdb
Jun 27 18:10:57 xlinux19 kernel: ata_host_intr: ata3: protocol 6 task_state 3
Jun 27 18:10:57 xlinux19 kernel: ata_hsm_move: ata3: protocol 6 task_state 3 (dev_stat 0x51) <== device cannot do it
Jun 27 18:10:57 xlinux19 kernel: ata_hsm_move: ata3: protocol 6 task_state 4 (dev_stat 0x51)
Jun 27 18:10:57 xlinux19 kernel: ata_scsi_timed_out: ENTER
Jun 27 18:10:57 xlinux19 kernel: ata_scsi_timed_out: EXIT, ret=0
Jun 27 18:10:57 xlinux19 kernel: ata_scsi_error: ENTER
Jun 27 18:10:57 xlinux19 kernel: ata_port_flush_task: ENTER
Jun 27 18:10:57 xlinux19 kernel: ata_port_flush_task: flush #1
Jun 27 18:10:57 xlinux19 kernel: ata_eh_autopsy: ENTER
Jun 27 18:10:57 xlinux19 kernel: atapi_eh_request_sense: ATAPI request sense
Jun 27 18:10:57 xlinux19 kernel: ata3: ata_dev_select: ENTER, ata3: device 0, wait 1
Jun 27 18:10:57 xlinux19 kernel: ata_tf_load_pio: feat 0x64 nsect 0x3 lba 0x0 0x0 0x20
Jun 27 18:10:57 xlinux19 kernel: ata_tf_load_pio: device 0xA0
Jun 27 18:10:57 xlinux19 kernel: ata_exec_command_pio: ata3: cmd 0xA0
Jun 27 18:10:57 xlinux19 kernel: ata_host_intr: ata3: protocol 5 task_state 5
Jun 27 18:10:57 xlinux19 kernel: ata_hsm_move: ata3: protocol 5 task_state 5 (dev_stat 0x58)
Jun 27 18:10:57 xlinux19 kernel: atapi_send_cdb: send cdb
Jun 27 18:10:57 xlinux19 kernel: ata_host_intr: ata3: protocol 5 task_state 2
Jun 27 18:10:57 xlinux19 kernel: ata_hsm_move: ata3: protocol 5 task_state 2 (dev_stat 0x58)
Jun 27 18:10:57 xlinux19 kernel: atapi_pio_bytes: ata3: xfering 18 bytes
Jun 27 18:10:57 xlinux19 kernel: __atapi_pio_bytes: data read
Jun 27 18:10:57 xlinux19 kernel: ata_host_intr: ata3: protocol 5 task_state 2
Jun 27 18:10:57 xlinux19 kernel: ata_hsm_move: ata3: protocol 5 task_state 2 (dev_stat 0x50)
Jun 27 18:10:57 xlinux19 kernel: ata_hsm_move: ata3: protocol 5 task_state 3 (dev_stat 0x50)
Jun 27 18:10:57 xlinux19 kernel: ata_hsm_move: ata3: dev 0 command complete, drv_stat 0x50
Jun 27 18:10:57 xlinux19 kernel: ata_port_flush_task: ENTER
Jun 27 18:10:57 xlinux19 kernel: ata_port_flush_task: flush #1
Jun 27 18:10:57 xlinux19 kernel: ata_eh_autopsy: EXIT
Jun 27 18:10:57 xlinux19 kernel: ata_eh_recover: ENTER
Jun 27 18:10:57 xlinux19 kernel: ata_eh_revalidate_and_attach: ENTER
Jun 27 18:10:57 xlinux19 kernel: ata_eh_revalidate_and_attach: EXIT
Jun 27 18:10:57 xlinux19 kernel: ata_eh_recover: EXIT, rc=0
Jun 27 18:10:57 xlinux19 kernel: atapi_qc_complete: ENTER, err_mask 0x0 <== Doesn't look right. Should report error.
Jun 27 18:10:57 xlinux19 kernel: ata_scsi_error: EXIT
===============================================================================================
(Test log with new EH.
Normal transaction.)
Jun 27 18:11:02 xlinux19 kernel: ata_scsi_dump_cdb: CDB (3:0,0,0) 1e 00 00 00 01 00 00 00 00
Jun 27 18:11:02 xlinux19 kernel: ata_scsi_translate: ENTER
Jun 27 18:11:02 xlinux19 kernel: ata3: ata_dev_select: ENTER, ata3: device 0, wait 1
Jun 27 18:11:02 xlinux19 kernel: ata_tf_load_pio: feat 0x0 nsect 0x0 lba 0x0 0x0 0x20
Jun 27 18:11:02 xlinux19 kernel: ata_tf_load_pio: device 0xA0
Jun 27 18:11:02 xlinux19 kernel: ata_exec_command_pio: ata3: cmd 0xA0
Jun 27 18:11:02 xlinux19 kernel: ata_scsi_translate: EXIT
Jun 27 18:11:02 xlinux19 kernel: ata_host_intr: ata3: protocol 6 task_state 5
Jun 27 18:11:02 xlinux19 kernel: ata_hsm_move: ata3: protocol 6 task_state 5 (dev_stat 0x58)
Jun 27 18:11:02 xlinux19 kernel: atapi_send_cdb: send cdb
Jun 27 18:11:02 xlinux19 kernel: ata_host_intr: ata3: protocol 6 task_state 3
Jun 27 18:11:02 xlinux19 kernel: ata_hsm_move: ata3: protocol 6 task_state 3 (dev_stat 0x50)
Jun 27 18:11:02 xlinux19 kernel: ata_hsm_move: ata3: dev 0 command complete, drv_stat 0x50
Jun 27 18:11:02 xlinux19 kernel: atapi_qc_complete: ENTER, err_mask 0x0 <== Everything is fine.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: libata new EH issue - ATAPI device error not properly reported
2006-06-28 8:27 libata new EH issue - ATAPI device error not properly reported Albert Lee
@ 2006-06-28 10:27 ` Tejun Heo
2006-06-28 11:03 ` Albert Lee
0 siblings, 1 reply; 3+ messages in thread
From: Tejun Heo @ 2006-06-28 10:27 UTC (permalink / raw)
To: albertl; +Cc: Jeff Garzik, Linux IDE, Unicorn Chang, Alan Cox, Doug Maxey
Hello, Albert.
Albert Lee wrote:
> Hi Tejun,
>
> Unicorn is doing some test with the current upstream branch and something
> looks strange in the log.
>
> With the new EH, when the ATAPI device reports dev_status 0x51,
> the err_mask is reported as 0x0. This does not look right.
> Maybe we should report AC_ERR_DEV error back to the upper layers?
> (Test log attached for your review.)
>
[--snip--]
> Jun 27 18:10:57 xlinux19 kernel: atapi_qc_complete: ENTER, err_mask 0x0 <== Doesn't look right. Should report error.
The following code block follows the above VPRINTK.
/* handle completion from new EH */
if (unlikely(qc->ap->ops->error_handler &&
(err_mask || qc->flags & ATA_QCFLAG_SENSE_VALID))){
if (!(qc->flags & ATA_QCFLAG_SENSE_VALID)) {
/* FIXME: not quite right; we don't want the
* translation of taskfile registers into a
* sense descriptors, since that's only
* correct for ATA, not ATAPI
*/
ata_gen_ata_desc_sense(qc);
}
qc->scsicmd->result = SAM_STAT_CHECK_CONDITION;
qc->scsidone(cmd);
ata_qc_free(qc);
return;
}
As EH set ATA_QCFLAG_SENSE_VALID after reading sense data the above if
block is executed thus reporting upper layer CHECK_CONDITION. Overall,
the control flow on ATAPI CC is like the following.
1. ATAPI CC occurs
2. AC_ERR_DEV set and EH invoked
3. EH requests sense. sense is stored in the sense buffer and
AC_ERR_DEV is cleared.
4. EH completes the qc and triggers above code block in atapi_qc_complete().
The logic behind clearing AC_ERR_DEV after reading sense data is that
ATAPI CC doesn't always indicate device error. It can indicate
anything. By clearing AC_ERR_DEV after sense data is read, libata EH
considers the error condition is cleared and doesn't perform further
action on the device. If the sense data indicates actual error, upper
layers will probably deal with it (sr driver or userland application).
It might be useful to interpret some of sense data and handle things
like transmission error in libata EH, but I don't know. ATAPI errors
always have been handled by upper layers.
--
tejun
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: libata new EH issue - ATAPI device error not properly reported
2006-06-28 10:27 ` Tejun Heo
@ 2006-06-28 11:03 ` Albert Lee
0 siblings, 0 replies; 3+ messages in thread
From: Albert Lee @ 2006-06-28 11:03 UTC (permalink / raw)
To: Tejun Heo
Cc: albertl, Jeff Garzik, Linux IDE, Unicorn Chang, Alan Cox,
Doug Maxey
Tejun Heo wrote:
> Hello, Albert.
>
> Albert Lee wrote:
>
>> Hi Tejun,
>>
>> Unicorn is doing some test with the current upstream branch and something
>> looks strange in the log.
>>
>> With the new EH, when the ATAPI device reports dev_status 0x51,
>> the err_mask is reported as 0x0. This does not look right.
>> Maybe we should report AC_ERR_DEV error back to the upper layers?
>> (Test log attached for your review.)
>>
> [--snip--]
>
>> Jun 27 18:10:57 xlinux19 kernel: atapi_qc_complete: ENTER, err_mask
>> 0x0 <== Doesn't look right. Should report error.
>
>
> The following code block follows the above VPRINTK.
>
> /* handle completion from new EH */
> if (unlikely(qc->ap->ops->error_handler &&
> (err_mask || qc->flags & ATA_QCFLAG_SENSE_VALID))){
>
> if (!(qc->flags & ATA_QCFLAG_SENSE_VALID)) {
> /* FIXME: not quite right; we don't want the
> * translation of taskfile registers into a
> * sense descriptors, since that's only
> * correct for ATA, not ATAPI
> */
> ata_gen_ata_desc_sense(qc);
> }
>
> qc->scsicmd->result = SAM_STAT_CHECK_CONDITION;
> qc->scsidone(cmd);
> ata_qc_free(qc);
> return;
> }
>
> As EH set ATA_QCFLAG_SENSE_VALID after reading sense data the above if
> block is executed thus reporting upper layer CHECK_CONDITION. Overall,
> the control flow on ATAPI CC is like the following.
>
> 1. ATAPI CC occurs
>
> 2. AC_ERR_DEV set and EH invoked
>
> 3. EH requests sense. sense is stored in the sense buffer and
> AC_ERR_DEV is cleared.
>
> 4. EH completes the qc and triggers above code block in
> atapi_qc_complete().
>
> The logic behind clearing AC_ERR_DEV after reading sense data is that
> ATAPI CC doesn't always indicate device error. It can indicate
> anything. By clearing AC_ERR_DEV after sense data is read, libata EH
> considers the error condition is cleared and doesn't perform further
> action on the device. If the sense data indicates actual error, upper
> layers will probably deal with it (sr driver or userland application).
>
> It might be useful to interpret some of sense data and handle things
> like transmission error in libata EH, but I don't know. ATAPI errors
> always have been handled by upper layers.
>
Thanks for the detailed explanation. We are checking a strange problem
related to the GoVault drive: After the GoVault drive is ejected, a
flood of SCSI commands 0x1e are issued to libata upstream kernel.
(This is not reproducible with 2.6.17 kernel.) The GoVault drive
timeout at the eject command before the flood of command, so that
timeout might be the cause, not the ATAPI err_mask. Will check more
about the real cause.
--
albert
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2006-06-28 11:03 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-28 8:27 libata new EH issue - ATAPI device error not properly reported Albert Lee
2006-06-28 10:27 ` Tejun Heo
2006-06-28 11:03 ` Albert Lee
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).