* More libata EH data points
@ 2005-08-20 6:51 Jeff Garzik
2005-08-20 17:11 ` Tejun Heo
0 siblings, 1 reply; 2+ messages in thread
From: Jeff Garzik @ 2005-08-20 6:51 UTC (permalink / raw)
To: linux-ide@vger.kernel.org; +Cc: Mark Lord, Tejun Heo
Check out this lkml thread:
http://marc.theaimsgroup.com/?t=111709353400001&r=1&w=2
It may help to turn on CONFIG_DEBUG_SLAB...
Jeff
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: More libata EH data points
2005-08-20 6:51 More libata EH data points Jeff Garzik
@ 2005-08-20 17:11 ` Tejun Heo
0 siblings, 0 replies; 2+ messages in thread
From: Tejun Heo @ 2005-08-20 17:11 UTC (permalink / raw)
To: Jeff Garzik; +Cc: linux-ide@vger.kernel.org, Mark Lord
Jeff Garzik wrote:
> Check out this lkml thread:
> http://marc.theaimsgroup.com/?t=111709353400001&r=1&w=2
>
> It may help to turn on CONFIG_DEBUG_SLAB...
>
Hello, Jeff. Hello, Mark.
The thread is reporting the following two problems.
1. w/o patch, scmd is accessed after deallocated.
2. w/ patch, scmd is freed twice.
I'm aware of both problems and have just verified both with test
cases. The first problem is caused by not clearing eh_cmdq in
ata_scsi_error. This leaves eh_cmd_q pointing at freed scmd after eh is
complete.
After the first eh is complete, eh_cmd_q is pointing to freed scmd(s).
When the next error occurs, scsi_softirq() calls scsi_eh_scmd_add(),
which adds the command to shost->eh_cmd_q. As eh_cmd_q contains
dangling pointers, this corrupts freed memory causing #1 (and infinite
loop depending on circumstances).
The patch modifies ATAPI eh path to use ata_qc_timeout_done instead of
scsi_finish_cmd and scmd's are finished by scsi_invoke_strategy_handler
using scsi_eh_flush_done_q. As ATA timeout path isn't modified, when an
ATA timeout occurs, a scmd is finished first in ata_qc_timeout and again
in scsi_invoke_strategy_handler causing double free.
With INIT_LIST_HEAD(&host->eh_cmd_q) one liner patch, both problems
don't occur. As for Mark's lockup problem, I think that it's probably a
different issue. One of probable lockup scenario is...
1. ATAPI command gets issued for probing. (PROT_ATAPI_NODATA)
2. atapi_packet_task sends cdb
3. interrupt occurs and command is failed
4. EH entered, sense requested (PROT_ATAPI_PIO, ATA_NIEN set)
5. Machine enters sleep state
6. Machine wakes up
7. Power event raises an interrupt.
8. *BOOM* We're in a screaming interrupt lockup.
It might be that Mark was experiencing both freed memory corruption
problem and above scenario. Although I don't see how the broken fix
patch would prevent some lockup which would occur with the one liner,
as, AFAIK, the one liner does everything that the broken patch tries to
do. The only difference is that the one liner doesn't clear
scmd->eh_entry, but this shouldn't cause any trouble.
When and if time permits, it would be very helpful for Mark to gather
some information about the lockups. We currently don't even know if
we're looking at the same problem. Especially as the current libata
implementation is not ready for suspending/resuming and doesn't
synchronize with polling tasks.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2005-08-20 17:12 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-08-20 6:51 More libata EH data points Jeff Garzik
2005-08-20 17:11 ` Tejun Heo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).