linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* More libata EH data points
@ 2005-08-20  6:51 Jeff Garzik
  2005-08-20 17:11 ` Tejun Heo
  0 siblings, 1 reply; 2+ messages in thread
From: Jeff Garzik @ 2005-08-20  6:51 UTC (permalink / raw)
  To: linux-ide@vger.kernel.org; +Cc: Mark Lord, Tejun Heo

Check out this lkml thread: 
http://marc.theaimsgroup.com/?t=111709353400001&r=1&w=2

It may help to turn on CONFIG_DEBUG_SLAB...

	Jeff



^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: More libata EH data points
  2005-08-20  6:51 More libata EH data points Jeff Garzik
@ 2005-08-20 17:11 ` Tejun Heo
  0 siblings, 0 replies; 2+ messages in thread
From: Tejun Heo @ 2005-08-20 17:11 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: linux-ide@vger.kernel.org, Mark Lord

Jeff Garzik wrote:
> Check out this lkml thread: 
> http://marc.theaimsgroup.com/?t=111709353400001&r=1&w=2
> 
> It may help to turn on CONFIG_DEBUG_SLAB...
> 

  Hello, Jeff.  Hello, Mark.

  The thread is reporting the following two problems.

  1. w/o patch, scmd is accessed after deallocated.
  2. w/ patch, scmd is freed twice.

  I'm aware of both problems and have just verified both with test 
cases.  The first problem is caused by not clearing eh_cmdq in 
ata_scsi_error.  This leaves eh_cmd_q pointing at freed scmd after eh is 
complete.

  After the first eh is complete, eh_cmd_q is pointing to freed scmd(s). 
  When the next error occurs, scsi_softirq() calls scsi_eh_scmd_add(), 
which adds the command to shost->eh_cmd_q.  As eh_cmd_q contains 
dangling pointers, this corrupts freed memory causing #1 (and infinite 
loop depending on circumstances).

  The patch modifies ATAPI eh path to use ata_qc_timeout_done instead of 
scsi_finish_cmd and scmd's are finished by scsi_invoke_strategy_handler 
using scsi_eh_flush_done_q.  As ATA timeout path isn't modified, when an 
ATA timeout occurs, a scmd is finished first in ata_qc_timeout and again 
in scsi_invoke_strategy_handler causing double free.

  With INIT_LIST_HEAD(&host->eh_cmd_q) one liner patch, both problems 
don't occur.  As for Mark's lockup problem, I think that it's probably a 
different issue.  One of probable lockup scenario is...

  1. ATAPI command gets issued for probing. (PROT_ATAPI_NODATA)
  2. atapi_packet_task sends cdb
  3. interrupt occurs and command is failed
  4. EH entered, sense requested (PROT_ATAPI_PIO, ATA_NIEN set)
  5. Machine enters sleep state
  6. Machine wakes up
  7. Power event raises an interrupt.
  8. *BOOM* We're in a screaming interrupt lockup.

  It might be that Mark was experiencing both freed memory corruption 
problem and above scenario.  Although I don't see how the broken fix 
patch would prevent some lockup which would occur with the one liner, 
as, AFAIK, the one liner does everything that the broken patch tries to 
do.  The only difference is that the one liner doesn't clear 
scmd->eh_entry, but this shouldn't cause any trouble.

  When and if time permits, it would be very helpful for Mark to gather 
some information about the lockups.  We currently don't even know if 
we're looking at the same problem.  Especially as the current libata 
implementation is not ready for suspending/resuming and doesn't 
synchronize with polling tasks.

  Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2005-08-20 17:12 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-08-20  6:51 More libata EH data points Jeff Garzik
2005-08-20 17:11 ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).