From: Tejun Heo <htejun@gmail.com>
To: Jeff Garzik <jgarzik@pobox.com>
Cc: "linux-ide@vger.kernel.org" <linux-ide@vger.kernel.org>,
Mark Lord <liml@rtr.ca>
Subject: Re: More libata EH data points
Date: Sun, 21 Aug 2005 02:11:44 +0900 [thread overview]
Message-ID: <43076450.4030007@gmail.com> (raw)
In-Reply-To: <4306D2F2.9000309@pobox.com>
Jeff Garzik wrote:
> Check out this lkml thread:
> http://marc.theaimsgroup.com/?t=111709353400001&r=1&w=2
>
> It may help to turn on CONFIG_DEBUG_SLAB...
>
Hello, Jeff. Hello, Mark.
The thread is reporting the following two problems.
1. w/o patch, scmd is accessed after deallocated.
2. w/ patch, scmd is freed twice.
I'm aware of both problems and have just verified both with test
cases. The first problem is caused by not clearing eh_cmdq in
ata_scsi_error. This leaves eh_cmd_q pointing at freed scmd after eh is
complete.
After the first eh is complete, eh_cmd_q is pointing to freed scmd(s).
When the next error occurs, scsi_softirq() calls scsi_eh_scmd_add(),
which adds the command to shost->eh_cmd_q. As eh_cmd_q contains
dangling pointers, this corrupts freed memory causing #1 (and infinite
loop depending on circumstances).
The patch modifies ATAPI eh path to use ata_qc_timeout_done instead of
scsi_finish_cmd and scmd's are finished by scsi_invoke_strategy_handler
using scsi_eh_flush_done_q. As ATA timeout path isn't modified, when an
ATA timeout occurs, a scmd is finished first in ata_qc_timeout and again
in scsi_invoke_strategy_handler causing double free.
With INIT_LIST_HEAD(&host->eh_cmd_q) one liner patch, both problems
don't occur. As for Mark's lockup problem, I think that it's probably a
different issue. One of probable lockup scenario is...
1. ATAPI command gets issued for probing. (PROT_ATAPI_NODATA)
2. atapi_packet_task sends cdb
3. interrupt occurs and command is failed
4. EH entered, sense requested (PROT_ATAPI_PIO, ATA_NIEN set)
5. Machine enters sleep state
6. Machine wakes up
7. Power event raises an interrupt.
8. *BOOM* We're in a screaming interrupt lockup.
It might be that Mark was experiencing both freed memory corruption
problem and above scenario. Although I don't see how the broken fix
patch would prevent some lockup which would occur with the one liner,
as, AFAIK, the one liner does everything that the broken patch tries to
do. The only difference is that the one liner doesn't clear
scmd->eh_entry, but this shouldn't cause any trouble.
When and if time permits, it would be very helpful for Mark to gather
some information about the lockups. We currently don't even know if
we're looking at the same problem. Especially as the current libata
implementation is not ready for suspending/resuming and doesn't
synchronize with polling tasks.
Thanks.
--
tejun
prev parent reply other threads:[~2005-08-20 17:12 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-08-20 6:51 More libata EH data points Jeff Garzik
2005-08-20 17:11 ` Tejun Heo [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=43076450.4030007@gmail.com \
--to=htejun@gmail.com \
--cc=jgarzik@pobox.com \
--cc=liml@rtr.ca \
--cc=linux-ide@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).