From: Tejun Heo <tj@kernel.org>
To: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: linux-ide@vger.kernel.org, jeff@garzik.org
Subject: Re: [PATCH] libata: Better timeout recovery
Date: Tue, 14 Oct 2008 14:32:05 +0900 [thread overview]
Message-ID: <48F42ED5.1030009@kernel.org> (raw)
In-Reply-To: <20081013141050.7772.52587.stgit@localhost.localdomain>
Hello, Alan.
Alan Cox wrote:
> diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
> index a93247c..757b956 100644
> --- a/drivers/ata/libata-eh.c
> +++ b/drivers/ata/libata-eh.c
> @@ -518,7 +518,7 @@ void ata_scsi_error(struct Scsi_Host *host)
>
> /* For new EH, all qcs are finished in one of three ways -
> * normal completion, error completion, and SCSI timeout.
> - * Both cmpletions can race against SCSI timeout. When normal
> + * Both completions can race against SCSI timeout. When normal
> * completion wins, the qc never reaches EH. When error
> * completion wins, the qc has ATA_QCFLAG_FAILED set.
> *
Heh.. nice catch.
> @@ -533,7 +533,19 @@ void ata_scsi_error(struct Scsi_Host *host)
> int nr_timedout = 0;
>
> spin_lock_irqsave(ap->lock, flags);
> -
> +
> + /* This must occur under the ap->lock as we don't want
> + a polled recovery to race the real interrupt handler
> +
> + The lost_interrupt handler checks for any completed but
> + non-notified command and completes much like an IRQ handler.
> +
> + We then fall into the error recovery code which will treat
> + this as if normal completion won the race */
As Elias pointed out, I think it's better to stick with
/* The first line.
* The last line.
*/
as that's what the rest of the libata uses.
> +
> + if (ap->ops->lost_interrupt)
> + ap->ops->lost_interrupt(ap);
> +
> list_for_each_entry_safe(scmd, tmp, &host->eh_cmd_q, eh_entry) {
> struct ata_queued_cmd *qc;
Eh... I really wish this can be folded into ata_sff_error_handler()
but, yeah, it's not easy to do as ata_scsi_error() marks commands as
timed out before error handler is entered. Hmmm... I think it would
be better to do this before entering error handler at all and way
before the actual command out triggers but well I guess that's for
another day.
Do you mind to separate out this one from the drain changes?
> @@ -577,6 +589,9 @@ void ata_scsi_error(struct Scsi_Host *host)
> ap->eh_tries = ATA_EH_MAX_TRIES;
> } else
> spin_unlock_wait(ap->lock);
> +
> + /* If we timed raced normal completion and there is nothing to
> + recover nr_timedout == 0 why exactly are we doing error recovery ? */
As it's easier that way and there's no need to optimize anything as
we're already knee deep in EH.
> + /* There was a command running, we are no longer busy and we have
> + no interrupt. */
> + ata_port_printk(ap, KERN_WARNING, "lost interrupt (Status 0x%x)\n",
> + status);
I sometimes find your indenting style a bit strange but it's equally
possible that I'm the weirdo. :-)
> + /* Run the host interrupt logic as if the interrupt had not been
> + lost */
> + ata_sff_host_intr(ap, qc);
Ah.. we don't have ->irq_handler as port op now but it looks rather
strange to call ata_sff_host_intr() directly. Maybe we need a
function which takes @irq_handler to call and make
ata_sff_lost_interrupt() call it with ata_sff_host_intr()?
> +/**
> * ata_sff_error_handler - Stock error handler for BMDMA controller
> * @ap: port to handle error for
> *
> @@ -2124,6 +2201,12 @@ void ata_sff_error_handler(struct ata_port *ap)
> ata_sff_sync(ap); /* FIXME: We don't need this */
> ap->ops->sff_check_status(ap);
> ap->ops->sff_irq_clear(ap);
> + /* We *MUST* do FIFO draining before we issue a reset as several
> + devices helpfully clear their internal state and will lock solid
> + if we touch the data port post reset. Pass qc in case anyone wants
> + to do different PIO/DMA recovery or has per command fixups */
> + if (ap->ops->drain_fifo)
> + ap->ops->drain_fifo(qc);
If the mechanism is something which can be universally applied to SFF
controllers (SATA ones w/ hardreset usually don't need this BTW),
maybe we're better off with providing another flavor of
ata_sff_error_handler() instead of adding ->drain_fifo method?
> spin_unlock_irqrestore(ap->lock, flags);
>
> @@ -2131,6 +2214,7 @@ void ata_sff_error_handler(struct ata_port *ap)
> ata_eh_thaw_port(ap);
>
> /* PIO and DMA engines have been stopped, perform recovery */
> +
Is this intentional?
Thanks.
--
tejun
next prev parent reply other threads:[~2008-10-14 5:34 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-10-13 14:11 [PATCH] libata: Better timeout recovery Alan Cox
2008-10-13 16:04 ` Elias Oltmanns
2008-10-13 16:10 ` Alan Cox
2008-10-14 5:32 ` Tejun Heo [this message]
-- strict thread matches above, loose matches on Subject: below --
2008-10-09 16:44 Alan Cox
2008-10-10 8:46 ` Elias Oltmanns
2008-10-10 9:19 ` Alan Cox
2008-10-10 13:24 ` Elias Oltmanns
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=48F42ED5.1030009@kernel.org \
--to=tj@kernel.org \
--cc=alan@lxorguk.ukuu.org.uk \
--cc=jeff@garzik.org \
--cc=linux-ide@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).