Re: [PATCH] libata: Better timeout recovery

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Tejun Heo <tj@kernel.org>
To: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: linux-ide@vger.kernel.org, jeff@garzik.org
Subject: Re: [PATCH] libata: Better timeout recovery
Date: Tue, 14 Oct 2008 14:32:05 +0900	[thread overview]
Message-ID: <48F42ED5.1030009@kernel.org> (raw)
In-Reply-To: <20081013141050.7772.52587.stgit@localhost.localdomain>

Hello, Alan.

Alan Cox wrote:
> diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
> index a93247c..757b956 100644
> --- a/drivers/ata/libata-eh.c
> +++ b/drivers/ata/libata-eh.c
> @@ -518,7 +518,7 @@ void ata_scsi_error(struct Scsi_Host *host)
>  
>  	/* For new EH, all qcs are finished in one of three ways -
>  	 * normal completion, error completion, and SCSI timeout.
> -	 * Both cmpletions can race against SCSI timeout.  When normal
> +	 * Both completions can race against SCSI timeout.  When normal
>  	 * completion wins, the qc never reaches EH.  When error
>  	 * completion wins, the qc has ATA_QCFLAG_FAILED set.
>  	 *

Heh.. nice catch.

> @@ -533,7 +533,19 @@ void ata_scsi_error(struct Scsi_Host *host)
>  		int nr_timedout = 0;
>  
>  		spin_lock_irqsave(ap->lock, flags);
> -
> +		
> +		/* This must occur under the ap->lock as we don't want
> +		   a polled recovery to race the real interrupt handler
> +		   
> +		   The lost_interrupt handler checks for any completed but
> +		   non-notified command and completes much like an IRQ handler.
> +		   
> +		   We then fall into the error recovery code which will treat
> +		   this as if normal completion won the race */

As Elias pointed out, I think it's better to stick with

 /* The first line.
  * The last line.
  */

as that's what the rest of the libata uses.

> +
> +		if (ap->ops->lost_interrupt)
> +			ap->ops->lost_interrupt(ap);
> +			
>  		list_for_each_entry_safe(scmd, tmp, &host->eh_cmd_q, eh_entry) {
>  			struct ata_queued_cmd *qc;

Eh... I really wish this can be folded into ata_sff_error_handler()
but, yeah, it's not easy to do as ata_scsi_error() marks commands as
timed out before error handler is entered.  Hmmm... I think it would
be better to do this before entering error handler at all and way
before the actual command out triggers but well I guess that's for
another day.

Do you mind to separate out this one from the drain changes?

> @@ -577,6 +589,9 @@ void ata_scsi_error(struct Scsi_Host *host)
>  		ap->eh_tries = ATA_EH_MAX_TRIES;
>  	} else
>  		spin_unlock_wait(ap->lock);
> +		
> +	/* If we timed raced normal completion and there is nothing to
> +	   recover nr_timedout == 0 why exactly are we doing error recovery ? */

As it's easier that way and there's no need to optimize anything as
we're already knee deep in EH.

> +	/* There was a command running, we are no longer busy and we have
> +	   no interrupt. */
> +	ata_port_printk(ap, KERN_WARNING, "lost interrupt (Status 0x%x)\n",
> +								status);

I sometimes find your indenting style a bit strange but it's equally
possible that I'm the weirdo.  :-)

> +	/* Run the host interrupt logic as if the interrupt had not been
> +	   lost */
> +	ata_sff_host_intr(ap, qc);

Ah.. we don't have ->irq_handler as port op now but it looks rather
strange to call ata_sff_host_intr() directly.  Maybe we need a
function which takes @irq_handler to call and make
ata_sff_lost_interrupt() call it with ata_sff_host_intr()?

> +/**
>   *	ata_sff_error_handler - Stock error handler for BMDMA controller
>   *	@ap: port to handle error for
>   *
> @@ -2124,6 +2201,12 @@ void ata_sff_error_handler(struct ata_port *ap)
>  	ata_sff_sync(ap);		/* FIXME: We don't need this */
>  	ap->ops->sff_check_status(ap);
>  	ap->ops->sff_irq_clear(ap);
> +	/* We *MUST* do FIFO draining before we issue a reset as several
> +	   devices helpfully clear their internal state and will lock solid
> +	   if we touch the data port post reset. Pass qc in case anyone wants
> +	   to do different PIO/DMA recovery or has per command fixups */
> +	if (ap->ops->drain_fifo)
> +		ap->ops->drain_fifo(qc);

If the mechanism is something which can be universally applied to SFF
controllers (SATA ones w/ hardreset usually don't need this BTW),
maybe we're better off with providing another flavor of
ata_sff_error_handler() instead of adding ->drain_fifo method?

>  	spin_unlock_irqrestore(ap->lock, flags);
>  
> @@ -2131,6 +2214,7 @@ void ata_sff_error_handler(struct ata_port *ap)
>  		ata_eh_thaw_port(ap);
>  
>  	/* PIO and DMA engines have been stopped, perform recovery */
> +	

Is this intentional?

Thanks.

-- 
tejun

next prev parent reply	other threads:[~2008-10-14  5:34 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-13 14:11 [PATCH] libata: Better timeout recovery Alan Cox
2008-10-13 16:04 ` Elias Oltmanns
2008-10-13 16:10   ` Alan Cox
2008-10-14  5:32 ` Tejun Heo [this message]
  -- strict thread matches above, loose matches on Subject: below --
2008-10-09 16:44 Alan Cox
2008-10-10  8:46 ` Elias Oltmanns
2008-10-10  9:19   ` Alan Cox
2008-10-10 13:24     ` Elias Oltmanns

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=48F42ED5.1030009@kernel.org \
    --to=tj@kernel.org \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=jeff@garzik.org \
    --cc=linux-ide@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.