linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/1] libata: use AC_ERR_TIMEOUT err_mask for time out
@ 2006-03-31  5:43 Albert Lee
  2006-03-31 15:06 ` Jeff Garzik
  0 siblings, 1 reply; 5+ messages in thread
From: Albert Lee @ 2006-03-31  5:43 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: IDE Linux

Use AC_ERR_TIMEOUT err_mask for time out,
instead of generating err_mask from drv_stat.

Signed-off-by: Albert Lee <albertcc@tw.ibm.com>
---
This patch was accepted in irq-pio before.

Patch against upstream (8b316a3973f05e572b4edeeda9072987f6bbaa44).
For your review, thanks.

Albert

--- upstream0/drivers/scsi/libata-core.c	2006-03-31 13:33:15.000000000 +0800
+++ upstream1/drivers/scsi/libata-core.c	2006-03-31 13:34:59.000000000 +0800
@@ -3827,7 +3827,7 @@ static void ata_qc_timeout(struct ata_qu
 		       ap->id, qc->tf.command, drv_stat, host_stat);
 
 		/* complete taskfile transaction */
-		qc->err_mask |= ac_err_mask(drv_stat);
+		qc->err_mask |= AC_ERR_TIMEOUT;
 		break;
 	}
 


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/1] libata: use AC_ERR_TIMEOUT err_mask for time out
  2006-03-31  5:43 [PATCH 1/1] libata: use AC_ERR_TIMEOUT err_mask for time out Albert Lee
@ 2006-03-31 15:06 ` Jeff Garzik
  2006-03-31 18:14   ` Tejun Heo
  0 siblings, 1 reply; 5+ messages in thread
From: Jeff Garzik @ 2006-03-31 15:06 UTC (permalink / raw)
  To: albertl; +Cc: IDE Linux

Albert Lee wrote:
> Use AC_ERR_TIMEOUT err_mask for time out,
> instead of generating err_mask from drv_stat.
> 
> Signed-off-by: Albert Lee <albertcc@tw.ibm.com>
> ---
> This patch was accepted in irq-pio before.
> 
> Patch against upstream (8b316a3973f05e572b4edeeda9072987f6bbaa44).
> For your review, thanks.
> 
> Albert
> 
> --- upstream0/drivers/scsi/libata-core.c	2006-03-31 13:33:15.000000000 +0800
> +++ upstream1/drivers/scsi/libata-core.c	2006-03-31 13:34:59.000000000 +0800
> @@ -3827,7 +3827,7 @@ static void ata_qc_timeout(struct ata_qu
>  		       ap->id, qc->tf.command, drv_stat, host_stat);
>  
>  		/* complete taskfile transaction */
> -		qc->err_mask |= ac_err_mask(drv_stat);
> +		qc->err_mask |= AC_ERR_TIMEOUT;

The intention here is to catch missed interrupts; due to flaky hardware 
or buggy drivers, we may miss an interrupt, but still be able to 
complete the qc in the timeout handler.  Thus, if the ac_err_mask() 
indicates a BSY or DRQ or DF is asserted, it will indicate an error, 
otherwise not.

So, NAK, and we should fix irq-pio for this as well...

	Jeff




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/1] libata: use AC_ERR_TIMEOUT err_mask for time out
  2006-03-31 15:06 ` Jeff Garzik
@ 2006-03-31 18:14   ` Tejun Heo
  2006-03-31 18:17     ` Tejun Heo
  2006-04-01  0:34     ` Jeff Garzik
  0 siblings, 2 replies; 5+ messages in thread
From: Tejun Heo @ 2006-03-31 18:14 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: albertl, IDE Linux

Jeff Garzik wrote:
> Albert Lee wrote:
>> Use AC_ERR_TIMEOUT err_mask for time out,
>> instead of generating err_mask from drv_stat.
>>
>> Signed-off-by: Albert Lee <albertcc@tw.ibm.com>
>> ---
>> This patch was accepted in irq-pio before.
>>
>> Patch against upstream (8b316a3973f05e572b4edeeda9072987f6bbaa44).
>> For your review, thanks.
>>
>> Albert
>>
>> --- upstream0/drivers/scsi/libata-core.c    2006-03-31 
>> 13:33:15.000000000 +0800
>> +++ upstream1/drivers/scsi/libata-core.c    2006-03-31 
>> 13:34:59.000000000 +0800
>> @@ -3827,7 +3827,7 @@ static void ata_qc_timeout(struct ata_qu
>>                 ap->id, qc->tf.command, drv_stat, host_stat);
>>  
>>          /* complete taskfile transaction */
>> -        qc->err_mask |= ac_err_mask(drv_stat);
>> +        qc->err_mask |= AC_ERR_TIMEOUT;
> 
> The intention here is to catch missed interrupts; due to flaky hardware 
> or buggy drivers, we may miss an interrupt, but still be able to 
> complete the qc in the timeout handler.  Thus, if the ac_err_mask() 
> indicates a BSY or DRQ or DF is asserted, it will indicate an error, 
> otherwise not.
> 
> So, NAK, and we should fix irq-pio for this as well...
> 

Hello, Jeff. Hello, Albert.

I think we must not successfully finish a qc after timeout. If a device 
hasn't replied in 30 seconds, it's best to assume the device is in an 
unknown state. ATA TF registers just don't have enough information to 
tell whether it looks like that because it finished the command or it 
has simply forgotten about everything.

For example, if a communication error occurs while a qc is being issued 
and the controller fails to report, the device would have been in the 
ready state until the qc times out. It does know nothing about the 
command but ->eng_timeout will complete the command successfully 
potentially causing data corruption.

Or, if a power fluctation or static discharge breaks the link while a qc 
is in progress (this happens very easily with SATA, gigahertz serial 
signals are easy to disrupt), it can result in PHY reset causing the 
device to forget about the command. Similar result can be obtained with 
the right combination of device and controller by simply unplugging and 
replugging the cable while commands are in progress. (I've done a lot of 
it.)

It just isn't worth to successfully complete a command after 30 seconds 
at the risk of data corruption. If the device/controller times out on 
most of commands, the combination is unuseable no matter what we do on 
timeout (one command every 30secs..). If the combination fails 
occasionally, retrying doesn't take little time and is much safer.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/1] libata: use AC_ERR_TIMEOUT err_mask for time out
  2006-03-31 18:14   ` Tejun Heo
@ 2006-03-31 18:17     ` Tejun Heo
  2006-04-01  0:34     ` Jeff Garzik
  1 sibling, 0 replies; 5+ messages in thread
From: Tejun Heo @ 2006-03-31 18:17 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: albertl, IDE Linux

[--snip--]
> 
> It just isn't worth to successfully complete a command after 30 seconds 
> at the risk of data corruption. If the device/controller times out on 
> most of commands, the combination is unuseable no matter what we do on 
> timeout (one command every 30secs..). If the combination fails 
> occasionally, retrying doesn't take little time and is much safer.

The last sentence should be either 'doesn't take much time' or 'takes 
little time'. Two sentences got fused in my head while writing. :/

-- 
tejun

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/1] libata: use AC_ERR_TIMEOUT err_mask for time out
  2006-03-31 18:14   ` Tejun Heo
  2006-03-31 18:17     ` Tejun Heo
@ 2006-04-01  0:34     ` Jeff Garzik
  1 sibling, 0 replies; 5+ messages in thread
From: Jeff Garzik @ 2006-04-01  0:34 UTC (permalink / raw)
  To: Tejun Heo; +Cc: albertl, IDE Linux

Tejun Heo wrote:
> I think we must not successfully finish a qc after timeout. If a device 

OK fair enough, I'm convinced.

	Jeff



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2006-04-01  0:34 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-03-31  5:43 [PATCH 1/1] libata: use AC_ERR_TIMEOUT err_mask for time out Albert Lee
2006-03-31 15:06 ` Jeff Garzik
2006-03-31 18:14   ` Tejun Heo
2006-03-31 18:17     ` Tejun Heo
2006-04-01  0:34     ` Jeff Garzik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).