linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH libata-dev-2.6:upstream 00/02] libata: race fixes
@ 2005-08-20 14:15 Tejun Heo
  2005-08-20 14:15 ` [PATCH libata-dev-2.6:upstream 01/02] libata: implement ata_poll_qc_complete and use it in polling functions Tejun Heo
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Tejun Heo @ 2005-08-20 14:15 UTC (permalink / raw)
  To: jgarzik, albertcc; +Cc: linux-ide

 Hello, Jeff & Albert.

 This is another patchset fixing races in libata.

 The first patch implements ata_poll_qc_complete which graps host
lock, turns irq back on, completes qc and release host lock.  All
command completion in polling tasks are replaced with this function.
The effect is...

 * If an interrupt from other ports or a spurious interrupt occurs
   inbetween ata_irq_on and ata_qc_complete, it leads to double
   completion of the qc.  This race is fixed.

 * atapi_packet_task() now reenables IRQ on error exit.

 * ata_qc_complete() is now always called with host lock.

 The second patch adds locking to ata_scsi_error such that
->eng_timeout is called after ata_qc_complete is complete.
Previously, EH and the latter part of ata_qc_complete could run side
by side.

 Albert, I think, this patch should kill the race you tried to fix by
moving ATA_QCFLAG_ACTIVE clearing above ->complete_fn invocation.

[ Start of patch descriptions ]

01_libata_add-ata_poll_qc_complete.patch
	: implement ata_poll_qc_complete and use it in polling functions

	Previously, libata polling functions turned irq back on and
	completed qc commands without holding host lock.  This creates
	a race condition between the polling task and interrupts from
	other ports on the same host set or spurious interrupt from
	itself.

	This patch implements ata_poll_qc_complete which enables irq
	and completes qc atomically and convert all polling functions.

	Note: Unlike other functions, atapi_packet_task() didn't use
	to turn irq back on on error exits.  This patch makes it use
	ata_poll_qc_complete, so irq is now turned back on on error
	exits.

	Note: With this change, ALL invocations of ata_qc_complete()
	are now done under host_set lock.

02_libata_add-locking-to-ata_scsi_error.patch
	: add host_set locking to ata_scsi_error()

	SCSI EH can start before ata_qc_complete is completely
	complete.  so, latter part of ata_qc_complete can run
	side-by-side with ->eng_timeout(), interfering EH.

        This patch makes ata_scsi_error() to grab and release host_set
	lock before invoking ->eng_timeout().

	Note: host_failed decrementing and eh_cmd_q banging are moved
	above ->eng_timeout() invocation such that they're done while
	holding the lock.

[ End of patch descriptions ]

 Thanks.

--
tejun


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH libata-dev-2.6:upstream 01/02] libata: implement ata_poll_qc_complete and use it in polling functions
  2005-08-20 14:15 [PATCH libata-dev-2.6:upstream 00/02] libata: race fixes Tejun Heo
@ 2005-08-20 14:15 ` Tejun Heo
  2005-08-21 19:53   ` Jeff Garzik
  2005-08-20 14:15 ` [PATCH libata-dev-2.6:upstream 02/02] libata: add host_set locking to ata_scsi_error() Tejun Heo
  2005-08-21 20:08 ` a word about ATA interrupts, and SATA (was Re: [PATCH libata-dev-2.6:upstream 00/02] libata: race fixes) Jeff Garzik
  2 siblings, 1 reply; 9+ messages in thread
From: Tejun Heo @ 2005-08-20 14:15 UTC (permalink / raw)
  To: jgarzik, albertcc; +Cc: linux-ide

01_libata_add-ata_poll_qc_complete.patch

	Previously, libata polling functions turned irq back on and
	completed qc commands without holding host lock.  This creates
	a race condition between the polling task and interrupts from
	other ports on the same host set or spurious interrupt from
	itself.

	This patch implements ata_poll_qc_complete which enables irq
	and completes qc atomically and convert all polling functions.

	Note: Unlike other functions, atapi_packet_task() didn't use
	to turn irq back on on error exits.  This patch makes it use
	ata_poll_qc_complete, so irq is now turned back on on error
	exits.

	Note: With this change, ALL invocations of ata_qc_complete()
	are now done under host_set lock.

Signed-off-by: Tejun Heo <htejun@gmail.com>

 libata-core.c |   32 ++++++++++++++++++++++----------
 1 files changed, 22 insertions(+), 10 deletions(-)

Index: work/drivers/scsi/libata-core.c
===================================================================
--- work.orig/drivers/scsi/libata-core.c	2005-08-20 23:13:40.000000000 +0900
+++ work/drivers/scsi/libata-core.c	2005-08-20 23:13:40.000000000 +0900
@@ -2402,6 +2402,24 @@ static int ata_sg_setup(struct ata_queue
 }
 
 /**
+ *	ata_poll_qc_complete - turn irq back on and finish qc
+ *	@qc: Command to complete
+ *	@drv_stat: ATA status register content
+ *
+ *	LOCKING:
+ *	None.  (grabs host lock)
+ */
+void ata_poll_qc_complete(struct ata_queued_cmd *qc, u8 drv_stat)
+{
+	struct ata_port *ap = qc->ap;
+
+	spin_lock_irq(&ap->host_set->lock);
+	ata_irq_on(ap);
+	ata_qc_complete(qc, drv_stat);
+	spin_unlock_irq(&ap->host_set->lock);
+}
+
+/**
  *	ata_pio_poll -
  *	@ap:
  *
@@ -2492,9 +2510,7 @@ static void ata_pio_complete (struct ata
 
 	ap->pio_task_state = PIO_ST_IDLE;
 
-	ata_irq_on(ap);
-
-	ata_qc_complete(qc, drv_stat);
+	ata_poll_qc_complete(qc, drv_stat);
 }
 
 
@@ -2717,9 +2733,7 @@ static void ata_pio_block(struct ata_por
 		if ((status & ATA_DRQ) == 0) {
 			ap->pio_task_state = PIO_ST_IDLE;
 
-			ata_irq_on(ap);
-
-			ata_qc_complete(qc, status);
+			ata_poll_qc_complete(qc, status);
 			return;
 		}
 
@@ -2749,9 +2763,7 @@ static void ata_pio_error(struct ata_por
 
 	ap->pio_task_state = PIO_ST_IDLE;
 
-	ata_irq_on(ap);
-
-	ata_qc_complete(qc, drv_stat | ATA_ERR);
+	ata_poll_qc_complete(qc, drv_stat | ATA_ERR);
 }
 
 static void ata_pio_task(void *_data)
@@ -3656,7 +3668,7 @@ static void atapi_packet_task(void *_dat
 	return;
 
 err_out:
-	ata_qc_complete(qc, ATA_ERR);
+	ata_poll_qc_complete(qc, ATA_ERR);
 }
 
 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH libata-dev-2.6:upstream 02/02] libata: add host_set locking to ata_scsi_error()
  2005-08-20 14:15 [PATCH libata-dev-2.6:upstream 00/02] libata: race fixes Tejun Heo
  2005-08-20 14:15 ` [PATCH libata-dev-2.6:upstream 01/02] libata: implement ata_poll_qc_complete and use it in polling functions Tejun Heo
@ 2005-08-20 14:15 ` Tejun Heo
  2005-08-21 19:52   ` Jeff Garzik
  2005-08-21 20:08 ` a word about ATA interrupts, and SATA (was Re: [PATCH libata-dev-2.6:upstream 00/02] libata: race fixes) Jeff Garzik
  2 siblings, 1 reply; 9+ messages in thread
From: Tejun Heo @ 2005-08-20 14:15 UTC (permalink / raw)
  To: jgarzik, albertcc; +Cc: linux-ide

02_libata_add-locking-to-ata_scsi_error.patch

	SCSI EH can start before ata_qc_complete is completely
	complete.  so, latter part of ata_qc_complete can run
	side-by-side with ->eng_timeout(), interfering EH.

        This patch makes ata_scsi_error() to grab and release host_set
	lock before invoking ->eng_timeout().

	Note: host_failed decrementing and eh_cmd_q banging are moved
	above ->eng_timeout() invocation such that they're done while
	holding the lock.

Signed-off-by: Tejun Heo <htejun@gmail.com>

 libata-scsi.c |   12 +++++++++---
 1 files changed, 9 insertions(+), 3 deletions(-)

Index: work/drivers/scsi/libata-scsi.c
===================================================================
--- work.orig/drivers/scsi/libata-scsi.c	2005-08-20 22:34:02.000000000 +0900
+++ work/drivers/scsi/libata-scsi.c	2005-08-20 23:13:40.000000000 +0900
@@ -373,12 +373,14 @@ int ata_scsi_slave_config(struct scsi_de
 
 int ata_scsi_error(struct Scsi_Host *host)
 {
-	struct ata_port *ap;
+	struct ata_port *ap = (struct ata_port *) &host->hostdata[0];
 
 	DPRINTK("ENTER\n");
 
-	ap = (struct ata_port *) &host->hostdata[0];
-	ap->ops->eng_timeout(ap);
+	/* The following locking ensures that we enter eng_timeout
+	 * after ata_qc_complete is complete.
+	 */
+	spin_lock_irq(&ap->host_set->lock);
 
 	/* TODO: this is per-command; when queueing is supported
 	 * this code will either change or move to a more
@@ -387,6 +389,10 @@ int ata_scsi_error(struct Scsi_Host *hos
 	host->host_failed--;
 	INIT_LIST_HEAD(&host->eh_cmd_q);
 
+	spin_unlock_irq(&ap->host_set->lock);
+
+	ap->ops->eng_timeout(ap);
+
 	DPRINTK("EXIT\n");
 	return 0;
 }


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH libata-dev-2.6:upstream 02/02] libata: add host_set locking to ata_scsi_error()
  2005-08-20 14:15 ` [PATCH libata-dev-2.6:upstream 02/02] libata: add host_set locking to ata_scsi_error() Tejun Heo
@ 2005-08-21 19:52   ` Jeff Garzik
  2005-08-22  8:23     ` Tejun Heo
  0 siblings, 1 reply; 9+ messages in thread
From: Jeff Garzik @ 2005-08-21 19:52 UTC (permalink / raw)
  To: Tejun Heo; +Cc: albertcc, linux-ide

Tejun Heo wrote:
> 02_libata_add-locking-to-ata_scsi_error.patch
> 
> 	SCSI EH can start before ata_qc_complete is completely
> 	complete.  so, latter part of ata_qc_complete can run
> 	side-by-side with ->eng_timeout(), interfering EH.
> 
>         This patch makes ata_scsi_error() to grab and release host_set
> 	lock before invoking ->eng_timeout().
> 
> 	Note: host_failed decrementing and eh_cmd_q banging are moved
> 	above ->eng_timeout() invocation such that they're done while
> 	holding the lock.

I think it's better to follow a path that sets a "EH owns this port now" 
flag inside spinlock,

	spin_lock_irqsave()
	ap->in_eh = 1;
	spin_unlock_irqrestore()

and then check that flag in ata_qc_complete()

	if (qc->ap->in_eh)
		return; /* do nothing else with command */

so that for the case of a late interrupt, the interrupt handler should 
acknowledge the interrupt, but not actually touch the ata_queued_cmd state.

	Jeff



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH libata-dev-2.6:upstream 01/02] libata: implement ata_poll_qc_complete and use it in polling functions
  2005-08-20 14:15 ` [PATCH libata-dev-2.6:upstream 01/02] libata: implement ata_poll_qc_complete and use it in polling functions Tejun Heo
@ 2005-08-21 19:53   ` Jeff Garzik
  2005-08-22  8:12     ` Tejun Heo
  0 siblings, 1 reply; 9+ messages in thread
From: Jeff Garzik @ 2005-08-21 19:53 UTC (permalink / raw)
  To: Tejun Heo; +Cc: albertcc, linux-ide

Tejun Heo wrote:
> 01_libata_add-ata_poll_qc_complete.patch
> 
> 	Previously, libata polling functions turned irq back on and
> 	completed qc commands without holding host lock.  This creates
> 	a race condition between the polling task and interrupts from
> 	other ports on the same host set or spurious interrupt from
> 	itself.
> 
> 	This patch implements ata_poll_qc_complete which enables irq
> 	and completes qc atomically and convert all polling functions.
> 
> 	Note: Unlike other functions, atapi_packet_task() didn't use
> 	to turn irq back on on error exits.  This patch makes it use
> 	ata_poll_qc_complete, so irq is now turned back on on error
> 	exits.
> 
> 	Note: With this change, ALL invocations of ata_qc_complete()
> 	are now done under host_set lock.
> 
> Signed-off-by: Tejun Heo <htejun@gmail.com>

This one looks OK to me.  Given that I rejected the other three, would 
you still like me to apply just this one?

	Jeff




^ permalink raw reply	[flat|nested] 9+ messages in thread

* a word about ATA interrupts, and SATA (was Re: [PATCH libata-dev-2.6:upstream 00/02] libata: race fixes)
  2005-08-20 14:15 [PATCH libata-dev-2.6:upstream 00/02] libata: race fixes Tejun Heo
  2005-08-20 14:15 ` [PATCH libata-dev-2.6:upstream 01/02] libata: implement ata_poll_qc_complete and use it in polling functions Tejun Heo
  2005-08-20 14:15 ` [PATCH libata-dev-2.6:upstream 02/02] libata: add host_set locking to ata_scsi_error() Tejun Heo
@ 2005-08-21 20:08 ` Jeff Garzik
  2 siblings, 0 replies; 9+ messages in thread
From: Jeff Garzik @ 2005-08-21 20:08 UTC (permalink / raw)
  To: Tejun Heo, linux-ide, Jens Axboe, Alan Cox, Mark Lord; +Cc: albertcc

BTW, here is food for thought:

In first generation SATA controllers, the host controller silicon 
emulates a set of PATA taskfile registers.  When you follow the ATA 
command protocols (host state machines), handling ATA commands in the 
specified manner, the host controller silicon is processing that stuff 
behind your back, silently converting your actions into other actions:

* storing data in a H2D Register FIS buffer, for eventually sending to 
the device
* storing data received from a D2H Register FIS or SDB FIS into the host 
controller taskfile registers.
* handling PIO Setup and Data FISes, for PIO data xfer and PACKET 
command protocol
* handling FIS's associated with DMA data transfer

I've never seen any documentation for any host controller, that 
describes how the emulation is implemented in silicon.  Probably nobody 
outside of the original VHDL designers for each chip know that.  All we 
know, we must figure out from inference:  we pretend a host chip is 
PATA, and the host chip turns our actions into a SATA FIS conversion.

Because of this, we can discern several unique SATA quirks:
(a) writing most ATA registers simply involves storing a data value into 
a buffer on the host controller
(b) writing to Device Control or Command registers causes a H2D Register 
FIS to be sent.
(c) [most important to this thread] the ATA INTRQ is signalled by 
setting the 'I' bit in certain FIS's.  It is no longer a pure indication 
of the INTRQ signal, as it was in PATA.

Due to the ramifications of (c), polling in general is possible but not 
the best idea.  libata currently uses polling for the IDENTIFY DEVICE 
command, and I have long been thinking about turning off polling for 
SATA controllers.  Some SATA controllers, such as Vitesse/Intel, simply 
do not support the ATA nIEN bit.  It is emulated in the driver.  Other 
controllers may not honor the nIEN bit correctly (due to bad emulation 
logic in silicon, probably).

Just factor this info into your decisions, when playing around with the 
interrupt delivery code.

	Jeff




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH libata-dev-2.6:upstream 01/02] libata: implement ata_poll_qc_complete and use it in polling functions
  2005-08-21 19:53   ` Jeff Garzik
@ 2005-08-22  8:12     ` Tejun Heo
  2005-08-23  5:06       ` Jeff Garzik
  0 siblings, 1 reply; 9+ messages in thread
From: Tejun Heo @ 2005-08-22  8:12 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: albertcc, linux-ide

On Sun, Aug 21, 2005 at 03:53:58PM -0400, Jeff Garzik wrote:
> Tejun Heo wrote:
> >01_libata_add-ata_poll_qc_complete.patch
> >
> >	Previously, libata polling functions turned irq back on and
> >	completed qc commands without holding host lock.  This creates
> >	a race condition between the polling task and interrupts from
> >	other ports on the same host set or spurious interrupt from
> >	itself.
> >
> >	This patch implements ata_poll_qc_complete which enables irq
> >	and completes qc atomically and convert all polling functions.
> >
> >	Note: Unlike other functions, atapi_packet_task() didn't use
> >	to turn irq back on on error exits.  This patch makes it use
> >	ata_poll_qc_complete, so irq is now turned back on on error
> >	exits.
> >
> >	Note: With this change, ALL invocations of ata_qc_complete()
> >	are now done under host_set lock.
> >
> >Signed-off-by: Tejun Heo <htejun@gmail.com>
> 
> This one looks OK to me.  Given that I rejected the other three, would 
> you still like me to apply just this one?
> 

 Greetings, Jeff.  (Hello is getting a bit old today :-)

 Here's updated version which also clears ATA_FLAG_NOINTR.

--

[PATCH libata-dev-2.6:upstream] implement ata_poll_qc_complete and use it in polling functions

 Previously, libata polling functions turned irq back on and completed
qc commands without holding host lock.  This creates a race condition
between the polling task and interrupts from other ports on the same
host set or spurious interrupt from itself.

 This patch implements ata_poll_qc_complete which enables irq and
completes qc atomically and convert all polling functions.

 Note: atapi_packet_task() didn't use to turn irq back on or clear
ATA_FLAG_NOINTR on error exits.  This patch makes it use
ata_poll_qc_complete which does both.

 Note: With this change, ALL invocations of ata_qc_complete() are now
done under host_set lock.

Signed-off-by: Tejun Heo <htejun@gmail.com>

diff --git a/drivers/scsi/libata-core.c b/drivers/scsi/libata-core.c
--- a/drivers/scsi/libata-core.c
+++ b/drivers/scsi/libata-core.c
@@ -2402,6 +2402,26 @@ static int ata_sg_setup(struct ata_queue
 }
 
 /**
+ *	ata_poll_qc_complete - turn irq back on and finish qc
+ *	@qc: Command to complete
+ *	@drv_stat: ATA status register content
+ *
+ *	LOCKING:
+ *	None.  (grabs host lock)
+ */
+
+void ata_poll_qc_complete(struct ata_queued_cmd *qc, u8 drv_stat)
+{
+	struct ata_port *ap = qc->ap;
+
+	spin_lock_irq(&ap->host_set->lock);
+	ap->flags &= ~ATA_FLAG_NOINTR;
+	ata_irq_on(ap);
+	ata_qc_complete(qc, drv_stat);
+	spin_unlock_irq(&ap->host_set->lock);
+}
+
+/**
  *	ata_pio_poll -
  *	@ap:
  *
@@ -2492,9 +2512,7 @@ static void ata_pio_complete (struct ata
 
 	ap->pio_task_state = PIO_ST_IDLE;
 
-	ata_irq_on(ap);
-
-	ata_qc_complete(qc, drv_stat);
+	ata_poll_qc_complete(qc, drv_stat);
 }
 
 
@@ -2717,9 +2735,7 @@ static void ata_pio_block(struct ata_por
 		if ((status & ATA_DRQ) == 0) {
 			ap->pio_task_state = PIO_ST_IDLE;
 
-			ata_irq_on(ap);
-
-			ata_qc_complete(qc, status);
+			ata_poll_qc_complete(qc, status);
 			return;
 		}
 
@@ -2749,9 +2765,7 @@ static void ata_pio_error(struct ata_por
 
 	ap->pio_task_state = PIO_ST_IDLE;
 
-	ata_irq_on(ap);
-
-	ata_qc_complete(qc, drv_stat | ATA_ERR);
+	ata_poll_qc_complete(qc, drv_stat | ATA_ERR);
 }
 
 static void ata_pio_task(void *_data)
@@ -3659,7 +3673,7 @@ static void atapi_packet_task(void *_dat
 	return;
 
 err_out:
-	ata_qc_complete(qc, ATA_ERR);
+	ata_poll_qc_complete(qc, ATA_ERR);
 }
 
 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH libata-dev-2.6:upstream 02/02] libata: add host_set locking to ata_scsi_error()
  2005-08-21 19:52   ` Jeff Garzik
@ 2005-08-22  8:23     ` Tejun Heo
  0 siblings, 0 replies; 9+ messages in thread
From: Tejun Heo @ 2005-08-22  8:23 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: albertcc, linux-ide

On Sun, Aug 21, 2005 at 03:52:27PM -0400, Jeff Garzik wrote:
> Tejun Heo wrote:
> >02_libata_add-locking-to-ata_scsi_error.patch
> >
> >	SCSI EH can start before ata_qc_complete is completely
> >	complete.  so, latter part of ata_qc_complete can run
> >	side-by-side with ->eng_timeout(), interfering EH.
> >
> >        This patch makes ata_scsi_error() to grab and release host_set
> >	lock before invoking ->eng_timeout().
> >
> >	Note: host_failed decrementing and eh_cmd_q banging are moved
> >	above ->eng_timeout() invocation such that they're done while
> >	holding the lock.
> 
> I think it's better to follow a path that sets a "EH owns this port now" 
> flag inside spinlock,
> 
> 	spin_lock_irqsave()
> 	ap->in_eh = 1;
> 	spin_unlock_irqrestore()
> 
> and then check that flag in ata_qc_complete()
> 
> 	if (qc->ap->in_eh)
> 		return; /* do nothing else with command */
> 
> so that for the case of a late interrupt, the interrupt handler should 
> acknowledge the interrupt, but not actually touch the ata_queued_cmd state.

 Hi!

 I think above is larger subject than what this patch deals with and
should be dealt with when we revamp EH (implementing in_eh will
certainly involve other changes, I think).  So, I think, for short
term, either Albert's patch[1][2] or above one should go in to plug
ATA_QCFLAG_ACTIVE race.

 I think if Albert's patch goes in, above patch is not necessary, as
the only race we currently have is ATAPI EH case and clearing
ATA_QCFLAG_ACTIVE is the only thing we do after ->complete_fn() in
such cases.

 Thanks.

[1] http://marc.theaimsgroup.com/?l=linux-ide&m=112417360223374&w=2
[2] Well, this referencing thing is neat.  ;-)

-- 
tejun

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH libata-dev-2.6:upstream 01/02] libata: implement ata_poll_qc_complete and use it in polling functions
  2005-08-22  8:12     ` Tejun Heo
@ 2005-08-23  5:06       ` Jeff Garzik
  0 siblings, 0 replies; 9+ messages in thread
From: Jeff Garzik @ 2005-08-23  5:06 UTC (permalink / raw)
  To: Tejun Heo; +Cc: albertcc, linux-ide

applied


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2005-08-23  5:06 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-08-20 14:15 [PATCH libata-dev-2.6:upstream 00/02] libata: race fixes Tejun Heo
2005-08-20 14:15 ` [PATCH libata-dev-2.6:upstream 01/02] libata: implement ata_poll_qc_complete and use it in polling functions Tejun Heo
2005-08-21 19:53   ` Jeff Garzik
2005-08-22  8:12     ` Tejun Heo
2005-08-23  5:06       ` Jeff Garzik
2005-08-20 14:15 ` [PATCH libata-dev-2.6:upstream 02/02] libata: add host_set locking to ata_scsi_error() Tejun Heo
2005-08-21 19:52   ` Jeff Garzik
2005-08-22  8:23     ` Tejun Heo
2005-08-21 20:08 ` a word about ATA interrupts, and SATA (was Re: [PATCH libata-dev-2.6:upstream 00/02] libata: race fixes) Jeff Garzik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).