* [PATCH libata-dev-2.6:upstream 00/02] libata: race fixes
@ 2005-08-20 14:15 Tejun Heo
2005-08-20 14:15 ` [PATCH libata-dev-2.6:upstream 01/02] libata: implement ata_poll_qc_complete and use it in polling functions Tejun Heo
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Tejun Heo @ 2005-08-20 14:15 UTC (permalink / raw)
To: jgarzik, albertcc; +Cc: linux-ide
Hello, Jeff & Albert.
This is another patchset fixing races in libata.
The first patch implements ata_poll_qc_complete which graps host
lock, turns irq back on, completes qc and release host lock. All
command completion in polling tasks are replaced with this function.
The effect is...
* If an interrupt from other ports or a spurious interrupt occurs
inbetween ata_irq_on and ata_qc_complete, it leads to double
completion of the qc. This race is fixed.
* atapi_packet_task() now reenables IRQ on error exit.
* ata_qc_complete() is now always called with host lock.
The second patch adds locking to ata_scsi_error such that
->eng_timeout is called after ata_qc_complete is complete.
Previously, EH and the latter part of ata_qc_complete could run side
by side.
Albert, I think, this patch should kill the race you tried to fix by
moving ATA_QCFLAG_ACTIVE clearing above ->complete_fn invocation.
[ Start of patch descriptions ]
01_libata_add-ata_poll_qc_complete.patch
: implement ata_poll_qc_complete and use it in polling functions
Previously, libata polling functions turned irq back on and
completed qc commands without holding host lock. This creates
a race condition between the polling task and interrupts from
other ports on the same host set or spurious interrupt from
itself.
This patch implements ata_poll_qc_complete which enables irq
and completes qc atomically and convert all polling functions.
Note: Unlike other functions, atapi_packet_task() didn't use
to turn irq back on on error exits. This patch makes it use
ata_poll_qc_complete, so irq is now turned back on on error
exits.
Note: With this change, ALL invocations of ata_qc_complete()
are now done under host_set lock.
02_libata_add-locking-to-ata_scsi_error.patch
: add host_set locking to ata_scsi_error()
SCSI EH can start before ata_qc_complete is completely
complete. so, latter part of ata_qc_complete can run
side-by-side with ->eng_timeout(), interfering EH.
This patch makes ata_scsi_error() to grab and release host_set
lock before invoking ->eng_timeout().
Note: host_failed decrementing and eh_cmd_q banging are moved
above ->eng_timeout() invocation such that they're done while
holding the lock.
[ End of patch descriptions ]
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH libata-dev-2.6:upstream 01/02] libata: implement ata_poll_qc_complete and use it in polling functions
2005-08-20 14:15 [PATCH libata-dev-2.6:upstream 00/02] libata: race fixes Tejun Heo
@ 2005-08-20 14:15 ` Tejun Heo
2005-08-21 19:53 ` Jeff Garzik
2005-08-20 14:15 ` [PATCH libata-dev-2.6:upstream 02/02] libata: add host_set locking to ata_scsi_error() Tejun Heo
2005-08-21 20:08 ` a word about ATA interrupts, and SATA (was Re: [PATCH libata-dev-2.6:upstream 00/02] libata: race fixes) Jeff Garzik
2 siblings, 1 reply; 9+ messages in thread
From: Tejun Heo @ 2005-08-20 14:15 UTC (permalink / raw)
To: jgarzik, albertcc; +Cc: linux-ide
01_libata_add-ata_poll_qc_complete.patch
Previously, libata polling functions turned irq back on and
completed qc commands without holding host lock. This creates
a race condition between the polling task and interrupts from
other ports on the same host set or spurious interrupt from
itself.
This patch implements ata_poll_qc_complete which enables irq
and completes qc atomically and convert all polling functions.
Note: Unlike other functions, atapi_packet_task() didn't use
to turn irq back on on error exits. This patch makes it use
ata_poll_qc_complete, so irq is now turned back on on error
exits.
Note: With this change, ALL invocations of ata_qc_complete()
are now done under host_set lock.
Signed-off-by: Tejun Heo <htejun@gmail.com>
libata-core.c | 32 ++++++++++++++++++++++----------
1 files changed, 22 insertions(+), 10 deletions(-)
Index: work/drivers/scsi/libata-core.c
===================================================================
--- work.orig/drivers/scsi/libata-core.c 2005-08-20 23:13:40.000000000 +0900
+++ work/drivers/scsi/libata-core.c 2005-08-20 23:13:40.000000000 +0900
@@ -2402,6 +2402,24 @@ static int ata_sg_setup(struct ata_queue
}
/**
+ * ata_poll_qc_complete - turn irq back on and finish qc
+ * @qc: Command to complete
+ * @drv_stat: ATA status register content
+ *
+ * LOCKING:
+ * None. (grabs host lock)
+ */
+void ata_poll_qc_complete(struct ata_queued_cmd *qc, u8 drv_stat)
+{
+ struct ata_port *ap = qc->ap;
+
+ spin_lock_irq(&ap->host_set->lock);
+ ata_irq_on(ap);
+ ata_qc_complete(qc, drv_stat);
+ spin_unlock_irq(&ap->host_set->lock);
+}
+
+/**
* ata_pio_poll -
* @ap:
*
@@ -2492,9 +2510,7 @@ static void ata_pio_complete (struct ata
ap->pio_task_state = PIO_ST_IDLE;
- ata_irq_on(ap);
-
- ata_qc_complete(qc, drv_stat);
+ ata_poll_qc_complete(qc, drv_stat);
}
@@ -2717,9 +2733,7 @@ static void ata_pio_block(struct ata_por
if ((status & ATA_DRQ) == 0) {
ap->pio_task_state = PIO_ST_IDLE;
- ata_irq_on(ap);
-
- ata_qc_complete(qc, status);
+ ata_poll_qc_complete(qc, status);
return;
}
@@ -2749,9 +2763,7 @@ static void ata_pio_error(struct ata_por
ap->pio_task_state = PIO_ST_IDLE;
- ata_irq_on(ap);
-
- ata_qc_complete(qc, drv_stat | ATA_ERR);
+ ata_poll_qc_complete(qc, drv_stat | ATA_ERR);
}
static void ata_pio_task(void *_data)
@@ -3656,7 +3668,7 @@ static void atapi_packet_task(void *_dat
return;
err_out:
- ata_qc_complete(qc, ATA_ERR);
+ ata_poll_qc_complete(qc, ATA_ERR);
}
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH libata-dev-2.6:upstream 02/02] libata: add host_set locking to ata_scsi_error()
2005-08-20 14:15 [PATCH libata-dev-2.6:upstream 00/02] libata: race fixes Tejun Heo
2005-08-20 14:15 ` [PATCH libata-dev-2.6:upstream 01/02] libata: implement ata_poll_qc_complete and use it in polling functions Tejun Heo
@ 2005-08-20 14:15 ` Tejun Heo
2005-08-21 19:52 ` Jeff Garzik
2005-08-21 20:08 ` a word about ATA interrupts, and SATA (was Re: [PATCH libata-dev-2.6:upstream 00/02] libata: race fixes) Jeff Garzik
2 siblings, 1 reply; 9+ messages in thread
From: Tejun Heo @ 2005-08-20 14:15 UTC (permalink / raw)
To: jgarzik, albertcc; +Cc: linux-ide
02_libata_add-locking-to-ata_scsi_error.patch
SCSI EH can start before ata_qc_complete is completely
complete. so, latter part of ata_qc_complete can run
side-by-side with ->eng_timeout(), interfering EH.
This patch makes ata_scsi_error() to grab and release host_set
lock before invoking ->eng_timeout().
Note: host_failed decrementing and eh_cmd_q banging are moved
above ->eng_timeout() invocation such that they're done while
holding the lock.
Signed-off-by: Tejun Heo <htejun@gmail.com>
libata-scsi.c | 12 +++++++++---
1 files changed, 9 insertions(+), 3 deletions(-)
Index: work/drivers/scsi/libata-scsi.c
===================================================================
--- work.orig/drivers/scsi/libata-scsi.c 2005-08-20 22:34:02.000000000 +0900
+++ work/drivers/scsi/libata-scsi.c 2005-08-20 23:13:40.000000000 +0900
@@ -373,12 +373,14 @@ int ata_scsi_slave_config(struct scsi_de
int ata_scsi_error(struct Scsi_Host *host)
{
- struct ata_port *ap;
+ struct ata_port *ap = (struct ata_port *) &host->hostdata[0];
DPRINTK("ENTER\n");
- ap = (struct ata_port *) &host->hostdata[0];
- ap->ops->eng_timeout(ap);
+ /* The following locking ensures that we enter eng_timeout
+ * after ata_qc_complete is complete.
+ */
+ spin_lock_irq(&ap->host_set->lock);
/* TODO: this is per-command; when queueing is supported
* this code will either change or move to a more
@@ -387,6 +389,10 @@ int ata_scsi_error(struct Scsi_Host *hos
host->host_failed--;
INIT_LIST_HEAD(&host->eh_cmd_q);
+ spin_unlock_irq(&ap->host_set->lock);
+
+ ap->ops->eng_timeout(ap);
+
DPRINTK("EXIT\n");
return 0;
}
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH libata-dev-2.6:upstream 02/02] libata: add host_set locking to ata_scsi_error()
2005-08-20 14:15 ` [PATCH libata-dev-2.6:upstream 02/02] libata: add host_set locking to ata_scsi_error() Tejun Heo
@ 2005-08-21 19:52 ` Jeff Garzik
2005-08-22 8:23 ` Tejun Heo
0 siblings, 1 reply; 9+ messages in thread
From: Jeff Garzik @ 2005-08-21 19:52 UTC (permalink / raw)
To: Tejun Heo; +Cc: albertcc, linux-ide
Tejun Heo wrote:
> 02_libata_add-locking-to-ata_scsi_error.patch
>
> SCSI EH can start before ata_qc_complete is completely
> complete. so, latter part of ata_qc_complete can run
> side-by-side with ->eng_timeout(), interfering EH.
>
> This patch makes ata_scsi_error() to grab and release host_set
> lock before invoking ->eng_timeout().
>
> Note: host_failed decrementing and eh_cmd_q banging are moved
> above ->eng_timeout() invocation such that they're done while
> holding the lock.
I think it's better to follow a path that sets a "EH owns this port now"
flag inside spinlock,
spin_lock_irqsave()
ap->in_eh = 1;
spin_unlock_irqrestore()
and then check that flag in ata_qc_complete()
if (qc->ap->in_eh)
return; /* do nothing else with command */
so that for the case of a late interrupt, the interrupt handler should
acknowledge the interrupt, but not actually touch the ata_queued_cmd state.
Jeff
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH libata-dev-2.6:upstream 01/02] libata: implement ata_poll_qc_complete and use it in polling functions
2005-08-20 14:15 ` [PATCH libata-dev-2.6:upstream 01/02] libata: implement ata_poll_qc_complete and use it in polling functions Tejun Heo
@ 2005-08-21 19:53 ` Jeff Garzik
2005-08-22 8:12 ` Tejun Heo
0 siblings, 1 reply; 9+ messages in thread
From: Jeff Garzik @ 2005-08-21 19:53 UTC (permalink / raw)
To: Tejun Heo; +Cc: albertcc, linux-ide
Tejun Heo wrote:
> 01_libata_add-ata_poll_qc_complete.patch
>
> Previously, libata polling functions turned irq back on and
> completed qc commands without holding host lock. This creates
> a race condition between the polling task and interrupts from
> other ports on the same host set or spurious interrupt from
> itself.
>
> This patch implements ata_poll_qc_complete which enables irq
> and completes qc atomically and convert all polling functions.
>
> Note: Unlike other functions, atapi_packet_task() didn't use
> to turn irq back on on error exits. This patch makes it use
> ata_poll_qc_complete, so irq is now turned back on on error
> exits.
>
> Note: With this change, ALL invocations of ata_qc_complete()
> are now done under host_set lock.
>
> Signed-off-by: Tejun Heo <htejun@gmail.com>
This one looks OK to me. Given that I rejected the other three, would
you still like me to apply just this one?
Jeff
^ permalink raw reply [flat|nested] 9+ messages in thread
* a word about ATA interrupts, and SATA (was Re: [PATCH libata-dev-2.6:upstream 00/02] libata: race fixes)
2005-08-20 14:15 [PATCH libata-dev-2.6:upstream 00/02] libata: race fixes Tejun Heo
2005-08-20 14:15 ` [PATCH libata-dev-2.6:upstream 01/02] libata: implement ata_poll_qc_complete and use it in polling functions Tejun Heo
2005-08-20 14:15 ` [PATCH libata-dev-2.6:upstream 02/02] libata: add host_set locking to ata_scsi_error() Tejun Heo
@ 2005-08-21 20:08 ` Jeff Garzik
2 siblings, 0 replies; 9+ messages in thread
From: Jeff Garzik @ 2005-08-21 20:08 UTC (permalink / raw)
To: Tejun Heo, linux-ide, Jens Axboe, Alan Cox, Mark Lord; +Cc: albertcc
BTW, here is food for thought:
In first generation SATA controllers, the host controller silicon
emulates a set of PATA taskfile registers. When you follow the ATA
command protocols (host state machines), handling ATA commands in the
specified manner, the host controller silicon is processing that stuff
behind your back, silently converting your actions into other actions:
* storing data in a H2D Register FIS buffer, for eventually sending to
the device
* storing data received from a D2H Register FIS or SDB FIS into the host
controller taskfile registers.
* handling PIO Setup and Data FISes, for PIO data xfer and PACKET
command protocol
* handling FIS's associated with DMA data transfer
I've never seen any documentation for any host controller, that
describes how the emulation is implemented in silicon. Probably nobody
outside of the original VHDL designers for each chip know that. All we
know, we must figure out from inference: we pretend a host chip is
PATA, and the host chip turns our actions into a SATA FIS conversion.
Because of this, we can discern several unique SATA quirks:
(a) writing most ATA registers simply involves storing a data value into
a buffer on the host controller
(b) writing to Device Control or Command registers causes a H2D Register
FIS to be sent.
(c) [most important to this thread] the ATA INTRQ is signalled by
setting the 'I' bit in certain FIS's. It is no longer a pure indication
of the INTRQ signal, as it was in PATA.
Due to the ramifications of (c), polling in general is possible but not
the best idea. libata currently uses polling for the IDENTIFY DEVICE
command, and I have long been thinking about turning off polling for
SATA controllers. Some SATA controllers, such as Vitesse/Intel, simply
do not support the ATA nIEN bit. It is emulated in the driver. Other
controllers may not honor the nIEN bit correctly (due to bad emulation
logic in silicon, probably).
Just factor this info into your decisions, when playing around with the
interrupt delivery code.
Jeff
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH libata-dev-2.6:upstream 01/02] libata: implement ata_poll_qc_complete and use it in polling functions
2005-08-21 19:53 ` Jeff Garzik
@ 2005-08-22 8:12 ` Tejun Heo
2005-08-23 5:06 ` Jeff Garzik
0 siblings, 1 reply; 9+ messages in thread
From: Tejun Heo @ 2005-08-22 8:12 UTC (permalink / raw)
To: Jeff Garzik; +Cc: albertcc, linux-ide
On Sun, Aug 21, 2005 at 03:53:58PM -0400, Jeff Garzik wrote:
> Tejun Heo wrote:
> >01_libata_add-ata_poll_qc_complete.patch
> >
> > Previously, libata polling functions turned irq back on and
> > completed qc commands without holding host lock. This creates
> > a race condition between the polling task and interrupts from
> > other ports on the same host set or spurious interrupt from
> > itself.
> >
> > This patch implements ata_poll_qc_complete which enables irq
> > and completes qc atomically and convert all polling functions.
> >
> > Note: Unlike other functions, atapi_packet_task() didn't use
> > to turn irq back on on error exits. This patch makes it use
> > ata_poll_qc_complete, so irq is now turned back on on error
> > exits.
> >
> > Note: With this change, ALL invocations of ata_qc_complete()
> > are now done under host_set lock.
> >
> >Signed-off-by: Tejun Heo <htejun@gmail.com>
>
> This one looks OK to me. Given that I rejected the other three, would
> you still like me to apply just this one?
>
Greetings, Jeff. (Hello is getting a bit old today :-)
Here's updated version which also clears ATA_FLAG_NOINTR.
--
[PATCH libata-dev-2.6:upstream] implement ata_poll_qc_complete and use it in polling functions
Previously, libata polling functions turned irq back on and completed
qc commands without holding host lock. This creates a race condition
between the polling task and interrupts from other ports on the same
host set or spurious interrupt from itself.
This patch implements ata_poll_qc_complete which enables irq and
completes qc atomically and convert all polling functions.
Note: atapi_packet_task() didn't use to turn irq back on or clear
ATA_FLAG_NOINTR on error exits. This patch makes it use
ata_poll_qc_complete which does both.
Note: With this change, ALL invocations of ata_qc_complete() are now
done under host_set lock.
Signed-off-by: Tejun Heo <htejun@gmail.com>
diff --git a/drivers/scsi/libata-core.c b/drivers/scsi/libata-core.c
--- a/drivers/scsi/libata-core.c
+++ b/drivers/scsi/libata-core.c
@@ -2402,6 +2402,26 @@ static int ata_sg_setup(struct ata_queue
}
/**
+ * ata_poll_qc_complete - turn irq back on and finish qc
+ * @qc: Command to complete
+ * @drv_stat: ATA status register content
+ *
+ * LOCKING:
+ * None. (grabs host lock)
+ */
+
+void ata_poll_qc_complete(struct ata_queued_cmd *qc, u8 drv_stat)
+{
+ struct ata_port *ap = qc->ap;
+
+ spin_lock_irq(&ap->host_set->lock);
+ ap->flags &= ~ATA_FLAG_NOINTR;
+ ata_irq_on(ap);
+ ata_qc_complete(qc, drv_stat);
+ spin_unlock_irq(&ap->host_set->lock);
+}
+
+/**
* ata_pio_poll -
* @ap:
*
@@ -2492,9 +2512,7 @@ static void ata_pio_complete (struct ata
ap->pio_task_state = PIO_ST_IDLE;
- ata_irq_on(ap);
-
- ata_qc_complete(qc, drv_stat);
+ ata_poll_qc_complete(qc, drv_stat);
}
@@ -2717,9 +2735,7 @@ static void ata_pio_block(struct ata_por
if ((status & ATA_DRQ) == 0) {
ap->pio_task_state = PIO_ST_IDLE;
- ata_irq_on(ap);
-
- ata_qc_complete(qc, status);
+ ata_poll_qc_complete(qc, status);
return;
}
@@ -2749,9 +2765,7 @@ static void ata_pio_error(struct ata_por
ap->pio_task_state = PIO_ST_IDLE;
- ata_irq_on(ap);
-
- ata_qc_complete(qc, drv_stat | ATA_ERR);
+ ata_poll_qc_complete(qc, drv_stat | ATA_ERR);
}
static void ata_pio_task(void *_data)
@@ -3659,7 +3673,7 @@ static void atapi_packet_task(void *_dat
return;
err_out:
- ata_qc_complete(qc, ATA_ERR);
+ ata_poll_qc_complete(qc, ATA_ERR);
}
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH libata-dev-2.6:upstream 02/02] libata: add host_set locking to ata_scsi_error()
2005-08-21 19:52 ` Jeff Garzik
@ 2005-08-22 8:23 ` Tejun Heo
0 siblings, 0 replies; 9+ messages in thread
From: Tejun Heo @ 2005-08-22 8:23 UTC (permalink / raw)
To: Jeff Garzik; +Cc: albertcc, linux-ide
On Sun, Aug 21, 2005 at 03:52:27PM -0400, Jeff Garzik wrote:
> Tejun Heo wrote:
> >02_libata_add-locking-to-ata_scsi_error.patch
> >
> > SCSI EH can start before ata_qc_complete is completely
> > complete. so, latter part of ata_qc_complete can run
> > side-by-side with ->eng_timeout(), interfering EH.
> >
> > This patch makes ata_scsi_error() to grab and release host_set
> > lock before invoking ->eng_timeout().
> >
> > Note: host_failed decrementing and eh_cmd_q banging are moved
> > above ->eng_timeout() invocation such that they're done while
> > holding the lock.
>
> I think it's better to follow a path that sets a "EH owns this port now"
> flag inside spinlock,
>
> spin_lock_irqsave()
> ap->in_eh = 1;
> spin_unlock_irqrestore()
>
> and then check that flag in ata_qc_complete()
>
> if (qc->ap->in_eh)
> return; /* do nothing else with command */
>
> so that for the case of a late interrupt, the interrupt handler should
> acknowledge the interrupt, but not actually touch the ata_queued_cmd state.
Hi!
I think above is larger subject than what this patch deals with and
should be dealt with when we revamp EH (implementing in_eh will
certainly involve other changes, I think). So, I think, for short
term, either Albert's patch[1][2] or above one should go in to plug
ATA_QCFLAG_ACTIVE race.
I think if Albert's patch goes in, above patch is not necessary, as
the only race we currently have is ATAPI EH case and clearing
ATA_QCFLAG_ACTIVE is the only thing we do after ->complete_fn() in
such cases.
Thanks.
[1] http://marc.theaimsgroup.com/?l=linux-ide&m=112417360223374&w=2
[2] Well, this referencing thing is neat. ;-)
--
tejun
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH libata-dev-2.6:upstream 01/02] libata: implement ata_poll_qc_complete and use it in polling functions
2005-08-22 8:12 ` Tejun Heo
@ 2005-08-23 5:06 ` Jeff Garzik
0 siblings, 0 replies; 9+ messages in thread
From: Jeff Garzik @ 2005-08-23 5:06 UTC (permalink / raw)
To: Tejun Heo; +Cc: albertcc, linux-ide
applied
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2005-08-23 5:06 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-08-20 14:15 [PATCH libata-dev-2.6:upstream 00/02] libata: race fixes Tejun Heo
2005-08-20 14:15 ` [PATCH libata-dev-2.6:upstream 01/02] libata: implement ata_poll_qc_complete and use it in polling functions Tejun Heo
2005-08-21 19:53 ` Jeff Garzik
2005-08-22 8:12 ` Tejun Heo
2005-08-23 5:06 ` Jeff Garzik
2005-08-20 14:15 ` [PATCH libata-dev-2.6:upstream 02/02] libata: add host_set locking to ata_scsi_error() Tejun Heo
2005-08-21 19:52 ` Jeff Garzik
2005-08-22 8:23 ` Tejun Heo
2005-08-21 20:08 ` a word about ATA interrupts, and SATA (was Re: [PATCH libata-dev-2.6:upstream 00/02] libata: race fixes) Jeff Garzik
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).