linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 02/16] libata-eh-fw: add new EH operations
  2006-04-11 13:42 [PATCHSET 5/9] new EH framework, take 2 Tejun Heo
@ 2006-04-11 13:42 ` Tejun Heo
  2006-04-11 13:42 ` [PATCH 01/16] libata-eh-fw: add flags for new EH Tejun Heo
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 26+ messages in thread
From: Tejun Heo @ 2006-04-11 13:42 UTC (permalink / raw)
  To: jgarzik, alan, axboe, albertcc, lkosewsk, linux-ide; +Cc: Tejun Heo

This patch adds three operations for new EH.  The operations are...

->freeze		freeze the port
->error_handler		new EH
->post_internal_cmd	EH/clean up for internal commands

The old and new EH's have to live together for the time being and
->error_handler will be used as a switch to select which EH mechanism
to use.

Signed-off-by: Tejun Heo <htejun@gmail.com>

---

 include/linux/libata.h |    9 ++++++++-
 1 files changed, 8 insertions(+), 1 deletions(-)

fc162d78699201209b9ecb6250f716aa3d65959d
diff --git a/include/linux/libata.h b/include/linux/libata.h
index d80dbb5..111bae9 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -462,7 +462,14 @@ struct ata_port_operations {
 	void (*qc_prep) (struct ata_queued_cmd *qc);
 	unsigned int (*qc_issue) (struct ata_queued_cmd *qc);
 
-	void (*eng_timeout) (struct ata_port *ap);
+	/* Error handlers.  ->error_handler overrides ->eng_timeout and
+	 * indicates that new-style EH is in place.
+	 */
+	void (*eng_timeout) (struct ata_port *ap); /* obsolete */
+
+	void (*freeze) (struct ata_port *ap);
+	void (*error_handler) (struct ata_port *ap);
+	void (*post_internal_cmd) (struct ata_queued_cmd *qc);
 
 	irqreturn_t (*irq_handler)(int, void *, struct pt_regs *);
 	void (*irq_clear) (struct ata_port *);
-- 
1.2.4



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCHSET 5/9] new EH framework, take 2
@ 2006-04-11 13:42 Tejun Heo
  2006-04-11 13:42 ` [PATCH 02/16] libata-eh-fw: add new EH operations Tejun Heo
                   ` (16 more replies)
  0 siblings, 17 replies; 26+ messages in thread
From: Tejun Heo @ 2006-04-11 13:42 UTC (permalink / raw)
  To: jgarzik, alan, axboe, albertcc, lkosewsk, linux-ide, htejun

Hello, all.

This is the second take of new-EH-framework patchset.  Changes from
the last take[L] are...

* adjusted to removal of @verbose from reset methods

* ata_exec_internal() synchronizes properly while freeing qc

* ata_exec_internal() whine and force ATA_ERR_OTHER if the internal qc
  has failed but neither the interrupt handler nor ->post_internal_cmd
  sets error code.

* SError and IRQ are cleared in ata_std_postreset() to clear error
  conditions and IRQs which can be raised during reset.

* ata_do_reset() now automatically freezes the port before proceeding
  with reset.  So, reset operations are guaranteed to be performed on
  frozen ports.

* ata_port_freeze() and thus ->freeze() method can be called multiple
  times without being thawed inbetween.  ->freeze() must not assume
  that the port is in thawed state.

* ata_scsi_error() retries ->error_handler() if the port is frozen on
  completion.  This is to handle cases where errors occur after
  ->error_handler() revived the port but before EH finishes.  This is
  done in such a way that EH doesn't miss freezing error condition
  under any condition.  Repeat is limited to 5 times.

* EH is inactive until the SCSI host is activated (SHOST_RUNNING).
  This prevents EH from running simultaneously with initial probing.

* various cosmetic changes (comments, new lines, messages...)

This patchset is against...

  upstream (c2a6585296009379e0f4eff39cdcb108b457ebf2)
  + [1] misc-reset-updates patchset (repost)
  + [2] implement-and-use-ata_wait_register patchset (repost)
  + [3] misc-ata_bus_probe-updates patchset
  + [4] sata_sil24-fixes-errata-workaround-and-reset-updates patchset, take 3
  + [5] implement-scsi_eh_schedule patchset
  + [6] fix-scsi_kill_request-busy-count-handling patch

Thanks.

--
tejun

[L] http://article.gmane.org/gmane.linux.ide/9311
[1] http://article.gmane.org/gmane.linux.ide/9495
[2] http://article.gmane.org/gmane.linux.ide/9499
[3] http://article.gmane.org/gmane.linux.ide/9506
[4] http://article.gmane.org/gmane.linux.ide/9516
[5] http://article.gmane.org/gmane.linux.ide/9290
[6] http://article.gmane.org/gmane.linux.ide/9487



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 01/16] libata-eh-fw: add flags for new EH
  2006-04-11 13:42 [PATCHSET 5/9] new EH framework, take 2 Tejun Heo
  2006-04-11 13:42 ` [PATCH 02/16] libata-eh-fw: add new EH operations Tejun Heo
@ 2006-04-11 13:42 ` Tejun Heo
  2006-04-11 13:42 ` [PATCH 09/16] libata-eh-fw: implement new EH scheduling via error completion Tejun Heo
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 26+ messages in thread
From: Tejun Heo @ 2006-04-11 13:42 UTC (permalink / raw)
  To: jgarzik, alan, axboe, albertcc, lkosewsk, linux-ide; +Cc: Tejun Heo

Add new device, port and ac flags to be used for new EH.

Signed-off-by: Tejun Heo <htejun@gmail.com>

---

 include/linux/libata.h |    9 +++++++--
 1 files changed, 7 insertions(+), 2 deletions(-)

83d510e57ad1843999195e7c5b00f7086ebf57d9
diff --git a/include/linux/libata.h b/include/linux/libata.h
index 1c75024..d80dbb5 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -124,6 +124,7 @@ enum {
 	ATA_DFLAG_CFG_MASK	= (1 << 8) - 1,
 
 	ATA_DFLAG_PIO		= (1 << 8), /* device currently in PIO mode */
+	ATA_DFLAG_FAILED	= (1 << 9), /* device has failed */
 
 	ATA_DEV_UNKNOWN		= 0,	/* unknown device */
 	ATA_DEV_ATA		= 1,	/* ATA device */
@@ -151,7 +152,8 @@ enum {
 	ATA_FLAG_FLUSH_PORT_TASK = (1 << 18), /* flush port task */
 
 	ATA_FLAG_DISABLED	= (1 << 19), /* port is disabled, ignore it */
-	ATA_FLAG_SUSPENDED	= (1 << 20), /* port is suspended */
+	ATA_FLAG_FROZEN		= (1 << 20), /* port is frozen */
+	ATA_FLAG_SUSPENDED	= (1 << 21), /* port is suspended (power) */
 
 	/* bits 24:31 of ap->flags are reserved for LLDD specific flags */
 
@@ -161,7 +163,10 @@ enum {
 	ATA_QCFLAG_SINGLE	= (1 << 2), /* no s/g, just a single buffer */
 	ATA_QCFLAG_DMAMAP	= ATA_QCFLAG_SG | ATA_QCFLAG_SINGLE,
 	ATA_QCFLAG_IO		= (1 << 3), /* standard IO command */
-	ATA_QCFLAG_EH_SCHEDULED = (1 << 4), /* EH scheduled */
+	ATA_QCFLAG_TIMEOUT	= (1 << 4), /* cmd has timed out */
+	ATA_QCFLAG_FAILED	= (1 << 5), /* cmd has failed */
+	ATA_QCFLAG_EH_SCHEDULED	= (1 << 6), /* EH scheduled */
+	ATA_QCFLAG_SENSE_VALID	= (1 << 7), /* sense data valid */
 
 	/* host set flags */
 	ATA_HOST_SIMPLEX	= (1 << 0),	/* Host is simplex, one DMA channel per host_set only */
-- 
1.2.4



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 04/16] libata-eh-fw: clear IRQ in ata_std_postreset()
  2006-04-11 13:42 [PATCHSET 5/9] new EH framework, take 2 Tejun Heo
                   ` (6 preceding siblings ...)
  2006-04-11 13:42 ` [PATCH 07/16] libata-eh-fw: implement ata_port_freeze() Tejun Heo
@ 2006-04-11 13:42 ` Tejun Heo
  2006-04-11 13:42 ` [PATCH 03/16] libata-eh-fw: hold host_set lock while finishing internal qc Tejun Heo
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 26+ messages in thread
From: Tejun Heo @ 2006-04-11 13:42 UTC (permalink / raw)
  To: jgarzik, alan, axboe, albertcc, lkosewsk, linux-ide; +Cc: Tejun Heo

Clear IRQ before reenabling in ata_std_postreset().  Reset might have
left pending IRQ.

Signed-off-by: Tejun Heo <htejun@gmail.com>

---

 drivers/scsi/libata-core.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

ac29aadbf4c791f1f1777d45883c5ac0c39cfbed
diff --git a/drivers/scsi/libata-core.c b/drivers/scsi/libata-core.c
index d2a7b6f..e559347 100644
--- a/drivers/scsi/libata-core.c
+++ b/drivers/scsi/libata-core.c
@@ -2538,7 +2538,8 @@ void ata_std_postreset(struct ata_port *
 	if (ap->cbl == ATA_CBL_SATA)
 		sata_print_link_status(ap);
 
-	/* re-enable interrupts */
+	/* clear & re-enable interrupts */
+	ata_chk_status(ap);
 	if (ap->ioaddr.ctl_addr)	/* FIXME: hack. create a hook instead */
 		ata_irq_on(ap);
 
-- 
1.2.4



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 10/16] libata-eh-fw: implement ata_eh_schedule_port()
  2006-04-11 13:42 [PATCHSET 5/9] new EH framework, take 2 Tejun Heo
                   ` (8 preceding siblings ...)
  2006-04-11 13:42 ` [PATCH 03/16] libata-eh-fw: hold host_set lock while finishing internal qc Tejun Heo
@ 2006-04-11 13:42 ` Tejun Heo
  2006-04-11 13:42 ` [PATCH 05/16] libata-eh-fw: clear SError in ata_std_postreset() Tejun Heo
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 26+ messages in thread
From: Tejun Heo @ 2006-04-11 13:42 UTC (permalink / raw)
  To: jgarzik, alan, axboe, albertcc, lkosewsk, linux-ide; +Cc: Tejun Heo

ata_eh_schedule_port() is the gateway to EH and does one of the
followings depending on how it's invoked.

* Without any flag: It simply schedules EH.  EH will kick in after all
  commands are drained (unless another event occurs, of course).

* ATA_EH_ABORT: EH is scheduled and all currently active qc's get
  aborted.  The caller is responsible for making sure the controller
  and devices are in stable state in this case.

* ATA_EH_FREEZE: It does everything ATA_EH_ABORT does and then freezes
  the port; thus, making the port inaccessible until it gets reset.
  This can be used to safely schedule EH when an HSM violation event
  occurs.

Signed-off-by: Tejun Heo <htejun@gmail.com>

---

 drivers/scsi/libata-core.c |    1 +
 drivers/scsi/libata-eh.c   |   53 ++++++++++++++++++++++++++++++++++++++++++++
 include/linux/libata.h     |    5 ++++
 3 files changed, 59 insertions(+), 0 deletions(-)

a46608a256d68fd1e1e68a3e73f0be45c752c6a8
diff --git a/drivers/scsi/libata-core.c b/drivers/scsi/libata-core.c
index 2a5d3f6..b6f6815 100644
--- a/drivers/scsi/libata-core.c
+++ b/drivers/scsi/libata-core.c
@@ -5300,5 +5300,6 @@ EXPORT_SYMBOL_GPL(ata_scsi_device_resume
 
 EXPORT_SYMBOL_GPL(ata_scsi_error);
 EXPORT_SYMBOL_GPL(ata_eng_timeout);
+EXPORT_SYMBOL_GPL(ata_eh_schedule_port);
 EXPORT_SYMBOL_GPL(ata_eh_qc_complete);
 EXPORT_SYMBOL_GPL(ata_eh_qc_retry);
diff --git a/drivers/scsi/libata-eh.c b/drivers/scsi/libata-eh.c
index a1fe14f..e731c04 100644
--- a/drivers/scsi/libata-eh.c
+++ b/drivers/scsi/libata-eh.c
@@ -239,6 +239,59 @@ void ata_eh_schedule_qc(struct ata_queue
 	scsi_eh_schedule_cmd(qc->scsicmd);
 }
 
+/**
+ *	ata_eh_schedule_port - schedule error handling without a qc
+ *	@ap: ATA port to schedule EH for
+ *	@flags: ATA_EH_* flags
+ *
+ *	Schedule error hanlding for the speficied ATA port.  EH will
+ *	kick in as soon as all commands are drained.
+ *
+ *	LOCKING:
+ *	spin_lock_irqsave(host_set lock)
+ */
+void ata_eh_schedule_port(struct ata_port *ap, unsigned int flags)
+{
+	int internal_cmd = ata_tag_internal(ap->active_tag);
+	int i;
+
+	WARN_ON(!ap->ops->error_handler);
+
+	/* SHOST_RUNNING test is to avoid invoking EH during boot probing */
+	if (!internal_cmd && ap->host->shost_state == SHOST_RUNNING)
+		scsi_eh_schedule_host(ap->host);
+
+	/* abort if requested */
+	if (!(flags & (ATA_EH_ABORT | ATA_EH_FREEZE)))
+		return;
+
+	for (i = 0; i < ATA_MAX_QUEUE; i++) {
+		struct ata_queued_cmd *qc = ata_qc_from_tag(ap, i);
+		if (qc) {
+			if (!internal_cmd)
+				ata_eh_schedule_qc(qc);
+			else {
+				qc->flags |= ATA_QCFLAG_FAILED;
+				__ata_qc_complete(qc);
+			}
+		}
+	}
+
+	/* freeze if requested */
+	if (!(flags & ATA_EH_FREEZE))
+		return;
+
+	/* Timeout handler might try to freeze an already frozen port
+	 * if it races against interrupt handler or another timeout.
+	 * Such conditions should be rare.  Whine.
+	 */
+	if (ap->flags & ATA_FLAG_FROZEN && ata_ratelimit())
+		printk(KERN_INFO "ata%u: ata_eh_schedule_port invoked on "
+		       "a frozen port\n", ap->id);
+
+	ata_port_freeze(ap);
+}
+
 static void ata_eh_scsidone(struct scsi_cmnd *scmd)
 {
 	/* nada */
diff --git a/include/linux/libata.h b/include/linux/libata.h
index 8aeead3..47fbd68 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -222,6 +222,10 @@ enum {
 	ATA_PORT_PRIMARY	= (1 << 0),
 	ATA_PORT_SECONDARY	= (1 << 1),
 
+	/* flags for ata_eh_shduled_port */
+	ATA_EH_ABORT		= (1 << 0), /* abort all active commands */
+	ATA_EH_FREEZE		= (1 << 1), /* freeze port (implies ABORT) */
+
 	/* how hard are we gonna try to probe/recover devices */
 	ATA_PROBE_MAX_TRIES	= 3,
 };
@@ -653,6 +657,7 @@ extern unsigned long ata_pci_default_fil
  */
 extern int ata_scsi_error(struct Scsi_Host *host);
 extern void ata_eng_timeout(struct ata_port *ap);
+extern void ata_eh_schedule_port(struct ata_port *ap, unsigned int flags);
 extern void ata_eh_qc_complete(struct ata_queued_cmd *qc);
 extern void ata_eh_qc_retry(struct ata_queued_cmd *qc);
 
-- 
1.2.4



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 07/16] libata-eh-fw: implement ata_port_freeze()
  2006-04-11 13:42 [PATCHSET 5/9] new EH framework, take 2 Tejun Heo
                   ` (5 preceding siblings ...)
  2006-04-11 13:42 ` [PATCH 11/16] libata-eh-fw: implement new EH scheduling via timeout Tejun Heo
@ 2006-04-11 13:42 ` Tejun Heo
  2006-04-11 13:42 ` [PATCH 04/16] libata-eh-fw: clear IRQ in ata_std_postreset() Tejun Heo
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 26+ messages in thread
From: Tejun Heo @ 2006-04-11 13:42 UTC (permalink / raw)
  To: jgarzik, alan, axboe, albertcc, lkosewsk, linux-ide; +Cc: Tejun Heo

Freezing is performed atomic w.r.t. host_set->lock and once frozen
LLDD is not allowed to access the port or any qc on it.  Also, libata
makes sure that no new qc gets issued to a frozen port.

A frozen port is thawed after a reset operation completes
successfully, so reset methods must do its job while the port is
frozen.  During initialization all ports get frozen before requesting
IRQ, so reset methods are always invoked on a frozen port.

Optional ->freeze operation notifies LLDD that the port is being
frozen.  LLDD can disable hardware interrupt in this callback if the
controller's IRQ mask can be changed dynamically.  If the controller
doesn't allow such operation, LLDD can check for frozen state in the
interrupt handler and ack/clear interrupts unconditionally while
frozen.

Signed-off-by: Tejun Heo <htejun@gmail.com>

---

 drivers/scsi/libata-core.c |   53 ++++++++++++++++++++++++++++++++++++++++++++
 drivers/scsi/libata.h      |    1 +
 2 files changed, 54 insertions(+), 0 deletions(-)

2e00d608564e9ac28cdbc9cb5421cf2078a9c558
diff --git a/drivers/scsi/libata-core.c b/drivers/scsi/libata-core.c
index 7a7297d..e8566bb 100644
--- a/drivers/scsi/libata-core.c
+++ b/drivers/scsi/libata-core.c
@@ -991,6 +991,12 @@ unsigned ata_exec_internal(struct ata_po
 
 	spin_lock_irqsave(&ap->host_set->lock, flags);
 
+	/* no internal command while frozen */
+	if (ap->flags & ATA_FLAG_FROZEN) {
+		spin_unlock_irqrestore(&ap->host_set->lock, flags);
+		return AC_ERR_SYSTEM;
+	}
+
 	/* initialize internal qc */
 
 	/* XXX: Tag 0 is used for drivers with legacy EH as some
@@ -1668,6 +1674,37 @@ void ata_port_disable(struct ata_port *a
 }
 
 /**
+ *	ata_port_freeze - freeze port until reset thaws it
+ *	@ap: ATA port to freeze
+ *
+ *	This function is called when HSM violation or some other
+ *	condition disrupts normal operation of the port.  Frozen port
+ *	is not allowed to perform any operation until a reset puts it
+ *	into known state and thaws it.
+ *
+ *	ap->ops->freeze() callback can be used for freezing the port
+ *	hardware-wise (e.g. mask interrupt / stop DMA engine).  If a
+ *	port cannot be frozen hardware-wise, the interrupt handler
+ *	must ack and clear interrupts unconditionally while the port
+ *	is frozen.
+ *
+ *	LOCKING:
+ *	spin_lock_irqsave(host_set lock)
+ */
+void ata_port_freeze(struct ata_port *ap)
+{
+	if (!ap->ops->error_handler)
+		return;
+
+	if (ap->ops->freeze)
+		ap->ops->freeze(ap);
+
+	ap->flags |= ATA_FLAG_FROZEN;
+
+	DPRINTK("ata%u port frozen\n", ap->id);
+}
+
+/**
  *	ata_down_sata_spd_limit - adjust SATA spd limit downward
  *	@ap: Port to adjust SATA spd limit for
  *
@@ -2623,15 +2660,26 @@ int ata_std_probe_reset(struct ata_port 
 int ata_do_reset(struct ata_port *ap, ata_reset_fn_t reset,
 		 ata_postreset_fn_t postreset, unsigned int *classes)
 {
+	unsigned long flags;
 	int i, rc;
 
 	for (i = 0; i < ATA_MAX_DEVICES; i++)
 		classes[i] = ATA_DEV_UNKNOWN;
 
+	/* make sure it's frozen */
+	spin_lock_irqsave(&ap->host_set->lock, flags);
+	ata_port_freeze(ap);
+	spin_unlock_irqrestore(&ap->host_set->lock, flags);
+
 	rc = reset(ap, classes);
 	if (rc)
 		return rc;
 
+	/* a successful reset thaws the port */
+	spin_lock_irqsave(&ap->host_set->lock, flags);
+	ap->flags &= ~ATA_FLAG_FROZEN;
+	spin_unlock_irqrestore(&ap->host_set->lock, flags);
+
 	/* If any class isn't ATA_DEV_UNKNOWN, consider classification
 	 * is complete and convert all ATA_DEV_UNKNOWN to
 	 * ATA_DEV_NONE.
@@ -4056,6 +4104,10 @@ static struct ata_queued_cmd *ata_qc_new
 	struct ata_queued_cmd *qc = NULL;
 	unsigned int i;
 
+	/* no command while frozen */
+	if (unlikely(ap->flags & ATA_FLAG_FROZEN))
+		return NULL;
+
 	/* the last tag is reserved for internal command. */
 	for (i = 0; i < ATA_MAX_QUEUE - 1; i++)
 		if (!test_and_set_bit(i, &ap->qactive)) {
@@ -4768,6 +4820,7 @@ int ata_device_add(const struct ata_prob
 
 		ata_chk_status(ap);
 		host_set->ops->irq_clear(ap);
+		ata_port_freeze(ap);
 		count++;
 	}
 
diff --git a/drivers/scsi/libata.h b/drivers/scsi/libata.h
index 4299aa1..3fd31f9 100644
--- a/drivers/scsi/libata.h
+++ b/drivers/scsi/libata.h
@@ -51,6 +51,7 @@ extern void ata_port_flush_task(struct a
 extern unsigned ata_exec_internal(struct ata_port *ap, struct ata_device *dev,
 				  struct ata_taskfile *tf, const u8 *cdb,
 				  int dma_dir, void *buf, unsigned int buflen);
+extern void ata_port_freeze(struct ata_port *ap);
 extern int ata_down_sata_spd_limit(struct ata_port *ap);
 extern int ata_set_sata_spd_needed(struct ata_port *ap);
 extern int ata_down_xfermask_limit(struct ata_port *ap, struct ata_device *dev,
-- 
1.2.4



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 06/16] libata-eh-fw: use special reserved tag and qc for internal commands
  2006-04-11 13:42 [PATCHSET 5/9] new EH framework, take 2 Tejun Heo
                   ` (10 preceding siblings ...)
  2006-04-11 13:42 ` [PATCH 05/16] libata-eh-fw: clear SError in ata_std_postreset() Tejun Heo
@ 2006-04-11 13:42 ` Tejun Heo
  2006-04-11 13:42 ` [PATCH 15/16] libata-eh-fw: update SCSI command completion path for new EH Tejun Heo
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 26+ messages in thread
From: Tejun Heo @ 2006-04-11 13:42 UTC (permalink / raw)
  To: jgarzik, alan, axboe, albertcc, lkosewsk, linux-ide; +Cc: Tejun Heo

New EH may issue internal commands to recover from error while failed
qc's are still hanging around.  To allow such usage, reserve tag
ATA_MAX_QUEUE-1 for internal command.  This also makes it easy to tell
whether a qc is for internal command or not.  ata_tag_internal() test
implements this test.

To avoid breaking existing drivers, ata_exec_internal() uses
ATA_TAG_INTERNAL only for drivers which implement ->error_handler.
For drivers on old EH, tag 0 is used as before.  Note that this makes
ata_tag_internal() test valid only when ->error_handler is
implemented.  This is okay as drivers on old EH should not and does
not have any reason to use ata_tag_internal().

Signed-off-by: Tejun Heo <htejun@gmail.com>

---

 drivers/scsi/libata-core.c |   33 +++++++++++++++++++++++++++++----
 include/linux/libata.h     |    9 ++++++++-
 2 files changed, 37 insertions(+), 5 deletions(-)

42932fc5ad9150819871b9fa01cae6dceaf6886c
diff --git a/drivers/scsi/libata-core.c b/drivers/scsi/libata-core.c
index 7907f9a..7a7297d 100644
--- a/drivers/scsi/libata-core.c
+++ b/drivers/scsi/libata-core.c
@@ -978,22 +978,45 @@ void ata_qc_complete_internal(struct ata
  *	LOCKING:
  *	None.  Should be called with kernel context, might sleep.
  */
-
 unsigned ata_exec_internal(struct ata_port *ap, struct ata_device *dev,
 			   struct ata_taskfile *tf, const u8 *cdb,
 			   int dma_dir, void *buf, unsigned int buflen)
 {
 	u8 command = tf->command;
 	struct ata_queued_cmd *qc;
+	unsigned int tag, preempted_tag;
 	DECLARE_COMPLETION(wait);
 	unsigned long flags;
 	unsigned int err_mask;
 
 	spin_lock_irqsave(&ap->host_set->lock, flags);
 
-	qc = ata_qc_new_init(ap, dev);
-	BUG_ON(qc == NULL);
+	/* initialize internal qc */
+
+	/* XXX: Tag 0 is used for drivers with legacy EH as some
+	 * drivers choke if any other tag is given.  This breaks
+	 * ata_tag_internal() test for those drivers.  Don't use new
+	 * EH stuff without converting to it.
+	 */
+	if (ap->ops->error_handler)
+		tag = ATA_TAG_INTERNAL;
+	else
+		tag = 0;
+
+	if (test_and_set_bit(tag, &ap->qactive))
+		BUG();
+	qc = ata_qc_from_tag(ap, tag);
+
+	qc->tag = tag;
+	qc->scsicmd = NULL;
+	qc->ap = ap;
+	qc->dev = dev;
+	ata_qc_reinit(qc);
+
+	preempted_tag = ap->active_tag;
+	ap->active_tag = ATA_TAG_POISON;
 
+	/* prepare & issue qc */
 	qc->tf = *tf;
 	if (cdb)
 		memcpy(qc->cdb, cdb, ATAPI_CDB_LEN);
@@ -1037,6 +1060,7 @@ unsigned ata_exec_internal(struct ata_po
 	err_mask = qc->err_mask;
 
 	ata_qc_free(qc);
+	ap->active_tag = preempted_tag;
 
 	/* XXX - Some LLDDs (sata_mv) disable port on command failure.
 	 * Until those drivers are fixed, we detect the condition
@@ -4032,7 +4056,8 @@ static struct ata_queued_cmd *ata_qc_new
 	struct ata_queued_cmd *qc = NULL;
 	unsigned int i;
 
-	for (i = 0; i < ATA_MAX_QUEUE; i++)
+	/* the last tag is reserved for internal command. */
+	for (i = 0; i < ATA_MAX_QUEUE - 1; i++)
 		if (!test_and_set_bit(i, &ap->qactive)) {
 			qc = ata_qc_from_tag(ap, i);
 			break;
diff --git a/include/linux/libata.h b/include/linux/libata.h
index 111bae9..e814f38 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -107,7 +107,9 @@ enum {
 	LIBATA_MAX_PRD		= ATA_MAX_PRD / 2,
 	ATA_MAX_PORTS		= 8,
 	ATA_DEF_QUEUE		= 1,
-	ATA_MAX_QUEUE		= 1,
+	/* tag ATA_MAX_QUEUE - 1 is reserved for internal commands */
+	ATA_MAX_QUEUE		= 2,
+	ATA_TAG_INTERNAL	= ATA_MAX_QUEUE - 1,
 	ATA_MAX_SECTORS		= 200,	/* FIXME */
 	ATA_MAX_BUS		= 2,
 	ATA_DEF_BUSY_WAIT	= 10000,
@@ -697,6 +699,11 @@ static inline unsigned int ata_tag_valid
 	return (tag < ATA_MAX_QUEUE) ? 1 : 0;
 }
 
+static inline unsigned int ata_tag_internal(unsigned int tag)
+{
+	return tag == ATA_MAX_QUEUE - 1;
+}
+
 static inline unsigned int ata_class_enabled(unsigned int class)
 {
 	return class == ATA_DEV_ATA || class == ATA_DEV_ATAPI;
-- 
1.2.4



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 09/16] libata-eh-fw: implement new EH scheduling via error completion
  2006-04-11 13:42 [PATCHSET 5/9] new EH framework, take 2 Tejun Heo
  2006-04-11 13:42 ` [PATCH 02/16] libata-eh-fw: add new EH operations Tejun Heo
  2006-04-11 13:42 ` [PATCH 01/16] libata-eh-fw: add flags for new EH Tejun Heo
@ 2006-04-11 13:42 ` Tejun Heo
  2006-04-11 13:42 ` [PATCH 12/16] libata-eh-fw: implement new EH scheduling from PIO Tejun Heo
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 26+ messages in thread
From: Tejun Heo @ 2006-04-11 13:42 UTC (permalink / raw)
  To: jgarzik, alan, axboe, albertcc, lkosewsk, linux-ide; +Cc: Tejun Heo

There are several ways a qc can get schedule for EH in new EH.  This
patch implements one of them - completing qc with non-zero
qc->err_mask.  ALL normal qc's with set err_mask are to be examined by
EH.  There's no sideway.

New EH schedules a qc for EH from completion iff ->error_handler is
implemented, qc->err_mask is non-zero and the command is not an
internal command (internal cmd is handled via ->post_internal_cmd).
The EH scheduling itself is performed by asking SCSI midlayer to
schedule EH for the specified scmd.

Note that in the new EH, there is no way a qc can hit
ata_qc_complete() twice or normal path completes a qc which has been
scheduled for EH by other entities.  The ownership is clear and must
be followed.  Violation will trigger WARN_ON().

For drivers implementing old-EH, nothing changes.  As this change
makes ata_qc_complete() rather large, it's not inlined anymore and
__ata_qc_complete() is exported to other parts of libata for later
use.

Signed-off-by: Tejun Heo <htejun@gmail.com>

---

 drivers/scsi/libata-core.c |   52 +++++++++++++++++++++++++++++++++++++++++++-
 drivers/scsi/libata-eh.c   |   29 +++++++++++++++++++++++++
 drivers/scsi/libata.h      |    2 ++
 include/linux/libata.h     |   21 +-----------------
 4 files changed, 83 insertions(+), 21 deletions(-)

62f9790537d3ea763d911e48f23e66bf21d7e445
diff --git a/drivers/scsi/libata-core.c b/drivers/scsi/libata-core.c
index f019d5d..2a5d3f6 100644
--- a/drivers/scsi/libata-core.c
+++ b/drivers/scsi/libata-core.c
@@ -4192,6 +4192,56 @@ void __ata_qc_complete(struct ata_queued
 	qc->complete_fn(qc);
 }
 
+/**
+ *	ata_qc_complete - Complete an active ATA command
+ *	@qc: Command to complete
+ *	@err_mask: ATA Status register contents
+ *
+ *	Indicate to the mid and upper layers that an ATA
+ *	command has completed, with either an ok or not-ok status.
+ *
+ *	LOCKING:
+ *	spin_lock_irqsave(host_set lock)
+ */
+void ata_qc_complete(struct ata_queued_cmd *qc)
+{
+	struct ata_port *ap = qc->ap;
+
+	/* XXX: New EH and old EH use different mechanisms to
+	 * synchronize EH with regular execution path.
+	 *
+	 * When a qc fails, it's marked with EH_SCHEDULED.  In new EH,
+	 * regular execution path is responsible for not accessing a
+	 * failed qc.  libata core enforces the rule by returning NULL
+	 * from ata_qc_from_tag() for failed qcs.
+	 *
+	 * Old EH depends on ata_qc_complete() nullifying completion
+	 * requests if EH_SCHEDULED is set.  Old EH does not
+	 * synchronize with interrupt handler.  Only PIO task is taken
+	 * care of.
+	 */
+	if (ap->ops->error_handler) {
+		WARN_ON(qc->flags & ATA_QCFLAG_EH_SCHEDULED ||
+			ap->flags & ATA_FLAG_FROZEN);
+
+		if (unlikely(qc->err_mask)) {
+			/* ATA_QCFLAG_FAILED is set for all failed
+			 * qc's including internal qc.
+			 */
+			qc->flags |= ATA_QCFLAG_FAILED;
+			if (!ata_tag_internal(qc->tag)) {
+				ata_eh_schedule_qc(qc);
+				return;
+			}
+		}
+		__ata_qc_complete(qc);
+	} else {
+		if (qc->flags & ATA_QCFLAG_EH_SCHEDULED)
+			return;
+		__ata_qc_complete(qc);
+	}
+}
+
 static inline int ata_should_dma_map(struct ata_queued_cmd *qc)
 {
 	struct ata_port *ap = qc->ap;
@@ -5177,7 +5227,7 @@ EXPORT_SYMBOL_GPL(ata_device_add);
 EXPORT_SYMBOL_GPL(ata_host_set_remove);
 EXPORT_SYMBOL_GPL(ata_sg_init);
 EXPORT_SYMBOL_GPL(ata_sg_init_one);
-EXPORT_SYMBOL_GPL(__ata_qc_complete);
+EXPORT_SYMBOL_GPL(ata_qc_complete);
 EXPORT_SYMBOL_GPL(ata_qc_issue_prot);
 EXPORT_SYMBOL_GPL(ata_tf_load);
 EXPORT_SYMBOL_GPL(ata_tf_read);
diff --git a/drivers/scsi/libata-eh.c b/drivers/scsi/libata-eh.c
index e73f561..a1fe14f 100644
--- a/drivers/scsi/libata-eh.c
+++ b/drivers/scsi/libata-eh.c
@@ -210,6 +210,35 @@ void ata_eng_timeout(struct ata_port *ap
 	DPRINTK("EXIT\n");
 }
 
+/**
+ *	ata_eh_schedule_qc - schedule qc for error handling
+ *	@qc: command to schedule error handling for
+ *
+ *	Schedule error handling for the specified qc.  EH will kick in
+ *	as soon as other commands are drained.
+ *
+ *	LOCKING:
+ *	spin_lock_irqsave(host_set lock)
+ */
+void ata_eh_schedule_qc(struct ata_queued_cmd *qc)
+{
+	struct ata_port *ap = qc->ap;
+
+	WARN_ON(!ap->ops->error_handler);
+	WARN_ON(qc->flags & ATA_QCFLAG_EH_SCHEDULED ||
+		ap->flags & ATA_FLAG_FROZEN);
+
+	qc->flags |= ATA_QCFLAG_FAILED | ATA_QCFLAG_EH_SCHEDULED;
+	qc->dev->flags |= ATA_DFLAG_FAILED;
+
+	/* The following will fail if timeout has already expired.
+	 * ata_scsi_timed_out() will put @qc onto EH.  Note that
+	 * EH_SCHEDULED flag is unconditionally set after this
+	 * function completes.
+	 */
+	scsi_eh_schedule_cmd(qc->scsicmd);
+}
+
 static void ata_eh_scsidone(struct scsi_cmnd *scmd)
 {
 	/* nada */
diff --git a/drivers/scsi/libata.h b/drivers/scsi/libata.h
index 3fd31f9..b522aaa 100644
--- a/drivers/scsi/libata.h
+++ b/drivers/scsi/libata.h
@@ -61,6 +61,7 @@ extern int ata_do_reset(struct ata_port 
 			ata_postreset_fn_t postreset, unsigned int *classes);
 extern void ata_qc_free(struct ata_queued_cmd *qc);
 extern void ata_qc_issue(struct ata_queued_cmd *qc);
+extern void __ata_qc_complete(struct ata_queued_cmd *qc);
 extern int ata_check_atapi_dma(struct ata_queued_cmd *qc);
 extern void ata_dev_select(struct ata_port *ap, unsigned int device,
                            unsigned int wait, unsigned int can_sleep);
@@ -104,5 +105,6 @@ extern void ata_scsi_rbuf_fill(struct at
 
 /* libata-eh.c */
 extern enum scsi_eh_timer_return ata_scsi_timed_out(struct scsi_cmnd *cmd);
+extern void ata_eh_schedule_qc(struct ata_queued_cmd *qc);
 
 #endif /* __LIBATA_H__ */
diff --git a/include/linux/libata.h b/include/linux/libata.h
index e059cd3..8aeead3 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -593,7 +593,7 @@ extern void ata_bmdma_start (struct ata_
 extern void ata_bmdma_stop(struct ata_queued_cmd *qc);
 extern u8   ata_bmdma_status(struct ata_port *ap);
 extern void ata_bmdma_irq_clear(struct ata_port *ap);
-extern void __ata_qc_complete(struct ata_queued_cmd *qc);
+extern void ata_qc_complete(struct ata_queued_cmd *qc);
 extern void ata_scsi_simulate(struct ata_port *ap, struct ata_device *dev,
 			      struct scsi_cmnd *cmd,
 			      void (*done)(struct scsi_cmnd *));
@@ -859,25 +859,6 @@ static inline void ata_qc_reinit(struct 
 }
 
 /**
- *	ata_qc_complete - Complete an active ATA command
- *	@qc: Command to complete
- *	@err_mask: ATA Status register contents
- *
- *	Indicate to the mid and upper layers that an ATA
- *	command has completed, with either an ok or not-ok status.
- *
- *	LOCKING:
- *	spin_lock_irqsave(host_set lock)
- */
-static inline void ata_qc_complete(struct ata_queued_cmd *qc)
-{
-	if (unlikely(qc->flags & ATA_QCFLAG_EH_SCHEDULED))
-		return;
-
-	__ata_qc_complete(qc);
-}
-
-/**
  *	ata_irq_on - Enable interrupts on a port.
  *	@ap: Port on which interrupts are enabled.
  *
-- 
1.2.4



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 03/16] libata-eh-fw: hold host_set lock while finishing internal qc
  2006-04-11 13:42 [PATCHSET 5/9] new EH framework, take 2 Tejun Heo
                   ` (7 preceding siblings ...)
  2006-04-11 13:42 ` [PATCH 04/16] libata-eh-fw: clear IRQ in ata_std_postreset() Tejun Heo
@ 2006-04-11 13:42 ` Tejun Heo
  2006-04-11 13:42 ` [PATCH 10/16] libata-eh-fw: implement ata_eh_schedule_port() Tejun Heo
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 26+ messages in thread
From: Tejun Heo @ 2006-04-11 13:42 UTC (permalink / raw)
  To: jgarzik, alan, axboe, albertcc, lkosewsk, linux-ide; +Cc: Tejun Heo

Hold host_set lock while finishing internal qc.

Signed-off-by: Tejun Heo <htejun@gmail.com>

---

 drivers/scsi/libata-core.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

ba608e38dd436e9129b6a9c99b9fa1f77a79a024
diff --git a/drivers/scsi/libata-core.c b/drivers/scsi/libata-core.c
index 8bf8cc6..d2a7b6f 100644
--- a/drivers/scsi/libata-core.c
+++ b/drivers/scsi/libata-core.c
@@ -1031,6 +1031,8 @@ unsigned ata_exec_internal(struct ata_po
 		spin_unlock_irqrestore(&ap->host_set->lock, flags);
 	}
 
+	spin_lock_irqsave(&ap->host_set->lock, flags);
+
 	*tf = qc->tf;
 	err_mask = qc->err_mask;
 
@@ -1052,6 +1054,8 @@ unsigned ata_exec_internal(struct ata_po
 		ata_port_probe(ap);
 	}
 
+	spin_unlock_irqrestore(&ap->host_set->lock, flags);
+
 	return err_mask;
 }
 
-- 
1.2.4



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 11/16] libata-eh-fw: implement new EH scheduling via timeout
  2006-04-11 13:42 [PATCHSET 5/9] new EH framework, take 2 Tejun Heo
                   ` (4 preceding siblings ...)
  2006-04-11 13:42 ` [PATCH 08/16] libata-eh-fw: update ata_qc_from_tag() to enforce normal/EH qc ownership Tejun Heo
@ 2006-04-11 13:42 ` Tejun Heo
  2006-04-12 22:36   ` Jeff Garzik
  2006-04-11 13:42 ` [PATCH 07/16] libata-eh-fw: implement ata_port_freeze() Tejun Heo
                   ` (10 subsequent siblings)
  16 siblings, 1 reply; 26+ messages in thread
From: Tejun Heo @ 2006-04-11 13:42 UTC (permalink / raw)
  To: jgarzik, alan, axboe, albertcc, lkosewsk, linux-ide; +Cc: Tejun Heo

Implement new EH scheduling from timeout.  ata_scsi_timedout() also
takes care of the race condition in which scsi_eh_schedule_qc() sets
ATA_QCFLAG_EH_SCHEDULED but fails to acutally schedule EH for the qc
because it loses to timeout.

A timeout is HSM violation condition.  New EH assumes that on a
timeout the state of the controller and devices are unknown and
dangerous.  So, all active commands are aborted and the port is
frozen.  Note that commands which get aborted this way don't have its
qc->err_mask set and its retries count will be compensated.

Signed-off-by: Tejun Heo <htejun@gmail.com>

---

 drivers/scsi/libata-eh.c |   33 ++++++++++++++++++++++++++++-----
 1 files changed, 28 insertions(+), 5 deletions(-)

0f79b4bddad85f3b2a55a46f316721b5492fd8ba
diff --git a/drivers/scsi/libata-eh.c b/drivers/scsi/libata-eh.c
index e731c04..97ec527 100644
--- a/drivers/scsi/libata-eh.c
+++ b/drivers/scsi/libata-eh.c
@@ -66,19 +66,42 @@ enum scsi_eh_timer_return ata_scsi_timed
 	struct Scsi_Host *host = cmd->device->host;
 	struct ata_port *ap = (struct ata_port *) &host->hostdata[0];
 	unsigned long flags;
+	int i;
 	struct ata_queued_cmd *qc;
 	enum scsi_eh_timer_return ret = EH_HANDLED;
 
 	DPRINTK("ENTER\n");
 
 	spin_lock_irqsave(&ap->host_set->lock, flags);
-	qc = ata_qc_from_tag(ap, ap->active_tag);
-	if (qc) {
-		WARN_ON(qc->scsicmd != cmd);
-		qc->flags |= ATA_QCFLAG_EH_SCHEDULED;
-		qc->err_mask |= AC_ERR_TIMEOUT;
+
+	for (i = 0; i < ATA_MAX_QUEUE; i++) {
+		/* If ata_eh_scheduled_qc() raced with us and lost,
+		 * EH_SCHEDULED flag would already be set, so we
+		 * cannot use ata_qc_from_tag() here.
+		 */
+		qc = __ata_qc_from_tag(ap, i);
+		if (qc && qc->flags & ATA_QCFLAG_ACTIVE && qc->scsicmd == cmd)
+			break;
+	}
+
+	if (i < ATA_MAX_QUEUE) {
+		/* qc->err_mask belongs to the command owner, so it
+		 * cannot be altered here.  Use ATA_QCFLAG_TIMEOUT
+		 * instead.  EH is responsible for merging this flag
+		 * into err_mask after claiming qc ownership.
+		 */
+		qc->flags |= ATA_QCFLAG_TIMEOUT | ATA_QCFLAG_EH_SCHEDULED;
+		qc->dev->flags |= ATA_DFLAG_FAILED;
+
+		if (ap->ops->error_handler)
+			ata_eh_schedule_port(ap, ATA_EH_FREEZE);
+		else
+			/* old EH, do what it used to do */
+			qc->err_mask |= AC_ERR_TIMEOUT;
+
 		ret = EH_NOT_HANDLED;
 	}
+
 	spin_unlock_irqrestore(&ap->host_set->lock, flags);
 
 	DPRINTK("EXIT, ret=%d\n", ret);
-- 
1.2.4



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 08/16] libata-eh-fw: update ata_qc_from_tag() to enforce normal/EH qc ownership
  2006-04-11 13:42 [PATCHSET 5/9] new EH framework, take 2 Tejun Heo
                   ` (3 preceding siblings ...)
  2006-04-11 13:42 ` [PATCH 12/16] libata-eh-fw: implement new EH scheduling from PIO Tejun Heo
@ 2006-04-11 13:42 ` Tejun Heo
  2006-04-11 13:42 ` [PATCH 11/16] libata-eh-fw: implement new EH scheduling via timeout Tejun Heo
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 26+ messages in thread
From: Tejun Heo @ 2006-04-11 13:42 UTC (permalink / raw)
  To: jgarzik, alan, axboe, albertcc, lkosewsk, linux-ide; +Cc: Tejun Heo

New EH framework has clear distinction about who owns a qc.  Every qc
starts owned by normal execution path - PIO, interrupt or whatever.
When an exception condition occurs which affects the qc, the qc gets
scheduled for EH.  Note that some events (say, link lost and regained,
command timeout) may schedule qc's which are not directly related but
could have been affected for EH too.  Scheduling for EH is atomic
w.r.t. ap->host_set->lock and once schedule for EH, normal execution
path is not allowed to access the qc in whatever way.  (PIO
synchronization acts a bit different and will be dealt with later)

This patch make ata_qc_from_tag() check whether a qc is active and
owned by normal path before returning it.  If conditions don't match,
NULL is returned and thus access to the qc is denied.
__ata_qc_from_tag() is the original ata_qc_from_tag() and is used by
libata core/EH layers to access inactive/failed qc's.

This change is applied only if the associated LLDD implements new EH
as indicated by non-NULL ->error_handler

Signed-off-by: Tejun Heo <htejun@gmail.com>

---

 drivers/scsi/libata-core.c |    4 ++--
 include/linux/libata.h     |   19 +++++++++++++++++--
 2 files changed, 19 insertions(+), 4 deletions(-)

efaa885fb5b63f1235ef0db268ed283074d7fd9b
diff --git a/drivers/scsi/libata-core.c b/drivers/scsi/libata-core.c
index e8566bb..f019d5d 100644
--- a/drivers/scsi/libata-core.c
+++ b/drivers/scsi/libata-core.c
@@ -1011,7 +1011,7 @@ unsigned ata_exec_internal(struct ata_po
 
 	if (test_and_set_bit(tag, &ap->qactive))
 		BUG();
-	qc = ata_qc_from_tag(ap, tag);
+	qc = __ata_qc_from_tag(ap, tag);
 
 	qc->tag = tag;
 	qc->scsicmd = NULL;
@@ -4111,7 +4111,7 @@ static struct ata_queued_cmd *ata_qc_new
 	/* the last tag is reserved for internal command. */
 	for (i = 0; i < ATA_MAX_QUEUE - 1; i++)
 		if (!test_and_set_bit(i, &ap->qactive)) {
-			qc = ata_qc_from_tag(ap, i);
+			qc = __ata_qc_from_tag(ap, i);
 			break;
 		}
 
diff --git a/include/linux/libata.h b/include/linux/libata.h
index e814f38..e059cd3 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -812,14 +812,29 @@ static inline void ata_qc_set_polling(st
 	qc->tf.ctl |= ATA_NIEN;
 }
 
-static inline struct ata_queued_cmd *ata_qc_from_tag (struct ata_port *ap,
-						      unsigned int tag)
+static inline struct ata_queued_cmd *__ata_qc_from_tag(struct ata_port *ap,
+						       unsigned int tag)
 {
 	if (likely(ata_tag_valid(tag)))
 		return &ap->qcmd[tag];
 	return NULL;
 }
 
+static inline struct ata_queued_cmd *ata_qc_from_tag(struct ata_port *ap,
+						     unsigned int tag)
+{
+	struct ata_queued_cmd *qc = __ata_qc_from_tag(ap, tag);
+
+	if (unlikely(!qc) || !ap->ops->error_handler)
+		return qc;
+
+	if ((qc->flags & (ATA_QCFLAG_ACTIVE |
+			  ATA_QCFLAG_EH_SCHEDULED)) == ATA_QCFLAG_ACTIVE)
+		return qc;
+
+	return NULL;
+}
+
 static inline void ata_tf_init(struct ata_port *ap, struct ata_taskfile *tf, unsigned int device)
 {
 	memset(tf, 0, sizeof(*tf));
-- 
1.2.4



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 05/16] libata-eh-fw: clear SError in ata_std_postreset()
  2006-04-11 13:42 [PATCHSET 5/9] new EH framework, take 2 Tejun Heo
                   ` (9 preceding siblings ...)
  2006-04-11 13:42 ` [PATCH 10/16] libata-eh-fw: implement ata_eh_schedule_port() Tejun Heo
@ 2006-04-11 13:42 ` Tejun Heo
  2006-04-11 13:42 ` [PATCH 06/16] libata-eh-fw: use special reserved tag and qc for internal commands Tejun Heo
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 26+ messages in thread
From: Tejun Heo @ 2006-04-11 13:42 UTC (permalink / raw)
  To: jgarzik, alan, axboe, albertcc, lkosewsk, linux-ide; +Cc: Tejun Heo

Clear SError in ata_std_postreset().  This is to clear SError bits
which get set during reset.

Signed-off-by: Tejun Heo <htejun@gmail.com>

---

 drivers/scsi/libata-core.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

23283e46f471fc36974603136f72212d26ac76c1
diff --git a/drivers/scsi/libata-core.c b/drivers/scsi/libata-core.c
index e559347..7907f9a 100644
--- a/drivers/scsi/libata-core.c
+++ b/drivers/scsi/libata-core.c
@@ -2538,6 +2538,10 @@ void ata_std_postreset(struct ata_port *
 	if (ap->cbl == ATA_CBL_SATA)
 		sata_print_link_status(ap);
 
+	/* clear SError */
+	if (ap->cbl == ATA_CBL_SATA && ap->ops->scr_read)
+		scr_write(ap, SCR_ERROR, scr_read(ap, SCR_ERROR));
+
 	/* clear & re-enable interrupts */
 	ata_chk_status(ap);
 	if (ap->ioaddr.ctl_addr)	/* FIXME: hack. create a hook instead */
-- 
1.2.4



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 12/16] libata-eh-fw: implement new EH scheduling from PIO
  2006-04-11 13:42 [PATCHSET 5/9] new EH framework, take 2 Tejun Heo
                   ` (2 preceding siblings ...)
  2006-04-11 13:42 ` [PATCH 09/16] libata-eh-fw: implement new EH scheduling via error completion Tejun Heo
@ 2006-04-11 13:42 ` Tejun Heo
  2006-04-11 13:42 ` [PATCH 08/16] libata-eh-fw: update ata_qc_from_tag() to enforce normal/EH qc ownership Tejun Heo
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 26+ messages in thread
From: Tejun Heo @ 2006-04-11 13:42 UTC (permalink / raw)
  To: jgarzik, alan, axboe, albertcc, lkosewsk, linux-ide; +Cc: Tejun Heo

PIO executes without holding host_set lock, so it cannot be
synchronized using the same mechanism as interrupt driven execution.
port_task framework makes sure that EH is not entered until PIO task
is flushed, so PIO task can be sure the qc in progress won't go away
underneath it.  One thing it cannot be sure of is whether the qc has
already been scheduled for EH by another exception condition while
host_set lock was released.

This patch makes ata_poll_qc-complete() handle such conditions
properly and make it freeze the port if HSM violation is detected
during PIO execution.

Signed-off-by: Tejun Heo <htejun@gmail.com>

---

 drivers/scsi/libata-core.c |   23 +++++++++++++++++++----
 1 files changed, 19 insertions(+), 4 deletions(-)

f4a9faac776713cb0983f4fd551e56e6f0845d27
diff --git a/drivers/scsi/libata-core.c b/drivers/scsi/libata-core.c
index b6f6815..43e47f3 100644
--- a/drivers/scsi/libata-core.c
+++ b/drivers/scsi/libata-core.c
@@ -3479,16 +3479,31 @@ skip_map:
  *	LOCKING:
  *	None.  (grabs host lock)
  */
-
 void ata_poll_qc_complete(struct ata_queued_cmd *qc)
 {
 	struct ata_port *ap = qc->ap;
 	unsigned long flags;
 
 	spin_lock_irqsave(&ap->host_set->lock, flags);
-	ap->flags &= ~ATA_FLAG_NOINTR;
-	ata_irq_on(ap);
-	ata_qc_complete(qc);
+
+	if (ap->ops->error_handler) {
+		/* EH might have kicked in while host_set lock is released */
+		qc = ata_qc_from_tag(ap, qc->tag);
+		if (qc) {
+			if (!(qc->err_mask & AC_ERR_HSM)) {
+				ap->flags &= ~ATA_FLAG_NOINTR;
+				ata_irq_on(ap);
+				ata_qc_complete(qc);
+			} else
+				ata_eh_schedule_port(ap, ATA_EH_FREEZE);
+		}
+	} else {
+		/* old EH */
+		ap->flags &= ~ATA_FLAG_NOINTR;
+		ata_irq_on(ap);
+		ata_qc_complete(qc);
+	}
+
 	spin_unlock_irqrestore(&ap->host_set->lock, flags);
 }
 
-- 
1.2.4



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 15/16] libata-eh-fw: update SCSI command completion path for new EH
  2006-04-11 13:42 [PATCHSET 5/9] new EH framework, take 2 Tejun Heo
                   ` (11 preceding siblings ...)
  2006-04-11 13:42 ` [PATCH 06/16] libata-eh-fw: use special reserved tag and qc for internal commands Tejun Heo
@ 2006-04-11 13:42 ` Tejun Heo
  2006-04-11 13:42 ` [PATCH 16/16] libata-eh-fw: update ata_interrupt() to handle frozen port properly Tejun Heo
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 26+ messages in thread
From: Tejun Heo @ 2006-04-11 13:42 UTC (permalink / raw)
  To: jgarzik, alan, axboe, albertcc, lkosewsk, linux-ide; +Cc: Tejun Heo

New EH is responsible for filling qc->tf with status and filling sense
data for ATAPI check sense.  Update SCSI command completion path to
reflect this.

Signed-off-by: Tejun Heo <htejun@gmail.com>

---

 drivers/scsi/libata-scsi.c |   25 ++++++++++++++++++++++++-
 1 files changed, 24 insertions(+), 1 deletions(-)

394065e67293178b8ed4766800e655f9cc8ec33e
diff --git a/drivers/scsi/libata-scsi.c b/drivers/scsi/libata-scsi.c
index c9c0014..1c1b510 100644
--- a/drivers/scsi/libata-scsi.c
+++ b/drivers/scsi/libata-scsi.c
@@ -1204,7 +1204,9 @@ static void ata_scsi_qc_complete(struct 
 		if (!need_sense) {
 			cmd->result = SAM_STAT_GOOD;
 		} else {
-			qc->ap->ops->tf_read(qc->ap, &qc->tf);
+			/* new EH already loaded qc->tf */
+			if (!qc->ap->ops->error_handler)
+				qc->ap->ops->tf_read(qc->ap, &qc->tf);
 
 			/* TODO: decide which descriptor format to use
 			 * for 48b LBA devices and call that here
@@ -2069,6 +2071,27 @@ static void atapi_qc_complete(struct ata
 
 	VPRINTK("ENTER, err_mask 0x%X\n", err_mask);
 
+	/* handle completion from new EH */
+	if (unlikely(qc->ap->ops->error_handler &&
+		     (err_mask || qc->flags & ATA_QCFLAG_SENSE_VALID))) {
+
+		if (!(qc->flags & ATA_QCFLAG_SENSE_VALID)) {
+			/* FIXME: not quite right; we don't want the
+			 * translation of taskfile registers into a
+			 * sense descriptors, since that's only
+			 * correct for ATA, not ATAPI
+			 */
+			/* new EH already loaded qc->tf */
+			ata_gen_ata_desc_sense(qc);
+		}
+
+		qc->scsicmd->result = SAM_STAT_CHECK_CONDITION;
+		qc->scsidone(cmd);
+		ata_qc_free(qc);
+		return;
+	}
+
+	/* successful completion or old EH failure path */
 	if (unlikely(err_mask & AC_ERR_DEV)) {
 		cmd->result = SAM_STAT_CHECK_CONDITION;
 		atapi_request_sense(qc);
-- 
1.2.4



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 14/16] libata-eh-fw: activate ->post_internal_cmd
  2006-04-11 13:42 [PATCHSET 5/9] new EH framework, take 2 Tejun Heo
                   ` (14 preceding siblings ...)
  2006-04-11 13:42 ` [PATCH 13/16] libata-eh-fw: activate ->error_handler Tejun Heo
@ 2006-04-11 13:42 ` Tejun Heo
  2006-04-12 22:41 ` [PATCHSET 5/9] new EH framework, take 2 Jeff Garzik
  16 siblings, 0 replies; 26+ messages in thread
From: Tejun Heo @ 2006-04-11 13:42 UTC (permalink / raw)
  To: jgarzik, alan, axboe, albertcc, lkosewsk, linux-ide; +Cc: Tejun Heo

Update ata_exec_internal() such that it uses new EH framework.
->post_internal_cmd() is always invoked regardless of completion
status.  Also, when ata_exec_internal() detects a timeout condition
and new EH is in place, it freezes the port as timeout for normal
commands would do.

Signed-off-by: Tejun Heo <htejun@gmail.com>

---

 drivers/scsi/libata-core.c |   25 ++++++++++++++++++++-----
 1 files changed, 20 insertions(+), 5 deletions(-)

6115e687a94c7d5a05cab2646f30ec5d4dca3f91
diff --git a/drivers/scsi/libata-core.c b/drivers/scsi/libata-core.c
index 43e47f3..c7b7de9 100644
--- a/drivers/scsi/libata-core.c
+++ b/drivers/scsi/libata-core.c
@@ -1046,13 +1046,17 @@ unsigned ata_exec_internal(struct ata_po
 
 		/* We're racing with irq here.  If we lose, the
 		 * following test prevents us from completing the qc
-		 * again.  If completion irq occurs after here but
-		 * before the caller cleans up, it will result in a
-		 * spurious interrupt.  We can live with that.
+		 * twice.  If we win, the port is frozen and will be
+		 * cleaned up by ->post_internal_cmd().
 		 */
 		if (qc->flags & ATA_QCFLAG_ACTIVE) {
-			qc->err_mask = AC_ERR_TIMEOUT;
-			ata_qc_complete(qc);
+			qc->err_mask |= AC_ERR_TIMEOUT;
+
+			if (ap->ops->error_handler)
+				ata_eh_schedule_port(ap, ATA_EH_FREEZE);
+			else
+				ata_qc_complete(qc);
+
 			printk(KERN_WARNING "ata%u: qc timeout (cmd 0x%x)\n",
 			       ap->id, command);
 		}
@@ -1060,6 +1064,17 @@ unsigned ata_exec_internal(struct ata_po
 		spin_unlock_irqrestore(&ap->host_set->lock, flags);
 	}
 
+	/* do post_internal_cmd */
+	if (ap->ops->post_internal_cmd)
+		ap->ops->post_internal_cmd(qc);
+
+	if (qc->flags & ATA_QCFLAG_FAILED && !qc->err_mask) {
+		printk(KERN_WARNING "ata%u: zero err_mask for failed "
+		       "internal command, assuming AC_ERR_OTHER\n", ap->id);
+		qc->err_mask |= AC_ERR_OTHER;
+	}
+
+	/* finish up */
 	spin_lock_irqsave(&ap->host_set->lock, flags);
 
 	*tf = qc->tf;
-- 
1.2.4



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 16/16] libata-eh-fw: update ata_interrupt() to handle frozen port properly
  2006-04-11 13:42 [PATCHSET 5/9] new EH framework, take 2 Tejun Heo
                   ` (12 preceding siblings ...)
  2006-04-11 13:42 ` [PATCH 15/16] libata-eh-fw: update SCSI command completion path for new EH Tejun Heo
@ 2006-04-11 13:42 ` Tejun Heo
  2006-04-12 22:40   ` Jeff Garzik
  2006-04-11 13:42 ` [PATCH 13/16] libata-eh-fw: activate ->error_handler Tejun Heo
                   ` (2 subsequent siblings)
  16 siblings, 1 reply; 26+ messages in thread
From: Tejun Heo @ 2006-04-11 13:42 UTC (permalink / raw)
  To: jgarzik, alan, axboe, albertcc, lkosewsk, linux-ide; +Cc: Tejun Heo

Update the stock interrupt handler such that it unconditionally clears
interrupts from a frozen port.

Signed-off-by: Tejun Heo <htejun@gmail.com>

---

 drivers/scsi/libata-core.c |   14 +++++++++++---
 1 files changed, 11 insertions(+), 3 deletions(-)

07f4b12b7a523dc928576fd5a2f18f40969a47ee
diff --git a/drivers/scsi/libata-core.c b/drivers/scsi/libata-core.c
index c7b7de9..d4c75cb 100644
--- a/drivers/scsi/libata-core.c
+++ b/drivers/scsi/libata-core.c
@@ -4517,12 +4517,20 @@ irqreturn_t ata_interrupt (int irq, void
 
 	for (i = 0; i < host_set->n_ports; i++) {
 		struct ata_port *ap;
+		struct ata_queued_cmd *qc;
 
 		ap = host_set->ports[i];
-		if (ap &&
-		    !(ap->flags & (ATA_FLAG_DISABLED | ATA_FLAG_NOINTR))) {
-			struct ata_queued_cmd *qc;
+		if (unlikely(!ap || ap->flags & ATA_FLAG_DISABLED))
+			continue;
 
+		if (unlikely(ap->flags & ATA_FLAG_FROZEN)) {
+			/* port frozen, ack unconditionally */
+			ata_irq_ack(ap, 0);
+			handled = 1;
+			continue;
+		}
+
+		if (!(ap->flags & ATA_FLAG_NOINTR)) {
 			qc = ata_qc_from_tag(ap, ap->active_tag);
 			if (qc && (!(qc->tf.ctl & ATA_NIEN)) &&
 			    (qc->flags & ATA_QCFLAG_ACTIVE))
-- 
1.2.4



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 13/16] libata-eh-fw: activate ->error_handler
  2006-04-11 13:42 [PATCHSET 5/9] new EH framework, take 2 Tejun Heo
                   ` (13 preceding siblings ...)
  2006-04-11 13:42 ` [PATCH 16/16] libata-eh-fw: update ata_interrupt() to handle frozen port properly Tejun Heo
@ 2006-04-11 13:42 ` Tejun Heo
  2006-04-11 13:42 ` [PATCH 14/16] libata-eh-fw: activate ->post_internal_cmd Tejun Heo
  2006-04-12 22:41 ` [PATCHSET 5/9] new EH framework, take 2 Jeff Garzik
  16 siblings, 0 replies; 26+ messages in thread
From: Tejun Heo @ 2006-04-11 13:42 UTC (permalink / raw)
  To: jgarzik, alan, axboe, albertcc, lkosewsk, linux-ide; +Cc: Tejun Heo

Update ata_scsi_error() such that ->error_handler is invoked if
implemented.  ata_scsi_error() also takes care of merging
ATA_QCFLAG_TIMEOUT into qc->err_mask and clearing SCSI EH conditions.

Signed-off-by: Tejun Heo <htejun@gmail.com>

---

 drivers/scsi/libata-eh.c |   56 +++++++++++++++++++++++++++++++++++++++++++---
 include/linux/libata.h   |    3 ++
 2 files changed, 55 insertions(+), 4 deletions(-)

dc24434e3e0ed074dd1f11231e9f0fc50e457d13
diff --git a/drivers/scsi/libata-eh.c b/drivers/scsi/libata-eh.c
index 97ec527..1015f89 100644
--- a/drivers/scsi/libata-eh.c
+++ b/drivers/scsi/libata-eh.c
@@ -123,18 +123,66 @@ enum scsi_eh_timer_return ata_scsi_timed
 int ata_scsi_error(struct Scsi_Host *host)
 {
 	struct ata_port *ap = (struct ata_port *)&host->hostdata[0];
+	spinlock_t *hs_lock = &ap->host_set->lock;
+	int repeat_cnt = ATA_EH_MAX_REPEAT;
+	unsigned long flags;
 
 	DPRINTK("ENTER\n");
 
+ repeat:
 	/* synchronize with IRQ handler and port task */
-	spin_unlock_wait(&ap->host_set->lock);
+	spin_unlock_wait(hs_lock);
 	ata_port_flush_task(ap);
 
-	WARN_ON(ata_qc_from_tag(ap, ap->active_tag) == NULL);
+	/* invoke error handler */
+	if (ap->ops->error_handler) {
+		int i;
+
+		/* for new EH, all the qc's are ours now */
+		for (i = 0; i < ATA_MAX_QUEUE; i++) {
+			struct ata_queued_cmd *qc;
+			qc = __ata_qc_from_tag(ap, i);
+			if (qc->flags & ATA_QCFLAG_TIMEOUT)
+				qc->err_mask |= AC_ERR_TIMEOUT;;
+		}
+
+		/* invoke EH */
+		ap->ops->error_handler(ap);
 
-	ap->ops->eng_timeout(ap);
+		/* Exception might have happend after ->error_handler
+		 * recovered the port but before reaching this point.
+		 * repeat EH if that happens.
+		 */
+		spin_lock_irqsave(hs_lock, flags);
+
+		if (ap->flags & ATA_FLAG_FROZEN) {
+			if (--repeat_cnt) {
+				printk(KERN_INFO "ata%u: port is frozen after "
+				       "EH completed, repeating EH (cnt=%d)\n",
+				       ap->id, repeat_cnt);
+				spin_unlock_irqrestore(hs_lock, flags);
+				goto repeat;
+			}
+			printk(KERN_ERR "ata%u: port is frozen after %d tries, "
+			       "giving up\n", ap->id, ATA_EH_MAX_REPEAT);
+		}
+
+		/* Clear host_eh_scheduled while holding hs_lock such
+		 * that if exception occurs after this point but
+		 * before EH completion, SCSI midlayer will
+		 * re-initiate EH.
+		 */
+		host->host_eh_scheduled = 0;
+
+		spin_unlock_irqrestore(hs_lock, flags);
+	} else {
+		WARN_ON(ata_qc_from_tag(ap, ap->active_tag) == NULL);
+		ap->ops->eng_timeout(ap);
+	}
 
-	WARN_ON(host->host_failed || !list_empty(&host->eh_cmd_q));
+	/* finish or retry handled scmd's and clean up */
+	WARN_ON(host->host_failed || !list_empty(&host->eh_cmd_q) ||
+		host->host_eh_scheduled);
 
 	scsi_eh_flush_done_q(&ap->eh_done_q);
 
diff --git a/include/linux/libata.h b/include/linux/libata.h
index 47fbd68..0ac5214 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -226,6 +226,9 @@ enum {
 	ATA_EH_ABORT		= (1 << 0), /* abort all active commands */
 	ATA_EH_FREEZE		= (1 << 1), /* freeze port (implies ABORT) */
 
+	/* max repeat if error condition is still set after ->error_handler */
+	ATA_EH_MAX_REPEAT	= 5,
+
 	/* how hard are we gonna try to probe/recover devices */
 	ATA_PROBE_MAX_TRIES	= 3,
 };
-- 
1.2.4



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 11/16] libata-eh-fw: implement new EH scheduling via timeout
  2006-04-11 13:42 ` [PATCH 11/16] libata-eh-fw: implement new EH scheduling via timeout Tejun Heo
@ 2006-04-12 22:36   ` Jeff Garzik
  2006-04-13  2:40     ` Tejun Heo
  0 siblings, 1 reply; 26+ messages in thread
From: Jeff Garzik @ 2006-04-12 22:36 UTC (permalink / raw)
  To: Tejun Heo; +Cc: alan, axboe, albertcc, lkosewsk, linux-ide

Tejun Heo wrote:
> Implement new EH scheduling from timeout.  ata_scsi_timedout() also
> takes care of the race condition in which scsi_eh_schedule_qc() sets
> ATA_QCFLAG_EH_SCHEDULED but fails to acutally schedule EH for the qc
> because it loses to timeout.
> 
> A timeout is HSM violation condition.  New EH assumes that on a
> timeout the state of the controller and devices are unknown and
> dangerous.  So, all active commands are aborted and the port is
> frozen.  Note that commands which get aborted this way don't have its
> qc->err_mask set and its retries count will be compensated.
> 
> Signed-off-by: Tejun Heo <htejun@gmail.com>

This is not always true:  With PCI IDE BMDMA devices, it is presumed 
that a DMA error will be handled by timeout.  For this case, the 
controller is in a known state.

	Jeff




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 16/16] libata-eh-fw: update ata_interrupt() to handle frozen port properly
  2006-04-11 13:42 ` [PATCH 16/16] libata-eh-fw: update ata_interrupt() to handle frozen port properly Tejun Heo
@ 2006-04-12 22:40   ` Jeff Garzik
  2006-04-13  2:59     ` Tejun Heo
  0 siblings, 1 reply; 26+ messages in thread
From: Jeff Garzik @ 2006-04-12 22:40 UTC (permalink / raw)
  To: Tejun Heo; +Cc: alan, axboe, albertcc, lkosewsk, linux-ide

Tejun Heo wrote:
> Update the stock interrupt handler such that it unconditionally clears
> interrupts from a frozen port.
> 
> Signed-off-by: Tejun Heo <htejun@gmail.com>
> 
> ---
> 
>  drivers/scsi/libata-core.c |   14 +++++++++++---
>  1 files changed, 11 insertions(+), 3 deletions(-)
> 
> 07f4b12b7a523dc928576fd5a2f18f40969a47ee
> diff --git a/drivers/scsi/libata-core.c b/drivers/scsi/libata-core.c
> index c7b7de9..d4c75cb 100644
> --- a/drivers/scsi/libata-core.c
> +++ b/drivers/scsi/libata-core.c
> @@ -4517,12 +4517,20 @@ irqreturn_t ata_interrupt (int irq, void
>  
>  	for (i = 0; i < host_set->n_ports; i++) {
>  		struct ata_port *ap;
> +		struct ata_queued_cmd *qc;
>  
>  		ap = host_set->ports[i];
> -		if (ap &&
> -		    !(ap->flags & (ATA_FLAG_DISABLED | ATA_FLAG_NOINTR))) {
> -			struct ata_queued_cmd *qc;
> +		if (unlikely(!ap || ap->flags & ATA_FLAG_DISABLED))
> +			continue;
>  
> +		if (unlikely(ap->flags & ATA_FLAG_FROZEN)) {
> +			/* port frozen, ack unconditionally */
> +			ata_irq_ack(ap, 0);
> +			handled = 1;
> +			continue;

NAK unless you can give me some reasonable justification for clearing 
the interrupts here.

Frozen or not, this is not really an appropriate place to put this.

	Jeff




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHSET 5/9] new EH framework, take 2
  2006-04-11 13:42 [PATCHSET 5/9] new EH framework, take 2 Tejun Heo
                   ` (15 preceding siblings ...)
  2006-04-11 13:42 ` [PATCH 14/16] libata-eh-fw: activate ->post_internal_cmd Tejun Heo
@ 2006-04-12 22:41 ` Jeff Garzik
  16 siblings, 0 replies; 26+ messages in thread
From: Jeff Garzik @ 2006-04-12 22:41 UTC (permalink / raw)
  To: Tejun Heo; +Cc: alan, axboe, albertcc, lkosewsk, linux-ide

Tejun Heo wrote:
> Hello, all.
> 
> This is the second take of new-EH-framework patchset.  Changes from
> the last take[L] are...

ACK patches 1-15, of 16.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 11/16] libata-eh-fw: implement new EH scheduling via timeout
  2006-04-12 22:36   ` Jeff Garzik
@ 2006-04-13  2:40     ` Tejun Heo
  2006-04-13  3:18       ` Jeff Garzik
  0 siblings, 1 reply; 26+ messages in thread
From: Tejun Heo @ 2006-04-13  2:40 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: alan, axboe, albertcc, lkosewsk, linux-ide

Jeff Garzik wrote:
> Tejun Heo wrote:
>> Implement new EH scheduling from timeout.  ata_scsi_timedout() also
>> takes care of the race condition in which scsi_eh_schedule_qc() sets
>> ATA_QCFLAG_EH_SCHEDULED but fails to acutally schedule EH for the qc
>> because it loses to timeout.
>>
>> A timeout is HSM violation condition.  New EH assumes that on a
>> timeout the state of the controller and devices are unknown and
>> dangerous.  So, all active commands are aborted and the port is
>> frozen.  Note that commands which get aborted this way don't have its
>> qc->err_mask set and its retries count will be compensated.
>>
>> Signed-off-by: Tejun Heo <htejun@gmail.com>
> 
> This is not always true:  With PCI IDE BMDMA devices, it is presumed 
> that a DMA error will be handled by timeout.  For this case, the 
> controller is in a known state.
> 

The problem is that the timeout handler doesn't have anyway to determine 
whether the timeout is from real timeout or from DMA error, and the 
timeout handler is responsible for transferring the ownership the failed 
port to EH.  EH, on entry, must be guaranteed that it owns the port if 
it's not frozen.

One way around this would be making a new callback, say, 
->timeout_autopsy and let it decide whether the port needs freezing or 
not, but it would be an overkill.  The only side effect of being frozen 
is that the port will get a softreset to thaw it, which isn't so bad - I 
want my controller to get a good spanking in the ass after sitting idle 
for 30secs.

-- 
tejun

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 16/16] libata-eh-fw: update ata_interrupt() to handle frozen port properly
  2006-04-12 22:40   ` Jeff Garzik
@ 2006-04-13  2:59     ` Tejun Heo
  0 siblings, 0 replies; 26+ messages in thread
From: Tejun Heo @ 2006-04-13  2:59 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: alan, axboe, albertcc, lkosewsk, linux-ide

Jeff Garzik wrote:
> Tejun Heo wrote:
>> Update the stock interrupt handler such that it unconditionally clears
>> interrupts from a frozen port.
>>
>> Signed-off-by: Tejun Heo <htejun@gmail.com>
>>
>> ---
>>
>>  drivers/scsi/libata-core.c |   14 +++++++++++---
>>  1 files changed, 11 insertions(+), 3 deletions(-)
>>
>> 07f4b12b7a523dc928576fd5a2f18f40969a47ee
>> diff --git a/drivers/scsi/libata-core.c b/drivers/scsi/libata-core.c
>> index c7b7de9..d4c75cb 100644
>> --- a/drivers/scsi/libata-core.c
>> +++ b/drivers/scsi/libata-core.c
>> @@ -4517,12 +4517,20 @@ irqreturn_t ata_interrupt (int irq, void
>>  
>>      for (i = 0; i < host_set->n_ports; i++) {
>>          struct ata_port *ap;
>> +        struct ata_queued_cmd *qc;
>>  
>>          ap = host_set->ports[i];
>> -        if (ap &&
>> -            !(ap->flags & (ATA_FLAG_DISABLED | ATA_FLAG_NOINTR))) {
>> -            struct ata_queued_cmd *qc;
>> +        if (unlikely(!ap || ap->flags & ATA_FLAG_DISABLED))
>> +            continue;
>>  
>> +        if (unlikely(ap->flags & ATA_FLAG_FROZEN)) {
>> +            /* port frozen, ack unconditionally */
>> +            ata_irq_ack(ap, 0);
>> +            handled = 1;
>> +            continue;
> 
> NAK unless you can give me some reasonable justification for clearing 
> the interrupts here.
> 
> Frozen or not, this is not really an appropriate place to put this.
> 

Frozen state can be implemented in one of two ways.

1. (much preferred) masking all IRQs from the port in hardware.  Most 
modern controllers can do this.  sata_sil and sata_sil24 are converted 
to new EH this way.  Stock BMDMA freeze routine does this by turning on 
ATA_NIEN which sort of achieves the goal but a bit unreliable.

2. (plan b) making IRQ handler ack and clear IRQs unconditionally and do 
nothing else while frozen.  This works best if the controller has 
pending IRQ indication.  It's less reliable than #1 but except for 
screaming IRQ condition, it works most of the time.  Unfortunately, AHCI 
implements frozen state this way because ICH7 gets messed up if IRQ mask 
is diddled with while things are in progress.

For dumb BMDMA controllers, #2 is a bit tricky because they don't have 
IRQ pending indication, so there is no reliable way whether it's raising 
the interrupt or some other thing sharing the interrupt line is.  Also, 
as noted above, ATA_NIEN isn't the most reliable way to achieve IRQ silence.

So, the stock BMDMA interrupt handler is updated as above.  While 
frozen, it acks and clears interrupts unconditionally and tell IRQ layer 
that it handled the IRQ.  It will even ack and clear IRQs which don't 
origin from it.  Basically, it's saying "dude, I don't know where I am 
and the IRQ might or might not be mine.  Sorry for the inconvenience but 
my EH will kick in and rescue me pretty soon."

AHCI and sata_sil24 don't use the stock interrupt handler and sata_sil 
has its own freezing mechanism.  ata_piix is frozen with ATA_NIEN but 
its datasheet says ATA_NIEN plugs its interrupt at the controller level, 
so this one should be safe too.  So, removing above code shouldn't 
affect any currently converted drivers but IMHO things are safer with 
above change.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 11/16] libata-eh-fw: implement new EH scheduling via timeout
  2006-04-13  2:40     ` Tejun Heo
@ 2006-04-13  3:18       ` Jeff Garzik
  2006-04-13  3:36         ` Tejun Heo
  0 siblings, 1 reply; 26+ messages in thread
From: Jeff Garzik @ 2006-04-13  3:18 UTC (permalink / raw)
  To: Tejun Heo; +Cc: alan, axboe, albertcc, lkosewsk, linux-ide

Tejun Heo wrote:
> The problem is that the timeout handler doesn't have anyway to determine 
> whether the timeout is from real timeout or from DMA error, and the 

Not true at all.  Just read BMDMA status.  Take a look at what 
drivers/ide does.


> timeout handler is responsible for transferring the ownership the failed 
> port to EH.  EH, on entry, must be guaranteed that it owns the port if 
> it's not frozen.
> 
> One way around this would be making a new callback, say, 
> ->timeout_autopsy and let it decide whether the port needs freezing or 
> not, but it would be an overkill.  The only side effect of being frozen 
> is that the port will get a softreset to thaw it, which isn't so bad - I 
> want my controller to get a good spanking in the ass after sitting idle 
> for 30secs.

When presented with standard, documented DMA error behavior, a reset is 
inappropriate.  Just ACK the DMA error and move on with life.  If 
continuous DMA errors occur, reset and/or step down the speed as was 
discussed many months ago.

Get the user back up and talking to their disk as fast as possible.

	Jeff



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 11/16] libata-eh-fw: implement new EH scheduling via timeout
  2006-04-13  3:18       ` Jeff Garzik
@ 2006-04-13  3:36         ` Tejun Heo
  2006-04-27 11:33           ` Jeff Garzik
  0 siblings, 1 reply; 26+ messages in thread
From: Tejun Heo @ 2006-04-13  3:36 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: alan, axboe, albertcc, lkosewsk, linux-ide

Jeff Garzik wrote:
> Tejun Heo wrote:
>> The problem is that the timeout handler doesn't have anyway to 
>> determine whether the timeout is from real timeout or from DMA error, 
>> and the 
> 
> Not true at all.  Just read BMDMA status.  Take a look at what 
> drivers/ide does.

Yeap, what I meant was the current timeout handler implementation 
doesn't have any way to do that, so later in the previous reply, I 
talked about ->timeout_autopsy.

> 
>> timeout handler is responsible for transferring the ownership the 
>> failed port to EH.  EH, on entry, must be guaranteed that it owns the 
>> port if it's not frozen.
>>
>> One way around this would be making a new callback, say, 
>> ->timeout_autopsy and let it decide whether the port needs freezing or 
>> not, but it would be an overkill.  The only side effect of being 
>> frozen is that the port will get a softreset to thaw it, which isn't 
>> so bad - I want my controller to get a good spanking in the ass after 
>> sitting idle for 30secs.
> 
> When presented with standard, documented DMA error behavior, a reset is 
> inappropriate.  Just ACK the DMA error and move on with life.  If 
> continuous DMA errors occur, reset and/or step down the speed as was 
> discussed many months ago.

The speeding down part is the same whether the port is frozen or not. 
The only difference is how EH recovers the port after the error.  Failed 
devices on a not frozen port are just revalidated while a frozen port 
gets a reset.

Here is another method to deal with it as adding ->timeout_autopsy or 
anything similar is too unattractive.  A new interface, say, 
ata_eh_thaw_port() can be implemented which thaws the port without 
resetting it.  Then, in BMDMA autopsy, after determining that a timeout 
was caused by DMA error, it can thaw the port and adjust qc->err_mask to 
AC_ERR_HOST_BUS.  How does it sound to you?

> Get the user back up and talking to their disk as fast as possible.

Command timeout is 30 secs (which, I think is a bit too long for ATA 
disk devices).  If resetting succeeds, it takes less than two seconds. 
I don't think it will make any difference to the user.

-- 
tejun

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 11/16] libata-eh-fw: implement new EH scheduling via timeout
  2006-04-13  3:36         ` Tejun Heo
@ 2006-04-27 11:33           ` Jeff Garzik
  2006-04-29 21:13             ` Alan Cox
  0 siblings, 1 reply; 26+ messages in thread
From: Jeff Garzik @ 2006-04-27 11:33 UTC (permalink / raw)
  To: Tejun Heo; +Cc: alan, axboe, albertcc, lkosewsk, linux-ide

Tejun Heo wrote:
> Here is another method to deal with it as adding ->timeout_autopsy or 
> anything similar is too unattractive.  A new interface, say, 
> ata_eh_thaw_port() can be implemented which thaws the port without 
> resetting it.  Then, in BMDMA autopsy, after determining that a timeout 
> was caused by DMA error, it can thaw the port and adjust qc->err_mask to 
> AC_ERR_HOST_BUS.  How does it sound to you?

Seems rational.


>> Get the user back up and talking to their disk as fast as possible.
> 
> Command timeout is 30 secs (which, I think is a bit too long for ATA 
> disk devices).  If resetting succeeds, it takes less than two seconds. I 
> don't think it will make any difference to the user.

Well overall I want to minimize the change from existing behavior, since 
there is enough change as it is.  In current libata and drivers/ide, it 
can continue after a properly indicated DMA error without doing a reset.

And I just think its rude, to reset hardware when it doesn't need 
resetting.  Its sorta like the Windows mentality -- "just reboot it, 
that will fix stuff"  Linux shouldn't be rude to hardware, when it need 
not be :)

	Jeff



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 11/16] libata-eh-fw: implement new EH scheduling via timeout
  2006-04-27 11:33           ` Jeff Garzik
@ 2006-04-29 21:13             ` Alan Cox
  0 siblings, 0 replies; 26+ messages in thread
From: Alan Cox @ 2006-04-29 21:13 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Tejun Heo, axboe, albertcc, lkosewsk, linux-ide

On Iau, 2006-04-27 at 07:33 -0400, Jeff Garzik wrote:
> Well overall I want to minimize the change from existing behavior, since 
> there is enough change as it is.  In current libata and drivers/ide, it 
> can continue after a properly indicated DMA error without doing a reset.

Sometimes. But the smarts in it also break some hardware horribly
because the properly indicated DMA error doesn't neccessarily leave the
state machines on the chip sane, and worse it sometimes does resets then
recovery and thats ugly.

> And I just think its rude, to reset hardware when it doesn't need 
> resetting.  Its sorta like the Windows mentality -- "just reboot it, 
> that will fix stuff"  Linux shouldn't be rude to hardware, when it need 
> not be :)

Windows generally uses a 7 second timeout btw.


^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2006-04-29 21:03 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-04-11 13:42 [PATCHSET 5/9] new EH framework, take 2 Tejun Heo
2006-04-11 13:42 ` [PATCH 02/16] libata-eh-fw: add new EH operations Tejun Heo
2006-04-11 13:42 ` [PATCH 01/16] libata-eh-fw: add flags for new EH Tejun Heo
2006-04-11 13:42 ` [PATCH 09/16] libata-eh-fw: implement new EH scheduling via error completion Tejun Heo
2006-04-11 13:42 ` [PATCH 12/16] libata-eh-fw: implement new EH scheduling from PIO Tejun Heo
2006-04-11 13:42 ` [PATCH 08/16] libata-eh-fw: update ata_qc_from_tag() to enforce normal/EH qc ownership Tejun Heo
2006-04-11 13:42 ` [PATCH 11/16] libata-eh-fw: implement new EH scheduling via timeout Tejun Heo
2006-04-12 22:36   ` Jeff Garzik
2006-04-13  2:40     ` Tejun Heo
2006-04-13  3:18       ` Jeff Garzik
2006-04-13  3:36         ` Tejun Heo
2006-04-27 11:33           ` Jeff Garzik
2006-04-29 21:13             ` Alan Cox
2006-04-11 13:42 ` [PATCH 07/16] libata-eh-fw: implement ata_port_freeze() Tejun Heo
2006-04-11 13:42 ` [PATCH 04/16] libata-eh-fw: clear IRQ in ata_std_postreset() Tejun Heo
2006-04-11 13:42 ` [PATCH 03/16] libata-eh-fw: hold host_set lock while finishing internal qc Tejun Heo
2006-04-11 13:42 ` [PATCH 10/16] libata-eh-fw: implement ata_eh_schedule_port() Tejun Heo
2006-04-11 13:42 ` [PATCH 05/16] libata-eh-fw: clear SError in ata_std_postreset() Tejun Heo
2006-04-11 13:42 ` [PATCH 06/16] libata-eh-fw: use special reserved tag and qc for internal commands Tejun Heo
2006-04-11 13:42 ` [PATCH 15/16] libata-eh-fw: update SCSI command completion path for new EH Tejun Heo
2006-04-11 13:42 ` [PATCH 16/16] libata-eh-fw: update ata_interrupt() to handle frozen port properly Tejun Heo
2006-04-12 22:40   ` Jeff Garzik
2006-04-13  2:59     ` Tejun Heo
2006-04-11 13:42 ` [PATCH 13/16] libata-eh-fw: activate ->error_handler Tejun Heo
2006-04-11 13:42 ` [PATCH 14/16] libata-eh-fw: activate ->post_internal_cmd Tejun Heo
2006-04-12 22:41 ` [PATCHSET 5/9] new EH framework, take 2 Jeff Garzik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).