[PATCHSET 6/9] new EH implementation, take 2

linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCHSET 6/9] new EH implementation, take 2
@ 2006-04-11 13:48 Tejun Heo
  2006-04-11 13:48 ` [PATCH 09/14] libata-eh: implement ata_eh_finish_qcs() Tejun Heo
                   ` (14 more replies)
  0 siblings, 15 replies; 24+ messages in thread
From: Tejun Heo @ 2006-04-11 13:48 UTC (permalink / raw)
  To: jgarzik, alan, axboe, albertcc, lkosewsk, linux-ide, htejun

Hello, all.

This is the second take of new-EH-implementation patchset.  Changes
from the last take[L] are...

* !tries[] and ata_set_mode() failure handlings in ata_eh_revive() are
  updated in the same way ata_bus_probe() is updated.

* ata_eh_analyze_serror() is updated such that it doesn't depend on
  existence of qc.  Result from analyze_serror() will be used for hot
  plug event detection.

* better wait logic before the first reset in ata_eh_revive().  The
  wait logic now lives in a separate function
  ata_eh_wait_before_reset().

* ata_eh_revive() skips EH reset if possible

* ata_eh_revive() skips revalidation if SATA PHY reports no device is
  attached to the port.

* ahci error_handler grabs host_set lock when accessing pp->eh_* fields

* ahci_host_intr is converted to return void

* ahci autopsy is performed before determine_qc such that the result
  can be used for NCQ determine_qc later.

* sata_sil24 is converted to new EH

* sata_sil postreset now calls ata_std_postreset() before reenabling
  interrupts

This patchset is against

  upstream (c2a6585296009379e0f4eff39cdcb108b457ebf2)
  + [1] misc-reset-updates patchset (repost)
  + [2] implement-and-use-ata_wait_register patchset (repost)
  + [3] misc-ata_bus_probe-updates patchset
  + [4] fixes-errata-workaround-and-reset-updates patchset, take 2
  + [5] implement-scsi_eh_schedule patchset
  + [6] fix-scsi_kill_request-busy-count handling patch
  + [7] new-EH-framework patchset, take 2

Thanks.

--
tejun

[L] http://article.gmane.org/gmane.linux.ide/9322
[1] http://article.gmane.org/gmane.linux.ide/9495
[2] http://article.gmane.org/gmane.linux.ide/9499
[3] http://article.gmane.org/gmane.linux.ide/9506
[4] http://article.gmane.org/gmane.linux.ide/9516
[5] http://article.gmane.org/gmane.linux.ide/9290
[6] http://article.gmane.org/gmane.linux.ide/9487
[7] http://article.gmane.org/gmane.linux.ide/9524

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 01/14] libata-eh: add constants and flags to be used by EH
  2006-04-11 13:48 [PATCHSET 6/9] new EH implementation, take 2 Tejun Heo
  2006-04-11 13:48 ` [PATCH 09/14] libata-eh: implement ata_eh_finish_qcs() Tejun Heo
  2006-04-11 13:48 ` [PATCH 02/14] libata-eh: implement ata_ering Tejun Heo
@ 2006-04-11 13:48 ` Tejun Heo
  2006-04-11 13:48 ` [PATCH 04/14] libata-eh: implement EH utility functions Tejun Heo
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Tejun Heo @ 2006-04-11 13:48 UTC (permalink / raw)
  To: jgarzik, alan, axboe, albertcc, lkosewsk, linux-ide; +Cc: Tejun Heo

Add constants and flags to be used by EH.

Signed-off-by: Tejun Heo <htejun@gmail.com>

---

 include/linux/ata.h    |   11 +++++++++++
 include/linux/libata.h |    5 +++++
 2 files changed, 16 insertions(+), 0 deletions(-)

27f8a9a882a40d389aaeea10e03594e8c41911d6
diff --git a/include/linux/ata.h b/include/linux/ata.h
index 312a2c0..283138f 100644
--- a/include/linux/ata.h
+++ b/include/linux/ata.h
@@ -97,6 +97,9 @@ enum {
 	ATA_DRQ			= (1 << 3),	/* data request i/o */
 	ATA_ERR			= (1 << 0),	/* have an error */
 	ATA_SRST		= (1 << 2),	/* software reset */
+	ATA_ICRC		= (1 << 7),	/* interface CRC error */
+	ATA_UNC			= (1 << 6),	/* uncorrectable media error */
+	ATA_IDNF		= (1 << 4),	/* ID not found */
 	ATA_ABORTED		= (1 << 2),	/* command aborted */
 
 	/* ATA command block registers */
@@ -192,6 +195,14 @@ enum {
 	SCR_ACTIVE		= 3,
 	SCR_NOTIFICATION	= 4,
 
+	/* SError bits */
+	SERR_DATA_RECOVERED	= (1 << 0), /* recovered data error */
+	SERR_COMM_RECOVERED	= (1 << 1), /* recovered comm failure */
+	SERR_DATA		= (1 << 8), /* unrecovered data error */
+	SERR_PERSISTENT		= (1 << 9), /* persistent data/comm error */
+	SERR_PROTOCOL		= (1 << 10), /* protocol violation */
+	SERR_INTERNAL		= (1 << 11), /* host internal error */
+
 	/* struct ata_taskfile flags */
 	ATA_TFLAG_LBA48		= (1 << 0), /* enable 48-bit LBA and "HOB" */
 	ATA_TFLAG_ISADDR	= (1 << 1), /* enable r/w to nsect/lba regs */
diff --git a/include/linux/libata.h b/include/linux/libata.h
index 0ac5214..58ad515 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -222,6 +222,11 @@ enum {
 	ATA_PORT_PRIMARY	= (1 << 0),
 	ATA_PORT_SECONDARY	= (1 << 1),
 
+	/* reset / recovery action types */
+	ATA_PORT_REVALIDATE	= (1 << 0),
+	ATA_PORT_SOFTRESET	= (1 << 1),
+	ATA_PORT_HARDRESET	= (1 << 2),
+
 	/* flags for ata_eh_shduled_port */
 	ATA_EH_ABORT		= (1 << 0), /* abort all active commands */
 	ATA_EH_FREEZE		= (1 << 1), /* freeze port (implies ABORT) */
-- 
1.2.4



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 04/14] libata-eh: implement EH utility functions
  2006-04-11 13:48 [PATCHSET 6/9] new EH implementation, take 2 Tejun Heo
                   ` (2 preceding siblings ...)
  2006-04-11 13:48 ` [PATCH 01/14] libata-eh: add constants and flags to be used by EH Tejun Heo
@ 2006-04-11 13:48 ` Tejun Heo
  2006-04-11 13:48 ` [PATCH 10/14] libata-eh: implement EH methods for BMDMA controllers Tejun Heo
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Tejun Heo @ 2006-04-11 13:48 UTC (permalink / raw)
  To: jgarzik, alan, axboe, albertcc, lkosewsk, linux-ide; +Cc: Tejun Heo

Implement two utility functions ata_err_string() and
atapi_eh_request_sense().  They will be used by EH.

Signed-off-by: Tejun Heo <htejun@gmail.com>

---

 drivers/scsi/libata-eh.c |   92 ++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 92 insertions(+), 0 deletions(-)

75a537b896e2e5c014503d310769ae6abb5d57d8
diff --git a/drivers/scsi/libata-eh.c b/drivers/scsi/libata-eh.c
index 485ebaa..7ef2b78 100644
--- a/drivers/scsi/libata-eh.c
+++ b/drivers/scsi/libata-eh.c
@@ -466,3 +466,95 @@ void ata_eh_qc_retry(struct ata_queued_c
 		scmd->retries--;
 	__ata_eh_qc_complete(qc);
 }
+
+/**
+ *	ata_err_string - convert err_mask to descriptive string
+ *	@err_mask: error mask to convert to string
+ *
+ *	Convert @err_mask to descriptive string.  Errors are
+ *	prioritized according to severity and only the most severe
+ *	error is reported.
+ *
+ *	LOCKING:
+ *	None.
+ *
+ *	RETURNS:
+ *	Descriptive string for @err_mask
+ */
+static const char * ata_err_string(unsigned int err_mask)
+{
+	if (err_mask & AC_ERR_HOST_BUS)
+		return "host bus error";
+	if (err_mask & AC_ERR_ATA_BUS)
+		return "ATA bus error";
+	if (err_mask & AC_ERR_TIMEOUT)
+		return "timeout";
+	if (err_mask & AC_ERR_HSM)
+		return "host state machine violation";
+	if (err_mask & AC_ERR_SYSTEM)
+		return "host internal error";
+	if (err_mask & AC_ERR_MEDIA)
+		return "media error";
+	if (err_mask & AC_ERR_INVALID)
+		return "invalid argument error";
+	if (err_mask & AC_ERR_DEV)
+		return "unclassified device error";
+	return "unknown error";
+}
+
+/**
+ *	atapi_eh_request_sense - perform ATAPI REQUEST_SENSE
+ *	@ap: port associated with device @dev
+ *	@dev: device to perform REQUEST_SENSE to
+ *	@sense_buf: result sense data buffer (SCSI_SENSE_BUFFERSIZE bytes long)
+ *
+ *	Perform ATAPI REQUEST_SENSE after the device reported CHECK
+ *	SENSE.  This function is EH helper.
+ *
+ *	LOCKING:
+ *	Kernel thread context (may sleep)
+ *
+ *	RETURNS:
+ *	0 on success, AC_ERR_* mask on failure
+ */
+static unsigned int atapi_eh_request_sense(struct ata_port *ap,
+					   struct ata_device *dev,
+					   unsigned char *sense_buf)
+{
+	struct ata_taskfile tf;
+	u8 cdb[ATAPI_CDB_LEN];
+
+	DPRINTK("ATAPI request sense\n");
+
+	ata_tf_init(ap, &tf, dev->devno);
+
+	/* FIXME: is this needed? */
+	memset(sense_buf, 0, SCSI_SENSE_BUFFERSIZE);
+
+	/* XXX: why tf_read here? */
+	ap->ops->tf_read(ap, &tf);
+
+	/* fill these in, for the case where they are -not- overwritten */
+	sense_buf[0] = 0x70;
+	sense_buf[2] = tf.feature >> 4;
+
+	memset(cdb, 0, ATAPI_CDB_LEN);
+	cdb[0] = REQUEST_SENSE;
+	cdb[4] = SCSI_SENSE_BUFFERSIZE;
+
+	tf.flags |= ATA_TFLAG_ISADDR | ATA_TFLAG_DEVICE;
+	tf.command = ATA_CMD_PACKET;
+
+	/* is it pointless to prefer PIO for "safety reasons"? */
+	if (ap->flags & ATA_FLAG_PIO_DMA) {
+		tf.protocol = ATA_PROT_ATAPI_DMA;
+		tf.feature |= ATAPI_PKT_DMA;
+	} else {
+		tf.protocol = ATA_PROT_ATAPI;
+		tf.lbam = (8 * 1024) & 0xff;
+		tf.lbah = (8 * 1024) >> 8;
+	}
+
+	return ata_exec_internal(ap, dev, &tf, cdb, DMA_FROM_DEVICE,
+				 sense_buf, SCSI_SENSE_BUFFERSIZE);
+}
-- 
1.2.4



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 03/14] libata-eh: add per-dev ata_ering
  2006-04-11 13:48 [PATCHSET 6/9] new EH implementation, take 2 Tejun Heo
                   ` (8 preceding siblings ...)
  2006-04-11 13:48 ` [PATCH 06/14] libata-eh: implement ata_eh_autopsy() Tejun Heo
@ 2006-04-11 13:48 ` Tejun Heo
  2006-04-11 13:48 ` [PATCH 07/14] libata-eh: implement ata_eh_report() Tejun Heo
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Tejun Heo @ 2006-04-11 13:48 UTC (permalink / raw)
  To: jgarzik, alan, axboe, albertcc, lkosewsk, linux-ide; +Cc: Tejun Heo

EH is gonna record errors per-dev.  Define per-dev ata_ering.

Signed-off-by: Tejun Heo <htejun@gmail.com>

---

 drivers/scsi/libata-core.c |    1 +
 include/linux/libata.h     |    4 ++++
 2 files changed, 5 insertions(+), 0 deletions(-)

f672946fdd44a66d0dd42b4df146b7e9e3735e49
diff --git a/drivers/scsi/libata-core.c b/drivers/scsi/libata-core.c
index d4c75cb..13954f6 100644
--- a/drivers/scsi/libata-core.c
+++ b/drivers/scsi/libata-core.c
@@ -4777,6 +4777,7 @@ static void ata_host_init(struct ata_por
 		dev->pio_mask = UINT_MAX;
 		dev->mwdma_mask = UINT_MAX;
 		dev->udma_mask = UINT_MAX;
+		ata_ering_init(&dev->ering, ATA_DEV_ERING_SIZE);
 	}
 
 #ifdef ATA_IRQ_TRAP
diff --git a/include/linux/libata.h b/include/linux/libata.h
index 7a8d77b..9447bd2 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -135,6 +135,8 @@ enum {
 	ATA_DEV_ATAPI_UNSUP	= 4,	/* ATAPI device (unsupported) */
 	ATA_DEV_NONE		= 5,	/* no device */
 
+	ATA_DEV_ERING_SIZE	= 32,	/* record 32 recent errors */
+
 	/* struct ata_port flags */
 	ATA_FLAG_SLAVE_POSS	= (1 << 0), /* host supports slave dev */
 					    /* (doesn't imply presence) */
@@ -414,6 +416,8 @@ struct ata_device {
 	u16			cylinders;	/* Number of cylinders */
 	u16			heads;		/* Number of heads */
 	u16			sectors;	/* Number of sectors per track */
+
+	DEFINE_ATA_ERING	(ering, ATA_DEV_ERING_SIZE);
 };
 
 struct ata_port {
-- 
1.2.4



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 02/14] libata-eh: implement ata_ering
  2006-04-11 13:48 [PATCHSET 6/9] new EH implementation, take 2 Tejun Heo
  2006-04-11 13:48 ` [PATCH 09/14] libata-eh: implement ata_eh_finish_qcs() Tejun Heo
@ 2006-04-11 13:48 ` Tejun Heo
  2006-04-11 13:48 ` [PATCH 01/14] libata-eh: add constants and flags to be used by EH Tejun Heo
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Tejun Heo @ 2006-04-11 13:48 UTC (permalink / raw)
  To: jgarzik, alan, axboe, albertcc, lkosewsk, linux-ide; +Cc: Tejun Heo

ata_ering is a ring buffer which records libata errors - whether a
command was for normar IO request, err_mask and timestamp.  This will
be used by EH to determine recovery actions.

Signed-off-by: Tejun Heo <htejun@gmail.com>

---

 drivers/scsi/libata-eh.c |   51 ++++++++++++++++++++++++++++++++++++++++++++++
 drivers/scsi/libata.h    |    1 +
 include/linux/libata.h   |   16 ++++++++++++++
 3 files changed, 68 insertions(+), 0 deletions(-)

c842df05517b8319f4b8cb4d739871f627c27a94
diff --git a/drivers/scsi/libata-eh.c b/drivers/scsi/libata-eh.c
index 1015f89..485ebaa 100644
--- a/drivers/scsi/libata-eh.c
+++ b/drivers/scsi/libata-eh.c
@@ -44,6 +44,57 @@
 
 #include "libata.h"
 
+void ata_ering_init(struct ata_ering *ering, int size)
+{
+	memset(ering, 0, sizeof(*ering) + sizeof(ering->ring[0]) * size);
+	ering->size = size;
+}
+
+static void ata_ering_record(struct ata_ering *ering, int is_io,
+			     unsigned int err_mask)
+{
+	struct ata_ering_entry *ent;
+
+	WARN_ON(!err_mask);
+
+	ering->cursor++;
+	ering->cursor %= ering->size;
+
+	ent = &ering->ring[ering->cursor];
+	ent->is_io = is_io;
+	ent->err_mask = err_mask;
+	ent->timestamp = get_jiffies_64();
+}
+
+static struct ata_ering_entry * ata_ering_top(struct ata_ering *ering)
+{
+	struct ata_ering_entry *ent = &ering->ring[ering->cursor];
+	if (!ent->err_mask)
+		return NULL;
+	return ent;
+}
+
+static int ata_ering_map(struct ata_ering *ering,
+			 int (*map_fn)(struct ata_ering_entry *, void *),
+			 void *arg)
+{
+	int idx, rc = 0;
+	struct ata_ering_entry *ent;
+
+	idx = ering->cursor;
+	do {
+		ent = &ering->ring[idx];
+		if (!ent->err_mask)
+			break;
+		rc = map_fn(ent, arg);
+		if (rc)
+			break;
+		idx = (idx - 1 + ering->size) % ering->size;
+	} while (idx != ering->cursor);
+
+	return rc;
+}
+
 /**
  *	ata_scsi_timed_out - SCSI layer time out callback
  *	@cmd: timed out SCSI command
diff --git a/drivers/scsi/libata.h b/drivers/scsi/libata.h
index b522aaa..7faa9a4 100644
--- a/drivers/scsi/libata.h
+++ b/drivers/scsi/libata.h
@@ -104,6 +104,7 @@ extern void ata_scsi_rbuf_fill(struct at
                                            u8 *rbuf, unsigned int buflen));
 
 /* libata-eh.c */
+extern void ata_ering_init(struct ata_ering *ering, int size);
 extern enum scsi_eh_timer_return ata_scsi_timed_out(struct scsi_cmnd *cmd);
 extern void ata_eh_schedule_qc(struct ata_queued_cmd *qc);
 
diff --git a/include/linux/libata.h b/include/linux/libata.h
index 58ad515..7a8d77b 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -373,6 +373,22 @@ struct ata_host_stats {
 	unsigned long		rw_reqbuf;
 };
 
+struct ata_ering_entry {
+	int			is_io;
+	unsigned int		err_mask;
+	u64			timestamp;
+};
+
+struct ata_ering {
+	int			cursor;
+	int			size;
+	struct ata_ering_entry	ring[];
+};
+
+#define DEFINE_ATA_ERING(name, size)	\
+	struct ata_ering	name;	\
+	struct ata_ering_entry	name_entries[size];
+
 struct ata_device {
 	u64			n_sectors;	/* size of device, if ATA */
 	unsigned long		flags;		/* ATA_DFLAG_xxx */
-- 
1.2.4



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 08/14] libata-eh: implement ata_eh_revive()
  2006-04-11 13:48 [PATCHSET 6/9] new EH implementation, take 2 Tejun Heo
                   ` (4 preceding siblings ...)
  2006-04-11 13:48 ` [PATCH 10/14] libata-eh: implement EH methods for BMDMA controllers Tejun Heo
@ 2006-04-11 13:48 ` Tejun Heo
  2006-04-19  9:08   ` zhao, forrest
  2006-04-11 13:48 ` [PATCH 11/14] ata_piix: convert to new EH Tejun Heo
                   ` (8 subsequent siblings)
  14 siblings, 1 reply; 24+ messages in thread
From: Tejun Heo @ 2006-04-11 13:48 UTC (permalink / raw)
  To: jgarzik, alan, axboe, albertcc, lkosewsk, linux-ide; +Cc: Tejun Heo

Implement EH helper function ata_eh_revive().  This function executes
what ata_eh_autopsy() and other parts of EH determined necessary to
resurrect the port.  As in ata_bus_probe(), each device is given fixed
number (ATA_EH_MAX_TRIES) of chances.  If a device uses up all its
chances and still fail to recover, it gets disabled.

Signed-off-by: Tejun Heo <htejun@gmail.com>

---

 drivers/scsi/libata-core.c |    1 
 drivers/scsi/libata-eh.c   |  206 ++++++++++++++++++++++++++++++++++++++++++++
 include/linux/libata.h     |    4 +
 3 files changed, 211 insertions(+), 0 deletions(-)

98acfd8b52af7a85887b7a38cadb9267a808bebc
diff --git a/drivers/scsi/libata-core.c b/drivers/scsi/libata-core.c
index e724a76..cb174cf 100644
--- a/drivers/scsi/libata-core.c
+++ b/drivers/scsi/libata-core.c
@@ -5345,3 +5345,4 @@ EXPORT_SYMBOL_GPL(ata_eh_qc_retry);
 EXPORT_SYMBOL_GPL(ata_eh_determine_qc);
 EXPORT_SYMBOL_GPL(ata_eh_autopsy);
 EXPORT_SYMBOL_GPL(ata_eh_report);
+EXPORT_SYMBOL_GPL(ata_eh_revive);
diff --git a/drivers/scsi/libata-eh.c b/drivers/scsi/libata-eh.c
index eebb165..1d25d55 100644
--- a/drivers/scsi/libata-eh.c
+++ b/drivers/scsi/libata-eh.c
@@ -918,3 +918,209 @@ void ata_eh_report(struct ata_port *ap, 
 	       tf->command, tf->feature, serror, action,
 	       desc_head, desc, desc_tail);
 }
+
+static void ata_eh_wait_before_reset(struct ata_port *ap)
+{
+	int scr_valid = ap->cbl == ATA_CBL_SATA && ap->ops->scr_read;
+	unsigned long timeout;
+
+	/* Give devices time to get ready before trying the first
+	 * reset.  Without this, devices tend to fail the first reset
+	 * under certain circumstances and cause much longer delay.
+	 */
+	timeout = jiffies + 5 * HZ;
+	ssleep(1);
+
+	if (scr_valid) {
+		while (time_before(jiffies, timeout)) {
+			if ((scr_read(ap, SCR_STATUS) & 0xf) != 0x1)
+				break;
+			msleep(100);
+		}
+	}
+
+	if (!scr_valid || sata_dev_present(ap)) {
+		while (time_before(jiffies, timeout)) {
+			if (!(ata_chk_status(ap) & ATA_BUSY))
+				break;
+			msleep(100);
+		}
+	}
+}
+
+/**
+ *	ata_eh_revive - revive host port after error
+ *	@ap: host port to revive
+ *	@action: action to perform to revive @ap
+ *	@softreset: softreset method (can be NULL)
+ *	@hardreset: hardreset method (can be NULL)
+ *	@postreset: postreset method (can be NULL)
+ *
+ *	Perform action specified by @action to revive host port @ap
+ *	after error.
+ *
+ *	LOCKING:
+ *	Kernel thread context (may sleep).
+ */
+int ata_eh_revive(struct ata_port *ap, unsigned int action,
+		  ata_reset_fn_t softreset, ata_reset_fn_t hardreset,
+		  ata_postreset_fn_t postreset)
+{
+	int scr_valid = ap->cbl == ATA_CBL_SATA && ap->ops->scr_read;
+	unsigned int classes[ATA_MAX_DEVICES];
+	int tries[ATA_MAX_DEVICES], reset_tries, nr_enabled;
+	struct ata_device *dev;
+	ata_reset_fn_t reset;
+	int i, down_xfermask, rc = 0;
+
+	if (!action)
+		goto out;
+
+	reset_tries = ATA_EH_MAX_TRIES;
+	nr_enabled = 0;
+	for (i = 0; i < ATA_MAX_DEVICES; i++) {
+		tries[i] = ATA_EH_MAX_TRIES;
+		if (ata_dev_enabled(&ap->device[i]))
+			nr_enabled++;
+	}
+
+	/* revalidate */
+	if (action == ATA_PORT_REVALIDATE) {
+		for (i = 0; i < ATA_MAX_DEVICES; i++) {
+			struct ata_device *dev = &ap->device[i];
+			if (!ata_dev_enabled(dev) ||
+			    !(dev->flags & ATA_DFLAG_FAILED))
+				continue;
+			if (ata_dev_revalidate(ap, dev, 0))
+				break;
+		}
+		if (i == ATA_MAX_DEVICES) {
+			rc = 0;
+			goto out;
+		}
+
+		action |= ATA_PORT_SOFTRESET;
+	}
+	action &= ~ATA_PORT_REVALIDATE;
+
+	/* Skip reset if possible. */
+	if (!nr_enabled && !(ap->flags & ATA_FLAG_FROZEN))
+		goto out;
+
+	/* give devices some time to breath */
+	ata_eh_wait_before_reset(ap);
+
+	if (softreset && (!hardreset || (!ata_set_sata_spd_needed(ap) &&
+					 action == ATA_PORT_SOFTRESET)))
+		reset = softreset;
+	else
+		reset = hardreset;
+
+ retry:
+	down_xfermask = 0;
+
+	/* reset.  postreset is responsible for thawing the port. */
+	printk("ata%u: %s resetting channel for error handling\n",
+	       ap->id, reset == softreset ? "soft" : "hard");
+
+	rc = ata_do_reset(ap, reset, postreset, classes);
+	if (rc)
+		goto fail_reset;
+
+	/* revalidate and reconfigure devices */
+	for (i = 0; i < ATA_MAX_DEVICES; i++) {
+		dev = &ap->device[i];
+
+		if (!ata_dev_enabled(dev))
+			continue;
+
+		if (scr_valid && !sata_dev_present(ap)) {
+			rc = -EIO;
+			goto fail;
+		}
+
+		rc = ata_dev_revalidate(ap, dev, 1);
+		if (rc)
+			goto fail;
+	}
+
+	/* configure transfer mode */
+	if (ap->ops->set_mode) {
+		/* FIXME: make ->set_mode handle no device case and
+		 * return error code and failing device on failure as
+		 * ata_set_mode() does.
+		 */
+		for (i = 0; i < ATA_MAX_DEVICES; i++)
+			if (ata_dev_enabled(&ap->device[i])) {
+				ap->ops->set_mode(ap);
+				break;
+			}
+		rc = 0;
+	} else
+		rc = ata_set_mode(ap, &dev);
+
+	if (rc) {
+		down_xfermask = 1;
+		goto fail;
+	}
+
+	goto out;
+
+ fail_reset:
+	if (!--reset_tries) {
+		printk(KERN_ERR "ata%u: EH reset failed, giving up\n", ap->id);
+		goto out;
+	}
+	if (reset == hardreset)
+		ata_down_sata_spd_limit(ap);
+	if (hardreset)
+		reset = hardreset;
+
+	printk(KERN_WARNING "ata%u: EH reset failed, will retry in 5 secs\n",
+	       ap->id);
+	ssleep(5);
+	goto retry;
+
+ fail:
+	switch (rc) {
+	case -EINVAL:
+	case -ENODEV:
+		tries[dev->devno] = 0;
+		break;
+	case -EIO:
+		ata_down_sata_spd_limit(ap);
+	default:
+		tries[dev->devno]--;
+		if (down_xfermask &&
+		    ata_down_xfermask_limit(ap, dev, tries[dev->devno] == 1))
+			tries[dev->devno] = 0;
+	}
+
+	if (!tries[dev->devno]) {
+		ata_dev_disable(ap, dev);
+		nr_enabled--;
+	}
+
+	if (nr_enabled) {
+		printk(KERN_WARNING "ata%u: some devices seem to be offline, "
+		       "will retry in 5 secs\n", ap->id);
+		ssleep(5);
+	} else {
+		/* no device left, repeat fast */
+		msleep(500);
+	}
+
+	if (hardreset)
+		reset = hardreset;
+	goto retry;
+
+ out:
+	for (i = 0; i < ATA_MAX_DEVICES; i++) {
+		dev = &ap->device[i];
+		dev->flags &= ~ATA_DFLAG_FAILED;
+		if (rc)
+			ata_dev_disable(ap, dev);
+	}
+
+	return rc;
+}
diff --git a/include/linux/libata.h b/include/linux/libata.h
index 5efadab..22472f6 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -238,6 +238,7 @@ enum {
 
 	/* how hard are we gonna try to probe/recover devices */
 	ATA_PROBE_MAX_TRIES	= 3,
+	ATA_EH_MAX_TRIES	= 3,
 };
 
 enum hsm_task_states {
@@ -696,6 +697,9 @@ extern unsigned int ata_eh_autopsy(struc
 extern void ata_eh_report(struct ata_port *ap, struct ata_queued_cmd *qc,
 			  const struct ata_taskfile *tf, u32 serror,
 			  unsigned int action, const char *desc);
+extern int ata_eh_revive(struct ata_port *ap, unsigned int action,
+			 ata_reset_fn_t softreset, ata_reset_fn_t hardreset,
+			 ata_postreset_fn_t postreset);
 
 
 static inline int
-- 
1.2.4



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 10/14] libata-eh: implement EH methods for BMDMA controllers
  2006-04-11 13:48 [PATCHSET 6/9] new EH implementation, take 2 Tejun Heo
                   ` (3 preceding siblings ...)
  2006-04-11 13:48 ` [PATCH 04/14] libata-eh: implement EH utility functions Tejun Heo
@ 2006-04-11 13:48 ` Tejun Heo
  2006-04-11 13:48 ` [PATCH 08/14] libata-eh: implement ata_eh_revive() Tejun Heo
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Tejun Heo @ 2006-04-11 13:48 UTC (permalink / raw)
  To: jgarzik, alan, axboe, albertcc, lkosewsk, linux-ide; +Cc: Tejun Heo

Implement EH methods for BMDMA controllers.  The followings are
defined.

* ata_bmdma_freeze: freeze BMDMA controller by turning on ATA_NIEN
* ata_bmdma_bmdma_drive_eh: drive BMDMA EH using given soft, hard and
			    post reset methods.
* ata_bmdma_error_handler: the stock BMDMA EH with stock reset
			   routines.
* ata_bmdma_post_internal_cmd: the stock BMDMA post_internal_cmd.
			       Makes sure BMDMA engine is stopped
			       after an internal command.

Signed-off-by: Tejun Heo <htejun@gmail.com>

---

 drivers/scsi/libata-bmdma.c |  112 +++++++++++++++++++++++++++++++++++++++++++
 drivers/scsi/libata-core.c  |    4 ++
 include/linux/libata.h      |    7 +++
 3 files changed, 123 insertions(+), 0 deletions(-)

e397d0ae739666692a9ca0d6ccbf083bfdf6b5ed
diff --git a/drivers/scsi/libata-bmdma.c b/drivers/scsi/libata-bmdma.c
index 835dff0..6e48ce5 100644
--- a/drivers/scsi/libata-bmdma.c
+++ b/drivers/scsi/libata-bmdma.c
@@ -652,6 +652,118 @@ void ata_bmdma_stop(struct ata_queued_cm
 	ata_altstatus(ap);        /* dummy read */
 }
 
+/**
+ *	ata_bmdma_freeze - Freeze BMDMA controller port
+ *	@ap: port to freeze
+ *
+ *	Freeze BMDMA controller port.
+ *
+ *	LOCKING:
+ *	Inherited from caller.
+ */
+void ata_bmdma_freeze(struct ata_port *ap)
+{
+	struct ata_ioports *ioaddr = &ap->ioaddr;
+
+	ap->ctl |= ATA_NIEN;
+	ap->last_ctl = ap->ctl;
+
+	if (ap->flags & ATA_FLAG_MMIO)
+		writeb(ap->ctl, (void __iomem *)ioaddr->ctl_addr);
+	else
+		outb(ap->ctl, ioaddr->ctl_addr);
+}
+
+/**
+ *	ata_bmdma_drive_eh - Perform EH with given methods for BMDMA controller
+ *	@ap: port to handle error for
+ *
+ *	Handle error for ATA BMDMA controller.  It can handle both
+ *	PATA and SATA controllers.  Many controllers should be able to
+ *	use this EH as-is or with some added handling before and
+ *	after.
+ *
+ *	This function is intended to be used for constructing
+ *	->error_handler callback by low level drivers.
+ *
+ *	LOCKING:
+ *	Kernel thread context (may sleep)
+ */
+void ata_bmdma_drive_eh(struct ata_port *ap,
+			ata_reset_fn_t softreset, ata_reset_fn_t hardreset,
+			ata_postreset_fn_t postreset)
+{
+	struct ata_host_set *host_set = ap->host_set;
+	unsigned int action = 0;
+	struct ata_queued_cmd *qc;
+	unsigned long flags;
+	struct ata_taskfile tf;
+	u32 serror;
+
+	qc = ata_eh_determine_qc(ap, &tf);
+
+	/* reset PIO HSM and stop DMA engine */
+	spin_lock_irqsave(&host_set->lock, flags);
+
+	ap->flags &= ~ATA_FLAG_NOINTR;
+	ap->hsm_task_state = HSM_ST_IDLE;
+
+	if (qc && (qc->tf.protocol == ATA_PROT_DMA ||
+		   qc->tf.protocol == ATA_PROT_ATAPI_DMA))
+		ap->ops->bmdma_stop(qc);
+
+	ata_altstatus(ap);
+	ata_chk_status(ap);
+	ap->ops->irq_clear(ap);
+
+	spin_unlock_irqrestore(&host_set->lock, flags);
+
+	/* PIO and DMA engines have been stopped, perform recovery */
+	serror = 0;
+	if (ap->cbl == ATA_CBL_SATA && ap->ops->scr_read) {
+		serror = scr_read(ap, SCR_ERROR);
+		scr_write(ap, SCR_ERROR, serror);
+	}
+
+	action |= ata_eh_autopsy(ap, qc, &tf, serror);
+	ata_eh_report(ap, qc, &tf, serror, action, NULL);
+	ata_eh_revive(ap, action, softreset, hardreset, postreset);
+	ata_eh_finish_qcs(ap, qc, &tf);
+}
+
+/**
+ *	ata_bmdma_error_handler - Stock error handler for BMDMA controller
+ *	@ap: port to handle error for
+ *
+ *	Stock error handler for BMDMA controller.
+ *
+ *	LOCKING:
+ *	Kernel thread context (may sleep)
+ */
+void ata_bmdma_error_handler(struct ata_port *ap)
+{
+	ata_reset_fn_t hardreset;
+
+	hardreset = NULL;
+	if (ap->flags & ATA_FLAG_SATA && ap->ops->scr_read)
+		hardreset = sata_std_hardreset;
+
+	ata_bmdma_drive_eh(ap, ata_std_softreset, hardreset, ata_std_postreset);
+}
+
+/**
+ *	ata_bmdma_post_internal_cmd - Stock post_internal_cmd for
+ *				      BMDMA controller
+ *	@qc: internal command to clean up
+ *
+ *	LOCKING:
+ *	Kernel thread context (may sleep)
+ */
+void ata_bmdma_post_internal_cmd(struct ata_queued_cmd *qc)
+{
+	ata_bmdma_stop(qc);
+}
+
 #ifdef CONFIG_PCI
 static struct ata_probe_ent *
 ata_probe_ent_alloc(struct device *dev, const struct ata_port_info *port)
diff --git a/drivers/scsi/libata-core.c b/drivers/scsi/libata-core.c
index 7605313..ac79c5c 100644
--- a/drivers/scsi/libata-core.c
+++ b/drivers/scsi/libata-core.c
@@ -5288,6 +5288,10 @@ EXPORT_SYMBOL_GPL(ata_bmdma_start);
 EXPORT_SYMBOL_GPL(ata_bmdma_irq_clear);
 EXPORT_SYMBOL_GPL(ata_bmdma_status);
 EXPORT_SYMBOL_GPL(ata_bmdma_stop);
+EXPORT_SYMBOL_GPL(ata_bmdma_freeze);
+EXPORT_SYMBOL_GPL(ata_bmdma_drive_eh);
+EXPORT_SYMBOL_GPL(ata_bmdma_error_handler);
+EXPORT_SYMBOL_GPL(ata_bmdma_post_internal_cmd);
 EXPORT_SYMBOL_GPL(ata_port_probe);
 EXPORT_SYMBOL_GPL(ata_set_sata_spd);
 EXPORT_SYMBOL_GPL(sata_phy_reset);
diff --git a/include/linux/libata.h b/include/linux/libata.h
index 7f626d2..603d800 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -626,6 +626,13 @@ extern void ata_bmdma_start (struct ata_
 extern void ata_bmdma_stop(struct ata_queued_cmd *qc);
 extern u8   ata_bmdma_status(struct ata_port *ap);
 extern void ata_bmdma_irq_clear(struct ata_port *ap);
+extern void ata_bmdma_freeze(struct ata_port *ap);
+extern void ata_bmdma_drive_eh(struct ata_port *ap,
+			       ata_reset_fn_t softreset,
+			       ata_reset_fn_t hardreset,
+			       ata_postreset_fn_t postreset);
+extern void ata_bmdma_error_handler(struct ata_port *ap);
+extern void ata_bmdma_post_internal_cmd(struct ata_queued_cmd *qc);
 extern void ata_qc_complete(struct ata_queued_cmd *qc);
 extern void ata_scsi_simulate(struct ata_port *ap, struct ata_device *dev,
 			      struct scsi_cmnd *cmd,
-- 
1.2.4



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 05/14] libata-eh: implement ata_eh_determine_qc()
  2006-04-11 13:48 [PATCHSET 6/9] new EH implementation, take 2 Tejun Heo
                   ` (6 preceding siblings ...)
  2006-04-11 13:48 ` [PATCH 11/14] ata_piix: convert to new EH Tejun Heo
@ 2006-04-11 13:48 ` Tejun Heo
  2006-04-11 13:48 ` [PATCH 06/14] libata-eh: implement ata_eh_autopsy() Tejun Heo
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Tejun Heo @ 2006-04-11 13:48 UTC (permalink / raw)
  To: jgarzik, alan, axboe, albertcc, lkosewsk, linux-ide; +Cc: Tejun Heo

Implement EH helper ata_eh_determine_qc().  This function determines
which is the offending qc and loads TF registers for the qc.

Signed-off-by: Tejun Heo <htejun@gmail.com>

---

 drivers/scsi/libata-core.c |    1 +
 drivers/scsi/libata-eh.c   |   24 ++++++++++++++++++++++++
 include/linux/libata.h     |    2 ++
 3 files changed, 27 insertions(+), 0 deletions(-)

96065c0f71d63ac4bb80855a432b2c435db73765
diff --git a/drivers/scsi/libata-core.c b/drivers/scsi/libata-core.c
index 13954f6..a4456bd 100644
--- a/drivers/scsi/libata-core.c
+++ b/drivers/scsi/libata-core.c
@@ -5342,3 +5342,4 @@ EXPORT_SYMBOL_GPL(ata_eng_timeout);
 EXPORT_SYMBOL_GPL(ata_eh_schedule_port);
 EXPORT_SYMBOL_GPL(ata_eh_qc_complete);
 EXPORT_SYMBOL_GPL(ata_eh_qc_retry);
+EXPORT_SYMBOL_GPL(ata_eh_determine_qc);
diff --git a/drivers/scsi/libata-eh.c b/drivers/scsi/libata-eh.c
index 7ef2b78..8a1a4c7 100644
--- a/drivers/scsi/libata-eh.c
+++ b/drivers/scsi/libata-eh.c
@@ -558,3 +558,27 @@ static unsigned int atapi_eh_request_sen
 	return ata_exec_internal(ap, dev, &tf, cdb, DMA_FROM_DEVICE,
 				 sense_buf, SCSI_SENSE_BUFFERSIZE);
 }
+
+/**
+ *	ata_eh_determine_qc - Determine which qc caused error
+ *	@ap: port which failed
+ *	@tf: resulting taskfile registers of the failed command
+ *
+ *	Determine which qc caused failure and read associated tf
+ *	registers.
+ *
+ *	LOCKING:
+ *	Kernel thread context (may sleep)
+ *
+ *	RETURNS:
+ *	Pointer to the failed qc.
+ */
+struct ata_queued_cmd * ata_eh_determine_qc(struct ata_port *ap,
+					    struct ata_taskfile *tf)
+{
+	memset(tf, 0, sizeof(*tf));
+	ap->ops->tf_read(ap, tf);
+
+	return __ata_qc_from_tag(ap, ap->active_tag);
+}
+
diff --git a/include/linux/libata.h b/include/linux/libata.h
index 9447bd2..6376379 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -688,6 +688,8 @@ extern void ata_eng_timeout(struct ata_p
 extern void ata_eh_schedule_port(struct ata_port *ap, unsigned int flags);
 extern void ata_eh_qc_complete(struct ata_queued_cmd *qc);
 extern void ata_eh_qc_retry(struct ata_queued_cmd *qc);
+extern struct ata_queued_cmd * ata_eh_determine_qc(struct ata_port *ap,
+						   struct ata_taskfile *tf);
 
 
 static inline int
-- 
1.2.4



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 11/14] ata_piix: convert to new EH
  2006-04-11 13:48 [PATCHSET 6/9] new EH implementation, take 2 Tejun Heo
                   ` (5 preceding siblings ...)
  2006-04-11 13:48 ` [PATCH 08/14] libata-eh: implement ata_eh_revive() Tejun Heo
@ 2006-04-11 13:48 ` Tejun Heo
  2006-04-11 13:48 ` [PATCH 05/14] libata-eh: implement ata_eh_determine_qc() Tejun Heo
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Tejun Heo @ 2006-04-11 13:48 UTC (permalink / raw)
  To: jgarzik, alan, axboe, albertcc, lkosewsk, linux-ide; +Cc: Tejun Heo

ata_piix can use stock BMDMA EH routines.  Convert to new EH.

Signed-off-by: Tejun Heo <htejun@gmail.com>

---

 drivers/scsi/ata_piix.c |    8 ++++++--
 1 files changed, 6 insertions(+), 2 deletions(-)

57fba9792fb7047599ffe8dfc9ca5854ea3b91fb
diff --git a/drivers/scsi/ata_piix.c b/drivers/scsi/ata_piix.c
index 8383970..382ea51 100644
--- a/drivers/scsi/ata_piix.c
+++ b/drivers/scsi/ata_piix.c
@@ -244,7 +244,9 @@ static const struct ata_port_operations 
 	.qc_prep		= ata_qc_prep,
 	.qc_issue		= ata_qc_issue_prot,
 
-	.eng_timeout		= ata_eng_timeout,
+	.freeze			= ata_bmdma_freeze,
+	.error_handler		= ata_bmdma_error_handler,
+	.post_internal_cmd	= ata_bmdma_post_internal_cmd,
 
 	.irq_handler		= ata_interrupt,
 	.irq_clear		= ata_bmdma_irq_clear,
@@ -272,7 +274,9 @@ static const struct ata_port_operations 
 	.qc_prep		= ata_qc_prep,
 	.qc_issue		= ata_qc_issue_prot,
 
-	.eng_timeout		= ata_eng_timeout,
+	.freeze			= ata_bmdma_freeze,
+	.error_handler		= ata_bmdma_error_handler,
+	.post_internal_cmd	= ata_bmdma_post_internal_cmd,
 
 	.irq_handler		= ata_interrupt,
 	.irq_clear		= ata_bmdma_irq_clear,
-- 
1.2.4



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 07/14] libata-eh: implement ata_eh_report()
  2006-04-11 13:48 [PATCHSET 6/9] new EH implementation, take 2 Tejun Heo
                   ` (9 preceding siblings ...)
  2006-04-11 13:48 ` [PATCH 03/14] libata-eh: add per-dev ata_ering Tejun Heo
@ 2006-04-11 13:48 ` Tejun Heo
  2006-04-11 13:48 ` [PATCH 12/14] sata_sil: convert to new EH Tejun Heo
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Tejun Heo @ 2006-04-11 13:48 UTC (permalink / raw)
  To: jgarzik, alan, axboe, albertcc, lkosewsk, linux-ide; +Cc: Tejun Heo

Implement EH helper function ata_eh_report().  This function reports
to user which command caused what kind of error on which device.
Detailed EH status is also printed to help tracking down the problem.
LLDDs may supply LLDD specific message to ata_eh_report().  It will
format them and print it together with other error information.

Using this function standardizes error messages over different drivers
helping both the users and developers.

Signed-off-by: Tejun Heo <htejun@gmail.com>

---

 drivers/scsi/libata-core.c |    1 +
 drivers/scsi/libata-eh.c   |   50 ++++++++++++++++++++++++++++++++++++++++++++
 include/linux/libata.h     |    3 +++
 3 files changed, 54 insertions(+), 0 deletions(-)

90078477b8c496028712d0f55a955e4615d8e9d9
diff --git a/drivers/scsi/libata-core.c b/drivers/scsi/libata-core.c
index 6b7f30d..e724a76 100644
--- a/drivers/scsi/libata-core.c
+++ b/drivers/scsi/libata-core.c
@@ -5344,3 +5344,4 @@ EXPORT_SYMBOL_GPL(ata_eh_qc_complete);
 EXPORT_SYMBOL_GPL(ata_eh_qc_retry);
 EXPORT_SYMBOL_GPL(ata_eh_determine_qc);
 EXPORT_SYMBOL_GPL(ata_eh_autopsy);
+EXPORT_SYMBOL_GPL(ata_eh_report);
diff --git a/drivers/scsi/libata-eh.c b/drivers/scsi/libata-eh.c
index 103ef28..eebb165 100644
--- a/drivers/scsi/libata-eh.c
+++ b/drivers/scsi/libata-eh.c
@@ -868,3 +868,53 @@ unsigned int ata_eh_autopsy(struct ata_p
 	return action;
 }
 
+/**
+ *	ata_eh_report - report error handling to user
+ *	@ap: host port EH is going on
+ *	@qc: failed qc (could be NULL)
+ *	@tf: current taskfile register values
+ *	@serror: SError register value
+ *	@action: determined recovery action
+ *	@desc: extra description
+ *
+ *	Report EH to user.
+ *
+ *	LOCKING:
+ *	None.
+ */
+void ata_eh_report(struct ata_port *ap, struct ata_queued_cmd *qc,
+		   const struct ata_taskfile *tf, u32 serror,
+		   unsigned int action, const char *desc)
+{
+	const char *desc_head, *desc_tail;
+
+	if (desc && desc[0] != '\0') {
+		desc_head = "      (";
+		desc_tail = ")\n";
+	} else {
+		desc_head = "";
+		desc = "";
+		desc_tail = "";
+	}
+
+	if (!qc) {
+		printk(KERN_ERR
+		       "ata%u: stat 0x%x err 0x%x SError 0x%x action 0x%x\n"
+		       "%s%s%s",
+		       ap->id, tf->command, tf->feature, serror,
+		       action, desc_head, desc, desc_tail);
+		return;
+	}
+
+	if (!qc->err_mask)
+		return;
+
+	printk(KERN_ERR
+	       "ata%u: dev %u command 0x%x tag %u failed with %s\n"
+	       "      Emask 0x%x stat 0x%x err 0x%x SErr 0x%x action 0x%x\n"
+	       "%s%s%s",
+	       ap->id, qc->dev->devno, qc->tf.command, qc->tag,
+	       ata_err_string(qc->err_mask), qc->err_mask,
+	       tf->command, tf->feature, serror, action,
+	       desc_head, desc, desc_tail);
+}
diff --git a/include/linux/libata.h b/include/linux/libata.h
index d7a51f3..5efadab 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -693,6 +693,9 @@ extern struct ata_queued_cmd * ata_eh_de
 extern unsigned int ata_eh_autopsy(struct ata_port *ap,
 				   struct ata_queued_cmd *qc,
 				   const struct ata_taskfile *tf, u32 serror);
+extern void ata_eh_report(struct ata_port *ap, struct ata_queued_cmd *qc,
+			  const struct ata_taskfile *tf, u32 serror,
+			  unsigned int action, const char *desc);
 
 
 static inline int
-- 
1.2.4



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 06/14] libata-eh: implement ata_eh_autopsy()
  2006-04-11 13:48 [PATCHSET 6/9] new EH implementation, take 2 Tejun Heo
                   ` (7 preceding siblings ...)
  2006-04-11 13:48 ` [PATCH 05/14] libata-eh: implement ata_eh_determine_qc() Tejun Heo
@ 2006-04-11 13:48 ` Tejun Heo
  2006-04-11 13:48 ` [PATCH 03/14] libata-eh: add per-dev ata_ering Tejun Heo
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Tejun Heo @ 2006-04-11 13:48 UTC (permalink / raw)
  To: jgarzik, alan, axboe, albertcc, lkosewsk, linux-ide; +Cc: Tejun Heo

Implement EH helper function ata_eh_autopsy().  This function analyzes
how the port and qc failed and determine what to do to recover from
the condition.

* Analyzes TF/SError
* Record the error and determine whether speeding down is necessary.
  If so, adjust relevant limits.
* Determine which action is required to recover - REVALIDATE,
  PORT_SOFTRESET or PORT_HARDRESET.

Signed-off-by: Tejun Heo <htejun@gmail.com>

---

 drivers/scsi/libata-core.c |    1 
 drivers/scsi/libata-eh.c   |  286 ++++++++++++++++++++++++++++++++++++++++++++
 include/linux/libata.h     |    3 
 3 files changed, 290 insertions(+), 0 deletions(-)

3a04374a8696fcaed6d00511dee9b0b9d05adec8
diff --git a/drivers/scsi/libata-core.c b/drivers/scsi/libata-core.c
index a4456bd..6b7f30d 100644
--- a/drivers/scsi/libata-core.c
+++ b/drivers/scsi/libata-core.c
@@ -5343,3 +5343,4 @@ EXPORT_SYMBOL_GPL(ata_eh_schedule_port);
 EXPORT_SYMBOL_GPL(ata_eh_qc_complete);
 EXPORT_SYMBOL_GPL(ata_eh_qc_retry);
 EXPORT_SYMBOL_GPL(ata_eh_determine_qc);
+EXPORT_SYMBOL_GPL(ata_eh_autopsy);
diff --git a/drivers/scsi/libata-eh.c b/drivers/scsi/libata-eh.c
index 8a1a4c7..103ef28 100644
--- a/drivers/scsi/libata-eh.c
+++ b/drivers/scsi/libata-eh.c
@@ -582,3 +582,289 @@ struct ata_queued_cmd * ata_eh_determine
 	return __ata_qc_from_tag(ap, ap->active_tag);
 }
 
+/**
+ *	ata_eh_analyze_tf - analyze taskfile of a failed qc
+ *	@qc: qc to analyze
+ *	@tf: Taskfile registers to analyze
+ *
+ *	Analyze taskfile of @qc and further determine cause of
+ *	failure.  This function also requests ATAPI sense data if
+ *	avaliable.
+ *
+ *	LOCKING:
+ *	Kernel thread context (may sleep)
+ *
+ *	RETURNS:
+ *	Determined recovery action
+ */
+static unsigned int ata_eh_analyze_tf(struct ata_queued_cmd *qc,
+				      const struct ata_taskfile *tf)
+{
+	unsigned int tmp, action = 0;
+	u8 stat = tf->command, err = tf->feature;
+
+	if ((stat & (ATA_BUSY | ATA_DRQ | ATA_DRDY)) != ATA_DRDY) {
+		qc->err_mask |= AC_ERR_HSM;
+		return ATA_PORT_SOFTRESET;
+	}
+
+	if (!(qc->err_mask & AC_ERR_DEV))
+		return 0;
+
+	switch (qc->dev->class) {
+	case ATA_DEV_ATA:
+		if (err & ATA_ICRC)
+			qc->err_mask |= AC_ERR_ATA_BUS;
+		if (err & ATA_UNC)
+			qc->err_mask |= AC_ERR_MEDIA;
+		if (err & ATA_IDNF)
+			qc->err_mask |= AC_ERR_INVALID;
+		break;
+
+	case ATA_DEV_ATAPI:
+		tmp = atapi_eh_request_sense(qc->ap, qc->dev,
+					     qc->scsicmd->sense_buffer);
+		if (!tmp) {
+			/*
+			 * ATA_QCFLAG_SENSE_VALID is used to tell
+			 * atapi_qc_complete() that sense data is
+			 * already valid.
+			 *
+			 * TODO: interpret sense data and set
+			 * appropriate err_mask.
+			 */
+			qc->err_mask &= ~AC_ERR_DEV;
+			qc->flags |= ATA_QCFLAG_SENSE_VALID;
+		} else
+			qc->err_mask |= tmp;
+	}
+
+	if (qc->err_mask) {
+		action |= ATA_PORT_REVALIDATE;
+		if (qc->err_mask &
+		    (AC_ERR_HSM | AC_ERR_TIMEOUT | AC_ERR_ATA_BUS))
+			action |= ATA_PORT_SOFTRESET;
+	}
+
+	return action;
+}
+
+/**
+ *	ata_eh_analyze_serror - analyze SError of a failed qc
+ *	@ap: ATA port to analyze SError for
+ *	@serror: SError to analyze
+ *	@p_err_mask: Resulting err_mask
+ *
+ *	Analyze SError if available and further determine cause of
+ *	failure.
+ *
+ *	LOCKING:
+ *	None.
+ *
+ *	RETURNS:
+ *	Determined recovery action
+ */
+static unsigned int ata_eh_analyze_serror(struct ata_port *ap, u32 serror,
+					  unsigned int *p_err_mask)
+{
+	unsigned int action = 0;
+
+	if (serror & SERR_PERSISTENT) {
+		*p_err_mask |= AC_ERR_ATA_BUS;
+		action |= ATA_PORT_HARDRESET;
+	}
+	if (serror &
+	    (SERR_DATA_RECOVERED | SERR_COMM_RECOVERED | SERR_DATA)) {
+		*p_err_mask |= AC_ERR_ATA_BUS;
+		action |= ATA_PORT_SOFTRESET;
+	}
+	if (serror & SERR_PROTOCOL) {
+		*p_err_mask |= AC_ERR_HSM;
+		action |= ATA_PORT_SOFTRESET;
+	}
+	if (serror & SERR_INTERNAL) {
+		*p_err_mask |= AC_ERR_SYSTEM;
+		action |= ATA_PORT_SOFTRESET;
+	}
+
+	return action;
+}
+
+static int ata_eh_categorize_ering_entry(struct ata_ering_entry *ent)
+{
+	if (ent->err_mask & (AC_ERR_ATA_BUS | AC_ERR_TIMEOUT))
+		return 1;
+
+	if (ent->is_io) {
+		if (ent->err_mask & AC_ERR_HSM)
+			return 1;
+		if ((ent->err_mask &
+		     (AC_ERR_DEV|AC_ERR_MEDIA|AC_ERR_INVALID)) == AC_ERR_DEV)
+			return 2;
+	}
+
+	return 0;
+}
+
+struct speed_down_needed_arg {
+	u64 since;
+	int nr_errors[3];
+};
+
+static int speed_down_needed_cb(struct ata_ering_entry *ent, void *void_arg)
+{
+	struct speed_down_needed_arg *arg = void_arg;
+
+	if (ent->timestamp < arg->since)
+		return -1;
+
+	arg->nr_errors[ata_eh_categorize_ering_entry(ent)]++;
+	return 0;
+}
+
+/**
+ *	ata_eh_speed_down_needed - Determine wheter speed down is necessary
+ *	@dev: Device of interest
+ *
+ *	This function examines error ring of @dev and determines
+ *	whether speed down is necessary.  Speed down is necessary if
+ *	there have been more than 3 of CAT-1 errors or 10 of Cat-2
+ *	errors during last 15 minutes.
+ *
+ *	Cat-1 errors are ATA_BUS, TIMEOUT for any command and HSM
+ *	violation for known supported commands.
+ *
+ *	Cat-2 errors are unclassified DEV error for known supported
+ *	command.
+ *
+ *	LOCKING:
+ *	Inherited from caller.
+ *
+ *	RETURNS:
+ *	1 if speed down is necessary, 0 otherwise
+ */
+static int ata_eh_speed_down_needed(struct ata_device *dev)
+{
+	const u64 interval = 15LLU * 60 * HZ;
+	static const int err_limits[3] = { -1, 3, 10 };
+	struct speed_down_needed_arg arg;
+	struct ata_ering_entry *ent;
+	int err_cat;
+	u64 j64;
+
+	ent = ata_ering_top(&dev->ering);
+	if (!ent)
+		return 0;
+
+	err_cat = ata_eh_categorize_ering_entry(ent);
+	if (err_cat == 0)
+		return 0;
+
+	memset(&arg, 0, sizeof(arg));
+
+	j64 = get_jiffies_64();
+	if (j64 >= interval)
+		arg.since = j64 - interval;
+	else
+		arg.since = 0;
+
+	ata_ering_map(&dev->ering, speed_down_needed_cb, &arg);
+
+	return arg.nr_errors[err_cat] > err_limits[err_cat];
+}
+
+/**
+ *	ata_eh_speed_down - record error and speed down if necessary
+ *	@ap: Host port failed device lives on
+ *	@dev: Failed device
+ *	@is_io: Did the device fail during normal IO?
+ *	@err_mask: err_mask of the error
+ *
+ *	Record error and examine error history to determine whether
+ *	adjusting transmission speed is necessary.  It also sets
+ *	transmission limits appropriately if such adjustment is
+ *	necessary.
+ *
+ *	LOCKING:
+ *	Kernel thread context (may sleep)
+ *
+ *	RETURNS:
+ *	0 on success, -errno otherwise
+ */
+static int ata_eh_speed_down(struct ata_port *ap, struct ata_device *dev,
+			     int is_io, unsigned int err_mask)
+{
+	if (!err_mask)
+		return 0;
+
+	/* record error and determine whether speed down is necessary */
+	ata_ering_record(&dev->ering, is_io, err_mask);
+
+	if (!ata_eh_speed_down_needed(dev))
+		return 0;
+
+	/* speed down SATA link speed if possible */
+	if (ata_down_sata_spd_limit(ap) == 0)
+		return ATA_PORT_HARDRESET;
+
+	/* lower transfer mode */
+	if (ata_down_xfermask_limit(ap, dev, 0) == 0)
+		return ATA_PORT_SOFTRESET;
+
+	printk(KERN_ERR "ata%u: dev %u speed down requested but no "
+	       "transfer mode left\n", ap->id, dev->devno);
+	return 0;
+}
+
+/**
+ *	ata_eh_autopsy - analyze error and determine recovery action
+ *	@ap: host port to perform autopsy on
+ *	@qc: failed command
+ *	@tf: taskfile registers to analyze
+ *	@serror: SError value to analyze
+ *
+ *	Analyze why @qc failed and determine which recovery action is
+ *	needed.  This function also sets more detailed AC_ERR_* values
+ *	and fills sense data for ATAPI CHECK SENSE.
+ *
+ *	LOCKING:
+ *	Kernel thread context (may sleep)
+ *
+ *	RETURNS:
+ *	Determined recovery action
+ */
+unsigned int ata_eh_autopsy(struct ata_port *ap, struct ata_queued_cmd *qc,
+			    const struct ata_taskfile *tf, u32 serror)
+{
+	unsigned int err_mask = 0, action = 0;
+
+	if (ap->flags & ATA_FLAG_FROZEN)
+		action |= ATA_PORT_SOFTRESET;
+
+	/* SError first */
+	action |= ata_eh_analyze_serror(ap, serror, &err_mask);
+
+	if (!qc)
+		return action;
+
+	/* we have qc, analyze TF, record and speed down */
+	qc->err_mask |= err_mask;
+
+	if (qc->err_mask & AC_ERR_TIMEOUT)
+		action |= ATA_PORT_SOFTRESET;
+
+	/* determine cause of failure. */
+	action |= ata_eh_analyze_tf(qc, tf);
+	action |= ata_eh_speed_down(ap, qc->dev, qc->flags & ATA_QCFLAG_IO,
+				    qc->err_mask);
+
+	/* DEV errors are probably spurious in case of ATA_BUS error */
+	if (qc->err_mask & AC_ERR_ATA_BUS)
+		qc->err_mask &= ~(AC_ERR_DEV | AC_ERR_MEDIA | AC_ERR_INVALID);
+
+	if (qc->err_mask)
+		action |= ATA_PORT_REVALIDATE;
+
+	return action;
+}
+
diff --git a/include/linux/libata.h b/include/linux/libata.h
index 6376379..d7a51f3 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -690,6 +690,9 @@ extern void ata_eh_qc_complete(struct at
 extern void ata_eh_qc_retry(struct ata_queued_cmd *qc);
 extern struct ata_queued_cmd * ata_eh_determine_qc(struct ata_port *ap,
 						   struct ata_taskfile *tf);
+extern unsigned int ata_eh_autopsy(struct ata_port *ap,
+				   struct ata_queued_cmd *qc,
+				   const struct ata_taskfile *tf, u32 serror);
 
 
 static inline int
-- 
1.2.4



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 09/14] libata-eh: implement ata_eh_finish_qcs()
  2006-04-11 13:48 [PATCHSET 6/9] new EH implementation, take 2 Tejun Heo
@ 2006-04-11 13:48 ` Tejun Heo
  2006-04-11 13:48 ` [PATCH 02/14] libata-eh: implement ata_ering Tejun Heo
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Tejun Heo @ 2006-04-11 13:48 UTC (permalink / raw)
  To: jgarzik, alan, axboe, albertcc, lkosewsk, linux-ide; +Cc: Tejun Heo

Implement EH helper function ata_eh_finish_qcs().  This function is
called after all EH actions are complete and finishes all the failed
qcs.  Depending on error status, a qc may be retried or completed.
This function is also responsible for loading qc->tf with resulting TF
values.

Signed-off-by: Tejun Heo <htejun@gmail.com>

---

 drivers/scsi/libata-core.c |    1 +
 drivers/scsi/libata-eh.c   |   45 ++++++++++++++++++++++++++++++++++++++++++++
 include/linux/libata.h     |    2 ++
 3 files changed, 48 insertions(+), 0 deletions(-)

a47d957e6a718253b6ce1b6131da2dcbbbb52641
diff --git a/drivers/scsi/libata-core.c b/drivers/scsi/libata-core.c
index cb174cf..7605313 100644
--- a/drivers/scsi/libata-core.c
+++ b/drivers/scsi/libata-core.c
@@ -5346,3 +5346,4 @@ EXPORT_SYMBOL_GPL(ata_eh_determine_qc);
 EXPORT_SYMBOL_GPL(ata_eh_autopsy);
 EXPORT_SYMBOL_GPL(ata_eh_report);
 EXPORT_SYMBOL_GPL(ata_eh_revive);
+EXPORT_SYMBOL_GPL(ata_eh_finish_qcs);
diff --git a/drivers/scsi/libata-eh.c b/drivers/scsi/libata-eh.c
index 1d25d55..0d8a9cb 100644
--- a/drivers/scsi/libata-eh.c
+++ b/drivers/scsi/libata-eh.c
@@ -1124,3 +1124,48 @@ int ata_eh_revive(struct ata_port *ap, u
 
 	return rc;
 }
+
+/**
+ *	ata_eh_finish_qcs - complete or retry commands
+ *	@ap: host port to finish qc's for
+ *	@qc: the failed qc (can be NULL)
+ *	@tf: taskfile register of the failed qc
+ *
+ *	Retry or complete failed qc's.
+ *
+ *	LOCKING:
+ *	None.
+ */
+void ata_eh_finish_qcs(struct ata_port *ap, struct ata_queued_cmd *qc,
+		       struct ata_taskfile *tf)
+{
+	struct ata_taskfile tmp_tf;
+
+	if (qc) {
+		/* prevent infinite retry loop */
+		if (!qc->err_mask && !(qc->flags & ATA_QCFLAG_SENSE_VALID)) {
+			printk(KERN_WARNING "ata%u: dev %u qc has no error "
+			       "flag set after EH, forcing AC_ERR_OTHER\n",
+			       ap->id, qc->dev->devno);
+			qc->err_mask |= AC_ERR_OTHER;
+		}
+
+		/* FIXME: qc->tf will be used by completion callbacks
+		 * to generate SCSI sense data.  This is to share
+		 * sense generation code with old-EH drivers.  Once EH
+		 * migration is complete, generate sense data in this
+		 * function, considering both err_mask and tf.
+		 */
+		tmp_tf = *tf;
+		tmp_tf.flags = qc->tf.flags;
+		tmp_tf.protocol = qc->tf.protocol;
+		tmp_tf.ctl = qc->tf.ctl;
+		qc->tf = tmp_tf;
+
+		if (qc->err_mask & AC_ERR_INVALID ||
+		    qc->flags & ATA_QCFLAG_SENSE_VALID)
+			ata_eh_qc_complete(qc);
+		else
+			ata_eh_qc_retry(qc);
+	}
+}
diff --git a/include/linux/libata.h b/include/linux/libata.h
index 22472f6..7f626d2 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -700,6 +700,8 @@ extern void ata_eh_report(struct ata_por
 extern int ata_eh_revive(struct ata_port *ap, unsigned int action,
 			 ata_reset_fn_t softreset, ata_reset_fn_t hardreset,
 			 ata_postreset_fn_t postreset);
+extern void ata_eh_finish_qcs(struct ata_port *ap, struct ata_queued_cmd *qc,
+			      struct ata_taskfile *tf);
 
 
 static inline int
-- 
1.2.4



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 13/14] ahci: convert to new EH
  2006-04-11 13:48 [PATCHSET 6/9] new EH implementation, take 2 Tejun Heo
                   ` (11 preceding siblings ...)
  2006-04-11 13:48 ` [PATCH 12/14] sata_sil: convert to new EH Tejun Heo
@ 2006-04-11 13:48 ` Tejun Heo
  2006-04-20  6:01   ` zhao, forrest
  2006-04-20  9:26   ` zhao, forrest
  2006-04-11 13:48 ` [PATCH 14/14] sata_sil24: " Tejun Heo
  2006-04-27  9:16 ` [PATCHSET 6/9] new EH implementation, take 2 Jeff Garzik
  14 siblings, 2 replies; 24+ messages in thread
From: Tejun Heo @ 2006-04-11 13:48 UTC (permalink / raw)
  To: jgarzik, alan, axboe, albertcc, lkosewsk, linux-ide; +Cc: Tejun Heo

Convert AHCI to new EH.  Unfortunately, ICH7 AHCI reacts badly if IRQ
mask is diddled during operation.  So, freezing is implemented by
unconditionally clearing interrupt conditions while frozen.

* AHCI interrupt handler does not analyze any of error conditions.  It
  just records relevant status registers in driver private area and
  invoke EH.  EH is responsible for decoding all those information.

* Interrupts are categorized according to required action.
  e.g. Connection status or unknown FIS error requires freezing the
  port while TF or HBUS_DATA don't.

* Only CONNECT (reflects SErr.X) interrupt is taken into account not
  PHYRDY (SErr.N), as CONNECT is better cue for starting EH.

* AHCI may be invoked without any active command.  e.g. CONNECT irq
  occuring while no qc in progress still triggers EH and will reset
  the port and revalidate attached device.

Signed-off-by: Tejun Heo <htejun@gmail.com>

---

 drivers/scsi/ahci.c |  248 ++++++++++++++++++++++++++++++++-------------------
 1 files changed, 154 insertions(+), 94 deletions(-)

29b703131da0ba4391f3e95bd49effc3b0b70d5d
diff --git a/drivers/scsi/ahci.c b/drivers/scsi/ahci.c
index 241ed5d..67950ee 100644
--- a/drivers/scsi/ahci.c
+++ b/drivers/scsi/ahci.c
@@ -71,6 +71,7 @@ enum {
 	AHCI_CMD_CLR_BUSY	= (1 << 10),
 
 	RX_FIS_D2H_REG		= 0x40,	/* offset of D2H Register FIS data */
+	RX_FIS_UNK		= 0x60, /* offset of Unknown FIS data */
 
 	board_ahci		= 0,
 
@@ -127,15 +128,16 @@ enum {
 	PORT_IRQ_PIOS_FIS	= (1 << 1), /* PIO Setup FIS rx'd */
 	PORT_IRQ_D2H_REG_FIS	= (1 << 0), /* D2H Register FIS rx'd */
 
-	PORT_IRQ_FATAL		= PORT_IRQ_TF_ERR |
-				  PORT_IRQ_HBUS_ERR |
-				  PORT_IRQ_HBUS_DATA_ERR |
-				  PORT_IRQ_IF_ERR,
-	DEF_PORT_IRQ		= PORT_IRQ_FATAL | PORT_IRQ_PHYRDY |
-				  PORT_IRQ_CONNECT | PORT_IRQ_SG_DONE |
-				  PORT_IRQ_UNK_FIS | PORT_IRQ_SDB_FIS |
-				  PORT_IRQ_DMAS_FIS | PORT_IRQ_PIOS_FIS |
-				  PORT_IRQ_D2H_REG_FIS,
+	PORT_IRQ_FREEZE		= PORT_IRQ_HBUS_ERR |
+				  PORT_IRQ_IF_ERR |
+				  PORT_IRQ_CONNECT |
+				  PORT_IRQ_UNK_FIS,
+	PORT_IRQ_ERROR		= PORT_IRQ_FREEZE |
+				  PORT_IRQ_TF_ERR |
+				  PORT_IRQ_HBUS_DATA_ERR,
+	DEF_PORT_IRQ		= PORT_IRQ_ERROR | PORT_IRQ_SG_DONE |
+				  PORT_IRQ_SDB_FIS | PORT_IRQ_DMAS_FIS |
+				  PORT_IRQ_PIOS_FIS | PORT_IRQ_D2H_REG_FIS,
 
 	/* PORT_CMD bits */
 	PORT_CMD_ATAPI		= (1 << 24), /* Device is ATAPI */
@@ -184,6 +186,9 @@ struct ahci_port_priv {
 	struct ahci_sg		*cmd_tbl_sg;
 	void			*rx_fis;
 	dma_addr_t		rx_fis_dma;
+	/* register values stored by interrupt handler for EH */
+	u32			eh_irq_stat;
+	u32			eh_serror;
 };
 
 static u32 ahci_scr_read (struct ata_port *ap, unsigned int sc_reg);
@@ -193,13 +198,13 @@ static unsigned int ahci_qc_issue(struct
 static irqreturn_t ahci_interrupt (int irq, void *dev_instance, struct pt_regs *regs);
 static int ahci_probe_reset(struct ata_port *ap, unsigned int *classes);
 static void ahci_irq_clear(struct ata_port *ap);
-static void ahci_eng_timeout(struct ata_port *ap);
+static void ahci_error_handler(struct ata_port *ap);
+static void ahci_post_internal_cmd(struct ata_queued_cmd *qc);
 static int ahci_port_start(struct ata_port *ap);
 static void ahci_port_stop(struct ata_port *ap);
 static void ahci_tf_read(struct ata_port *ap, struct ata_taskfile *tf);
 static void ahci_qc_prep(struct ata_queued_cmd *qc);
 static u8 ahci_check_status(struct ata_port *ap);
-static inline int ahci_host_intr(struct ata_port *ap, struct ata_queued_cmd *qc);
 static void ahci_remove_one (struct pci_dev *pdev);
 
 static struct scsi_host_template ahci_sht = {
@@ -234,7 +239,8 @@ static const struct ata_port_operations 
 	.qc_prep		= ahci_qc_prep,
 	.qc_issue		= ahci_qc_issue,
 
-	.eng_timeout		= ahci_eng_timeout,
+	.error_handler		= ahci_error_handler,
+	.post_internal_cmd	= ahci_post_internal_cmd,
 
 	.irq_handler		= ahci_interrupt,
 	.irq_clear		= ahci_irq_clear,
@@ -757,109 +763,170 @@ static void ahci_qc_prep(struct ata_queu
 	ahci_fill_cmd_slot(pp, opts);
 }
 
-static void ahci_restart_port(struct ata_port *ap, u32 irq_stat)
+static unsigned int ahci_eh_autopsy(struct ata_port *ap, u32 irq_stat,
+				    unsigned int *r_err_mask,
+				    char *desc, size_t desc_sz)
 {
-	void __iomem *mmio = ap->host_set->mmio_base;
-	void __iomem *port_mmio = ahci_port_base(mmio, ap->port_no);
-	u32 tmp;
+	struct ahci_port_priv *pp = ap->private_data;
+	unsigned int err_mask = 0, action = 0;
+	int rc;
 
-	if ((ap->device[0].class != ATA_DEV_ATAPI) ||
-	    ((irq_stat & PORT_IRQ_TF_ERR) == 0))
-		printk(KERN_WARNING "ata%u: port reset, "
-		       "p_is %x is %x pis %x cmd %x tf %x ss %x se %x\n",
-			ap->id,
-			irq_stat,
-			readl(mmio + HOST_IRQ_STAT),
-			readl(port_mmio + PORT_IRQ_STAT),
-			readl(port_mmio + PORT_CMD),
-			readl(port_mmio + PORT_TFDATA),
-			readl(port_mmio + PORT_SCR_STAT),
-			readl(port_mmio + PORT_SCR_ERR));
+	rc = scnprintf(desc, desc_sz, "irq_stat 0x%08x", irq_stat);
+	desc += rc;
+	desc_sz -= rc;
 
-	/* stop DMA */
-	ahci_stop_engine(ap);
+	if (irq_stat & PORT_IRQ_TF_ERR)
+		err_mask |= AC_ERR_DEV;
 
-	/* clear SATA phy error, if any */
-	tmp = readl(port_mmio + PORT_SCR_ERR);
-	writel(tmp, port_mmio + PORT_SCR_ERR);
+	if (irq_stat & (PORT_IRQ_HBUS_ERR | PORT_IRQ_HBUS_DATA_ERR)) {
+		err_mask |= AC_ERR_HOST_BUS;
+		action |= ATA_PORT_SOFTRESET;
+	}
 
-	/* if DRQ/BSY is set, device needs to be reset.
-	 * if so, issue COMRESET
-	 */
-	tmp = readl(port_mmio + PORT_TFDATA);
-	if (tmp & (ATA_BUSY | ATA_DRQ)) {
-		writel(0x301, port_mmio + PORT_SCR_CTL);
-		readl(port_mmio + PORT_SCR_CTL); /* flush */
-		udelay(10);
-		writel(0x300, port_mmio + PORT_SCR_CTL);
-		readl(port_mmio + PORT_SCR_CTL); /* flush */
+	if (irq_stat & PORT_IRQ_IF_ERR) {
+		err_mask |= AC_ERR_ATA_BUS;
+		action |= ATA_PORT_SOFTRESET;
+		rc = scnprintf(desc, desc_sz, ", interface fatal error");
+		desc += rc;
+		desc_sz -= rc;
 	}
 
-	/* re-start DMA */
-	ahci_start_engine(ap);
+	if (irq_stat & PORT_IRQ_CONNECT) {
+		err_mask |= AC_ERR_ATA_BUS;
+		action |= ATA_PORT_SOFTRESET;
+		rc = scnprintf(desc, desc_sz, ", connection status changed");
+		desc += rc;
+		desc_sz -= rc;
+	}
+
+	if (irq_stat & PORT_IRQ_UNK_FIS) {
+		u32 *unk = (u32 *)(pp->rx_fis + RX_FIS_UNK);
+
+		err_mask |= AC_ERR_HSM;
+		action |= ATA_PORT_SOFTRESET;
+		rc = scnprintf(desc, desc_sz,
+			       ", unknown FIS %08x %08x %08x %08x",
+			       unk[0], unk[1], unk[2], unk[3]);
+		desc += rc;
+		desc_sz -= rc;
+	}
+
+	*r_err_mask |= err_mask;
+	return action;
 }
 
-static void ahci_eng_timeout(struct ata_port *ap)
+static void ahci_error_handler(struct ata_port *ap)
 {
-	struct ata_host_set *host_set = ap->host_set;
-	void __iomem *mmio = host_set->mmio_base;
-	void __iomem *port_mmio = ahci_port_base(mmio, ap->port_no);
+	struct ahci_port_priv *pp = ap->private_data;
+	unsigned int action = 0;
+	unsigned int err_mask = 0;
+	unsigned flags;
+	u32 irq_stat, serror;
+	struct ata_taskfile tf;
 	struct ata_queued_cmd *qc;
-	unsigned long flags;
+	char desc[70] = "";
+
+	/* fetch & clear error information from interrupt handler */
+	spin_lock_irqsave(&ap->host_set->lock, flags);
+
+	irq_stat = pp->eh_irq_stat;
+	serror = pp->eh_serror;
+	pp->eh_irq_stat = 0;
+	pp->eh_serror = 0;
 
-	printk(KERN_WARNING "ata%u: handling error/timeout\n", ap->id);
+	spin_unlock_irqrestore(&ap->host_set->lock, flags);
 
-	spin_lock_irqsave(&host_set->lock, flags);
+	if (!(ap->flags & ATA_FLAG_FROZEN)) {
+		/* restart engine */
+		ahci_stop_engine(ap);
+		ahci_start_engine(ap);
+	}
+
+	/* perform recovery */
+	action |= ahci_eh_autopsy(ap, irq_stat, &err_mask, desc, sizeof(desc));
+
+	qc = ata_eh_determine_qc(ap, &tf);
+	if (qc)
+		qc->err_mask |= err_mask;
+
+	action |= ata_eh_autopsy(ap, qc, &tf, serror);
+	ata_eh_report(ap, qc, &tf, serror, action, desc);
+	ata_eh_revive(ap, action,
+		      ahci_softreset, ahci_hardreset, ahci_postreset);
+	ata_eh_finish_qcs(ap, qc, &tf);
+}
 
-	ahci_restart_port(ap, readl(port_mmio + PORT_IRQ_STAT));
-	qc = ata_qc_from_tag(ap, ap->active_tag);
-	qc->err_mask |= AC_ERR_TIMEOUT;
+static void ahci_post_internal_cmd(struct ata_queued_cmd *qc)
+{
+	struct ata_port *ap = qc->ap;
 
-	spin_unlock_irqrestore(&host_set->lock, flags);
+	if (qc->flags & ATA_QCFLAG_FAILED)
+		qc->err_mask |= AC_ERR_OTHER;
 
-	ata_eh_qc_complete(qc);
+	if (qc->err_mask) {
+		/* make DMA engine forget about the failed command */
+		ahci_stop_engine(ap);
+		ahci_start_engine(ap);
+	}
 }
 
-static inline int ahci_host_intr(struct ata_port *ap, struct ata_queued_cmd *qc)
+static inline void ahci_host_intr(struct ata_port *ap)
 {
+	struct ahci_port_priv *pp = ap->private_data;
 	void __iomem *mmio = ap->host_set->mmio_base;
 	void __iomem *port_mmio = ahci_port_base(mmio, ap->port_no);
-	u32 status, serr, ci;
-
-	serr = readl(port_mmio + PORT_SCR_ERR);
-	writel(serr, port_mmio + PORT_SCR_ERR);
+	u32 status, serror, ci;
+	unsigned int eh_flags;
 
 	status = readl(port_mmio + PORT_IRQ_STAT);
 	writel(status, port_mmio + PORT_IRQ_STAT);
 
-	ci = readl(port_mmio + PORT_CMD_ISSUE);
-	if (likely((ci & 0x1) == 0)) {
-		if (qc) {
-			WARN_ON(qc->err_mask);
-			ata_qc_complete(qc);
-			qc = NULL;
-		}
+	/* AHCI gets unhappy if IRQ mask is diddled with while the
+	 * port is active, so we cannot disable IRQ when freezing.
+	 * Clear IRQ conditions and hope screaming IRQs don't happen.
+	 */
+	if (ap->flags & ATA_FLAG_FROZEN) {
+		/* some AHCI errors hang the controller until SError
+		 * is cleared.  Store and clear it.
+		 */
+		serror = scr_read(ap, SCR_ERROR);
+		scr_write(ap, SCR_ERROR, serror);
+		pp->eh_irq_stat |= status;
+		pp->eh_serror |= serror;
+		return;
 	}
 
-	if (status & PORT_IRQ_FATAL) {
-		unsigned int err_mask;
-		if (status & PORT_IRQ_TF_ERR)
-			err_mask = AC_ERR_DEV;
-		else if (status & PORT_IRQ_IF_ERR)
-			err_mask = AC_ERR_ATA_BUS;
-		else
-			err_mask = AC_ERR_HOST_BUS;
-
-		/* command processing has stopped due to error; restart */
-		ahci_restart_port(ap, status);
-
-		if (qc) {
-			qc->err_mask |= err_mask;
-			ata_qc_complete(qc);
+	if (!(status & PORT_IRQ_ERROR)) {
+		struct ata_queued_cmd *qc;
+
+		if ((qc = ata_qc_from_tag(ap, ap->active_tag))) {
+			ci = readl(port_mmio + PORT_CMD_ISSUE);
+			if ((ci & 0x1) == 0) {
+				ata_qc_complete(qc);
+				return;
+			}
 		}
+
+		if (ata_ratelimit())
+			printk(KERN_INFO "ata%u: spurious interrupt "
+			       "(irq_stat 0x%x active_tag %d)\n",
+			       ap->id, status, ap->active_tag);
+
+		return;
 	}
 
-	return 1;
+	/* Something weird is going on.  Hand over to EH. */
+	serror = scr_read(ap, SCR_ERROR);
+	scr_write(ap, SCR_ERROR, serror);
+
+	pp->eh_irq_stat = status;
+	pp->eh_serror = serror;
+
+	eh_flags = ATA_EH_ABORT;
+	if (status & PORT_IRQ_FREEZE)
+		eh_flags |= ATA_EH_FREEZE;
+
+	ata_eh_schedule_port(ap, eh_flags);
 }
 
 static void ahci_irq_clear(struct ata_port *ap)
@@ -896,14 +963,7 @@ static irqreturn_t ahci_interrupt (int i
 
 		ap = host_set->ports[i];
 		if (ap) {
-			struct ata_queued_cmd *qc;
-			qc = ata_qc_from_tag(ap, ap->active_tag);
-			if (!ahci_host_intr(ap, qc))
-				if (ata_ratelimit())
-					dev_printk(KERN_WARNING, host_set->dev,
-					  "unhandled interrupt on port %u\n",
-					  i);
-
+			ahci_host_intr(ap);
 			VPRINTK("port %u\n", i);
 		} else {
 			VPRINTK("port %u (no irq)\n", i);
@@ -920,7 +980,7 @@ static irqreturn_t ahci_interrupt (int i
 		handled = 1;
 	}
 
-        spin_unlock(&host_set->lock);
+	spin_unlock(&host_set->lock);
 
 	VPRINTK("EXIT\n");
 
-- 
1.2.4



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 12/14] sata_sil: convert to new EH
  2006-04-11 13:48 [PATCHSET 6/9] new EH implementation, take 2 Tejun Heo
                   ` (10 preceding siblings ...)
  2006-04-11 13:48 ` [PATCH 07/14] libata-eh: implement ata_eh_report() Tejun Heo
@ 2006-04-11 13:48 ` Tejun Heo
  2006-04-11 13:48 ` [PATCH 13/14] ahci: " Tejun Heo
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Tejun Heo @ 2006-04-11 13:48 UTC (permalink / raw)
  To: jgarzik, alan, axboe, albertcc, lkosewsk, linux-ide; +Cc: Tejun Heo

Convert sata_sil to new EH.  As these controllers have hardware
interrupt mask and are known to have screaming interrupts issues, use
hardware IRQ masking for freezing.  sil_freeze() masks interrupts for
the port and sil_postreset() thaws it.  As ports are automatically
frozen before probing reset, there is no need to initialize interrupt
masks sil_init_onde().  Remove related code.

Other than freezing, sata_sil uses stock BMDMA EH routines.a

Signed-off-by: Tejun Heo <htejun@gmail.com>

---

 drivers/scsi/sata_sil.c |   62 +++++++++++++++++++++++++++++++++++------------
 1 files changed, 46 insertions(+), 16 deletions(-)

32a0f7961c59f9de17e29356f06a072367c64c43
diff --git a/drivers/scsi/sata_sil.c b/drivers/scsi/sata_sil.c
index f29c3e7..4b9b2d1 100644
--- a/drivers/scsi/sata_sil.c
+++ b/drivers/scsi/sata_sil.c
@@ -95,7 +95,10 @@ static int sil_init_one (struct pci_dev 
 static void sil_dev_config(struct ata_port *ap, struct ata_device *dev);
 static u32 sil_scr_read (struct ata_port *ap, unsigned int sc_reg);
 static void sil_scr_write (struct ata_port *ap, unsigned int sc_reg, u32 val);
+static int sil_probe_reset(struct ata_port *ap, unsigned int *classes);
 static void sil_post_set_mode (struct ata_port *ap);
+static void sil_freeze(struct ata_port *ap);
+static void sil_error_handler(struct ata_port *ap);
 
 
 static const struct pci_device_id sil_pci_tbl[] = {
@@ -167,7 +170,7 @@ static const struct ata_port_operations 
 	.check_status		= ata_check_status,
 	.exec_command		= ata_exec_command,
 	.dev_select		= ata_std_dev_select,
-	.probe_reset		= ata_std_probe_reset,
+	.probe_reset		= sil_probe_reset,
 	.post_set_mode		= sil_post_set_mode,
 	.bmdma_setup            = ata_bmdma_setup,
 	.bmdma_start            = ata_bmdma_start,
@@ -175,7 +178,9 @@ static const struct ata_port_operations 
 	.bmdma_status		= ata_bmdma_status,
 	.qc_prep		= ata_qc_prep,
 	.qc_issue		= ata_qc_issue_prot,
-	.eng_timeout		= ata_eng_timeout,
+	.freeze			= sil_freeze,
+	.error_handler		= sil_error_handler,
+	.post_internal_cmd	= ata_bmdma_post_internal_cmd,
 	.irq_handler		= ata_interrupt,
 	.irq_clear		= ata_bmdma_irq_clear,
 	.scr_read		= sil_scr_read,
@@ -315,6 +320,44 @@ static void sil_scr_write (struct ata_po
 		writel(val, mmio);
 }
 
+static void sil_postreset(struct ata_port *ap, unsigned int *classes)
+{
+	void __iomem *mmio_base = ap->host_set->mmio_base;
+	u32 tmp;
+
+	ata_std_postreset(ap, classes);
+
+	/* everything is back to normal, turn on IRQ */
+	tmp = readl(mmio_base + SIL_SYSCFG);
+	tmp &= ~(SIL_MASK_IDE0_INT << ap->port_no);
+	writel(tmp, mmio_base + SIL_SYSCFG);
+}
+
+static int sil_probe_reset(struct ata_port *ap, unsigned int *classes)
+{
+	return ata_drive_probe_reset(ap, ata_std_probeinit,
+				     ata_std_softreset, sata_std_hardreset,
+				     sil_postreset, classes);
+}
+
+static void sil_freeze(struct ata_port *ap)
+{
+	void __iomem *mmio_base = ap->host_set->mmio_base;
+	u32 tmp;
+
+	/* plug IRQ */
+	tmp = readl(mmio_base + SIL_SYSCFG);
+	tmp |= SIL_MASK_IDE0_INT << ap->port_no;
+	writel(tmp, mmio_base + SIL_SYSCFG);
+	readl(mmio_base + SIL_SYSCFG);	/* flush */
+}
+
+static void sil_error_handler(struct ata_port *ap)
+{
+	ata_bmdma_drive_eh(ap, ata_std_softreset, sata_std_hardreset,
+			   sil_postreset);
+}
+
 /**
  *	sil_dev_config - Apply device/host-specific errata fixups
  *	@ap: Port containing device to be examined
@@ -385,7 +428,7 @@ static int sil_init_one (struct pci_dev 
 	int rc;
 	unsigned int i;
 	int pci_dev_busy = 0;
-	u32 tmp, irq_mask;
+	u32 tmp;
 	u8 cls;
 
 	if (!printed_version++)
@@ -475,24 +518,11 @@ static int sil_init_one (struct pci_dev 
 	}
 
 	if (ent->driver_data == sil_3114) {
-		irq_mask = SIL_MASK_4PORT;
-
 		/* flip the magic "make 4 ports work" bit */
 		tmp = readl(mmio_base + sil_port[2].bmdma);
 		if ((tmp & SIL_INTR_STEERING) == 0)
 			writel(tmp | SIL_INTR_STEERING,
 			       mmio_base + sil_port[2].bmdma);
-
-	} else {
-		irq_mask = SIL_MASK_2PORT;
-	}
-
-	/* make sure IDE0/1/2/3 interrupts are not masked */
-	tmp = readl(mmio_base + SIL_SYSCFG);
-	if (tmp & irq_mask) {
-		tmp &= ~irq_mask;
-		writel(tmp, mmio_base + SIL_SYSCFG);
-		readl(mmio_base + SIL_SYSCFG);	/* flush */
 	}
 
 	/* mask all SATA phy-related interrupts */
-- 
1.2.4



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 14/14] sata_sil24: convert to new EH
  2006-04-11 13:48 [PATCHSET 6/9] new EH implementation, take 2 Tejun Heo
                   ` (12 preceding siblings ...)
  2006-04-11 13:48 ` [PATCH 13/14] ahci: " Tejun Heo
@ 2006-04-11 13:48 ` Tejun Heo
  2006-04-27  9:16 ` [PATCHSET 6/9] new EH implementation, take 2 Jeff Garzik
  14 siblings, 0 replies; 24+ messages in thread
From: Tejun Heo @ 2006-04-11 13:48 UTC (permalink / raw)
  To: jgarzik, alan, axboe, albertcc, lkosewsk, linux-ide; +Cc: Tejun Heo

Convert sata_sil24 to new EH.

* When port is frozen, IRQ for the port is masked.

* sil24_softreset() doesn't need to mangle with IRQ mask anymore.
  libata ensures that the port is frozen during reset.

* Only turn on interrupts which are handled by interrupt handler and
  EH.  As we don't handle SDB notify yet, turn it off. DEV_XCHG and
  UNK_FIS are handled by EH and thus turned on.

* sil24_softreset() usually fails to recover the port after DEV_XCHG.
  ATA_PORT_HARDRESET is used as recovery action for DEV_XCHG.

* sil24 may be invoked without any active command.  e.g. DEV_XCHG irq
  occuring while no qc in progress still triggers EH and will reset
  the port and revalidate attached device.

Signed-off-by: Tejun Heo <htejun@gmail.com>

---

 drivers/scsi/sata_sil24.c |  330 +++++++++++++++++++++++++++------------------
 1 files changed, 195 insertions(+), 135 deletions(-)

0300dfe12d10341a2f82a93b3cc64436352088a0
diff --git a/drivers/scsi/sata_sil24.c b/drivers/scsi/sata_sil24.c
index 9320368..bbbc18a 100644
--- a/drivers/scsi/sata_sil24.c
+++ b/drivers/scsi/sata_sil24.c
@@ -156,6 +156,10 @@ enum {
 	PORT_IRQ_HANDSHAKE	= (1 << 10), /* handshake error threshold */
 	PORT_IRQ_SDB_NOTIFY	= (1 << 11), /* SDB notify received */
 
+	PORT_IRQ_FREEZE		= PORT_IRQ_DEV_XCHG | PORT_IRQ_UNK_FIS,
+	DEF_PORT_IRQ		= PORT_IRQ_FREEZE |
+				  PORT_IRQ_COMPLETE | PORT_IRQ_ERROR,
+
 	/* bits[27:16] are unmasked (raw) */
 	PORT_IRQ_RAW_SHIFT	= 16,
 	PORT_IRQ_MASKED_MASK	= 0x7ff,
@@ -242,6 +246,58 @@ union sil24_cmd_block {
 	struct sil24_atapi_block atapi;
 };
 
+static struct sil24_cerr_info {
+	unsigned int err_mask, action;
+	const char *desc;
+} sil24_cerr_db[] = {
+	[0]			= { AC_ERR_DEV, 0,	/* covers ATAPI CC */
+				    "device error" },
+	[PORT_CERR_DEV]		= { AC_ERR_DEV, ATA_PORT_REVALIDATE,
+				    "device error via D2H FIS" },
+	[PORT_CERR_SDB]		= { AC_ERR_DEV, ATA_PORT_REVALIDATE,
+				    "device error via SDB FIS" },
+	[PORT_CERR_DATA]	= { AC_ERR_ATA_BUS, ATA_PORT_SOFTRESET,
+				    "error in data FIS" },
+	[PORT_CERR_SEND]	= { AC_ERR_ATA_BUS, ATA_PORT_SOFTRESET,
+				    "failed to transmit command FIS" },
+	[PORT_CERR_INCONSISTENT] = { AC_ERR_HSM, ATA_PORT_SOFTRESET,
+				     "protocol mismatch" },
+	[PORT_CERR_DIRECTION]	= { AC_ERR_HSM, ATA_PORT_SOFTRESET,
+				    "data directon mismatch" },
+	[PORT_CERR_UNDERRUN]	= { AC_ERR_HSM, ATA_PORT_SOFTRESET,
+				    "ran out of SGEs while writing" },
+	[PORT_CERR_OVERRUN]	= { AC_ERR_HSM, ATA_PORT_SOFTRESET,
+				    "ran out of SGEs while reading" },
+	[PORT_CERR_PKT_PROT]	= { AC_ERR_HSM, ATA_PORT_SOFTRESET,
+				    "invalid data directon for ATAPI CDB" },
+	[PORT_CERR_SGT_BOUNDARY] = { AC_ERR_SYSTEM, ATA_PORT_SOFTRESET,
+				     "SGT no on qword boundary" },
+	[PORT_CERR_SGT_TGTABRT]	= { AC_ERR_HOST_BUS, ATA_PORT_SOFTRESET,
+				    "PCI target abort while fetching SGT" },
+	[PORT_CERR_SGT_MSTABRT]	= { AC_ERR_HOST_BUS, ATA_PORT_SOFTRESET,
+				    "PCI master abort while fetching SGT" },
+	[PORT_CERR_SGT_PCIPERR]	= { AC_ERR_HOST_BUS, ATA_PORT_SOFTRESET,
+				    "PCI parity error while fetching SGT" },
+	[PORT_CERR_CMD_BOUNDARY] = { AC_ERR_SYSTEM, ATA_PORT_SOFTRESET,
+				     "PRB not on qword boundary" },
+	[PORT_CERR_CMD_TGTABRT]	= { AC_ERR_HOST_BUS, ATA_PORT_SOFTRESET,
+				    "PCI target abort while fetching PRB" },
+	[PORT_CERR_CMD_MSTABRT]	= { AC_ERR_HOST_BUS, ATA_PORT_SOFTRESET,
+				    "PCI master abort while fetching PRB" },
+	[PORT_CERR_CMD_PCIPERR]	= { AC_ERR_HOST_BUS, ATA_PORT_SOFTRESET,
+				    "PCI parity error while fetching PRB" },
+	[PORT_CERR_XFR_UNDEF]	= { AC_ERR_HOST_BUS, ATA_PORT_SOFTRESET,
+				    "undefined error while transferring data" },
+	[PORT_CERR_XFR_TGTABRT]	= { AC_ERR_HOST_BUS, ATA_PORT_SOFTRESET,
+				    "PCI target abort while transferring data" },
+	[PORT_CERR_XFR_MSTABRT]	= { AC_ERR_HOST_BUS, ATA_PORT_SOFTRESET,
+				    "PCI master abort while transferring data" },
+	[PORT_CERR_XFR_PCIPERR]	= { AC_ERR_HOST_BUS, ATA_PORT_SOFTRESET,
+				    "PCI parity error while transferring data" },
+	[PORT_CERR_SENDSERVICE]	= { AC_ERR_HSM, ATA_PORT_SOFTRESET,
+				    "FIS received while sending service FIS" },
+};
+
 /*
  * ap->private_data
  *
@@ -251,7 +307,8 @@ union sil24_cmd_block {
 struct sil24_port_priv {
 	union sil24_cmd_block *cmd_block;	/* 32 cmd blocks */
 	dma_addr_t cmd_block_dma;		/* DMA base addr for them */
-	struct ata_taskfile tf;			/* Cached taskfile registers */
+	struct ata_taskfile tf;			/* cached taskfile registers */
+	u32 eh_irq_stat;			/* saved irq_stat for EH */
 };
 
 /* ap->host_set->private_data */
@@ -269,7 +326,9 @@ static int sil24_probe_reset(struct ata_
 static void sil24_qc_prep(struct ata_queued_cmd *qc);
 static unsigned int sil24_qc_issue(struct ata_queued_cmd *qc);
 static void sil24_irq_clear(struct ata_port *ap);
-static void sil24_eng_timeout(struct ata_port *ap);
+static void sil24_freeze(struct ata_port *ap);
+static void sil24_error_handler(struct ata_port *ap);
+static void sil24_post_internal_cmd(struct ata_queued_cmd *qc);
 static irqreturn_t sil24_interrupt(int irq, void *dev_instance, struct pt_regs *regs);
 static int sil24_port_start(struct ata_port *ap);
 static void sil24_port_stop(struct ata_port *ap);
@@ -326,7 +385,9 @@ static const struct ata_port_operations 
 	.qc_prep		= sil24_qc_prep,
 	.qc_issue		= sil24_qc_issue,
 
-	.eng_timeout		= sil24_eng_timeout,
+	.freeze			= sil24_freeze,
+	.error_handler		= sil24_error_handler,
+	.post_internal_cmd	= sil24_post_internal_cmd,
 
 	.irq_handler		= sil24_interrupt,
 	.irq_clear		= sil24_irq_clear,
@@ -460,7 +521,7 @@ static int sil24_softreset(struct ata_po
 	struct sil24_port_priv *pp = ap->private_data;
 	struct sil24_prb *prb = &pp->cmd_block[0].ata.prb;
 	dma_addr_t paddr = pp->cmd_block_dma;
-	u32 mask, irq_enable, irq_stat;
+	u32 mask, irq_stat;
 	const char *reason;
 
 	DPRINTK("ENTER\n");
@@ -471,10 +532,6 @@ static int sil24_softreset(struct ata_po
 		goto out;
 	}
 
-	/* temporarily turn off IRQs during SRST */
-	irq_enable = readl(port + PORT_IRQ_ENABLE_SET);
-	writel(irq_enable, port + PORT_IRQ_ENABLE_CLR);
-
 	/* put the port into known state */
 	if (sil24_init_port(ap)) {
 		reason ="port not ready";
@@ -495,9 +552,6 @@ static int sil24_softreset(struct ata_po
 	writel(irq_stat, port + PORT_IRQ_STAT); /* clear IRQs */
 	irq_stat >>= PORT_IRQ_RAW_SHIFT;
 
-	/* restore IRQs */
-	writel(irq_enable, port + PORT_IRQ_ENABLE_SET);
-
 	if (!(irq_stat & PORT_IRQ_COMPLETE)) {
 		if (irq_stat & PORT_IRQ_ERROR)
 			reason = "SRST command error";
@@ -566,11 +620,27 @@ static int sil24_hardreset(struct ata_po
 	return -EIO;
 }
 
+static void sil24_postreset(struct ata_port *ap, unsigned int *classes)
+{
+	void __iomem *port = (void __iomem *)ap->ioaddr.cmd_addr;
+	u32 tmp;
+
+	/* clear IRQ */
+	tmp = readl(port + PORT_IRQ_STAT);
+	writel(tmp, port + PORT_IRQ_STAT);
+
+	/* turn IRQ back on */
+	writel(DEF_PORT_IRQ, port + PORT_IRQ_ENABLE_SET);
+
+	/* do the standard stuff */
+	ata_std_postreset(ap, classes);
+}
+
 static int sil24_probe_reset(struct ata_port *ap, unsigned int *classes)
 {
 	return ata_drive_probe_reset(ap, ata_std_probeinit,
 				     sil24_softreset, sil24_hardreset,
-				     ata_std_postreset, classes);
+				     sil24_postreset, classes);
 }
 
 static inline void sil24_fill_sg(struct ata_queued_cmd *qc,
@@ -656,152 +726,138 @@ static void sil24_irq_clear(struct ata_p
 	/* unused */
 }
 
-static int __sil24_restart_controller(void __iomem *port)
+static void sil24_freeze(struct ata_port *ap)
 {
-	u32 tmp;
-	int cnt;
-
-	writel(PORT_CS_INIT, port + PORT_CTRL_STAT);
-
-	/* Max ~10ms */
-	for (cnt = 0; cnt < 10000; cnt++) {
-		tmp = readl(port + PORT_CTRL_STAT);
-		if (tmp & PORT_CS_RDY)
-			return 0;
-		udelay(1);
-	}
-
-	return -1;
-}
+	void __iomem *port = (void __iomem *)ap->ioaddr.cmd_addr;
 
-static void sil24_restart_controller(struct ata_port *ap)
-{
-	if (__sil24_restart_controller((void __iomem *)ap->ioaddr.cmd_addr))
-		printk(KERN_ERR DRV_NAME
-		       " ata%u: failed to restart controller\n", ap->id);
+	/* Port-wide IRQ mask in HOST_CTRL doesn't really work, clear
+	 * PORT_IRQ_ENABLE instead.
+	 */
+	writel(0xffff, port + PORT_IRQ_ENABLE_CLR);
 }
 
-static int __sil24_reset_controller(void __iomem *port)
+static unsigned int sil24_eh_autopsy(struct ata_port *ap, u32 irq_stat,
+				     unsigned int *r_err_mask,
+				     char *desc, size_t desc_sz)
 {
-	int cnt;
-	u32 tmp;
-
-	/* Reset controller state.  Is this correct? */
-	writel(PORT_CS_DEV_RST, port + PORT_CTRL_STAT);
-	readl(port + PORT_CTRL_STAT);	/* sync */
+	void __iomem *port = (void __iomem *)ap->ioaddr.cmd_addr;
+	unsigned int err_mask = 0, action = 0;
+	int rc;
 
-	/* Max ~100ms */
-	for (cnt = 0; cnt < 1000; cnt++) {
-		udelay(100);
-		tmp = readl(port + PORT_CTRL_STAT);
-		if (!(tmp & PORT_CS_DEV_RST))
-			break;
+	rc = scnprintf(desc, desc_sz, "irq_stat 0x%08x", irq_stat);
+	desc += rc;
+	desc_sz -= rc;
+
+	if (irq_stat & PORT_IRQ_DEV_XCHG) {
+		err_mask |= AC_ERR_ATA_BUS;
+		/* sil24 doesn't recover very well from phy
+		 * disconnection with a softreset.  Force hardreset.
+		 */
+		action |= ATA_PORT_HARDRESET;
+		rc = scnprintf(desc, desc_sz, ", device exchanged");
+		desc += rc;
+		desc_sz -= rc;
 	}
 
-	if (tmp & PORT_CS_DEV_RST)
-		return -1;
-
-	if (tmp & PORT_CS_RDY)
-		return 0;
+	if (irq_stat & PORT_IRQ_UNK_FIS) {
+		err_mask |= AC_ERR_HSM;
+		action |= ATA_PORT_SOFTRESET;
+		rc = scnprintf(desc, desc_sz, ", unknown FIS");
+		desc += rc;
+		desc_sz -= rc;
+	}
 
-	return __sil24_restart_controller(port);
-}
+	if (irq_stat & PORT_IRQ_ERROR) {
+		struct sil24_cerr_info *ci = NULL;
+		u32 cerr;
+
+		cerr = readl(port + PORT_CMD_ERR);
+		if (cerr < ARRAY_SIZE(sil24_cerr_db))
+			ci = &sil24_cerr_db[cerr];
+
+		if (ci && ci->desc) {
+			err_mask |= ci->err_mask;
+			action |= ci->action;
+			rc = scnprintf(desc, desc_sz, ", %s", ci->desc);
+		} else {
+			err_mask |= AC_ERR_OTHER;
+			action |= ATA_PORT_SOFTRESET;
+			rc = scnprintf(desc, desc_sz,
+				       ", unknown command error %d", cerr);
+		}
+		desc += rc;
+		desc_sz -= rc;
+	}
 
-static void sil24_reset_controller(struct ata_port *ap)
-{
-	printk(KERN_NOTICE DRV_NAME
-	       " ata%u: resetting controller...\n", ap->id);
-	if (__sil24_reset_controller((void __iomem *)ap->ioaddr.cmd_addr))
-                printk(KERN_ERR DRV_NAME
-                       " ata%u: failed to reset controller\n", ap->id);
+	*r_err_mask |= err_mask;
+	return action;
 }
 
-static void sil24_eng_timeout(struct ata_port *ap)
+static void sil24_error_handler(struct ata_port *ap)
 {
+	struct sil24_port_priv *pp = ap->private_data;
+	unsigned int action = 0;
+	unsigned int err_mask = 0;
+	unsigned long flags;
+	u32 irq_stat, serror;
+	struct ata_taskfile tf;
 	struct ata_queued_cmd *qc;
+	char desc[70] = "";
 
-	qc = ata_qc_from_tag(ap, ap->active_tag);
+	/* fetch & clear error information */
+	spin_lock_irqsave(&ap->host_set->lock, flags);
+	irq_stat = pp->eh_irq_stat;
+	pp->eh_irq_stat = 0;
+	spin_unlock_irqrestore(&ap->host_set->lock, flags);
+
+	serror = scr_read(ap, SCR_ERROR);
+	scr_write(ap, SCR_ERROR, serror);
+
+	/* if not frozen, resume the port.  if fail, freeze */
+	if (!(ap->flags & ATA_FLAG_FROZEN))
+		if (sil24_init_port(ap))
+			ata_eh_schedule_port(ap, ATA_EH_FREEZE);
 
-	printk(KERN_ERR "ata%u: command timeout\n", ap->id);
-	qc->err_mask |= AC_ERR_TIMEOUT;
-	ata_eh_qc_complete(qc);
+	/* perform recovery */
+	action |= sil24_eh_autopsy(ap, irq_stat, &err_mask, desc, sizeof(desc));
+
+	qc = ata_eh_determine_qc(ap, &tf);
+	if (qc)
+		qc->err_mask |= err_mask;
 
-	sil24_reset_controller(ap);
+	action |= ata_eh_autopsy(ap, qc, &tf, serror);
+	ata_eh_report(ap, qc, &tf, serror, action, desc);
+	ata_eh_revive(ap, action,
+		      sil24_softreset, sil24_hardreset, sil24_postreset);
+	ata_eh_finish_qcs(ap, qc, &tf);
 }
 
-static void sil24_error_intr(struct ata_port *ap, u32 slot_stat)
+static void sil24_post_internal_cmd(struct ata_queued_cmd *qc)
 {
-	struct ata_queued_cmd *qc = ata_qc_from_tag(ap, ap->active_tag);
-	struct sil24_port_priv *pp = ap->private_data;
-	void __iomem *port = (void __iomem *)ap->ioaddr.cmd_addr;
-	u32 irq_stat, cmd_err, sstatus, serror;
-	unsigned int err_mask;
-
-	irq_stat = readl(port + PORT_IRQ_STAT);
-	writel(irq_stat, port + PORT_IRQ_STAT);		/* clear irq */
-
-	if (!(irq_stat & PORT_IRQ_ERROR)) {
-		/* ignore non-completion, non-error irqs for now */
-		printk(KERN_WARNING DRV_NAME
-		       "ata%u: non-error exception irq (irq_stat %x)\n",
-		       ap->id, irq_stat);
-		return;
-	}
+	struct ata_port *ap = qc->ap;
 
-	cmd_err = readl(port + PORT_CMD_ERR);
-	sstatus = readl(port + PORT_SSTATUS);
-	serror = readl(port + PORT_SERROR);
-	if (serror)
-		writel(serror, port + PORT_SERROR);
+	if (qc->flags & ATA_QCFLAG_FAILED)
+		qc->err_mask |= AC_ERR_OTHER;
 
-	/*
-	 * Don't log ATAPI device errors.  They're supposed to happen
-	 * and any serious errors will be logged using sense data by
-	 * the SCSI layer.
-	 */
-	if (ap->device[0].class != ATA_DEV_ATAPI || cmd_err > PORT_CERR_SDB)
-		printk("ata%u: error interrupt on port%d\n"
-		       "  stat=0x%x irq=0x%x cmd_err=%d sstatus=0x%x serror=0x%x\n",
-		       ap->id, ap->port_no, slot_stat, irq_stat, cmd_err, sstatus, serror);
-
-	if (cmd_err == PORT_CERR_DEV || cmd_err == PORT_CERR_SDB) {
-		/*
-		 * Device is reporting error, tf registers are valid.
-		 */
-		sil24_update_tf(ap);
-		err_mask = ac_err_mask(pp->tf.command);
-		sil24_restart_controller(ap);
-	} else {
-		/*
-		 * Other errors.  libata currently doesn't have any
-		 * mechanism to report these errors.  Just turn on
-		 * ATA_ERR.
-		 */
-		err_mask = AC_ERR_OTHER;
-		sil24_reset_controller(ap);
-	}
-
-	if (qc) {
-		qc->err_mask |= err_mask;
-		ata_qc_complete(qc);
-	}
+	/* make DMA engine forget about the failed command */
+	if (qc->err_mask)
+		sil24_init_port(ap);
 }
 
 static inline void sil24_host_intr(struct ata_port *ap)
 {
 	struct ata_queued_cmd *qc = ata_qc_from_tag(ap, ap->active_tag);
+	struct sil24_port_priv *pp = ap->private_data;
 	void __iomem *port = (void __iomem *)ap->ioaddr.cmd_addr;
-	u32 slot_stat;
+	u32 slot_stat, irq_stat;
+	unsigned int eh_flags;
 
 	slot_stat = readl(port + PORT_SLOT_STAT);
 	if (!(slot_stat & HOST_SSTAT_ATTN)) {
-		struct sil24_port_priv *pp = ap->private_data;
-
 		if (ap->flags & SIL24_FLAG_PCIX_IRQ_WOC)
 			writel(PORT_IRQ_COMPLETE, port + PORT_IRQ_STAT);
 
-		/*
-		 * !HOST_SSAT_ATTN guarantees successful completion,
+		/* !HOST_SSAT_ATTN guarantees successful completion,
 		 * so reading back tf registers is unnecessary for
 		 * most commands.  TODO: read tf registers for
 		 * commands which require these values on successful
@@ -814,8 +870,21 @@ static inline void sil24_host_intr(struc
 			qc->err_mask |= ac_err_mask(pp->tf.command);
 			ata_qc_complete(qc);
 		}
-	} else
-		sil24_error_intr(ap, slot_stat);
+
+		return;
+	}
+
+	/* something weird is going on, pass it to EH */
+	irq_stat = readl(port + PORT_IRQ_STAT);
+	writel(irq_stat, port + PORT_IRQ_STAT);
+
+	pp->eh_irq_stat = irq_stat;
+
+	eh_flags = ATA_EH_ABORT;
+	if (irq_stat & PORT_IRQ_FREEZE)
+		eh_flags |= ATA_EH_FREEZE;
+
+	ata_eh_schedule_port(ap, eh_flags);
 }
 
 static irqreturn_t sil24_interrupt(int irq, void *dev_instance, struct pt_regs *regs)
@@ -1067,15 +1136,6 @@ static int sil24_init_one(struct pci_dev
 		/* Always use 64bit activation */
 		writel(PORT_CS_32BIT_ACTV, port + PORT_CTRL_CLR);
 
-		/* Configure interrupts */
-		writel(0xffff, port + PORT_IRQ_ENABLE_CLR);
-		writel(PORT_IRQ_COMPLETE | PORT_IRQ_ERROR |
-		       PORT_IRQ_SDB_NOTIFY, port + PORT_IRQ_ENABLE_SET);
-
-		/* Clear interrupts */
-		writel(0x0fff0fff, port + PORT_IRQ_STAT);
-		writel(PORT_CS_IRQ_WOC, port + PORT_CTRL_CLR);
-
 		/* Clear port multiplier enable and resume bits */
 		writel(PORT_CS_PM_EN | PORT_CS_RESUME, port + PORT_CTRL_CLR);
 	}
-- 
1.2.4



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 08/14] libata-eh: implement ata_eh_revive()
  2006-04-11 13:48 ` [PATCH 08/14] libata-eh: implement ata_eh_revive() Tejun Heo
@ 2006-04-19  9:08   ` zhao, forrest
  2006-04-19 10:33     ` Tejun Heo
  0 siblings, 1 reply; 24+ messages in thread
From: zhao, forrest @ 2006-04-19  9:08 UTC (permalink / raw)
  To: Tejun Heo; +Cc: jgarzik, alan, axboe, albertcc, lkosewsk, linux-ide

On Tue, 2006-04-11 at 22:48 +0900, Tejun Heo wrote:
> +
> +/**
> + *	ata_eh_revive - revive host port after error
> + *	@ap: host port to revive
> + *	@action: action to perform to revive @ap
> + *	@softreset: softreset method (can be NULL)
> + *	@hardreset: hardreset method (can be NULL)
> + *	@postreset: postreset method (can be NULL)
> + *
> + *	Perform action specified by @action to revive host port @ap
> + *	after error.
> + *
> + *	LOCKING:
> + *	Kernel thread context (may sleep).
> + */
> +int ata_eh_revive(struct ata_port *ap, unsigned int action,
> +		  ata_reset_fn_t softreset, ata_reset_fn_t hardreset,
> +		  ata_postreset_fn_t postreset)
> +{
> +	int scr_valid = ap->cbl == ATA_CBL_SATA && ap->ops->scr_read;
> +	unsigned int classes[ATA_MAX_DEVICES];
> +	int tries[ATA_MAX_DEVICES], reset_tries, nr_enabled;
> +	struct ata_device *dev;
> +	ata_reset_fn_t reset;
> +	int i, down_xfermask, rc = 0;
> +
> +	if (!action)
> +		goto out;
> +
> +	reset_tries = ATA_EH_MAX_TRIES;
> +	nr_enabled = 0;
> +	for (i = 0; i < ATA_MAX_DEVICES; i++) {
> +		tries[i] = ATA_EH_MAX_TRIES;
> +		if (ata_dev_enabled(&ap->device[i]))
> +			nr_enabled++;
> +	}
> +
> +	/* revalidate */
> +	if (action == ATA_PORT_REVALIDATE) {
> +		for (i = 0; i < ATA_MAX_DEVICES; i++) {
> +			struct ata_device *dev = &ap->device[i];
> +			if (!ata_dev_enabled(dev) ||
> +			    !(dev->flags & ATA_DFLAG_FAILED))
> +				continue;
> +			if (ata_dev_revalidate(ap, dev, 0))
> +				break;
> +		}
> +		if (i == ATA_MAX_DEVICES) {
> +			rc = 0;
> +			goto out;
> +		}
> +
> +		action |= ATA_PORT_SOFTRESET;
> +	}
> +	action &= ~ATA_PORT_REVALIDATE;
> +
> +	/* Skip reset if possible. */
> +	if (!nr_enabled && !(ap->flags & ATA_FLAG_FROZEN))
> +		goto out;
> +
> +	/* give devices some time to breath */
> +	ata_eh_wait_before_reset(ap);
> +
> +	if (softreset && (!hardreset || (!ata_set_sata_spd_needed(ap) &&
> +					 action == ATA_PORT_SOFTRESET)))
> +		reset = softreset;
> +	else
> +		reset = hardreset;
> +
> + retry:
> +	down_xfermask = 0;
> +
> +	/* reset.  postreset is responsible for thawing the port. */
> +	printk("ata%u: %s resetting channel for error handling\n",
> +	       ap->id, reset == softreset ? "soft" : "hard");
> +
> +	rc = ata_do_reset(ap, reset, postreset, classes);
> +	if (rc)
> +		goto fail_reset;
> +
> +	/* revalidate and reconfigure devices */
> +	for (i = 0; i < ATA_MAX_DEVICES; i++) {
> +		dev = &ap->device[i];
> +
> +		if (!ata_dev_enabled(dev))
> +			continue;
> +
> +		if (scr_valid && !sata_dev_present(ap)) {
> +			rc = -EIO;
> +			goto fail;
> +		}
> +
> +		rc = ata_dev_revalidate(ap, dev, 1);
> +		if (rc)
> +			goto fail;
> +	}
> +
> +	/* configure transfer mode */
> +	if (ap->ops->set_mode) {
> +		/* FIXME: make ->set_mode handle no device case and
> +		 * return error code and failing device on failure as
> +		 * ata_set_mode() does.
> +		 */
> +		for (i = 0; i < ATA_MAX_DEVICES; i++)
> +			if (ata_dev_enabled(&ap->device[i])) {
> +				ap->ops->set_mode(ap);
> +				break;
> +			}
> +		rc = 0;
> +	} else
> +		rc = ata_set_mode(ap, &dev);
> +
> +	if (rc) {
> +		down_xfermask = 1;
> +		goto fail;
> +	}
> +
> +	goto out;
> +
> + fail_reset:
> +	if (!--reset_tries) {
> +		printk(KERN_ERR "ata%u: EH reset failed, giving up\n", ap->id);
> +		goto out;
> +	}
> +	if (reset == hardreset)
> +		ata_down_sata_spd_limit(ap);
> +	if (hardreset)
> +		reset = hardreset;
> +
> +	printk(KERN_WARNING "ata%u: EH reset failed, will retry in 5 secs\n",
> +	       ap->id);
> +	ssleep(5);
> +	goto retry;
> +
> + fail:
> +	switch (rc) {
> +	case -EINVAL:
> +	case -ENODEV:
> +		tries[dev->devno] = 0;
> +		break;
> +	case -EIO:
> +		ata_down_sata_spd_limit(ap);
> +	default:
> +		tries[dev->devno]--;
> +		if (down_xfermask &&
> +		    ata_down_xfermask_limit(ap, dev, tries[dev->devno] == 1))
> +			tries[dev->devno] = 0;
> +	}
> +
> +	if (!tries[dev->devno]) {
> +		ata_dev_disable(ap, dev);
> +		nr_enabled--;
> +	}
> +
> +	if (nr_enabled) {
> +		printk(KERN_WARNING "ata%u: some devices seem to be offline, "
> +		       "will retry in 5 secs\n", ap->id);
> +		ssleep(5);
> +	} else {
> +		/* no device left, repeat fast */
> +		msleep(500);
> +	}
> +
> +	if (hardreset)
> +		reset = hardreset;
> +	goto retry;
> +
> + out:
> +	for (i = 0; i < ATA_MAX_DEVICES; i++) {
> +		dev = &ap->device[i];
> +		dev->flags &= ~ATA_DFLAG_FAILED;
> +		if (rc)
> +			ata_dev_disable(ap, dev);
> +	}
> +
> +	return rc;
> +}
Tejun,

I did some initial hotplug test in our lab today. And
this is the printk output in dmesg when I unplug the
SATA disk at port 2:

ata2: stat 0x50 err 0x0 SError 0x90402 action 0x2
      (irq_stat 0x04400000, PHY RDY changed)
ata2: soft resetting channel for error handling
ata2: SATA link down (SStatus 21 SControl 300)
ata2: limiting SATA link speed to 1.5 Gbps
ata2: some devices seem to be offline, will retry in 5 secs
ata2: hard resetting channel for error handling
ata2: SATA link down (SStatus 0 SControl 310)
ata2: some devices seem to be offline, will retry in 5 secs
ata2: hard resetting channel for error handling
ata2: SATA link down (SStatus 0 SControl 310)
ata2: dev 0 disabled
// why hard reset again??
ata2: hard resetting channel for error handling
ata2: SATA link down (SStatus 0 SControl 310)
// ??
ata2: dev 0 detaching (SCSI 1:0:0:0)


My question is: why hard-reset is executed again after
dev0 at ata2 has been disabled?

Could you help me understand the idea behind this logic?

Thanks,
Forrest




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 08/14] libata-eh: implement ata_eh_revive()
  2006-04-19  9:08   ` zhao, forrest
@ 2006-04-19 10:33     ` Tejun Heo
  0 siblings, 0 replies; 24+ messages in thread
From: Tejun Heo @ 2006-04-19 10:33 UTC (permalink / raw)
  To: zhao, forrest; +Cc: jgarzik, alan, axboe, albertcc, lkosewsk, linux-ide

On Wed, Apr 19, 2006 at 05:08:41PM +0800, zhao, forrest wrote:
> Tejun,
> 
> I did some initial hotplug test in our lab today. And
> this is the printk output in dmesg when I unplug the
> SATA disk at port 2:
> 
> ata2: stat 0x50 err 0x0 SError 0x90402 action 0x2
>       (irq_stat 0x04400000, PHY RDY changed)
> ata2: soft resetting channel for error handling
> ata2: SATA link down (SStatus 21 SControl 300)
> ata2: limiting SATA link speed to 1.5 Gbps
> ata2: some devices seem to be offline, will retry in 5 secs
> ata2: hard resetting channel for error handling
> ata2: SATA link down (SStatus 0 SControl 310)
> ata2: some devices seem to be offline, will retry in 5 secs
> ata2: hard resetting channel for error handling
> ata2: SATA link down (SStatus 0 SControl 310)
> ata2: dev 0 disabled
> // why hard reset again??
> ata2: hard resetting channel for error handling
> ata2: SATA link down (SStatus 0 SControl 310)
> // ??
> ata2: dev 0 detaching (SCSI 1:0:0:0)
> 
> 
> My question is: why hard-reset is executed again after
> dev0 at ata2 has been disabled?
> 
> Could you help me understand the idea behind this logic?
> 

Hello, Zhao.

The port could be frozen due to the error which caused the attached
device to fail.  So, EH is trying to put the port into known state
before finishing EH such that we can deal with later events (hotplug).
I've added logics to avoid unnecessary resets in my working tree such
that unnecessary resets are avoided if..

* the port is not frozen
* the port is frozen but hotplug is scheduled

Hope it helped.

-- 
tejun

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 13/14] ahci: convert to new EH
  2006-04-11 13:48 ` [PATCH 13/14] ahci: " Tejun Heo
@ 2006-04-20  6:01   ` zhao, forrest
  2006-04-20  7:11     ` Tejun Heo
  2006-04-20  9:26   ` zhao, forrest
  1 sibling, 1 reply; 24+ messages in thread
From: zhao, forrest @ 2006-04-20  6:01 UTC (permalink / raw)
  To: Tejun Heo; +Cc: jgarzik, alan, axboe, albertcc, lkosewsk, linux-ide

On Tue, 2006-04-11 at 22:48 +0900, Tejun Heo wrote:
> Convert AHCI to new EH.  Unfortunately, ICH7 AHCI reacts badly if IRQ
> mask is diddled during operation.  So, freezing is implemented by
> unconditionally clearing interrupt conditions while frozen.
>  
> -static inline int ahci_host_intr(struct ata_port *ap, struct ata_queued_cmd *qc)
> +static inline void ahci_host_intr(struct ata_port *ap)
>  {
> +	struct ahci_port_priv *pp = ap->private_data;
>  	void __iomem *mmio = ap->host_set->mmio_base;
>  	void __iomem *port_mmio = ahci_port_base(mmio, ap->port_no);
> -	u32 status, serr, ci;
> -
> -	serr = readl(port_mmio + PORT_SCR_ERR);
> -	writel(serr, port_mmio + PORT_SCR_ERR);
> +	u32 status, serror, ci;
> +	unsigned int eh_flags;
>  
>  	status = readl(port_mmio + PORT_IRQ_STAT);
>  	writel(status, port_mmio + PORT_IRQ_STAT);
>  
> -	ci = readl(port_mmio + PORT_CMD_ISSUE);
> -	if (likely((ci & 0x1) == 0)) {
> -		if (qc) {
> -			WARN_ON(qc->err_mask);
> -			ata_qc_complete(qc);
> -			qc = NULL;
> -		}
> +	/* AHCI gets unhappy if IRQ mask is diddled with while the
> +	 * port is active, so we cannot disable IRQ when freezing.
> +	 * Clear IRQ conditions and hope screaming IRQs don't happen.
> +	 */
> +	if (ap->flags & ATA_FLAG_FROZEN) {
> +		/* some AHCI errors hang the controller until SError
> +		 * is cleared.  Store and clear it.
> +		 */
> +		serror = scr_read(ap, SCR_ERROR);
> +		scr_write(ap, SCR_ERROR, serror);
> +		pp->eh_irq_stat |= status;
> +		pp->eh_serror |= serror;
> +		return;
>  	}
>  
> -	if (status & PORT_IRQ_FATAL) {
> -		unsigned int err_mask;
> -		if (status & PORT_IRQ_TF_ERR)
> -			err_mask = AC_ERR_DEV;
> -		else if (status & PORT_IRQ_IF_ERR)
> -			err_mask = AC_ERR_ATA_BUS;
> -		else
> -			err_mask = AC_ERR_HOST_BUS;
> -
> -		/* command processing has stopped due to error; restart */
> -		ahci_restart_port(ap, status);
> -
> -		if (qc) {
> -			qc->err_mask |= err_mask;
> -			ata_qc_complete(qc);
> +	if (!(status & PORT_IRQ_ERROR)) {
> +		struct ata_queued_cmd *qc;
> +
> +		if ((qc = ata_qc_from_tag(ap, ap->active_tag))) {
> +			ci = readl(port_mmio + PORT_CMD_ISSUE);
> +			if ((ci & 0x1) == 0) {
> +				ata_qc_complete(qc);
> +				return;
> +			}
>  		}
> +
> +		if (ata_ratelimit())
> +			printk(KERN_INFO "ata%u: spurious interrupt "
> +			       "(irq_stat 0x%x active_tag %d)\n",
> +			       ap->id, status, ap->active_tag);
> +
> +		return;
>  	}
>  
> -	return 1;
> +	/* Something weird is going on.  Hand over to EH. */
> +	serror = scr_read(ap, SCR_ERROR);
> +	scr_write(ap, SCR_ERROR, serror);
> +
> +	pp->eh_irq_stat = status;
> +	pp->eh_serror = serror;
> +
> +	eh_flags = ATA_EH_ABORT;
> +	if (status & PORT_IRQ_FREEZE)
> +		eh_flags |= ATA_EH_FREEZE;
> +
> +	ata_eh_schedule_port(ap, eh_flags);
>  }

Hi, Tejun

When testing hotplug and reading your patches, I thought an interrupt
lost might occur on AHCI in the following case:

1 system boot up with SATA disk A attached to port 1 and disk B attached
to port 2
2 disk B at port 2 is hot-unplugged
3 ata_eh_revive() will execute several round of soft-reset/hard-reset as
we observed in dmesg
4 now imagine ata_eh_revive() start to execute the last round of
hard-reset, so the code path comes into ata_do_reset(), then into
ahci_hardreset()
5 disk B is hot-plugged to port 2, and an interrupt is triggered
6 CPU respond to this interrupt when code path execute between
ahci_start_engine(); in ahci_hardreset() and
ap->flags &= ~ATA_FLAG_FROZEN; in ata_do_reset();
7 then this interrupt is lost since no EH is scheduled to handle it.

I think invoking ata_eh_schedule_port() in ahci_postreset() can fix
the problem, is it right?

Thanks,
Forrest


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 13/14] ahci: convert to new EH
  2006-04-20  6:01   ` zhao, forrest
@ 2006-04-20  7:11     ` Tejun Heo
  2006-04-20  7:44       ` Jeff Garzik
  0 siblings, 1 reply; 24+ messages in thread
From: Tejun Heo @ 2006-04-20  7:11 UTC (permalink / raw)
  To: zhao, forrest; +Cc: jgarzik, alan, axboe, albertcc, lkosewsk, linux-ide

On Thu, Apr 20, 2006 at 02:01:12PM +0800, zhao, forrest wrote:
> 
> Hi, Tejun
> 
> When testing hotplug and reading your patches, I thought an interrupt
> lost might occur on AHCI in the following case:
> 
> 1 system boot up with SATA disk A attached to port 1 and disk B attached
> to port 2
> 2 disk B at port 2 is hot-unplugged
> 3 ata_eh_revive() will execute several round of soft-reset/hard-reset as
> we observed in dmesg
> 4 now imagine ata_eh_revive() start to execute the last round of
> hard-reset, so the code path comes into ata_do_reset(), then into
> ahci_hardreset()
> 5 disk B is hot-plugged to port 2, and an interrupt is triggered
> 6 CPU respond to this interrupt when code path execute between
> ahci_start_engine(); in ahci_hardreset() and
> ap->flags &= ~ATA_FLAG_FROZEN; in ata_do_reset();
> 7 then this interrupt is lost since no EH is scheduled to handle it.
> 
> I think invoking ata_eh_schedule_port() in ahci_postreset() can fix
> the problem, is it right?

Hello, Forrest.

Yes, you're right.  The problem is that we cannot tell whether such
interrupts are due to the reset or some other events.  The goal was to
make sure existing devices are okay on EH completion.  If new devices
get connected during EH, we might lose the event, which IMHO is okay.

Maybe this can be solved by merging EH and probe into one.  Probing
and EH revive are pretty similar in the first place.  I'll think about
that.  But I still think it's okay to lose hotplug interrupt during
EH.  All the user has to do is simply replug the device or issue
manual scan.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 13/14] ahci: convert to new EH
  2006-04-20  7:11     ` Tejun Heo
@ 2006-04-20  7:44       ` Jeff Garzik
  2006-04-21  1:34         ` Tejun Heo
  0 siblings, 1 reply; 24+ messages in thread
From: Jeff Garzik @ 2006-04-20  7:44 UTC (permalink / raw)
  To: Tejun Heo; +Cc: zhao, forrest, alan, axboe, albertcc, lkosewsk, linux-ide

Tejun Heo wrote:
> On Thu, Apr 20, 2006 at 02:01:12PM +0800, zhao, forrest wrote:
>> Hi, Tejun
>>
>> When testing hotplug and reading your patches, I thought an interrupt
>> lost might occur on AHCI in the following case:
>>
>> 1 system boot up with SATA disk A attached to port 1 and disk B attached
>> to port 2
>> 2 disk B at port 2 is hot-unplugged
>> 3 ata_eh_revive() will execute several round of soft-reset/hard-reset as
>> we observed in dmesg
>> 4 now imagine ata_eh_revive() start to execute the last round of
>> hard-reset, so the code path comes into ata_do_reset(), then into
>> ahci_hardreset()
>> 5 disk B is hot-plugged to port 2, and an interrupt is triggered
>> 6 CPU respond to this interrupt when code path execute between
>> ahci_start_engine(); in ahci_hardreset() and
>> ap->flags &= ~ATA_FLAG_FROZEN; in ata_do_reset();
>> 7 then this interrupt is lost since no EH is scheduled to handle it.
>>
>> I think invoking ata_eh_schedule_port() in ahci_postreset() can fix
>> the problem, is it right?
> 
> Hello, Forrest.
> 
> Yes, you're right.  The problem is that we cannot tell whether such
> interrupts are due to the reset or some other events.  The goal was to
> make sure existing devices are okay on EH completion.  If new devices
> get connected during EH, we might lose the event, which IMHO is okay.
> 
> Maybe this can be solved by merging EH and probe into one.  Probing
> and EH revive are pretty similar in the first place.  I'll think about

Speaking to hotplug specifically, on hardware with plug irqs, it needs 
to do something like

	* receive hotplug interrupt
	* hang out for a while, eating hotplug interrupt events
	  (debounce)
	* revalidate device
	* issue unplug and/or plug to SCSI layer


> that.  But I still think it's okay to lose hotplug interrupt during
> EH.  All the user has to do is simply replug the device or issue
> manual scan.

If losing the hotplug interrupt requires the user to do that, no that's 
definitely not OK...  for a hotplug interrupt during EH, you want to 
stop what you're doing at the nearest opportunity, and start all over 
again revalidating the device.  If its a different device, all the 
accumulated state is flushed.

	Jeff




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 13/14] ahci: convert to new EH
  2006-04-11 13:48 ` [PATCH 13/14] ahci: " Tejun Heo
  2006-04-20  6:01   ` zhao, forrest
@ 2006-04-20  9:26   ` zhao, forrest
  2006-04-21  1:20     ` Tejun Heo
  1 sibling, 1 reply; 24+ messages in thread
From: zhao, forrest @ 2006-04-20  9:26 UTC (permalink / raw)
  To: Tejun Heo; +Cc: jgarzik, alan, axboe, albertcc, lkosewsk, linux-ide

On Tue, 2006-04-11 at 22:48 +0900, Tejun Heo wrote:
> Convert AHCI to new EH.  Unfortunately, ICH7 AHCI reacts badly if IRQ
> mask is diddled during operation.  So, freezing is implemented by
> unconditionally clearing interrupt conditions while frozen.

>  
> -static inline int ahci_host_intr(struct ata_port *ap, struct ata_queued_cmd *qc)
> +static inline void ahci_host_intr(struct ata_port *ap)
>  {
> +	struct ahci_port_priv *pp = ap->private_data;
>  	void __iomem *mmio = ap->host_set->mmio_base;
>  	void __iomem *port_mmio = ahci_port_base(mmio, ap->port_no);
> -	u32 status, serr, ci;
> -
> -	serr = readl(port_mmio + PORT_SCR_ERR);
> -	writel(serr, port_mmio + PORT_SCR_ERR);
> +	u32 status, serror, ci;
> +	unsigned int eh_flags;
>  
>  	status = readl(port_mmio + PORT_IRQ_STAT);
>  	writel(status, port_mmio + PORT_IRQ_STAT);
>  
> -	ci = readl(port_mmio + PORT_CMD_ISSUE);
> -	if (likely((ci & 0x1) == 0)) {
> -		if (qc) {
> -			WARN_ON(qc->err_mask);
> -			ata_qc_complete(qc);
> -			qc = NULL;
> -		}
> +	/* AHCI gets unhappy if IRQ mask is diddled with while the
> +	 * port is active, so we cannot disable IRQ when freezing.
> +	 * Clear IRQ conditions and hope screaming IRQs don't happen.
> +	 */
> +	if (ap->flags & ATA_FLAG_FROZEN) {
> +		/* some AHCI errors hang the controller until SError
> +		 * is cleared.  Store and clear it.
> +		 */
> +		serror = scr_read(ap, SCR_ERROR);
> +		scr_write(ap, SCR_ERROR, serror);
> +		pp->eh_irq_stat |= status;
> +		pp->eh_serror |= serror;
> +		return;
>  	}
>  
> -	if (status & PORT_IRQ_FATAL) {
> -		unsigned int err_mask;
> -		if (status & PORT_IRQ_TF_ERR)
> -			err_mask = AC_ERR_DEV;
> -		else if (status & PORT_IRQ_IF_ERR)
> -			err_mask = AC_ERR_ATA_BUS;
> -		else
> -			err_mask = AC_ERR_HOST_BUS;
> -
> -		/* command processing has stopped due to error; restart */
> -		ahci_restart_port(ap, status);
> -
> -		if (qc) {
> -			qc->err_mask |= err_mask;
> -			ata_qc_complete(qc);
> +	if (!(status & PORT_IRQ_ERROR)) {
> +		struct ata_queued_cmd *qc;
> +
> +		if ((qc = ata_qc_from_tag(ap, ap->active_tag))) {
> +			ci = readl(port_mmio + PORT_CMD_ISSUE);
> +			if ((ci & 0x1) == 0) {
> +				ata_qc_complete(qc);
> +				return;
> +			}
>  		}
> +
> +		if (ata_ratelimit())
> +			printk(KERN_INFO "ata%u: spurious interrupt "
> +			       "(irq_stat 0x%x active_tag %d)\n",
> +			       ap->id, status, ap->active_tag);
> +
> +		return;
>  	}
>  
> -	return 1;
> +	/* Something weird is going on.  Hand over to EH. */
> +	serror = scr_read(ap, SCR_ERROR);
> +	scr_write(ap, SCR_ERROR, serror);
> +
> +	pp->eh_irq_stat = status;
> +	pp->eh_serror = serror;
> +
> +	eh_flags = ATA_EH_ABORT;
> +	if (status & PORT_IRQ_FREEZE)
> +		eh_flags |= ATA_EH_FREEZE;
> +
> +	ata_eh_schedule_port(ap, eh_flags);
>  }
>  

Anther problem with this patch is that interrupts happened
when port is in ATA_FLAG_FROZEN state are recorded in pp->eh_irq_stat.
So a later user-initiated warm-unplug operation might trigger EH
to handle the error conditions previously recorded in pp->eh_irq_stat.

I observed this problem in our lab
1 execute many hot-plug/unplug operations, and finally disk B is
at port 2

2 echo "1" > /sys/class/scsi_host/host1/device/target1\:0\:0/1\:0\:0\:0/
delete

3 then the output in dmesg is:
ata2: dev 0 disabled
ata2: stat 0x50 err 0x0 SError 0x40d0000 action 0x2
      (irq_stat 0x00400040, connection status changed)
ata2: probing bus for new devices
ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata2: dev 0 cfg 49:2f00 82:7469 83:7f01 84:4023 85:7469 86:3c01 87:4023
88:407f
ata2: dev 0 ATA-7, max UDMA/133, 156301488 sectors: LBA48
ata2: dev 0 configured for UDMA/133
  Vendor: ATA       Model: WDC WD800JD-22LS  Rev: 06.0
  Type:   Direct-Access                      ANSI SCSI revision: 05
SCSI device sdb: 156301488 512-byte hdwr sectors (80026 MB)
sdb: Write Protect is off
sdb: Mode Sense: 00 3a 00 00
SCSI device sdb: drive cache: write back
SCSI device sdb: 156301488 512-byte hdwr sectors (80026 MB)
sdb: Write Protect is off
sdb: Mode Sense: 00 3a 00 00
SCSI device sdb: drive cache: write back
 sdb: unknown partition table
sd 1:0:0:0: Attached scsi disk sdb

4 I execute "echo "1" > /sys/class/scsi_host/host1/device/target1\:0
\:0/1\:0\:0\:0/delete" again, then the output in dmesg is:
ata2: dev 0 disabled
ata2: stat 0x50 err 0x0 SError 0x0 action 0x2
      (irq_stat 0x00000000)
ata2: soft resetting channel for error handling
ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)


Thanks,
Forrest

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 13/14] ahci: convert to new EH
  2006-04-20  9:26   ` zhao, forrest
@ 2006-04-21  1:20     ` Tejun Heo
  0 siblings, 0 replies; 24+ messages in thread
From: Tejun Heo @ 2006-04-21  1:20 UTC (permalink / raw)
  To: zhao, forrest; +Cc: jgarzik, alan, axboe, albertcc, lkosewsk, linux-ide

zhao, forrest wrote:
> On Tue, 2006-04-11 at 22:48 +0900, Tejun Heo wrote:
>> Convert AHCI to new EH.  Unfortunately, ICH7 AHCI reacts badly if IRQ
>> mask is diddled during operation.  So, freezing is implemented by
>> unconditionally clearing interrupt conditions while frozen.
> 
>>  
>> -static inline int ahci_host_intr(struct ata_port *ap, struct ata_queued_cmd *qc)
>> +static inline void ahci_host_intr(struct ata_port *ap)
>>  {
>> +	struct ahci_port_priv *pp = ap->private_data;
>>  	void __iomem *mmio = ap->host_set->mmio_base;
>>  	void __iomem *port_mmio = ahci_port_base(mmio, ap->port_no);
>> -	u32 status, serr, ci;
>> -
>> -	serr = readl(port_mmio + PORT_SCR_ERR);
>> -	writel(serr, port_mmio + PORT_SCR_ERR);
>> +	u32 status, serror, ci;
>> +	unsigned int eh_flags;
>>  
>>  	status = readl(port_mmio + PORT_IRQ_STAT);
>>  	writel(status, port_mmio + PORT_IRQ_STAT);
>>  
>> -	ci = readl(port_mmio + PORT_CMD_ISSUE);
>> -	if (likely((ci & 0x1) == 0)) {
>> -		if (qc) {
>> -			WARN_ON(qc->err_mask);
>> -			ata_qc_complete(qc);
>> -			qc = NULL;
>> -		}
>> +	/* AHCI gets unhappy if IRQ mask is diddled with while the
>> +	 * port is active, so we cannot disable IRQ when freezing.
>> +	 * Clear IRQ conditions and hope screaming IRQs don't happen.
>> +	 */
>> +	if (ap->flags & ATA_FLAG_FROZEN) {
>> +		/* some AHCI errors hang the controller until SError
>> +		 * is cleared.  Store and clear it.
>> +		 */
>> +		serror = scr_read(ap, SCR_ERROR);
>> +		scr_write(ap, SCR_ERROR, serror);
>> +		pp->eh_irq_stat |= status;
>> +		pp->eh_serror |= serror;
>> +		return;
>>  	}
>>  
>> -	if (status & PORT_IRQ_FATAL) {
>> -		unsigned int err_mask;
>> -		if (status & PORT_IRQ_TF_ERR)
>> -			err_mask = AC_ERR_DEV;
>> -		else if (status & PORT_IRQ_IF_ERR)
>> -			err_mask = AC_ERR_ATA_BUS;
>> -		else
>> -			err_mask = AC_ERR_HOST_BUS;
>> -
>> -		/* command processing has stopped due to error; restart */
>> -		ahci_restart_port(ap, status);
>> -
>> -		if (qc) {
>> -			qc->err_mask |= err_mask;
>> -			ata_qc_complete(qc);
>> +	if (!(status & PORT_IRQ_ERROR)) {
>> +		struct ata_queued_cmd *qc;
>> +
>> +		if ((qc = ata_qc_from_tag(ap, ap->active_tag))) {
>> +			ci = readl(port_mmio + PORT_CMD_ISSUE);
>> +			if ((ci & 0x1) == 0) {
>> +				ata_qc_complete(qc);
>> +				return;
>> +			}
>>  		}
>> +
>> +		if (ata_ratelimit())
>> +			printk(KERN_INFO "ata%u: spurious interrupt "
>> +			       "(irq_stat 0x%x active_tag %d)\n",
>> +			       ap->id, status, ap->active_tag);
>> +
>> +		return;
>>  	}
>>  
>> -	return 1;
>> +	/* Something weird is going on.  Hand over to EH. */
>> +	serror = scr_read(ap, SCR_ERROR);
>> +	scr_write(ap, SCR_ERROR, serror);
>> +
>> +	pp->eh_irq_stat = status;
>> +	pp->eh_serror = serror;
>> +
>> +	eh_flags = ATA_EH_ABORT;
>> +	if (status & PORT_IRQ_FREEZE)
>> +		eh_flags |= ATA_EH_FREEZE;
>> +
>> +	ata_eh_schedule_port(ap, eh_flags);
>>  }
>>  
> 
> Anther problem with this patch is that interrupts happened
> when port is in ATA_FLAG_FROZEN state are recorded in pp->eh_irq_stat.
> So a later user-initiated warm-unplug operation might trigger EH
> to handle the error conditions previously recorded in pp->eh_irq_stat.
> 

Yeap, correct again.  The problem is fixed in my repository but saving
stuff to pp and analyze later idea didn't work very well for PM support
(sil24 can push multiple commands to multiple devices and multiple of
them can fail) so I'm stripping that out and pushing it to interrupt
handler.

Thanks for experimenting with this.  Oh.. and one more thing.  Have you
noticed that the first hotplug fails to attach the SCSI device?  This is
because setting SCS_HOTPLUG_PLUG is moved to the end of
ata_eh_scsi_hotplug() from ata_eh_hotplug() by quilt (I changed
ata_eh_hotplug() and while pushing, 'patch' resolved it by moving the
call...).

-- 
tejun

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 13/14] ahci: convert to new EH
  2006-04-20  7:44       ` Jeff Garzik
@ 2006-04-21  1:34         ` Tejun Heo
  0 siblings, 0 replies; 24+ messages in thread
From: Tejun Heo @ 2006-04-21  1:34 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: zhao, forrest, alan, axboe, albertcc, lkosewsk, linux-ide

Jeff Garzik wrote:
> Tejun Heo wrote:
>> On Thu, Apr 20, 2006 at 02:01:12PM +0800, zhao, forrest wrote:
>>> Hi, Tejun
>>>
>>> When testing hotplug and reading your patches, I thought an interrupt
>>> lost might occur on AHCI in the following case:
>>>
>>> 1 system boot up with SATA disk A attached to port 1 and disk B attached
>>> to port 2
>>> 2 disk B at port 2 is hot-unplugged
>>> 3 ata_eh_revive() will execute several round of soft-reset/hard-reset as
>>> we observed in dmesg
>>> 4 now imagine ata_eh_revive() start to execute the last round of
>>> hard-reset, so the code path comes into ata_do_reset(), then into
>>> ahci_hardreset()
>>> 5 disk B is hot-plugged to port 2, and an interrupt is triggered
>>> 6 CPU respond to this interrupt when code path execute between
>>> ahci_start_engine(); in ahci_hardreset() and
>>> ap->flags &= ~ATA_FLAG_FROZEN; in ata_do_reset();
>>> 7 then this interrupt is lost since no EH is scheduled to handle it.
>>>
>>> I think invoking ata_eh_schedule_port() in ahci_postreset() can fix
>>> the problem, is it right?
>>
>> Hello, Forrest.
>>
>> Yes, you're right.  The problem is that we cannot tell whether such
>> interrupts are due to the reset or some other events.  The goal was to
>> make sure existing devices are okay on EH completion.  If new devices
>> get connected during EH, we might lose the event, which IMHO is okay.
>>
>> Maybe this can be solved by merging EH and probe into one.  Probing
>> and EH revive are pretty similar in the first place.  I'll think about
> 
> Speaking to hotplug specifically, on hardware with plug irqs, it needs 
> to do something like
> 
>     * receive hotplug interrupt
>     * hang out for a while, eating hotplug interrupt events
>       (debounce)
>     * revalidate device
>     * issue unplug and/or plug to SCSI layer

I see.  I'll pay more attention to the debouncing.

>> that.  But I still think it's okay to lose hotplug interrupt during
>> EH.  All the user has to do is simply replug the device or issue
>> manual scan.
> 
> If losing the hotplug interrupt requires the user to do that, no that's 
> definitely not OK...  for a hotplug interrupt during EH, you want to 
> stop what you're doing at the nearest opportunity, and start all over 
> again revalidating the device.  If its a different device, all the 
> accumulated state is flushed.
> 

Hmmm... Well, I initially thought that's a tradeoff libata can take. 
It's a quite small window.  Such events are lost iff the user plugs a 
new device inbetween autopsy completion and reset completion.  ie. while 
EH is actively spitting out messages.

I've been thinking about this since yesterday (except for the time I've 
played HOMM5 demo), and it seems that achieving completely reliable 
device detection can be achieved relatively easily by combining EH 
revive and probing.  And with SError.X bit check at the end, PM should 
be able to do reliable detection, too.

PM is requiring more changes than I initially thought and merging 
probing and EH reviving would take some time too.  And, of course, HOMM5 
demo is out.  So, I don't think I can make it this week.  But on the 
bright side, SCSI part of EH seems to be agreed on and although EH and 
hotplug are a little bit flakey, libata generic PM support really works 
on my working tree!

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCHSET 6/9] new EH implementation, take 2
  2006-04-11 13:48 [PATCHSET 6/9] new EH implementation, take 2 Tejun Heo
                   ` (13 preceding siblings ...)
  2006-04-11 13:48 ` [PATCH 14/14] sata_sil24: " Tejun Heo
@ 2006-04-27  9:16 ` Jeff Garzik
  14 siblings, 0 replies; 24+ messages in thread
From: Jeff Garzik @ 2006-04-27  9:16 UTC (permalink / raw)
  To: Tejun Heo; +Cc: alan, axboe, albertcc, lkosewsk, linux-ide

General comment:

My general feeling as I read these patchsets is that the patches are a 
bit /too/ split up.

I know its largely a judgement call, but don't be afraid to submit 
larger patches, particularly for new features like NCQ.  The several 
patches are highly interdependent, and they are introducing new code, 
might as well just lump them together.  Following the "one logical 
change per patch" rule doesn't have to mean "add one new function per patch"

	Jeff

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2006-04-27  9:16 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-04-11 13:48 [PATCHSET 6/9] new EH implementation, take 2 Tejun Heo
2006-04-11 13:48 ` [PATCH 09/14] libata-eh: implement ata_eh_finish_qcs() Tejun Heo
2006-04-11 13:48 ` [PATCH 02/14] libata-eh: implement ata_ering Tejun Heo
2006-04-11 13:48 ` [PATCH 01/14] libata-eh: add constants and flags to be used by EH Tejun Heo
2006-04-11 13:48 ` [PATCH 04/14] libata-eh: implement EH utility functions Tejun Heo
2006-04-11 13:48 ` [PATCH 10/14] libata-eh: implement EH methods for BMDMA controllers Tejun Heo
2006-04-11 13:48 ` [PATCH 08/14] libata-eh: implement ata_eh_revive() Tejun Heo
2006-04-19  9:08   ` zhao, forrest
2006-04-19 10:33     ` Tejun Heo
2006-04-11 13:48 ` [PATCH 11/14] ata_piix: convert to new EH Tejun Heo
2006-04-11 13:48 ` [PATCH 05/14] libata-eh: implement ata_eh_determine_qc() Tejun Heo
2006-04-11 13:48 ` [PATCH 06/14] libata-eh: implement ata_eh_autopsy() Tejun Heo
2006-04-11 13:48 ` [PATCH 03/14] libata-eh: add per-dev ata_ering Tejun Heo
2006-04-11 13:48 ` [PATCH 07/14] libata-eh: implement ata_eh_report() Tejun Heo
2006-04-11 13:48 ` [PATCH 12/14] sata_sil: convert to new EH Tejun Heo
2006-04-11 13:48 ` [PATCH 13/14] ahci: " Tejun Heo
2006-04-20  6:01   ` zhao, forrest
2006-04-20  7:11     ` Tejun Heo
2006-04-20  7:44       ` Jeff Garzik
2006-04-21  1:34         ` Tejun Heo
2006-04-20  9:26   ` zhao, forrest
2006-04-21  1:20     ` Tejun Heo
2006-04-11 13:48 ` [PATCH 14/14] sata_sil24: " Tejun Heo
2006-04-27  9:16 ` [PATCHSET 6/9] new EH implementation, take 2 Jeff Garzik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).