[PATCHSET #upstream-fixes] libata: fix a bunch of PMP related problems

linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCHSET #upstream-fixes] libata: fix a bunch of PMP related problems
@ 2008-05-18 16:15 Tejun Heo
  2008-05-18 16:15 ` [PATCH 01/10] libata: fix sata_link_hardreset() @online out parameter handling Tejun Heo
                   ` (10 more replies)
  0 siblings, 11 replies; 30+ messages in thread
From: Tejun Heo @ 2008-05-18 16:15 UTC (permalink / raw)
  To: jeff, linux-ide; +Cc: liml

Hello,

This patchset is started from investigating sil24 - mv4140 PMP
detection / NCQ problems.  It soon turned out there are quite a number
of issues lurking in the current PMP implementation.

* We still have a hole during reset where hotplug can go unnoticed.

* PMP r/w timeout is too short.  It's currently 250ms.  This is from
  SIMG PMPs and was mainly to avoid longish sequential multiple
  timeouts when access PMP regs via polled commands which is not the
  case anymore and mv 4140 and SIMG4726 needs more time occasionally.

* SDB Notify is not masked during fan-out port resets if PMP hasn't
  been reset in that pass.  This causes PMP register access failure
  because SDB Notify is received while PMP register write is in
  progress for fan-out reset.  I missed this because most of my
  testing was before PMP register access was converted to IRQ driven.

* ata_lpm_schedule() is called during probe right after SCSI scan is
  complete which schedules EH immediately, reliably triggering above
  two on certain hardware combinations.  This by itself should be okay
  but makes other problems more painful.

* sil3124/32 and mv4140 combination for some reason can't do NCQ
  reliably.  I don't know why.  I'll blacklist it for the time being
  and contact both companies about this.

* recovered errors shouldn't trigger resets.

* SIMG4726 config device still is a real pain in the ass.

This patchset contains ten patches addressing all the above issues.
Although this patchset is rather large, it's basically bug fixes.

The patchset fixes all hotplug problems I can reproduce including the
hotplug problems on inic162x and JMB ahcis.  The only remaining issue
is that sometimes device detection gets delayed by 30sec IDENTIFY
timeout.  I'll prep a patchset to make EH command timeouts more
intelligent (for #upstream).

The sad part is that with PMP in the mix and the current host
controller designs, we have inherent race condition during reset (no
reliable way to wait for the initial D2H FIS after hardresetting
fan-out port, so SRST races with D2H FIS) and this seems to lead to
problems during or after reset sometimes.  We have to resort to
intelligently timed retries to make it work bearably.

Thanks.

 drivers/ata/libata-core.c |   40 ++++----
 drivers/ata/libata-eh.c   |  207 +++++++++++++++++++++++++---------------------
 drivers/ata/libata-pmp.c  |   44 ++++-----
 drivers/ata/libata-scsi.c |    6 -
 drivers/ata/sata_sil24.c  |   11 ++
 include/linux/libata.h    |    4
 6 files changed, 170 insertions(+), 142 deletions(-)

--
tejun

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 01/10] libata: fix sata_link_hardreset() @online out parameter handling
  2008-05-18 16:15 [PATCHSET #upstream-fixes] libata: fix a bunch of PMP related problems Tejun Heo
@ 2008-05-18 16:15 ` Tejun Heo
  2008-05-19 21:53   ` Jeff Garzik
  2008-05-18 16:15 ` [PATCH 02/10] libata: reorganize ata_eh_reset() no reset method path Tejun Heo
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 30+ messages in thread
From: Tejun Heo @ 2008-05-18 16:15 UTC (permalink / raw)
  To: jeff, linux-ide; +Cc: liml, Tejun Heo

The @online out parameter is supposed to set to true iff link is
online and reset succeeded as advertised in the function description
and callers are coded expecting that.  However, sata_link_reset()
didn't behave this way on device readiness test failure.  Fix it.

Signed-off-by: Tejun Heo <htejun@gmail.com>
---
 drivers/ata/libata-core.c |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 927b692..c6c316f 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -3653,9 +3653,13 @@ int sata_link_hardreset(struct ata_link *link, const unsigned long *timing,
 	if (check_ready)
 		rc = ata_wait_ready(link, deadline, check_ready);
  out:
-	if (rc && rc != -EAGAIN)
+	if (rc && rc != -EAGAIN) {
+		/* online is set iff link is online && reset succeeded */
+		if (online)
+			*online = false;
 		ata_link_printk(link, KERN_ERR,
 				"COMRESET failed (errno=%d)\n", rc);
+	}
 	DPRINTK("EXIT, rc=%d\n", rc);
 	return rc;
 }
-- 
1.5.2.4


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 02/10] libata: reorganize ata_eh_reset() no reset method path
  2008-05-18 16:15 [PATCHSET #upstream-fixes] libata: fix a bunch of PMP related problems Tejun Heo
  2008-05-18 16:15 ` [PATCH 01/10] libata: fix sata_link_hardreset() @online out parameter handling Tejun Heo
@ 2008-05-18 16:15 ` Tejun Heo
  2008-05-18 16:15 ` [PATCH 03/10] libata: move reset freeze/thaw handling into ata_eh_reset() Tejun Heo
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 30+ messages in thread
From: Tejun Heo @ 2008-05-18 16:15 UTC (permalink / raw)
  To: jeff, linux-ide; +Cc: liml, Tejun Heo

Reorganize ata_eh_reset() such that @prereset() is called even when no
reset method is available and if block is used instead of goto to skip
actual reset.  This makes no reset case behave better (readiness wait)
and future changes easier.

Signed-off-by: Tejun Heo <htejun@gmail.com>
---
 drivers/ata/libata-eh.c |  102 ++++++++++++++++++++++++-----------------------
 1 files changed, 52 insertions(+), 50 deletions(-)

diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index 62e0331..a34adc2 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -2098,7 +2098,9 @@ int ata_eh_reset(struct ata_link *link, int classify,
 	u32 sstatus;
 	int rc;
 
-	/* about to reset */
+	/*
+	 * Prepare to reset
+	 */
 	spin_lock_irqsave(ap->lock, flags);
 	ap->pflags |= ATA_PFLAG_RESETTING;
 	spin_unlock_irqrestore(ap->lock, flags);
@@ -2124,16 +2126,8 @@ int ata_eh_reset(struct ata_link *link, int classify,
 			ap->ops->set_piomode(ap, dev);
 	}
 
-	if (!softreset && !hardreset) {
-		if (verbose)
-			ata_link_printk(link, KERN_INFO, "no reset method "
-					"available, skipping reset\n");
-		if (!(lflags & ATA_LFLAG_ASSUME_CLASS))
-			lflags |= ATA_LFLAG_ASSUME_ATA;
-		goto done;
-	}
-
 	/* prefer hardreset */
+	reset = NULL;
 	ehc->i.action &= ~ATA_EH_RESET;
 	if (hardreset) {
 		reset = hardreset;
@@ -2141,11 +2135,6 @@ int ata_eh_reset(struct ata_link *link, int classify,
 	} else if (softreset) {
 		reset = softreset;
 		ehc->i.action = ATA_EH_SOFTRESET;
-	} else {
-		ata_link_printk(link, KERN_ERR, "BUG: no reset method, "
-				"please report to linux-ide@vger.kernel.org\n");
-		dump_stack();
-		return -EINVAL;
 	}
 
 	if (prereset) {
@@ -2165,55 +2154,68 @@ int ata_eh_reset(struct ata_link *link, int classify,
 					"prereset failed (errno=%d)\n", rc);
 			goto out;
 		}
-	}
 
-	/* prereset() might have cleared ATA_EH_RESET */
-	if (!(ehc->i.action & ATA_EH_RESET)) {
-		/* prereset told us not to reset, bang classes and return */
-		ata_link_for_each_dev(dev, link)
-			classes[dev->devno] = ATA_DEV_NONE;
-		rc = 0;
-		goto out;
+		/* prereset() might have cleared ATA_EH_RESET.  If so,
+		 * bang classes and return.
+		 */
+		if (reset && !(ehc->i.action & ATA_EH_RESET)) {
+			ata_link_for_each_dev(dev, link)
+				classes[dev->devno] = ATA_DEV_NONE;
+			rc = 0;
+			goto out;
+		}
 	}
 
  retry:
+	/*
+	 * Perform reset
+	 */
 	deadline = jiffies + ata_eh_reset_timeouts[try++];
 
-	/* shut up during boot probing */
-	if (verbose)
-		ata_link_printk(link, KERN_INFO, "%s resetting link\n",
-				reset == softreset ? "soft" : "hard");
+	if (reset) {
+		if (verbose)
+			ata_link_printk(link, KERN_INFO, "%s resetting link\n",
+					reset == softreset ? "soft" : "hard");
 
-	/* mark that this EH session started with reset */
-	if (reset == hardreset)
-		ehc->i.flags |= ATA_EHI_DID_HARDRESET;
-	else
-		ehc->i.flags |= ATA_EHI_DID_SOFTRESET;
+		/* mark that this EH session started with reset */
+		if (reset == hardreset)
+			ehc->i.flags |= ATA_EHI_DID_HARDRESET;
+		else
+			ehc->i.flags |= ATA_EHI_DID_SOFTRESET;
 
-	rc = ata_do_reset(link, reset, classes, deadline);
+		rc = ata_do_reset(link, reset, classes, deadline);
 
-	if (reset == hardreset &&
-	    ata_eh_followup_srst_needed(link, rc, classify, classes)) {
-		/* okay, let's do follow-up softreset */
-		reset = softreset;
+		if (reset == hardreset &&
+		    ata_eh_followup_srst_needed(link, rc, classify, classes)) {
+			/* okay, let's do follow-up softreset */
+			reset = softreset;
 
-		if (!reset) {
-			ata_link_printk(link, KERN_ERR,
-					"follow-up softreset required "
-					"but no softreset avaliable\n");
-			rc = -EINVAL;
-			goto fail;
+			if (!reset) {
+				ata_link_printk(link, KERN_ERR,
+						"follow-up softreset required "
+						"but no softreset avaliable\n");
+				rc = -EINVAL;
+				goto fail;
+			}
+
+			ata_eh_about_to_do(link, NULL, ATA_EH_RESET);
+			rc = ata_do_reset(link, reset, classes, deadline);
 		}
 
-		ata_eh_about_to_do(link, NULL, ATA_EH_RESET);
-		rc = ata_do_reset(link, reset, classes, deadline);
+		/* -EAGAIN can happen if we skipped followup SRST */
+		if (rc && rc != -EAGAIN)
+			goto fail;
+	} else {
+		if (verbose)
+			ata_link_printk(link, KERN_INFO, "no reset method "
+					"available, skipping reset\n");
+		if (!(lflags & ATA_LFLAG_ASSUME_CLASS))
+			lflags |= ATA_LFLAG_ASSUME_ATA;
 	}
 
-	/* -EAGAIN can happen if we skipped followup SRST */
-	if (rc && rc != -EAGAIN)
-		goto fail;
-
- done:
+	/*
+	 * Post-reset processing
+	 */
 	ata_link_for_each_dev(dev, link) {
 		/* After the reset, the device state is PIO 0 and the
 		 * controller state is undefined.  Reset also wakes up
-- 
1.5.2.4


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 03/10] libata: move reset freeze/thaw handling into ata_eh_reset()
  2008-05-18 16:15 [PATCHSET #upstream-fixes] libata: fix a bunch of PMP related problems Tejun Heo
  2008-05-18 16:15 ` [PATCH 01/10] libata: fix sata_link_hardreset() @online out parameter handling Tejun Heo
  2008-05-18 16:15 ` [PATCH 02/10] libata: reorganize ata_eh_reset() no reset method path Tejun Heo
@ 2008-05-18 16:15 ` Tejun Heo
  2008-05-18 16:15 ` [PATCH 04/10] libata: kill hotplug related race condition Tejun Heo
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 30+ messages in thread
From: Tejun Heo @ 2008-05-18 16:15 UTC (permalink / raw)
  To: jeff, linux-ide; +Cc: liml, Tejun Heo

Previously reset freeze/thaw handling lived outside of ata_eh_reset()
mainly because the original PMP reset code needed the port frozen
while resetting all the fan-out ports, which is no longer the case.

This patch moves freeze/thaw handling into ata_eh_reset().
@prereset() and @postreset() are now called w/o freezing the port
although @prereset() an be called frozen if the port is frozen prior
to entering ata_eh_reset().

This makes code simpler and will help removing hotplug event related
races.

Signed-off-by: Tejun Heo <htejun@gmail.com>
---
 drivers/ata/libata-eh.c  |   46 ++++++++++++++++++----------------------------
 drivers/ata/libata-pmp.c |    4 ----
 2 files changed, 18 insertions(+), 32 deletions(-)

diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index a34adc2..06a92c5 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -2170,6 +2170,9 @@ int ata_eh_reset(struct ata_link *link, int classify,
 	/*
 	 * Perform reset
 	 */
+	if (ata_is_host_link(link))
+		ata_eh_freeze_port(ap);
+
 	deadline = jiffies + ata_eh_reset_timeouts[try++];
 
 	if (reset) {
@@ -2238,6 +2241,10 @@ int ata_eh_reset(struct ata_link *link, int classify,
 	if (sata_scr_read(link, SCR_STATUS, &sstatus) == 0)
 		link->sata_spd = (sstatus >> 4) & 0xf;
 
+	/* thaw the port */
+	if (ata_is_host_link(link))
+		ata_eh_thaw_port(ap);
+
 	if (postreset)
 		postreset(link, classes);
 
@@ -2589,7 +2596,7 @@ int ata_eh_recover(struct ata_port *ap, ata_prereset_fn_t prereset,
 	struct ata_link *link;
 	struct ata_device *dev;
 	int nr_failed_devs, nr_disabled_devs;
-	int reset, rc;
+	int rc;
 	unsigned long flags;
 
 	DPRINTK("ENTER\n");
@@ -2632,7 +2639,6 @@ int ata_eh_recover(struct ata_port *ap, ata_prereset_fn_t prereset,
 	rc = 0;
 	nr_failed_devs = 0;
 	nr_disabled_devs = 0;
-	reset = 0;
 
 	/* if UNLOADING, finish immediately */
 	if (ap->pflags & ATA_PFLAG_UNLOADING)
@@ -2646,40 +2652,24 @@ int ata_eh_recover(struct ata_port *ap, ata_prereset_fn_t prereset,
 		if (ata_eh_skip_recovery(link))
 			ehc->i.action = 0;
 
-		/* do we need to reset? */
-		if (ehc->i.action & ATA_EH_RESET)
-			reset = 1;
-
 		ata_link_for_each_dev(dev, link)
 			ehc->classes[dev->devno] = ATA_DEV_UNKNOWN;
 	}
 
 	/* reset */
-	if (reset) {
-		/* if PMP is attached, this function only deals with
-		 * downstream links, port should stay thawed.
-		 */
-		if (!sata_pmp_attached(ap))
-			ata_eh_freeze_port(ap);
-
-		ata_port_for_each_link(link, ap) {
-			struct ata_eh_context *ehc = &link->eh_context;
+	ata_port_for_each_link(link, ap) {
+		struct ata_eh_context *ehc = &link->eh_context;
 
-			if (!(ehc->i.action & ATA_EH_RESET))
-				continue;
+		if (!(ehc->i.action & ATA_EH_RESET))
+			continue;
 
-			rc = ata_eh_reset(link, ata_link_nr_vacant(link),
-					  prereset, softreset, hardreset,
-					  postreset);
-			if (rc) {
-				ata_link_printk(link, KERN_ERR,
-						"reset failed, giving up\n");
-				goto out;
-			}
+		rc = ata_eh_reset(link, ata_link_nr_vacant(link),
+				  prereset, softreset, hardreset, postreset);
+		if (rc) {
+			ata_link_printk(link, KERN_ERR,
+					"reset failed, giving up\n");
+			goto out;
 		}
-
-		if (!sata_pmp_attached(ap))
-			ata_eh_thaw_port(ap);
 	}
 
 	/* the rest */
diff --git a/drivers/ata/libata-pmp.c b/drivers/ata/libata-pmp.c
index ff1822a..f3ad024 100644
--- a/drivers/ata/libata-pmp.c
+++ b/drivers/ata/libata-pmp.c
@@ -700,8 +700,6 @@ static int sata_pmp_eh_recover_pmp(struct ata_port *ap,
 	if (ehc->i.action & ATA_EH_RESET) {
 		struct ata_link *tlink;
 
-		ata_eh_freeze_port(ap);
-
 		/* reset */
 		rc = ata_eh_reset(link, 0, prereset, softreset, hardreset,
 				  postreset);
@@ -711,8 +709,6 @@ static int sata_pmp_eh_recover_pmp(struct ata_port *ap,
 			goto fail;
 		}
 
-		ata_eh_thaw_port(ap);
-
 		/* PMP is reset, SErrors cannot be trusted, scan all */
 		ata_port_for_each_link(tlink, ap) {
 			struct ata_eh_context *ehc = &tlink->eh_context;
-- 
1.5.2.4


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 04/10] libata: kill hotplug related race condition
  2008-05-18 16:15 [PATCHSET #upstream-fixes] libata: fix a bunch of PMP related problems Tejun Heo
                   ` (2 preceding siblings ...)
  2008-05-18 16:15 ` [PATCH 03/10] libata: move reset freeze/thaw handling into ata_eh_reset() Tejun Heo
@ 2008-05-18 16:15 ` Tejun Heo
  2008-05-18 16:15 ` [PATCH 05/10] libata: ignore recovered PHY errors Tejun Heo
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 30+ messages in thread
From: Tejun Heo @ 2008-05-18 16:15 UTC (permalink / raw)
  To: jeff, linux-ide; +Cc: liml, Tejun Heo

Originally, whole reset processing was done while the port is frozen
and SError was cleared during @postreset().  This had two race
conditions.  1: hotplug could occur after reset but before SError is
cleared and libata won't know about it.  2: hotplug could occur after
all the reset is complete but before the port is thawed.  As all
events are cleared on thaw, the hotplug event would be lost.

Commit ac371987a81c61c2efbd6931245cdcaf43baad89 kills the first race
by clearing SError during link resume but before link onlineness test.
However, this doesn't fix race #2 and in some cases clearing SError
after SRST is a good idea.

This patch solves this problem by cross checking link onlineness with
classification result after SError is cleared and port is thawed.
Reset is retried if link is online but all devices attached to the
link are unknown.  As all devices will be revalidated, this one-way
check is enough to ensure that all devices are detected and
revalidated reliably.

This, luckily, also fixes the cases where host controller returns
bogus status while harddrive is spinning up after hotplug making
classification run before the device sends the first FIS and thus
causes misdetection.

Low level drivers can bypass the logic by setting class explicitly to
ATA_DEV_NONE if ever necessary (currently none requires this).

Signed-off-by: Tejun Heo <htejun@gmail.com>
---
 drivers/ata/libata-core.c |   21 +++++++-----------
 drivers/ata/libata-eh.c   |   52 ++++++++++++++++++++++++++++++++++++--------
 2 files changed, 50 insertions(+), 23 deletions(-)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index c6c316f..ffc689d 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -3490,22 +3490,11 @@ int sata_link_resume(struct ata_link *link, const unsigned long *params,
 	if ((rc = sata_link_debounce(link, params, deadline)))
 		return rc;
 
-	/* Clear SError.  PMP and some host PHYs require this to
-	 * operate and clearing should be done before checking PHY
-	 * online status to avoid race condition (hotplugging between
-	 * link resume and status check).
-	 */
+	/* clear SError, some PHYs require this even for SRST to work */
 	if (!(rc = sata_scr_read(link, SCR_ERROR, &serror)))
 		rc = sata_scr_write(link, SCR_ERROR, serror);
-	if (rc == 0 || rc == -EINVAL) {
-		unsigned long flags;
 
-		spin_lock_irqsave(link->ap->lock, flags);
-		link->eh_info.serror = 0;
-		spin_unlock_irqrestore(link->ap->lock, flags);
-		rc = 0;
-	}
-	return rc;
+	return rc != -EINVAL ? rc : 0;
 }
 
 /**
@@ -3704,8 +3693,14 @@ int sata_std_hardreset(struct ata_link *link, unsigned int *class,
  */
 void ata_std_postreset(struct ata_link *link, unsigned int *classes)
 {
+	u32 serror;
+
 	DPRINTK("ENTER\n");
 
+	/* reset complete, clear SError */
+	if (!sata_scr_read(link, SCR_ERROR, &serror))
+		sata_scr_write(link, SCR_ERROR, serror);
+
 	/* print link status */
 	sata_print_link_status(link);
 
diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index 06a92c5..751dad0 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -2047,19 +2047,11 @@ static int ata_do_reset(struct ata_link *link, ata_reset_fn_t reset,
 			unsigned int *classes, unsigned long deadline)
 {
 	struct ata_device *dev;
-	int rc;
 
 	ata_link_for_each_dev(dev, link)
 		classes[dev->devno] = ATA_DEV_UNKNOWN;
 
-	rc = reset(link, classes, deadline);
-
-	/* convert all ATA_DEV_UNKNOWN to ATA_DEV_NONE */
-	ata_link_for_each_dev(dev, link)
-		if (classes[dev->devno] == ATA_DEV_UNKNOWN)
-			classes[dev->devno] = ATA_DEV_NONE;
-
-	return rc;
+	return reset(link, classes, deadline);
 }
 
 static int ata_eh_followup_srst_needed(struct ata_link *link,
@@ -2096,7 +2088,7 @@ int ata_eh_reset(struct ata_link *link, int classify,
 	ata_reset_fn_t reset;
 	unsigned long flags;
 	u32 sstatus;
-	int rc;
+	int nr_known, rc;
 
 	/*
 	 * Prepare to reset
@@ -2245,9 +2237,49 @@ int ata_eh_reset(struct ata_link *link, int classify,
 	if (ata_is_host_link(link))
 		ata_eh_thaw_port(ap);
 
+	/* postreset() should clear hardware SError.  Although SError
+	 * is cleared during link resume, clearing SError here is
+	 * necessary as some PHYs raise hotplug events after SRST.
+	 * This introduces race condition where hotplug occurs between
+	 * reset and here.  This race is mediated by cross checking
+	 * link onlineness and classification result later.
+	 */
 	if (postreset)
 		postreset(link, classes);
 
+	/* clear cached SError */
+	spin_lock_irqsave(link->ap->lock, flags);
+	link->eh_info.serror = 0;
+	spin_unlock_irqrestore(link->ap->lock, flags);
+
+	/* Make sure onlineness and classification result correspond.
+	 * Hotplug could have happened during reset and some
+	 * controllers fail to wait while a drive is spinning up after
+	 * being hotplugged causing misdetection.  By cross checking
+	 * link onlineness and classification result, those conditions
+	 * can be reliably detected and retried.
+	 */
+	nr_known = 0;
+	ata_link_for_each_dev(dev, link) {
+		/* convert all ATA_DEV_UNKNOWN to ATA_DEV_NONE */
+		if (classes[dev->devno] == ATA_DEV_UNKNOWN)
+			classes[dev->devno] = ATA_DEV_NONE;
+		else
+			nr_known++;
+	}
+
+	if (classify && !nr_known && ata_link_online(link)) {
+		if (try < max_tries) {
+			ata_link_printk(link, KERN_WARNING, "link online but "
+				       "device misclassified, retrying\n");
+			rc = -EAGAIN;
+			goto fail;
+		}
+		ata_link_printk(link, KERN_WARNING,
+			       "link online but device misclassified, "
+			       "device detection might fail\n");
+	}
+
 	/* reset successful, schedule revalidation */
 	ata_eh_done(link, NULL, ATA_EH_RESET);
 	ehc->i.action |= ATA_EH_REVALIDATE;
-- 
1.5.2.4


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 05/10] libata: ignore recovered PHY errors
  2008-05-18 16:15 [PATCHSET #upstream-fixes] libata: fix a bunch of PMP related problems Tejun Heo
                   ` (3 preceding siblings ...)
  2008-05-18 16:15 ` [PATCH 04/10] libata: kill hotplug related race condition Tejun Heo
@ 2008-05-18 16:15 ` Tejun Heo
  2008-05-19 21:50   ` Jeff Garzik
  2008-05-18 16:15 ` [PATCH 06/10] libata: increase PMP register access timeout to 3s Tejun Heo
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 30+ messages in thread
From: Tejun Heo @ 2008-05-18 16:15 UTC (permalink / raw)
  To: jeff, linux-ide; +Cc: liml, Tejun Heo

No reason to get overzealous about recovered comm and data errors.
Some PHYs habitually sets them w/o no good reason and being draconian
about these soft error conditions doesn't seem to help anybody.

If need ever rises, we might need to add soft PHY error condition, say
AC_ERR_MAYBE_ATA_BUS and use it only to determine whether speed down
is necessary but I don't think that's very likely to happen.  It's far
more likely we'll get timeouts or fatal transmission errors if
recovered errors are so prominent that they hamper operation.

Signed-off-by: Tejun Heo <htejun@gmail.com>
---
 drivers/ata/libata-eh.c |    7 +------
 1 files changed, 1 insertions(+), 6 deletions(-)

diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index 751dad0..7894d83 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -1308,12 +1308,7 @@ static void ata_eh_analyze_serror(struct ata_link *link)
 	unsigned int err_mask = 0, action = 0;
 	u32 hotplug_mask;
 
-	if (serror & SERR_PERSISTENT) {
-		err_mask |= AC_ERR_ATA_BUS;
-		action |= ATA_EH_RESET;
-	}
-	if (serror &
-	    (SERR_DATA_RECOVERED | SERR_COMM_RECOVERED | SERR_DATA)) {
+	if (serror & (SERR_PERSISTENT | SERR_DATA)) {
 		err_mask |= AC_ERR_ATA_BUS;
 		action |= ATA_EH_RESET;
 	}
-- 
1.5.2.4


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 06/10] libata: increase PMP register access timeout to 3s
  2008-05-18 16:15 [PATCHSET #upstream-fixes] libata: fix a bunch of PMP related problems Tejun Heo
                   ` (4 preceding siblings ...)
  2008-05-18 16:15 ` [PATCH 05/10] libata: ignore recovered PHY errors Tejun Heo
@ 2008-05-18 16:15 ` Tejun Heo
  2008-05-18 16:15 ` [PATCH 07/10] libata: make sure PMP notification is turned off during recovery Tejun Heo
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 30+ messages in thread
From: Tejun Heo @ 2008-05-18 16:15 UTC (permalink / raw)
  To: jeff, linux-ide; +Cc: liml, Tejun Heo

This timeout was set low because previously PMP register access was
done via polling and register access timeouts could stack up.  This is
no longer the case.  One timeout will make all following accesses fail
immediately.

In rare cases both marvell and SIMG PMPs need almost a second.  Bump
it to 3s.

While at it, rename it to SATA_PMP_RW_TIMEOUT.  It's not specific to
SCR access.

Signed-off-by: Tejun Heo <htejun@gmail.com>
---
 drivers/ata/libata-pmp.c |    4 ++--
 include/linux/libata.h   |    2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/ata/libata-pmp.c b/drivers/ata/libata-pmp.c
index f3ad024..04a486a 100644
--- a/drivers/ata/libata-pmp.c
+++ b/drivers/ata/libata-pmp.c
@@ -48,7 +48,7 @@ static unsigned int sata_pmp_read(struct ata_link *link, int reg, u32 *r_val)
 	tf.device = link->pmp;
 
 	err_mask = ata_exec_internal(pmp_dev, &tf, NULL, DMA_NONE, NULL, 0,
-				     SATA_PMP_SCR_TIMEOUT);
+				     SATA_PMP_RW_TIMEOUT);
 	if (err_mask)
 		return err_mask;
 
@@ -88,7 +88,7 @@ static unsigned int sata_pmp_write(struct ata_link *link, int reg, u32 val)
 	tf.lbah = (val >> 24) & 0xff;
 
 	return ata_exec_internal(pmp_dev, &tf, NULL, DMA_NONE, NULL, 0,
-				 SATA_PMP_SCR_TIMEOUT);
+				 SATA_PMP_RW_TIMEOUT);
 }
 
 /**
diff --git a/include/linux/libata.h b/include/linux/libata.h
index 0f17643..95e4169 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -341,7 +341,7 @@ enum {
 	ATA_EH_PMP_TRIES	= 5,
 	ATA_EH_PMP_LINK_TRIES	= 3,
 
-	SATA_PMP_SCR_TIMEOUT	= 250,
+	SATA_PMP_RW_TIMEOUT	= 3000,		/* PMP read/write timeout */
 
 	/* Horkage types. May be set by libata or controller on drives
 	   (some horkage may be drive/controller pair dependant */
-- 
1.5.2.4


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 07/10] libata: make sure PMP notification is turned off during recovery
  2008-05-18 16:15 [PATCHSET #upstream-fixes] libata: fix a bunch of PMP related problems Tejun Heo
                   ` (5 preceding siblings ...)
  2008-05-18 16:15 ` [PATCH 06/10] libata: increase PMP register access timeout to 3s Tejun Heo
@ 2008-05-18 16:15 ` Tejun Heo
  2008-05-18 16:15 ` [PATCH 08/10] libata: don't schedule LPM action seperately during probing Tejun Heo
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 30+ messages in thread
From: Tejun Heo @ 2008-05-18 16:15 UTC (permalink / raw)
  To: jeff, linux-ide; +Cc: liml, Tejun Heo

PMP notification during reset can make some controllers fail reset
processing and needs to be turned off during resets.  PMP attach and
full-revalidation path did this via sata_pmp_configure() but the quick
revalidation wasn't.  Move the notification disable code right above
fan-out port recovery so that it's always turned off.

This fixes obscure reset failures.

Signed-off-by: Tejun Heo <htejun@gmail.com>
---
 drivers/ata/libata-pmp.c |   36 ++++++++++++++++++++----------------
 1 files changed, 20 insertions(+), 16 deletions(-)

diff --git a/drivers/ata/libata-pmp.c b/drivers/ata/libata-pmp.c
index 04a486a..0f9386d 100644
--- a/drivers/ata/libata-pmp.c
+++ b/drivers/ata/libata-pmp.c
@@ -257,19 +257,6 @@ static int sata_pmp_configure(struct ata_device *dev, int print_info)
 		goto fail;
 	}
 
-	/* turn off notification till fan-out ports are reset and configured */
-	if (gscr[SATA_PMP_GSCR_FEAT_EN] & SATA_PMP_FEAT_NOTIFY) {
-		gscr[SATA_PMP_GSCR_FEAT_EN] &= ~SATA_PMP_FEAT_NOTIFY;
-
-		err_mask = sata_pmp_write(dev->link, SATA_PMP_GSCR_FEAT_EN,
-					  gscr[SATA_PMP_GSCR_FEAT_EN]);
-		if (err_mask) {
-			rc = -EIO;
-			reason = "failed to write GSCR_FEAT_EN";
-			goto fail;
-		}
-	}
-
 	if (print_info) {
 		ata_dev_printk(dev, KERN_INFO, "Port Multiplier %s, "
 			       "0x%04x:0x%04x r%d, %d ports, feat 0x%x/0x%x\n",
@@ -860,6 +847,7 @@ static int sata_pmp_eh_recover(struct ata_port *ap)
 	struct ata_link *pmp_link = &ap->link;
 	struct ata_device *pmp_dev = pmp_link->device;
 	struct ata_eh_context *pmp_ehc = &pmp_link->eh_context;
+	u32 *gscr = pmp_dev->gscr;
 	struct ata_link *link;
 	struct ata_device *dev;
 	unsigned int err_mask;
@@ -897,6 +885,22 @@ static int sata_pmp_eh_recover(struct ata_port *ap)
 	if (rc)
 		goto pmp_fail;
 
+	/* PHY event notification can disturb reset and other recovery
+	 * operations.  Turn it off.
+	 */
+	if (gscr[SATA_PMP_GSCR_FEAT_EN] & SATA_PMP_FEAT_NOTIFY) {
+		gscr[SATA_PMP_GSCR_FEAT_EN] &= ~SATA_PMP_FEAT_NOTIFY;
+
+		err_mask = sata_pmp_write(pmp_link, SATA_PMP_GSCR_FEAT_EN,
+					  gscr[SATA_PMP_GSCR_FEAT_EN]);
+		if (err_mask) {
+			ata_link_printk(pmp_link, KERN_WARNING,
+				"failed to disable NOTIFY (err_mask=0x%x)\n",
+				err_mask);
+			goto pmp_fail;
+		}
+	}
+
 	/* handle disabled links */
 	rc = sata_pmp_eh_handle_disabled_links(ap);
 	if (rc)
@@ -919,10 +923,10 @@ static int sata_pmp_eh_recover(struct ata_port *ap)
 
 	/* enable notification */
 	if (pmp_dev->flags & ATA_DFLAG_AN) {
-		pmp_dev->gscr[SATA_PMP_GSCR_FEAT_EN] |= SATA_PMP_FEAT_NOTIFY;
+		gscr[SATA_PMP_GSCR_FEAT_EN] |= SATA_PMP_FEAT_NOTIFY;
 
-		err_mask = sata_pmp_write(pmp_dev->link, SATA_PMP_GSCR_FEAT_EN,
-					  pmp_dev->gscr[SATA_PMP_GSCR_FEAT_EN]);
+		err_mask = sata_pmp_write(pmp_link, SATA_PMP_GSCR_FEAT_EN,
+					  gscr[SATA_PMP_GSCR_FEAT_EN]);
 		if (err_mask) {
 			ata_dev_printk(pmp_dev, KERN_ERR, "failed to write "
 				       "PMP_FEAT_EN (Emask=0x%x)\n", err_mask);
-- 
1.5.2.4


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 08/10] libata: don't schedule LPM action seperately during probing
  2008-05-18 16:15 [PATCHSET #upstream-fixes] libata: fix a bunch of PMP related problems Tejun Heo
                   ` (6 preceding siblings ...)
  2008-05-18 16:15 ` [PATCH 07/10] libata: make sure PMP notification is turned off during recovery Tejun Heo
@ 2008-05-18 16:15 ` Tejun Heo
  2008-05-18 16:15 ` [PATCH 09/10] sata_sil24: don't use NCQ if marvell 4140 PMP is attached Tejun Heo
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 30+ messages in thread
From: Tejun Heo @ 2008-05-18 16:15 UTC (permalink / raw)
  To: jeff, linux-ide; +Cc: liml, Tejun Heo

There's no reason to schedule LPM action after probing is complete
causing another EH iteration.  Just schedule it together with probing
itself.

Signed-off-by: Tejun Heo <htejun@gmail.com>
---
 drivers/ata/libata-core.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index ffc689d..a12a27e 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -5615,7 +5615,7 @@ int ata_host_register(struct ata_host *host, struct scsi_host_template *sht)
 			spin_lock_irqsave(ap->lock, flags);
 
 			ehi->probe_mask |= ATA_ALL_DEVICES;
-			ehi->action |= ATA_EH_RESET;
+			ehi->action |= ATA_EH_RESET | ATA_EH_LPM;
 			ehi->flags |= ATA_EHI_NO_AUTOPSY | ATA_EHI_QUIET;
 
 			ap->pflags &= ~ATA_PFLAG_INITIALIZING;
@@ -5648,7 +5648,6 @@ int ata_host_register(struct ata_host *host, struct scsi_host_template *sht)
 		struct ata_port *ap = host->ports[i];
 
 		ata_scsi_scan_host(ap, 1);
-		ata_lpm_schedule(ap, ap->pm_policy);
 	}
 
 	return 0;
-- 
1.5.2.4


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 09/10] sata_sil24: don't use NCQ if marvell 4140 PMP is attached
  2008-05-18 16:15 [PATCHSET #upstream-fixes] libata: fix a bunch of PMP related problems Tejun Heo
                   ` (7 preceding siblings ...)
  2008-05-18 16:15 ` [PATCH 08/10] libata: don't schedule LPM action seperately during probing Tejun Heo
@ 2008-05-18 16:15 ` Tejun Heo
  2008-05-18 21:14   ` Mark Lord
  2008-05-18 16:15 ` [PATCH 10/10] libata: ignore SIMG4726 config pseudo device Tejun Heo
  2008-05-18 16:29 ` [PATCHSET #upstream-fixes] git tree available Tejun Heo
  10 siblings, 1 reply; 30+ messages in thread
From: Tejun Heo @ 2008-05-18 16:15 UTC (permalink / raw)
  To: jeff, linux-ide; +Cc: liml, Tejun Heo

When 4140 PMP is attached to sil24, NCQ commands to fan out port 1 and
2 (0 based) often stall if commands are in progress to other ports.
I've tried a number of things but can't tell what's going on.  It
never happens w/ ahci and reportedly sata_mv which can issue NCQ
commands to multiple devices simultaneously like sil24 does.

Disable NCQ for devices behind 4140 PMP for the time being.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Mark Lord <liml@rtr.ca>
---
 drivers/ata/sata_sil24.c |   11 +++++++++++
 1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/drivers/ata/sata_sil24.c b/drivers/ata/sata_sil24.c
index 27a1101..8ee6b5b 100644
--- a/drivers/ata/sata_sil24.c
+++ b/drivers/ata/sata_sil24.c
@@ -899,14 +899,25 @@ static bool sil24_qc_fill_rtf(struct ata_queued_cmd *qc)
 
 static void sil24_pmp_attach(struct ata_port *ap)
 {
+	u32 *gscr = ap->link.device->gscr;
+
 	sil24_config_pmp(ap, 1);
 	sil24_init_port(ap);
+
+	if (sata_pmp_gscr_vendor(gscr) == 0x11ab &&
+	    sata_pmp_gscr_devid(gscr) == 0x4140) {
+		ata_port_printk(ap, KERN_INFO,
+			"disabling NCQ support due to sil24-mv4140 quirk\n");
+		ap->flags &= ~ATA_FLAG_NCQ;
+	}
 }
 
 static void sil24_pmp_detach(struct ata_port *ap)
 {
 	sil24_init_port(ap);
 	sil24_config_pmp(ap, 0);
+
+	ap->flags |= ATA_FLAG_NCQ;
 }
 
 static int sil24_pmp_hardreset(struct ata_link *link, unsigned int *class,
-- 
1.5.2.4


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 10/10] libata: ignore SIMG4726 config pseudo device
  2008-05-18 16:15 [PATCHSET #upstream-fixes] libata: fix a bunch of PMP related problems Tejun Heo
                   ` (8 preceding siblings ...)
  2008-05-18 16:15 ` [PATCH 09/10] sata_sil24: don't use NCQ if marvell 4140 PMP is attached Tejun Heo
@ 2008-05-18 16:15 ` Tejun Heo
  2008-05-18 16:29 ` [PATCHSET #upstream-fixes] git tree available Tejun Heo
  10 siblings, 0 replies; 30+ messages in thread
From: Tejun Heo @ 2008-05-18 16:15 UTC (permalink / raw)
  To: jeff, linux-ide; +Cc: liml, Tejun Heo

I was hoping ATA_HORKAGE_NODMA | ATA_HORKAGE_SKIP_PM could keep it
happy but no even this doesn't work under certain configurations and
it's not like we can do anything useful with the cofig device anyway.
Replace ATA_HORKAGE_SKIP_PM with ATA_HORKAGE_DISABLE and use it for
the config device.  This makes the device completely ignored by
libata.

Signed-off-by: Tejun Heo <htejun@gmail.com>
---
 drivers/ata/libata-core.c |   10 ++++++++--
 drivers/ata/libata-scsi.c |    6 ------
 include/linux/libata.h    |    2 +-
 3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index a12a27e..3c89f20 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -2126,6 +2126,13 @@ int ata_dev_configure(struct ata_device *dev)
 	dev->horkage |= ata_dev_blacklisted(dev);
 	ata_force_horkage(dev);
 
+	if (dev->horkage & ATA_HORKAGE_DISABLE) {
+		ata_dev_printk(dev, KERN_INFO,
+			       "unsupported device, disabling\n");
+		ata_dev_disable(dev);
+		return 0;
+	}
+
 	/* let ACPI work its magic */
 	rc = ata_acpi_on_devcfg(dev);
 	if (rc)
@@ -3893,8 +3900,7 @@ static const struct ata_blacklist_entry ata_device_blacklist [] = {
 	{ "SAMSUNG CD-ROM SN-124", "N001",	ATA_HORKAGE_NODMA },
 	{ "Seagate STT20000A", NULL,		ATA_HORKAGE_NODMA },
 	/* Odd clown on sil3726/4726 PMPs */
-	{ "Config  Disk",	NULL,		ATA_HORKAGE_NODMA |
-						ATA_HORKAGE_SKIP_PM },
+	{ "Config  Disk",	NULL,		ATA_HORKAGE_DISABLE },
 
 	/* Weird ATAPI devices */
 	{ "TORiSAN DVD-ROM DRD-N216", NULL,	ATA_HORKAGE_MAX_SEC_128 },
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 3ce4392..aeb6e01 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -1082,12 +1082,6 @@ static unsigned int ata_scsi_start_stop_xlat(struct ata_queued_cmd *qc)
 	if (((cdb[4] >> 4) & 0xf) != 0)
 		goto invalid_fld;       /* power conditions not supported */
 
-	if (qc->dev->horkage & ATA_HORKAGE_SKIP_PM) {
-		/* the device lacks PM support, finish without doing anything */
-		scmd->result = SAM_STAT_GOOD;
-		return 1;
-	}
-
 	if (cdb[4] & 0x1) {
 		tf->nsect = 1;	/* 1 sector, lba=0 */
 
diff --git a/include/linux/libata.h b/include/linux/libata.h
index 95e4169..95a2000 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -351,7 +351,7 @@ enum {
 	ATA_HORKAGE_NONCQ	= (1 << 2),	/* Don't use NCQ */
 	ATA_HORKAGE_MAX_SEC_128	= (1 << 3),	/* Limit max sects to 128 */
 	ATA_HORKAGE_BROKEN_HPA	= (1 << 4),	/* Broken HPA */
-	ATA_HORKAGE_SKIP_PM	= (1 << 5),	/* Skip PM operations */
+	ATA_HORKAGE_DISABLE	= (1 << 5),	/* Disable it */
 	ATA_HORKAGE_HPA_SIZE	= (1 << 6),	/* native size off by one */
 	ATA_HORKAGE_IPM		= (1 << 7),	/* Link PM problems */
 	ATA_HORKAGE_IVB		= (1 << 8),	/* cbl det validity bit bugs */
-- 
1.5.2.4


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCHSET #upstream-fixes] git tree available
  2008-05-18 16:15 [PATCHSET #upstream-fixes] libata: fix a bunch of PMP related problems Tejun Heo
                   ` (9 preceding siblings ...)
  2008-05-18 16:15 ` [PATCH 10/10] libata: ignore SIMG4726 config pseudo device Tejun Heo
@ 2008-05-18 16:29 ` Tejun Heo
  2008-05-20  1:35   ` Brian & Chamaigne Scamman
  10 siblings, 1 reply; 30+ messages in thread
From: Tejun Heo @ 2008-05-18 16:29 UTC (permalink / raw)
  To: jeff, linux-ide; +Cc: liml, Brian & Chamaigne Scamman

Git tree available at...

 http://git.kernel.org/?p=linux/kernel/git/tj/libata-dev.git;a=shortlog;h=hotplug-fixes
 git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata-dev.git hotplug-fixes

Brian, the original thread is

 http://thread.gmane.org/gmane.linux.ide/31572

and it might fix your device detection problem.  Please test.
If you don't git, patch against v2.6.26-rc2 is available at...

 http://master.kernel.org/~tj/v2.6.26-rc2-to-hotplug-fixes.patch

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 09/10] sata_sil24: don't use NCQ if marvell 4140 PMP is attached
  2008-05-18 16:15 ` [PATCH 09/10] sata_sil24: don't use NCQ if marvell 4140 PMP is attached Tejun Heo
@ 2008-05-18 21:14   ` Mark Lord
  0 siblings, 0 replies; 30+ messages in thread
From: Mark Lord @ 2008-05-18 21:14 UTC (permalink / raw)
  To: Tejun Heo; +Cc: jeff, linux-ide

Tejun Heo wrote:
> When 4140 PMP is attached to sil24, NCQ commands to fan out port 1 and
> 2 (0 based) often stall if commands are in progress to other ports.
> I've tried a number of things but can't tell what's going on.  It
> never happens w/ ahci and reportedly sata_mv which can issue NCQ
> commands to multiple devices simultaneously like sil24 does.
..

Mmm.. I wonder if sil24 is somehow triggering the "proprietary switching"
mode of the 4140 -- this mode allows command-based-switching controllers
to actually do something similar to FIS-based switching with the 4140.

But I don't know how the 4140 decides to enter that mode..

I'll ping the Marvell folks again on that.

Cheers

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 05/10] libata: ignore recovered PHY errors
  2008-05-18 16:15 ` [PATCH 05/10] libata: ignore recovered PHY errors Tejun Heo
@ 2008-05-19 21:50   ` Jeff Garzik
  0 siblings, 0 replies; 30+ messages in thread
From: Jeff Garzik @ 2008-05-19 21:50 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-ide, liml

Tejun Heo wrote:
> No reason to get overzealous about recovered comm and data errors.
> Some PHYs habitually sets them w/o no good reason and being draconian
> about these soft error conditions doesn't seem to help anybody.
> 
> If need ever rises, we might need to add soft PHY error condition, say
> AC_ERR_MAYBE_ATA_BUS and use it only to determine whether speed down
> is necessary but I don't think that's very likely to happen.  It's far
> more likely we'll get timeouts or fatal transmission errors if
> recovered errors are so prominent that they hamper operation.
> 
> Signed-off-by: Tejun Heo <htejun@gmail.com>
> ---
>  drivers/ata/libata-eh.c |    7 +------
>  1 files changed, 1 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
> index 751dad0..7894d83 100644
> --- a/drivers/ata/libata-eh.c
> +++ b/drivers/ata/libata-eh.c
> @@ -1308,12 +1308,7 @@ static void ata_eh_analyze_serror(struct ata_link *link)
>  	unsigned int err_mask = 0, action = 0;
>  	u32 hotplug_mask;
>  
> -	if (serror & SERR_PERSISTENT) {
> -		err_mask |= AC_ERR_ATA_BUS;
> -		action |= ATA_EH_RESET;
> -	}
> -	if (serror &
> -	    (SERR_DATA_RECOVERED | SERR_COMM_RECOVERED | SERR_DATA)) {
> +	if (serror & (SERR_PERSISTENT | SERR_DATA)) {
>  		err_mask |= AC_ERR_ATA_BUS;
>  		action |= ATA_EH_RESET;

We should keep track of these events, though, a la struct net_device_stats

	Jeff




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 01/10] libata: fix sata_link_hardreset() @online out parameter handling
  2008-05-18 16:15 ` [PATCH 01/10] libata: fix sata_link_hardreset() @online out parameter handling Tejun Heo
@ 2008-05-19 21:53   ` Jeff Garzik
  0 siblings, 0 replies; 30+ messages in thread
From: Jeff Garzik @ 2008-05-19 21:53 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-ide, liml

Tejun Heo wrote:
> The @online out parameter is supposed to set to true iff link is
> online and reset succeeded as advertised in the function description
> and callers are coded expecting that.  However, sata_link_reset()
> didn't behave this way on device readiness test failure.  Fix it.
> 
> Signed-off-by: Tejun Heo <htejun@gmail.com>
> ---
>  drivers/ata/libata-core.c |    6 +++++-
>  1 files changed, 5 insertions(+), 1 deletions(-)

applied 1-10



^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCHSET #upstream-fixes] git tree available
  2008-05-18 16:29 ` [PATCHSET #upstream-fixes] git tree available Tejun Heo
@ 2008-05-20  1:35   ` Brian & Chamaigne Scamman
  2008-05-20  2:58     ` Mark Lord
  0 siblings, 1 reply; 30+ messages in thread
From: Brian & Chamaigne Scamman @ 2008-05-20  1:35 UTC (permalink / raw)
  To: 'Tejun Heo', jeff, linux-ide; +Cc: liml

[-- Attachment #1: Type: text/plain, Size: 1548 bytes --]

Tejun-

The hotplug fixes didn't solve the problem. After adding some "logic
monitoring" statements, I found that the successful discovery of the drives
depends on the response from sil24_exec_polled_cmd.

If the call from ata_wait_register is 327680, the drives are recognized; if
the response is 262144 the drives have timed out.

I've also noticed that the drives respond to the EH entering with DevExch
vs. PHY RDY changed.

Bottom line, the drives seem to be recognized almost every time I attach
them to the PMP after the EH has completed processing the empty PMP.  If the
drives are attached to the PMP before the PMP is attached to the controller,
the drives are almost never recognized (EH times out).

I've attached some more dmesg output (with my logic monitoring statements
included).

-Brian

-----Original Message-----
From: Tejun Heo [mailto:htejun@gmail.com] 
Sent: Sunday, May 18, 2008 12:29 PM
To: jeff@garzik.org; linux-ide@vger.kernel.org
Cc: liml@rtr.ca; Brian & Chamaigne Scamman
Subject: Re: [PATCHSET #upstream-fixes] git tree available

Git tree available at...

http://git.kernel.org/?p=linux/kernel/git/tj/libata-dev.git;a=shortlog;h=hot
plug-fixes
 git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata-dev.git
hotplug-fixes

Brian, the original thread is

 http://thread.gmane.org/gmane.linux.ide/31572

and it might fix your device detection problem.  Please test.
If you don't git, patch against v2.6.26-rc2 is available at...

 http://master.kernel.org/~tj/v2.6.26-rc2-to-hotplug-fixes.patch

Thanks.

-- 
tejun

[-- Attachment #2: ssd_not_working.txt --]
[-- Type: text/plain, Size: 8841 bytes --]

ata3: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0xe frozen
ata3: irq stat 0x00b40090, PHY RDY changed 
ata3: BJS: Prep for recovery 
ata3: BJS: dev 0 
ata3: BJS: Prep for EH 
ata3: BJS: Reset 
ata3: hard resetting link 
ata3: BJS: follow-up softreset required 
ata3: BJS: orig timeout (7812) 
ata3: BJS: IRQ port (11862016) 
ata3: BJS: rc=0 
ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 0) 
ata3: BJS: The Rest... 
ata3.15: Port Multiplier 1.1, 0x1095:0x3726 r23, 6 ports, feat 0x1/0x9 
ata3.00: BJS: Prep for recovery 
ata3.00: BJS: dev 0 
ata3.01: BJS: Prep for recovery 
ata3.01: BJS: dev 0 
ata3.02: BJS: Prep for recovery 
ata3.02: BJS: dev 0 
ata3.03: BJS: Prep for recovery 
ata3.03: BJS: dev 0 
ata3.04: BJS: Prep for recovery 
ata3.04: BJS: dev 0 
ata3.05: BJS: Prep for recovery 
ata3.05: BJS: dev 0 
ata3.00: BJS: Prep for EH 
ata3.01: BJS: Prep for EH 
ata3.02: BJS: Prep for EH 
ata3.03: BJS: Prep for EH 
ata3.04: BJS: Prep for EH 
ata3.05: BJS: Prep for EH 
ata3.00: BJS: Reset 
ata3.00: hard resetting link 
ata3.00: BJS: follow-up softreset required 
ata3.00: BJS: orig timeout (9680) 
ata3: BJS: IRQ port (262144) 
ata3: BJS: IRQ !complete 
ata3.00: softreset failed (timeout) 
ata3.00: BJS: rc=-5 
ata3.15: qc timeout (cmd 0xe4) 
ata3.00: failed to read SCR 0 (Emask=0x4) 
ata3.00: reset failed, giving up 
ata3.15: hard resetting link 
ata3.15: BJS: follow-up softreset required 
ata3.15: BJS: orig timeout (7816) 
ata3: BJS: IRQ port (11862016) 
ata3.15: BJS: rc=0 
ata3.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0) 
ata3.00: BJS: Prep for recovery 
ata3.00: BJS: dev 0 
ata3.01: BJS: Prep for recovery 
ata3.01: BJS: dev 0 
ata3.02: BJS: Prep for recovery 
ata3.02: BJS: dev 0 
ata3.03: BJS: Prep for recovery 
ata3.03: BJS: dev 0 
ata3.04: BJS: Prep for recovery 
ata3.04: BJS: dev 0 
ata3.05: BJS: Prep for recovery 
ata3.05: BJS: dev 0 
ata3.00: BJS: Prep for EH 
ata3.01: BJS: Prep for EH 
ata3.02: BJS: Prep for EH 
ata3.03: BJS: Prep for EH 
ata3.04: BJS: Prep for EH 
ata3.05: BJS: Prep for EH 
ata3.00: BJS: Reset 
ata3.00: hard resetting link 
ata3.00: BJS: follow-up softreset required 
ata3.00: BJS: orig timeout (9680) 
ata3: BJS: IRQ port (262144) 
ata3: BJS: IRQ complete 
ata3.00: softreset failed (timeout) 
ata3.00: BJS: rc=-5 
ata3.15: qc timeout (cmd 0xe4) 
ata3.00: failed to read SCR 0 (Emask=0x4) 
ata3.00: reset failed, giving up 
ata3.15: hard resetting link 
ata3.15: BJS: follow-up softreset required 
ata3.15: BJS: orig timeout (7816) 
ata3: BJS: IRQ port (11862016) 
ata3.15: BJS: rc=0 
ata3.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0) 
ata3.00: BJS: Prep for recovery 
ata3.00: BJS: dev 0 
ata3.01: BJS: Prep for recovery 
ata3.01: BJS: dev 0 
ata3.02: BJS: Prep for recovery 
ata3.02: BJS: dev 0 
ata3.03: BJS: Prep for recovery 
ata3.03: BJS: dev 0 
ata3.04: BJS: Prep for recovery 
ata3.04: BJS: dev 0 
ata3.05: BJS: Prep for recovery 
ata3.05: BJS: dev 0 
ata3.00: BJS: Prep for EH 
ata3.01: BJS: Prep for EH 
ata3.02: BJS: Prep for EH 
ata3.03: BJS: Prep for EH 
ata3.04: BJS: Prep for EH 
ata3.05: BJS: Prep for EH 
ata3.00: BJS: Reset 
ata3.00: hard resetting link 
ata3.00: BJS: follow-up softreset required 
ata3.00: BJS: orig timeout (9680) 
ata3: BJS: IRQ port (262144) 
ata3: BJS: IRQ !complete 
ata3.00: softreset failed (timeout) 
ata3.00: BJS: rc=-5 
ata3.15: qc timeout (cmd 0xe4) 
ata3.00: failed to read SCR 0 (Emask=0x4) 
ata3.00: reset failed, giving up 
ata3.00: failed to recover link after 3 tries, disabling 
ata3: failed to recover PMP, retrying in 5 secs 
ata3.15: hard resetting link 
ata3.15: BJS: follow-up softreset required 
ata3.15: BJS: orig timeout (7816) 
ata3: BJS: IRQ port (11862016) 
ata3.15: BJS: rc=0 
ata3.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0) 
ata3.00: BJS: Prep for recovery 
ata3.00: BJS: dev 0 
ata3.01: BJS: Prep for recovery 
ata3.01: BJS: dev 0 
ata3.02: BJS: Prep for recovery 
ata3.02: BJS: dev 0 
ata3.03: BJS: Prep for recovery 
ata3.03: BJS: dev 0 
ata3.04: BJS: Prep for recovery 
ata3.04: BJS: dev 0 
ata3.05: BJS: Prep for recovery 
ata3.05: BJS: dev 0 
ata3.00: BJS: Prep for EH
ata3.01: BJS: Prep for EH
ata3.02: BJS: Prep for EH
ata3.03: BJS: Prep for EH
ata3.04: BJS: Prep for EH
ata3.05: BJS: Prep for EH
ata3.00: BJS: Reset
ata3.01: BJS: Reset
ata3.01: hard resetting link
ata3.01: SATA link down (SStatus 0 SControl 320)
ata3.02: BJS: Reset
ata3.02: hard resetting link
ata3.02: SATA link down (SStatus 0 SControl 320)
ata3.03: BJS: Reset
ata3.03: hard resetting link
ata3.03: SATA link down (SStatus 0 SControl 320)
ata3.04: BJS: Reset
ata3.04: hard resetting link
ata3.04: SATA link down (SStatus 0 SControl 320)
ata3.05: BJS: Reset
ata3.05: hard resetting link
ata3.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
ata3.00: BJS: The Rest...
ata3.01: BJS: The Rest...
ata3.02: BJS: The Rest...
ata3.03: BJS: The Rest...
ata3.04: BJS: The Rest...
ata3.05: BJS: The Rest...
ata3.00: BJS: Prep for recovery
ata3.00: BJS: dev 0
ata3.01: BJS: Prep for recovery
ata3.01: BJS: dev 0
ata3.02: BJS: Prep for recovery
ata3.02: BJS: dev 0
ata3.03: BJS: Prep for recovery
ata3.03: BJS: dev 0
ata3.04: BJS: Prep for recovery
ata3.04: BJS: dev 0
ata3.05: BJS: Prep for recovery
ata3.05: BJS: dev 0
ata3.00: BJS: Prep for EH
ata3.01: BJS: Prep for EH
ata3.02: BJS: Prep for EH
ata3.03: BJS: Prep for EH
ata3.04: BJS: Prep for EH
ata3.05: BJS: Prep for EH
ata3.00: BJS: Reset
ata3.01: BJS: Reset
ata3.02: BJS: Reset
ata3.03: BJS: Reset
ata3.04: BJS: Reset
ata3.05: BJS: Reset
ata3.00: BJS: The Rest...
ata3.01: BJS: The Rest...
ata3.02: BJS: The Rest...
ata3.03: BJS: The Rest...
ata3.04: BJS: The Rest...
ata3.05: BJS: The Rest...
ata3: EH complete 
ata3.00: exception Emask 0xl0 SAct 0x0 SErr 0x4010000 action 0xf
ata3: SError: { PHYRdyChg DevExch } 
ata3.00: BJS: Prep for recovery 
ata3.00: BJS: dev 0 
ata3.01: BJS: Prep for recovery 
ata3.01: BJS: dev 0 
ata3.02: BJS: Prep for recovery 
ata3.02: BJS: dev 0 
ata3.03: BJS: Prep for recovery 
ata3.03: BJS: dev 0 
ata3.04: BJS: Prep for recovery
ata3.04: BJS: dev 0 
ata3.05: BJS: Prep for recovery
ata3.05: BJS: dev 0 
ata3.00: BJS: Prep for EH
ata3.01: BJS: Prep for EH
ata3.02: BJS: Prep for EH
ata3.03: BJS: Prep for EH
ata3.04: BJS: Prep for EH
ata3.05: BJS: Prep for EH
ata3.00: BJS: Reset 
ata3.00: hard resetting link 
ata3.00: SATA link down (SStatus 0 SControl 320) 
ata3.01: BJS: Reset 
ata3.02: BJS: Reset 
ata3.03: BJS: Reset 
ata3.04: BJS: Reset 
ata3.05: BJS: Reset 
ata3.00: BJS: The Rest... 
ata3.01: BJS: The Rest... 
ata3.02: BJS: The Rest... 
ata3.03: BJS: The Rest... 
ata3.04: BJS: The Rest... 
ata3.05: BJS: The Rest... 
ata3: EH complete 
ata3.00: exception Emask 0xl0 SAct 0x0 SErr 0x4040000 action 0xf 
ata3: SError: { CommWake DevExch }
ata3.00: BJS: Prep for recovery
ata3.00: BJS: dev 0
ata3.01: BJS: Prep for recovery
ata3.01: BJS: dev 0
ata3.02: BJS: Prep for recovery
ata3.02: BJS: dev 0
ata3.03: BJS: Prep for recovery
ata3.03: BJS: dev 0
ata3.04: BJS: Prep for recovery
ata3.04: BJS: dev 0
ata3.05: BJS: Prep for recovery
ata3.05: BJS: dev 0
ata3.00: BJS: Prep for EH
ata3.01: BJS: Prep for EH
ata3.02: BJS: Prep for EH
ata3.03: BJS: Prep for EH
ata3.04: BJS: Prep for EH
ata3.05: BJS: Prep for EH
ata3.00: BJS: Reset
ata3.00: hard resetting link
ata3.00: BJS: follow-up softreset required
ata3.00: BJS: orig timeout (9276)
ata3: BJS: IRQ port (327680) 
ata3.00: BJS: rc=0 
ata3.00: SATA link up 1.5 Gbps (SStatus 113 SControl 320) 
ata3.01: BJS: Reset 
ata3.02: BJS: Reset 
ata3.03: BJS: Reset 
ata3.04: BJS: Reset 
ata3.05: BJS: Reset
ata3.00: BJS: The Rest...
ata3.00: ATA-6: Super Talent Tech, Rev 2.11, max UDMA/133
ata3.00: 127923200 sectors, multi 1: LBA
ata3.00: applying bridge limits
ata3.00: configured for UDMA/100
ata3.01: BJS: The Rest...
ata3.02: BJS: The Rest...
ata3.03: BJS: The Rest...
ata3.04: BJS: The Rest...
ata3.05: BJS: The Rest... 
ata3: EH complete 
scsi 2:0:0:0: Direct-Access     ATA      Super Talent Tec Rev PQ: 0 ANSI: 5 
sd 2:0:0:0: [sda] 127923200 512-byte hardware sectors (65497 MB) 
sd 2:0:0:0: [sda] Write Protect is off 
sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00 
sd 2:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn’t support DPO or FUA 
sd 2:0:0:0: [sda] 127923200 512-byte hardware sectors (65497 MB) 
sd 2:0:0:0: [sda] Write Protect is off 
sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00 
sd 2:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn’t support DPO or FUA 
 sda: sda1 
sd 2:0:0:0: [sda] Attached SCSI disk 
sd 2:0:0:0: Attached scsi generic sg0 type 0 

[-- Attachment #3: ssd_working.txt --]
[-- Type: text/plain, Size: 2825 bytes --]

ata3: exception Emask 0x10 SAct 0x0 SErr 0x10000 action 0xe frozen
ata3: irq_stat 0x00a40080, device exchanged 
ata3: SError: { PHYRdyChg } 
ata3: BJS: Prep for recovery 
ata3: BJS: dev 0 
ata3: BJS: Prep for EH 
ata3: BJS: Reset 
ata3: hard resetting link 
ata3: BJS: follow-up softreset required 
ata3: BJS: orig timeout (7816) 
ata3: BJS: IRQ port (11862016) 
ata3: BJS: rc=0 
ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 0) 
ata3: BJS: The Rest... 
ata3.15: Port Multiplier 1.1, 0x1095:0x3726 r23, 6 ports, feat 0x1/0x9 
ata3.00: BJS: Prep for recovery 
ata3.00: BJS: dev 0 
ata3.01: BJS: Prep for recovery 
ata3.01: BJS: dev 0 
ata3.02: BJS: Prep for recovery 
ata3.02: BJS: dev 0 
ata3.03: BJS: Prep for recovery 
ata3.03: BJS: dev 0 
ata3.04: BJS: Prep for recovery 
ata3.04: BJS: dev 0 
ata3.05: BJS: Prep for recovery 
ata3.05: BJS: dev 0 
ata3.00: BJS: Prep for EH 
ata3.01: BJS: Prep for EH 
ata3.02: BJS: Prep for EH 
ata3.03: BJS: Prep for EH 
ata3.04: BJS: Prep for EH 
ata3.05: BJS: Prep for EH 
ata3.00: BJS: Reset 
ata3.00: hard resetting link 
ata3.00: BJS: follow-up softreset required 
ata3.00: BJS: orig timeout (9680) 
ata3: BJS: IRQ port (327680) 
ata3.00: BJS: rc=0
ata3.00: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
ata3.01: BJS: Reset
ata3.01: hard resetting link
ata3.01: SATA link down (SStatus 0 SControl 320)
ata3.02: BJS: Reset
ata3.02: hard resetting link
ata3.02: SATA link down (SStatus 0 SControl 320)
ata3.03: BJS: Reset
ata3.03: hard resetting link
ata3.03: SATA link down (SStatus 0 SControl 320)
ata3.04: BJS: Reset
ata3.04: hard resetting link
ata3.04: SATA link down (SStatus 0 SControl 320)
ata3.05: BJS: Reset
ata3.05: hard resetting link
ata3.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
ata3.00: BJS: The Rest...
ata3.00: ATA-6: Super Talent Tech, Rev 2.11, max UDMA/133
ata3.00: 127923200 sectors, multi 1: LBA 
ata3.00: applying bridge limits 
ata3.00: configured for UDMA/100 
ata3.01: BJS: The Rest... 
ata3.02: BJS: The Rest. . 
ata3.03: BJS: The Rest... 
ata3.04: BJS: The Rest. . 
ata3.05: BJS: The Rest... 
ata3: EH complete 
scsi 2:0:0:0: Direct-Access          ATA       Super Talent Tec Rev PQ: 0 ANSI: 5 
sd 2:0:0:0: [sda] 127923200 512-byte hardware Sectors (65497 MB) 
sd 2:0:0:0: [sda] Write Protect is off
sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 2:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn’t support DPO or FUA
sd 2:0:0:0: [sda] 127923200 512-byte hardware sectors (65497 MB)
sd 2:0:0:0: [sda] Write Protect is off
sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 2:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn’t support DPO or FUA
 sda: sdal 
sd 2:0:0:0: [sda] Attached SCSI disk
sd 2:0:0:0: Attached scsi generic sg0 type 0

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCHSET #upstream-fixes] git tree available
  2008-05-20  1:35   ` Brian & Chamaigne Scamman
@ 2008-05-20  2:58     ` Mark Lord
  2008-05-20  4:28       ` Tejun Heo
  0 siblings, 1 reply; 30+ messages in thread
From: Mark Lord @ 2008-05-20  2:58 UTC (permalink / raw)
  To: Brian & Chamaigne Scamman; +Cc: 'Tejun Heo', jeff, linux-ide

Brian & Chamaigne Scamman wrote:
> Tejun-
> 
> The hotplug fixes didn't solve the problem. After adding some "logic
> monitoring" statements, I found that the successful discovery of the drives
> depends on the response from sil24_exec_polled_cmd.
> 
> If the call from ata_wait_register is 327680, the drives are recognized; if
> the response is 262144 the drives have timed out.
> 
> I've also noticed that the drives respond to the EH entering with DevExch
> vs. PHY RDY changed.
> 
> Bottom line, the drives seem to be recognized almost every time I attach
> them to the PMP after the EH has completed processing the empty PMP.  If the
> drives are attached to the PMP before the PMP is attached to the controller,
> the drives are almost never recognized (EH times out).
..

hp-polling is beginning to look better and better..  again.  :)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCHSET #upstream-fixes] git tree available
  2008-05-20  2:58     ` Mark Lord
@ 2008-05-20  4:28       ` Tejun Heo
  2008-05-20  4:43         ` Tejun Heo
  2008-05-20 12:08         ` Brian & Chamaigne Scamman
  0 siblings, 2 replies; 30+ messages in thread
From: Tejun Heo @ 2008-05-20  4:28 UTC (permalink / raw)
  To: Mark Lord; +Cc: Brian & Chamaigne Scamman, jeff, linux-ide

Mark Lord wrote:
> Brian & Chamaigne Scamman wrote:
>> Tejun-
>>
>> The hotplug fixes didn't solve the problem. After adding some "logic
>> monitoring" statements, I found that the successful discovery of the 
>> drives
>> depends on the response from sil24_exec_polled_cmd.
>>
>> If the call from ata_wait_register is 327680, the drives are 
>> recognized; if
>> the response is 262144 the drives have timed out.
>>
>> I've also noticed that the drives respond to the EH entering with DevExch
>> vs. PHY RDY changed.
>>
>> Bottom line, the drives seem to be recognized almost every time I attach
>> them to the PMP after the EH has completed processing the empty PMP.  
>> If the
>> drives are attached to the PMP before the PMP is attached to the 
>> controller,
>> the drives are almost never recognized (EH times out).

Can you please describe what you did exactly?  sil3726/4726 has some 
problems when its first fan out port goes online and offline while it's 
powered up.  You want to keep it occupied at all times.

Also, does putting sslep(5) right before followup-SRST help?

> hp-polling is beginning to look better and better..  again.  :)

hp-polling wouldn't really help here.  What's failing is not hotplug 
event detection, the reset protocol is failing.

-- 
tejun

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCHSET #upstream-fixes] git tree available
  2008-05-20  4:28       ` Tejun Heo
@ 2008-05-20  4:43         ` Tejun Heo
  2008-05-21  1:32           ` Brian & Chamaigne Scamman
  2008-05-20 12:08         ` Brian & Chamaigne Scamman
  1 sibling, 1 reply; 30+ messages in thread
From: Tejun Heo @ 2008-05-20  4:43 UTC (permalink / raw)
  To: Mark Lord; +Cc: Brian & Chamaigne Scamman, jeff, linux-ide

[-- Attachment #1: Type: text/plain, Size: 1230 bytes --]

Tejun Heo wrote:
> Mark Lord wrote:
>> Brian & Chamaigne Scamman wrote:
>>> Tejun-
>>>
>>> The hotplug fixes didn't solve the problem. After adding some "logic
>>> monitoring" statements, I found that the successful discovery of the 
>>> drives
>>> depends on the response from sil24_exec_polled_cmd.
>>>
>>> If the call from ata_wait_register is 327680, the drives are 
>>> recognized; if
>>> the response is 262144 the drives have timed out.
>>>
>>> I've also noticed that the drives respond to the EH entering with 
>>> DevExch
>>> vs. PHY RDY changed.
>>>
>>> Bottom line, the drives seem to be recognized almost every time I attach
>>> them to the PMP after the EH has completed processing the empty PMP.  
>>> If the
>>> drives are attached to the PMP before the PMP is attached to the 
>>> controller,
>>> the drives are almost never recognized (EH times out).
> 
> Can you please describe what you did exactly?  sil3726/4726 has some 
> problems when its first fan out port goes online and offline while it's 
> powered up.  You want to keep it occupied at all times.
> 
> Also, does putting sslep(5) right before followup-SRST help?

Can you please test whether the attached patch fixes the detection problem?

-- 
tejun

[-- Attachment #2: simg3726-debug.patch --]
[-- Type: text/x-patch, Size: 706 bytes --]

diff --git a/drivers/ata/libata-pmp.c b/drivers/ata/libata-pmp.c
index 3374ec5..b65db30 100644
--- a/drivers/ata/libata-pmp.c
+++ b/drivers/ata/libata-pmp.c
@@ -322,9 +322,12 @@ static void sata_pmp_quirks(struct ata_port *ap)
 	if (vendor == 0x1095 && devid == 0x3726) {
 		/* sil3726 quirks */
 		ata_port_for_each_link(link, ap) {
-			/* class code report is unreliable */
+			/* Class code report is unreliable and SRST
+			 * times out under certain configurations.
+			 */
 			if (link->pmp < 5)
-				link->flags |= ATA_LFLAG_ASSUME_ATA;
+				link->flags |= ATA_LFLAG_NO_SRST |
+					       ATA_LFLAG_ASSUME_ATA;
 
 			/* port 5 is for SEMB device and it doesn't like SRST */
 			if (link->pmp == 5)

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* RE: [PATCHSET #upstream-fixes] git tree available
  2008-05-20  4:28       ` Tejun Heo
  2008-05-20  4:43         ` Tejun Heo
@ 2008-05-20 12:08         ` Brian & Chamaigne Scamman
  2008-05-20 14:50           ` Tejun Heo
  1 sibling, 1 reply; 30+ messages in thread
From: Brian & Chamaigne Scamman @ 2008-05-20 12:08 UTC (permalink / raw)
  To: 'Tejun Heo', 'Mark Lord'; +Cc: jeff, linux-ide

Tejun Heo wrote:
>Can you please describe what you did exactly?  sil3726/4726 has some 
>problems when its first fan out port goes online and offline while it's 
>powered up.  You want to keep it occupied at all times.
>
Here's the current setup:
- PMP powered via computer's power supply; on with computer
- SSD and bridges powered via computer's power supply; on with computer
- FC9 with new 2.6.26-rc2-hotplug-patches installed and booted
- Once computer is booted, login, open a terminal and dmesg -c
For the case which fails:
- plug SSD drive/bridge into PMP port 0
- plug eSata cable into PMP
- plug eSata cable into eSata port on controller
- watch the softreset timeout
For the case which works:
- plug eSata cable into PMP
- plug eSata cable into eSata port on controller
- watch the PMP become recognized
- plug SSD drive/bridge into PMP port 0
- watch the SSD drive become recognized

Steps above are repeatable. Previously I was applying power to the PMP while
the drives were attached (PMP, bridges and drives all receiving power at the
same time) and then plugging in the eSata cable, but this case always failed
as well.

>Also, does putting sslep(5) right before followup-SRST help?
>
I'll look at adding the sslep line and the new patch (independently).

-Brian


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCHSET #upstream-fixes] git tree available
  2008-05-20 12:08         ` Brian & Chamaigne Scamman
@ 2008-05-20 14:50           ` Tejun Heo
  0 siblings, 0 replies; 30+ messages in thread
From: Tejun Heo @ 2008-05-20 14:50 UTC (permalink / raw)
  To: Brian & Chamaigne Scamman; +Cc: 'Mark Lord', jeff, linux-ide

Brian & Chamaigne Scamman wrote:
> Tejun Heo wrote:
>> Can you please describe what you did exactly?  sil3726/4726 has some 
>> problems when its first fan out port goes online and offline while it's 
>> powered up.  You want to keep it occupied at all times.
>>
> Here's the current setup:
> - PMP powered via computer's power supply; on with computer
> - SSD and bridges powered via computer's power supply; on with computer
> - FC9 with new 2.6.26-rc2-hotplug-patches installed and booted
> - Once computer is booted, login, open a terminal and dmesg -c
> For the case which fails:
> - plug SSD drive/bridge into PMP port 0
> - plug eSata cable into PMP
> - plug eSata cable into eSata port on controller
> - watch the softreset timeout
> For the case which works:
> - plug eSata cable into PMP
> - plug eSata cable into eSata port on controller
> - watch the PMP become recognized
> - plug SSD drive/bridge into PMP port 0
> - watch the SSD drive become recognized
> 
> Steps above are repeatable. Previously I was applying power to the PMP while
> the drives were attached (PMP, bridges and drives all receiving power at the
> same time) and then plugging in the eSata cable, but this case always failed
> as well.
> 
>> Also, does putting sslep(5) right before followup-SRST help?
>>
> I'll look at adding the sslep line and the new patch (independently).

Please try the patch.  I think that should work.

-- 
tejun

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCHSET #upstream-fixes] git tree available
  2008-05-20  4:43         ` Tejun Heo
@ 2008-05-21  1:32           ` Brian & Chamaigne Scamman
  2008-05-21  4:59             ` Tejun Heo
  0 siblings, 1 reply; 30+ messages in thread
From: Brian & Chamaigne Scamman @ 2008-05-21  1:32 UTC (permalink / raw)
  To: 'Tejun Heo', 'Mark Lord'; +Cc: jeff, linux-ide

[-- Attachment #1: Type: text/plain, Size: 793 bytes --]

Tejun Heo wrote:
>Can you please test whether the attached patch fixes the detection problem?

Tejun-
The patch worked great; see dmesg log below...
I didn't test the sslep(5) command since the patch worked; do you still need
me to test it?

One remaining issue is the "failed to IDENTIFY (I/O error, err mask=0xll)"
message. If the PMP, bridges and drives are powered up before I insert the
eSATA cable in to the controller, I don't receive the message. The message
appears when only when I have the eSATA cable attached to the controller and
then simultaneously power on the PMP and drives. In the attached dmesg log,
the drive is eventually recognized; however there have been a couple of
times when the drives aren't recognized. An eSATA cable remove/re-insert
fixes the problem.

-Brian


[-- Attachment #2: dmesg_pmp_success.txt --]
[-- Type: text/plain, Size: 4546 bytes --]

ata3: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0xe frozen
ata3: irq_stat 0x00b40090, PHY RDY changed 
ata3: hard resetting link 
ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 0) 
ata3.15: Port Multiplier 1.1, 0x1095:0x3726 r23, 6 ports, feat 0xl/0x9
ata3.00: hard resetting link 
ata3.00: SATA link up 1.5 Gbps (SStatus 113 SControl 320) 
ata3.01: hard resetting link
ata3.01: SATA link down (SStatus 0 SControl 320)
ata3.02: hard resetting link
ata3.02: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata3.03: hard resetting link
ata3.03: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
ata3.04: hard resetting link
ata3.04: SATA link down (SStatus 0 SControl 320)
ata3.05: hard resetting link
ata3.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
ata3.00: failed to IDENTIFY (I/O error, err mask=0xll)
ata3.15: hard resetting link 
ata3.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0) 
ata3.00: hard resetting link 
ata3.00: SATA link up 1.5 Gbps (SStatus 113 SControl 320) 
ata3.02: hard resetting link 
ata3.02: SATA link up 1.5 Gbps (SStatus 113 SControl 300) 
ata3.03: hard resetting link 
ata3.03: SATA link up 1.5 Gbps (SStatus 113 SControl 300) 
ata3.05: hard resetting link 
ata3.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320) 
ata3.00: ATA-6: Super Talent Tech, Rev 2.11, max UDMA/133
ata3.00: 127923200 sectors, multi 1: LBA 
ata3.00: applying bridge limits 
ata3.00: configured for UDMA/100 
ata3.02: ATA-6: Super Talent Tech, Rev 2.11, max UDMA/133
ata3.02: 127923200 sectors, multi 1: LBA 
ata3.02: applying bridge limits 
ata3.02: configured for UDMA/100 
ata3.03: failed to IDENTIFY (I/O error, err mask=0xll) 
ata3.15: hard resetting link 
ata3.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
ata3.00: hard resetting link 
ata3.00: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
ata3.01: hard resetting link 
ata3.01: SATA link down (SStatus 0 SControl 320) 
ata3.02: hard resetting link 
ata3.02: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata3.03: hard resetting link 
ata3.03: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
ata3.04: hard resetting link 
ata3.04: SATA link down (SStatus 0 SControl 320) 
ata3.05: hard resetting link 
ata3.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
ata3.00: configured for UDMA/100 
ata3.02: configured for UDMA/100
ata3.03: ATA-6: Super Talent Tech, Rev 2.11, max UDMA/133
ata3.03: 127923200 sectors, multi 1: LBA
ata3.03: applying bridge limits
ata3.03: configured for UDMA/100
ata3: EH complete 
scsi 2:0:0:0: Direct-Access     ATA      Super Talent Tec Rev  PQ: 0 ANSI: 5 
sd 2:0:0:0: [sda] 127923200 512-byte hardware sectors (65497 MB) 
sd 2:0:0:0: [sda] Write Protect is off 
sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00 
sd 2:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn’t support DPO or FUA
sd 2:0:0:0: [sda] 127923200 512-byte hardware sectors (65497 MB) 
sd 2:0:0:0: [sda] Write Protect is off 
sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00 
sd 2:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn’t support DPO or FUA 
 sda: sda1 
sd 2:0:0:0: [sda] Attached SCSI disk 
sd 2:0:0:0: Attached scsi generic sg0 type 0 
scsi 2:2:0:0: Direct-Access     ATA      Super Talent Tec Rev  PQ: 0 ANSI: 5
sd 2:2:0:0: [sdb] 127923200 512-byte hardware sectors (65497 MB) 
sd 2:2:0:0: [sdb] Write Protect is off 
sd 2:2:0:0: [sdb] Mode Sense: 00 3a 00 00 
sd 2:2:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn’t support DPO or FUA
sd 2:2:0:0: [sdb] 127923200 512-byte hardware sectors (65497 MB) 
sd 2:2:0:0: [sdb] Write Protect is off 
sd 2:2:0:0: [sdb] Mode Sense: 00 3a 00 00 
sd 2:2:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn’t support DPO or FUA
 sdb: sdb1 
sd 2:2:0:0: [sdb] Attached SCSI disk 
sd 2:2:0:0: Attached scsi generic sg1 type 0 
scsi 2:3:0:0: Direct-Access     ATA      Super Talent Tec Rev  PQ: 0 ANSI: 5
sd 2:3:0:0: [sdc] 127923200 512-byte hardware sectors (65497 MB) 
sd 2:3:0:0: [sdc] Write Protect is off 
sd 2:3:0:0: [sdc] Mode Sense: 00 3a 00 00 
sd 2:3:0:0: [sdc] Write cache: disabled, read cache: enabled, doesn’t support DPO or FUA
sd 2:3:0:0: [sdc] 127923200 512-byte hardware sectors (65497 MB) 
sd 2:3:0:0: [sdc] Write Protect is off 
sd 2:3:0:0: [sdc] Mode Sense: 00 3a 00 00 
sd 2:3:0:0: [sdc] Write cache: disabled, read cache: enabled, doesn’t support DPO or FUA
 sdc: sdc1 
sd 2:3:0:0: [sdc] Attached SCSI disk 
sd 2:3:0:0: Attached scsi generic sg2 type 0 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCHSET #upstream-fixes] git tree available
  2008-05-21  1:32           ` Brian & Chamaigne Scamman
@ 2008-05-21  4:59             ` Tejun Heo
  2008-05-21 11:14               ` Brian & Chamaigne Scamman
  2008-05-21 19:42               ` Brian & Chamaigne Scamman
  0 siblings, 2 replies; 30+ messages in thread
From: Tejun Heo @ 2008-05-21  4:59 UTC (permalink / raw)
  To: Brian & Chamaigne Scamman; +Cc: 'Mark Lord', jeff, linux-ide

Brian & Chamaigne Scamman wrote:
> Tejun Heo wrote:
>> Can you please test whether the attached patch fixes the detection problem?
> 
> Tejun-
> The patch worked great; see dmesg log below...
> I didn't test the sslep(5) command since the patch worked; do you still need
> me to test it?

Nope.

> One remaining issue is the "failed to IDENTIFY (I/O error, err mask=0xll)"
> message. If the PMP, bridges and drives are powered up before I insert the
> eSATA cable in to the controller, I don't receive the message. The message
> appears when only when I have the eSATA cable attached to the controller and
> then simultaneously power on the PMP and drives. In the attached dmesg log,
> the drive is eventually recognized; however there have been a couple of
> times when the drives aren't recognized. An eSATA cable remove/re-insert
> fixes the problem.

Hmm... can you please post the failing log?

-- 
tejun

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCHSET #upstream-fixes] git tree available
  2008-05-21  4:59             ` Tejun Heo
@ 2008-05-21 11:14               ` Brian & Chamaigne Scamman
  2008-05-21 19:42               ` Brian & Chamaigne Scamman
  1 sibling, 0 replies; 30+ messages in thread
From: Brian & Chamaigne Scamman @ 2008-05-21 11:14 UTC (permalink / raw)
  To: 'Tejun Heo'; +Cc: 'Mark Lord', jeff, linux-ide

Tejun Heo wrote:
>Brian & Chamaigne Scamman wrote:
>> One remaining issue is the "failed to IDENTIFY (I/O error, err
mask=0xll)"
>> message. If the PMP, bridges and drives are powered up before I insert
the
>> eSATA cable in to the controller, I don't receive the message. The
message
>> appears when only when I have the eSATA cable attached to the controller
and
>> then simultaneously power on the PMP and drives. In the attached dmesg
log,
>> the drive is eventually recognized; however there have been a couple of
>> times when the drives aren't recognized. An eSATA cable remove/re-insert
>> fixes the problem.
>
>Hmm... can you please post the failing log?

In my previous post, there are two cases where the drive wasn't initially
recognized, but did respond after a PMP link reset (ata3.00 and ata3.03).
I'll post a log today were one of the drives is never recognized (doesn't
happen that often)

-Brian


^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCHSET #upstream-fixes] git tree available
  2008-05-21  4:59             ` Tejun Heo
  2008-05-21 11:14               ` Brian & Chamaigne Scamman
@ 2008-05-21 19:42               ` Brian & Chamaigne Scamman
  2008-05-22  0:40                 ` Tejun Heo
  1 sibling, 1 reply; 30+ messages in thread
From: Brian & Chamaigne Scamman @ 2008-05-21 19:42 UTC (permalink / raw)
  To: 'Tejun Heo'; +Cc: 'Mark Lord', jeff, linux-ide

[-- Attachment #1: Type: text/plain, Size: 776 bytes --]

Tejun Heo wrote:
>Brian & Chamaigne Scamman wrote:
>> One remaining issue is the "failed to IDENTIFY (I/O error, err
mask=0xll)"
>> message. If the PMP, bridges and drives are powered up before I 
>> insert the eSATA cable in to the controller, I don't receive the 
>> message. The message appears when only when I have the eSATA cable 
>> attached to the controller and then simultaneously power on the PMP 
>> and drives. In the attached dmesg log, the drive is eventually 
>> recognized; however there have been a couple of times when the drives 
>> aren't recognized. An eSATA cable remove/re-insert fixes the problem.
>
>Hmm... can you please post the failing log?

Here's a dmesg log where one of the drives is never recognized, but the
others eventually are...

-Brian

[-- Attachment #2: pmp_drive_header_err.txt --]
[-- Type: text/plain, Size: 6104 bytes --]

ata3: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0xe frozen
ata3: irq_stat 0x00b40090, PHY RDY changed
ata3: hard resetting link
ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
ata3.15: Port Multiplier 1.1, 0x1095:0x3726 r23, 6 ports, feat 0x1/0x9
ata3.00: hard resetting link
ata3.00: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
ata3.01: hard resetting link
ata3.01: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata3.02: hard resetting link
ata3.02: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata3.03: hard resetting link
ata3.03: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata3.04: hard resetting link
ata3.04: SATA link down (SStatus 0 SControl 320)
ata3.05: hard resetting link
ata3.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
ata3.00: failed to IDENTIFY (I/O error, err_mask=0x11)
ata3.15: hard resetting link
ata3.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
ata3.00: hard resetting link
ata3.00: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
ata3.01: hard resetting link
ata3.01: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata3.02: hard resetting link
ata3.02: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata3.03: hard resetting link
ata3.03: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata3.05: hard resetting link
ata3.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
ata3.00: ATA-6: Super Talent Tech, Rev 2.11, max UDMA/133
ata3.00: 127923200 sectors, multi 1: LBA 
ata3.00: applying bridge limits
ata3.00: configured for UDMA/100
ata3.01: ATA-6: Super Talent Tech, Rev 2.11, max UDMA/133
ata3.01: 127923200 sectors, multi 1: LBA 
ata3.01: applying bridge limits
ata3.01: configured for UDMA/100
ata3.02: ATA-6: Super Talent Tech, Rev 2.11, max UDMA/133
ata3.02: 127923200 sectors, multi 1: LBA 
ata3.02: applying bridge limits
ata3.02: configured for UDMA/100
ata3.03: failed to IDENTIFY (I/O error, err_mask=0x11)
ata3.15: hard resetting link
ata3.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
ata3.00: hard resetting link
ata3.00: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
ata3.01: hard resetting link
ata3.01: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata3.02: hard resetting link
ata3.02: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata3.03: hard resetting link
ata3.03: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata3.04: hard resetting link
ata3.04: SATA link down (SStatus 0 SControl 320)
ata3.05: hard resetting link
ata3.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
ata3.00: configured for UDMA/100
ata3.01: configured for UDMA/100
ata3.02: configured for UDMA/100
ata3.03: failed to IDENTIFY (I/O error, err_mask=0x11)
ata3.15: hard resetting link
ata3.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
ata3.00: hard resetting link
ata3.00: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
ata3.01: hard resetting link
ata3.01: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata3.02: hard resetting link
ata3.02: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata3.03: hard resetting link
ata3.03: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata3.05: hard resetting link
ata3.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
ata3.00: configured for UDMA/100
ata3.01: configured for UDMA/100
ata3.02: configured for UDMA/100
ata3.03: failed to IDENTIFY (I/O error, err_mask=0x11)
ata3.03: failed to recover link after 3 tries, disabling
ata3: failed to recover PMP, retrying in 5 secs
ata3.15: hard resetting link
ata3.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
ata3.00: hard resetting link
ata3.00: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
ata3.01: hard resetting link
ata3.01: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata3.02: hard resetting link
ata3.02: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata3.04: hard resetting link
ata3.04: SATA link down (SStatus 0 SControl 320)
ata3.05: hard resetting link
ata3.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
ata3.00: configured for UDMA/100
ata3.01: configured for UDMA/100
ata3.02: configured for UDMA/100
ata3: EH complete
scsi 2:0:0:0: Direct-Access     ATA      Super Talent Tec Rev  PQ: 0 ANSI: 5
sd 2:0:0:0: [sda] 127923200 512-byte hardware sectors (65497 MB)
sd 2:0:0:0: [sda] Write Protect is off
sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 2:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
sd 2:0:0:0: [sda] 127923200 512-byte hardware sectors (65497 MB)
sd 2:0:0:0: [sda] Write Protect is off
sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 2:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
 sda: sda1
sd 2:0:0:0: [sda] Attached SCSI disk
sd 2:0:0:0: Attached scsi generic sg0 type 0
scsi 2:1:0:0: Direct-Access     ATA      Super Talent Tec Rev  PQ: 0 ANSI: 5
sd 2:1:0:0: [sdb] 127923200 512-byte hardware sectors (65497 MB)
sd 2:1:0:0: [sdb] Write Protect is off
sd 2:1:0:0: [sdb] Mode Sense: 00 3a 00 00
sd 2:1:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
sd 2:1:0:0: [sdb] 127923200 512-byte hardware sectors (65497 MB)
sd 2:1:0:0: [sdb] Write Protect is off
sd 2:1:0:0: [sdb] Mode Sense: 00 3a 00 00
sd 2:1:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
 sdb: sdb1
sd 2:1:0:0: [sdb] Attached SCSI disk
sd 2:1:0:0: Attached scsi generic sg1 type 0
scsi 2:2:0:0: Direct-Access     ATA      Super Talent Tec Rev  PQ: 0 ANSI: 5
sd 2:2:0:0: [sdc] 127923200 512-byte hardware sectors (65497 MB)
sd 2:2:0:0: [sdc] Write Protect is off
sd 2:2:0:0: [sdc] Mode Sense: 00 3a 00 00
sd 2:2:0:0: [sdc] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
sd 2:2:0:0: [sdc] 127923200 512-byte hardware sectors (65497 MB)
sd 2:2:0:0: [sdc] Write Protect is off
sd 2:2:0:0: [sdc] Mode Sense: 00 3a 00 00
sd 2:2:0:0: [sdc] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
 sdc: sdc1
sd 2:2:0:0: [sdc] Attached SCSI disk
sd 2:2:0:0: Attached scsi generic sg2 type 0

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCHSET #upstream-fixes] git tree available
  2008-05-21 19:42               ` Brian & Chamaigne Scamman
@ 2008-05-22  0:40                 ` Tejun Heo
  2008-05-23  0:49                   ` Brian & Chamaigne Scamman
  0 siblings, 1 reply; 30+ messages in thread
From: Tejun Heo @ 2008-05-22  0:40 UTC (permalink / raw)
  To: Brian & Chamaigne Scamman; +Cc: 'Mark Lord', jeff, linux-ide

Brian & Chamaigne Scamman wrote:
> Tejun Heo wrote:
>> Brian & Chamaigne Scamman wrote:
>>> One remaining issue is the "failed to IDENTIFY (I/O error, err
> mask=0xll)"
>>> message. If the PMP, bridges and drives are powered up before I 
>>> insert the eSATA cable in to the controller, I don't receive the 
>>> message. The message appears when only when I have the eSATA cable 
>>> attached to the controller and then simultaneously power on the PMP 
>>> and drives. In the attached dmesg log, the drive is eventually 
>>> recognized; however there have been a couple of times when the drives 
>>> aren't recognized. An eSATA cable remove/re-insert fixes the problem.
>> Hmm... can you please post the failing log?
> 
> Here's a dmesg log where one of the drives is never recognized, but the
> others eventually are...
> 
> -Brian
> 

Can you please put ssleep(5) right after the first ata_do_reset() call 
in drivers/ata/libata-eh.c::ata_eh_reset() and see whether the problem 
goes away?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCHSET #upstream-fixes] git tree available
  2008-05-22  0:40                 ` Tejun Heo
@ 2008-05-23  0:49                   ` Brian & Chamaigne Scamman
  2008-05-23  1:04                     ` Tejun Heo
  0 siblings, 1 reply; 30+ messages in thread
From: Brian & Chamaigne Scamman @ 2008-05-23  0:49 UTC (permalink / raw)
  To: 'Tejun Heo'; +Cc: 'Mark Lord', jeff, linux-ide

Tejun Heo wrote:
>Can you please put ssleep(5) right after the first ata_do_reset() call 
>in drivers/ata/libata-eh.c::ata_eh_reset() and see whether the problem 
>goes away?

This actually caused more problems. The "failed to IDENTIFY (I/O error,
err_mask=0x11)" was still printed, plus there was a 5 second delay for each
port on the PMP during each of the 3 attempts to read the drive. In the end,
neither of the two SSD drives I tried ended up working. Changing ssleep(5)
to msleep(5) didn't remove the IDENTIFY errors, but still allowed the drives
to work.

Is there anyway to control the length of the reset pulse? I've heard that
some of the SSD's require the reset pulse to be held longer than normal.

-Brian

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCHSET #upstream-fixes] git tree available
  2008-05-23  0:49                   ` Brian & Chamaigne Scamman
@ 2008-05-23  1:04                     ` Tejun Heo
  2008-05-29  3:06                       ` Tejun Heo
  0 siblings, 1 reply; 30+ messages in thread
From: Tejun Heo @ 2008-05-23  1:04 UTC (permalink / raw)
  To: Brian & Chamaigne Scamman; +Cc: 'Mark Lord', jeff, linux-ide

Brian & Chamaigne Scamman wrote:
> Tejun Heo wrote:
>> Can you please put ssleep(5) right after the first ata_do_reset() call 
>> in drivers/ata/libata-eh.c::ata_eh_reset() and see whether the problem 
>> goes away?
> 
> This actually caused more problems. The "failed to IDENTIFY (I/O error,
> err_mask=0x11)" was still printed, plus there was a 5 second delay for each
> port on the PMP during each of the 3 attempts to read the drive. In the end,
> neither of the two SSD drives I tried ended up working. Changing ssleep(5)
> to msleep(5) didn't remove the IDENTIFY errors, but still allowed the drives
> to work.
> 
> Is there anyway to control the length of the reset pulse? I've heard that
> some of the SSD's require the reset pulse to be held longer than normal.

For sata_sil24, it's determined by the controller.  Hmmm... Please wait 
a bit, I have another thing to try.

-- 
tejun

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCHSET #upstream-fixes] git tree available
  2008-05-23  1:04                     ` Tejun Heo
@ 2008-05-29  3:06                       ` Tejun Heo
  2008-05-29  3:11                         ` Brian & Chamaigne Scamman
  0 siblings, 1 reply; 30+ messages in thread
From: Tejun Heo @ 2008-05-29  3:06 UTC (permalink / raw)
  To: Brian & Chamaigne Scamman; +Cc: 'Mark Lord', jeff, linux-ide

Tejun Heo wrote:
> Brian & Chamaigne Scamman wrote:
>> Tejun Heo wrote:
>>> Can you please put ssleep(5) right after the first ata_do_reset()
>>> call in drivers/ata/libata-eh.c::ata_eh_reset() and see whether the
>>> problem goes away?
>>
>> This actually caused more problems. The "failed to IDENTIFY (I/O error,
>> err_mask=0x11)" was still printed, plus there was a 5 second delay for
>> each
>> port on the PMP during each of the 3 attempts to read the drive. In
>> the end,
>> neither of the two SSD drives I tried ended up working. Changing
>> ssleep(5)
>> to msleep(5) didn't remove the IDENTIFY errors, but still allowed the
>> drives
>> to work.
>>
>> Is there anyway to control the length of the reset pulse? I've heard that
>> some of the SSD's require the reset pulse to be held longer than normal.
> 
> For sata_sil24, it's determined by the controller.  Hmmm... Please wait
> a bit, I have another thing to try.

Which didn't work out too well.  :-(

I'm sorry but I'm out of ideas.  I'm gonna ask SIMG about it.  Do you
mind being cc'd there?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCHSET #upstream-fixes] git tree available
  2008-05-29  3:06                       ` Tejun Heo
@ 2008-05-29  3:11                         ` Brian & Chamaigne Scamman
  0 siblings, 0 replies; 30+ messages in thread
From: Brian & Chamaigne Scamman @ 2008-05-29  3:11 UTC (permalink / raw)
  To: 'Tejun Heo'; +Cc: 'Mark Lord', jeff, linux-ide

Not at all.

-----Original Message-----
From: Tejun Heo [mailto:htejun@gmail.com] 
Sent: Wednesday, May 28, 2008 11:07 PM
To: Brian & Chamaigne Scamman
Cc: 'Mark Lord'; jeff@garzik.org; linux-ide@vger.kernel.org
Subject: Re: [PATCHSET #upstream-fixes] git tree available

Tejun Heo wrote:
> Brian & Chamaigne Scamman wrote:
>> Tejun Heo wrote:
>>> Can you please put ssleep(5) right after the first ata_do_reset()
>>> call in drivers/ata/libata-eh.c::ata_eh_reset() and see whether the
>>> problem goes away?
>>
>> This actually caused more problems. The "failed to IDENTIFY (I/O error,
>> err_mask=0x11)" was still printed, plus there was a 5 second delay for
>> each
>> port on the PMP during each of the 3 attempts to read the drive. In
>> the end,
>> neither of the two SSD drives I tried ended up working. Changing
>> ssleep(5)
>> to msleep(5) didn't remove the IDENTIFY errors, but still allowed the
>> drives
>> to work.
>>
>> Is there anyway to control the length of the reset pulse? I've heard that
>> some of the SSD's require the reset pulse to be held longer than normal.
> 
> For sata_sil24, it's determined by the controller.  Hmmm... Please wait
> a bit, I have another thing to try.

Which didn't work out too well.  :-(

I'm sorry but I'm out of ideas.  I'm gonna ask SIMG about it.  Do you
mind being cc'd there?

Thanks.

-- 
tejun


^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2008-05-29  3:11 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-05-18 16:15 [PATCHSET #upstream-fixes] libata: fix a bunch of PMP related problems Tejun Heo
2008-05-18 16:15 ` [PATCH 01/10] libata: fix sata_link_hardreset() @online out parameter handling Tejun Heo
2008-05-19 21:53   ` Jeff Garzik
2008-05-18 16:15 ` [PATCH 02/10] libata: reorganize ata_eh_reset() no reset method path Tejun Heo
2008-05-18 16:15 ` [PATCH 03/10] libata: move reset freeze/thaw handling into ata_eh_reset() Tejun Heo
2008-05-18 16:15 ` [PATCH 04/10] libata: kill hotplug related race condition Tejun Heo
2008-05-18 16:15 ` [PATCH 05/10] libata: ignore recovered PHY errors Tejun Heo
2008-05-19 21:50   ` Jeff Garzik
2008-05-18 16:15 ` [PATCH 06/10] libata: increase PMP register access timeout to 3s Tejun Heo
2008-05-18 16:15 ` [PATCH 07/10] libata: make sure PMP notification is turned off during recovery Tejun Heo
2008-05-18 16:15 ` [PATCH 08/10] libata: don't schedule LPM action seperately during probing Tejun Heo
2008-05-18 16:15 ` [PATCH 09/10] sata_sil24: don't use NCQ if marvell 4140 PMP is attached Tejun Heo
2008-05-18 21:14   ` Mark Lord
2008-05-18 16:15 ` [PATCH 10/10] libata: ignore SIMG4726 config pseudo device Tejun Heo
2008-05-18 16:29 ` [PATCHSET #upstream-fixes] git tree available Tejun Heo
2008-05-20  1:35   ` Brian & Chamaigne Scamman
2008-05-20  2:58     ` Mark Lord
2008-05-20  4:28       ` Tejun Heo
2008-05-20  4:43         ` Tejun Heo
2008-05-21  1:32           ` Brian & Chamaigne Scamman
2008-05-21  4:59             ` Tejun Heo
2008-05-21 11:14               ` Brian & Chamaigne Scamman
2008-05-21 19:42               ` Brian & Chamaigne Scamman
2008-05-22  0:40                 ` Tejun Heo
2008-05-23  0:49                   ` Brian & Chamaigne Scamman
2008-05-23  1:04                     ` Tejun Heo
2008-05-29  3:06                       ` Tejun Heo
2008-05-29  3:11                         ` Brian & Chamaigne Scamman
2008-05-20 12:08         ` Brian & Chamaigne Scamman
2008-05-20 14:50           ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).