sata_sil24 EH complete

linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* sata_sil24 EH complete
@ 2009-05-05 14:22 Tim Connors
  2009-05-06  0:32 ` Tejun Heo
  0 siblings, 1 reply; 7+ messages in thread
From: Tim Connors @ 2009-05-05 14:22 UTC (permalink / raw)
  To: Linux Kernel Mailing List, linux-ide

With 2.6.29, when I hotplug a western digital mybook into a 'Silicon
Image, Inc. SiI 3132 Serial ATA Raid II Controller', the drive is not
found:


*** drive failed due to user induced sillyness
...
[18676.308750] sd 0:0:0:0: [sdb] Stopping disk
[18676.308770] sd 0:0:0:0: [sdb] START_STOP FAILED
[18676.308774] sd 0:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
*** drive plugged back in
[18767.821747] ata1: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0xe frozen
[18767.821756] ata1: irq_stat 0x00b40090, PHY RDY changed
[18767.821775] ata1: hard resetting link
[18774.633185] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
[18774.773591] ata1.00: applying link speed limit horkage to 1.5 Gbps
[18779.655674] ata1: hard resetting link
[18781.861074] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 10)
[18782.178565] ata1.00: configured for UDMA/100
[18782.178579] ata1: EH complete

And then nothing.  The device is still alive - if I plug it back in
through its usb interface, it comes up normally.  I know from experience
that if I was to reboot the machine, it would have come up quite nicely
from boot:

[    9.780090] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
[    9.916218] ata1.00: applying link speed limit horkage to 1.5 Gbps
[   16.981087] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 10)
[   17.291757] ata1.00: configured for UDMA/100
[   17.291874] scsi 0:0:0:0: Direct-Access     ATA      WD My Book       01.0 PQ: 0 ANSI: 5
[   17.291959] sd 0:0:0:0: [sdb] 1953525168 512-byte hardware sectors: (1.00 TB/931 GiB)
[   17.291972] sd 0:0:0:0: [sdb] Write Protect is off
[   17.291974] sd 0:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[   17.291994] sd 0:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[   17.292051] sd 0:0:0:0: [sdb] 1953525168 512-byte hardware sectors: (1.00 TB/931 GiB)
[   17.292150] sd 0:0:0:0: [sdb] Write Protect is off
[   17.292153] sd 0:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[   17.292176] sd 0:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[   17.292179]  sdb: sdb1
[   17.372927] sd 0:0:0:0: [sdb] Attached SCSI disk

This is another attempt plugging in another mybook disk:

[20662.149223] ata1: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0xe frozen
[20662.149233] ata1: irq_stat 0x00a00080, device exchanged
[20662.149252] ata1: hard resetting link
[20672.164117] ata1: softreset failed (timeout)
[20672.164139] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
[20672.164143] ata1: link online but device misclassified, retrying
[20672.164146] ata1: hard resetting link
[20674.372055] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
[20674.510116] ata1.00: applying link speed limit horkage to 1.5 Gbps
[20679.372028] ata1: hard resetting link
[20681.572066] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 10)
[20681.880986] ata1.00: configured for UDMA/100
[20681.880999] ata1: EH complete

Still no /dev/sd*

I haven't tried booting into a kernel prior to these updates:
http://lkml.org/lkml/2009/2/2/447


-- 
TimC
Yay! I have found the last bug bug bug bug bug bug bug
bug bug bug bug bug bug bug bug bug bug bug bug bug bug
bug bug bu%$@#$@#%$@#    Error: Missing Carrier Signal

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: sata_sil24 EH complete
  2009-05-05 14:22 sata_sil24 EH complete Tim Connors
@ 2009-05-06  0:32 ` Tejun Heo
  2009-05-06  0:38   ` Tim Connors
  0 siblings, 1 reply; 7+ messages in thread
From: Tejun Heo @ 2009-05-06  0:32 UTC (permalink / raw)
  To: Tim Connors; +Cc: Linux Kernel Mailing List, linux-ide

Tim Connors wrote:
> With 2.6.29, when I hotplug a western digital mybook into a 'Silicon
> Image, Inc. SiI 3132 Serial ATA Raid II Controller', the drive is not
> found:
> 
> 
> *** drive failed due to user induced sillyness
> ...
> [18676.308750] sd 0:0:0:0: [sdb] Stopping disk
> [18676.308770] sd 0:0:0:0: [sdb] START_STOP FAILED
> [18676.308774] sd 0:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
> *** drive plugged back in
> [18767.821747] ata1: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0xe frozen
> [18767.821756] ata1: irq_stat 0x00b40090, PHY RDY changed
> [18767.821775] ata1: hard resetting link
> [18774.633185] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
> [18774.773591] ata1.00: applying link speed limit horkage to 1.5 Gbps
> [18779.655674] ata1: hard resetting link
> [18781.861074] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 10)
> [18782.178565] ata1.00: configured for UDMA/100
> [18782.178579] ata1: EH complete
> 
> And then nothing.  The device is still alive - if I plug it back in
> through its usb interface, it comes up normally.  I know from experience
> that if I was to reboot the machine, it would have come up quite nicely
> from boot:

Hmm... "ata1.00: configured for UDMA/100" means libata part of
configuration went okay but it looks like SCSI probing didn't kick in
for reason.  Is this always reproducible?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: sata_sil24 EH complete
  2009-05-06  0:32 ` Tejun Heo
@ 2009-05-06  0:38   ` Tim Connors
  2009-05-08 14:11     ` Tim Connors
  0 siblings, 1 reply; 7+ messages in thread
From: Tim Connors @ 2009-05-06  0:38 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Linux Kernel Mailing List, linux-ide

On Wed, 6 May 2009, Tejun Heo wrote:

> Tim Connors wrote:
> > With 2.6.29, when I hotplug a western digital mybook into a 'Silicon
> > Image, Inc. SiI 3132 Serial ATA Raid II Controller', the drive is not
> > found:
> >
> >
> > *** drive failed due to user induced sillyness
> > ...
> > [18676.308750] sd 0:0:0:0: [sdb] Stopping disk
> > [18676.308770] sd 0:0:0:0: [sdb] START_STOP FAILED
> > [18676.308774] sd 0:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
> > *** drive plugged back in
> > [18767.821747] ata1: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0xe frozen
> > [18767.821756] ata1: irq_stat 0x00b40090, PHY RDY changed
> > [18767.821775] ata1: hard resetting link
> > [18774.633185] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
> > [18774.773591] ata1.00: applying link speed limit horkage to 1.5 Gbps
> > [18779.655674] ata1: hard resetting link
> > [18781.861074] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 10)
> > [18782.178565] ata1.00: configured for UDMA/100
> > [18782.178579] ata1: EH complete
> >
> > And then nothing.  The device is still alive - if I plug it back in
> > through its usb interface, it comes up normally.  I know from experience
> > that if I was to reboot the machine, it would have come up quite nicely
> > from boot:
>
> Hmm... "ata1.00: configured for UDMA/100" means libata part of
> configuration went okay but it looks like SCSI probing didn't kick in
> for reason.  Is this always reproducible?

100% of time if it is plugged in after boot.

-- 
TimC
The generation of random numbers is too important to be left to chance.
  -- Robert R. Coveyou, Oak Ridge National Laboratory.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: sata_sil24 EH complete
  2009-05-06  0:38   ` Tim Connors
@ 2009-05-08 14:11     ` Tim Connors
  2009-05-09  1:41       ` Tejun Heo
  0 siblings, 1 reply; 7+ messages in thread
From: Tim Connors @ 2009-05-08 14:11 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Linux Kernel Mailing List, linux-ide

On Wed, 6 May 2009, Tim Connors wrote:

> On Wed, 6 May 2009, Tejun Heo wrote:
>
> > Tim Connors wrote:
> > > With 2.6.29, when I hotplug a western digital mybook into a 'Silicon
> > > Image, Inc. SiI 3132 Serial ATA Raid II Controller', the drive is not
> > > found:
> > >
> > >
> > > *** drive failed due to user induced sillyness
> > > ...
> > > [18676.308750] sd 0:0:0:0: [sdb] Stopping disk
> > > [18676.308770] sd 0:0:0:0: [sdb] START_STOP FAILED
> > > [18676.308774] sd 0:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
> > > *** drive plugged back in
> > > [18767.821747] ata1: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0xe frozen
> > > [18767.821756] ata1: irq_stat 0x00b40090, PHY RDY changed
> > > [18767.821775] ata1: hard resetting link
> > > [18774.633185] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
> > > [18774.773591] ata1.00: applying link speed limit horkage to 1.5 Gbps
> > > [18779.655674] ata1: hard resetting link
> > > [18781.861074] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 10)
> > > [18782.178565] ata1.00: configured for UDMA/100
> > > [18782.178579] ata1: EH complete
> > >
> > > And then nothing.  The device is still alive - if I plug it back in
> > > through its usb interface, it comes up normally.  I know from experience
> > > that if I was to reboot the machine, it would have come up quite nicely
> > > from boot:
> >
> > Hmm... "ata1.00: configured for UDMA/100" means libata part of
> > configuration went okay but it looks like SCSI probing didn't kick in
> > for reason.  Is this always reproducible?
>
> 100% of time if it is plugged in after boot.

Hah, I lied.  Not 100% of the time.

With *some* combination perhaps of
> for host in 0 1 2 3 ; do ; echo "- - -" > /sys/class/scsi_host/host$host/scan ; done
and
> scsiadd -s
when the drive is plugged in and when it isn't plugged in (or maybe even
just leaving it alone) and plugging and unplugging it repeadedly,
eventually (maybe 20% of the time?) it comes good:

[278081.974683] ata1: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0xe
frozen
[278081.974689] ata1: irq_stat 0x00a00080, device exchanged
[278081.974703] ata1: limiting SATA link speed to 1.5 Gbps
[278081.974708] ata1: hard resetting link
[278091.984170] ata1: softreset failed (timeout)
[278091.984200] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 10)
[278091.984209] ata1: link online but device misclassified, retrying
[278091.984215] ata1: hard resetting link
[278094.196137] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 10)
[278094.333982] ata1.00: ATA-6: WD My Book, 01.01A01, max UDMA/133
[278094.333990] ata1.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 1)
[278094.506583] ata1.00: configured for UDMA/100
[278094.506601] ata1: EH complete
[278094.506805] scsi 0:0:0:0: Direct-Access     ATA      WD My Book
01.0 PQ: 0 ANSI: 5
[278094.508717] sd 0:0:0:0: [sde] 1953525168 512-byte hardware sectors:
(1.00 TB/931 GiB)
[278094.508752] sd 0:0:0:0: [sde] Write Protect is off
[278094.508759] sd 0:0:0:0: [sde] Mode Sense: 00 3a 00 00
[278094.508815] sd 0:0:0:0: [sde] Write cache: enabled, read cache:
enabled, doesn't support DPO or FUA
[278094.508969] sd 0:0:0:0: [sde] 1953525168 512-byte hardware sectors:
(1.00 TB/931 GiB)
[278094.509003] sd 0:0:0:0: [sde] Write Protect is off
[278094.509009] sd 0:0:0:0: [sde] Mode Sense: 00 3a 00 00
[278094.509064] sd 0:0:0:0: [sde] Write cache: enabled, read cache:
enabled, doesn't support DPO or FUA
[278094.509076]  sde: sde1
[278094.581084] sd 0:0:0:0: [sde] Attached SCSI disk

Maybe it's a bit worrying, but subsequent to this, it spontaneously
occasionally says:
[279339.156950] ata1: hard resetting link
[279341.280088] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 10)
[279341.590803] ata1.00: configured for UDMA/100
[279341.590815] ata1: EH complete
[279341.627787] sd 0:0:0:0: [sde] 1953525168 512-byte hardware sectors:
(1.00 TB/931 GiB)
[279341.698493] sd 0:0:0:0: [sde] Write Protect is off
[279341.698500] sd 0:0:0:0: [sde] Mode Sense: 00 3a 00 00
[279341.698572] sd 0:0:0:0: [sde] Write cache: enabled, read cache:
enabled, doesn't support DPO or FUA

It's done that twice in the half hour or so (as raid is recovering) that
it's been up so far.

-- 
TimC
It's the _target_ that supposed to go "F00F", not the processor.
              -- Mike Andrews, on Pentiums in missiles

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: sata_sil24 EH complete
  2009-05-08 14:11     ` Tim Connors
@ 2009-05-09  1:41       ` Tejun Heo
  2009-05-09 16:06         ` [PATCH #upstream-fixes] libata: fix attach error handling Tejun Heo
  0 siblings, 1 reply; 7+ messages in thread
From: Tejun Heo @ 2009-05-09  1:41 UTC (permalink / raw)
  To: Tim Connors; +Cc: Linux Kernel Mailing List, linux-ide

Tim Connors wrote:
>>> Hmm... "ata1.00: configured for UDMA/100" means libata part of
>>> configuration went okay but it looks like SCSI probing didn't kick in
>>> for reason.  Is this always reproducible?
>> 100% of time if it is plugged in after boot.
> 
> Hah, I lied.  Not 100% of the time.
> 
> With *some* combination perhaps of
>> for host in 0 1 2 3 ; do ; echo "- - -" > /sys/class/scsi_host/host$host/scan ; done
> and
>> scsiadd -s
> when the drive is plugged in and when it isn't plugged in (or maybe even
> just leaving it alone) and plugging and unplugging it repeadedly,
> eventually (maybe 20% of the time?) it comes good:

I can reliably reproduce it here.  Heh... interesting.  I'll dig in.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH #upstream-fixes] libata: fix attach error handling
  2009-05-09  1:41       ` Tejun Heo
@ 2009-05-09 16:06         ` Tejun Heo
  2009-05-11 18:33           ` Jeff Garzik
  0 siblings, 1 reply; 7+ messages in thread
From: Tejun Heo @ 2009-05-09 16:06 UTC (permalink / raw)
  To: Tim Connors; +Cc: Linux Kernel Mailing List, linux-ide, Jeff Garzik

New device attach path in ata_eh_revalidate_and_attach() is divided
into two separate loops because ATA requires IDENTIFY to be issued to
slave first while the user expects to see device probe messages from
the master device.  new_mask is used to track which devices are the
new ones between the first loop and the second.

This usually works well but if an error occurs during configuration
stage, ata_dev_revalidate_and_attach() returns with error code and
forgets new_mask.  On the retry run, dev->class is set and new_mask
for the device is clear, so the device just gets revalidated and thus
ends up skipping post-configuration procedure including scheduling of
SCSI_HOTPLUG for the device.  When this occurs, ATA part of probing
works fine but SCSI probing usually doesn't happen and makes the
device unreachable.

The behavior has been around for a very long time but it has been
uncovered with the recent addition of 1_5_GBPS horkage which uses
-EAGAIN return value from ata_dev_configure() to restart the probing
sequence after forcing cable speed.

This can be fixed by making sure dev->class is permanently set only
after all configurations are successfully complete.  Fix it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Tim Connors <tconnors+linuxkml@astro.swin.edu.au>
---
 drivers/ata/libata-eh.c |   21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index 0183131..eb8b940 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -2783,6 +2783,12 @@ static int ata_eh_revalidate_and_attach(struct ata_link *link,
 		} else if (dev->class == ATA_DEV_UNKNOWN &&
 			   ehc->tries[dev->devno] &&
 			   ata_class_enabled(ehc->classes[dev->devno])) {
+			/* Temporarily set dev->class, it will be
+			 * permanently set once all configurations are
+			 * complete.  This is necessary because new
+			 * device configuration is done in two
+			 * separate loops.
+			 */
 			dev->class = ehc->classes[dev->devno];
 
 			if (dev->class == ATA_DEV_PMP)
@@ -2790,6 +2796,11 @@ static int ata_eh_revalidate_and_attach(struct ata_link *link,
 			else
 				rc = ata_dev_read_id(dev, &dev->class,
 						     readid_flags, dev->id);
+
+			/* read_id might have changed class, store and reset */
+			ehc->classes[dev->devno] = dev->class;
+			dev->class = ATA_DEV_UNKNOWN;
+
 			switch (rc) {
 			case 0:
 				/* clear error info accumulated during probe */
@@ -2799,13 +2810,11 @@ static int ata_eh_revalidate_and_attach(struct ata_link *link,
 			case -ENOENT:
 				/* IDENTIFY was issued to non-existent
 				 * device.  No need to reset.  Just
-				 * thaw and kill the device.
+				 * thaw and ignore the device.
 				 */
 				ata_eh_thaw_port(ap);
-				dev->class = ATA_DEV_UNKNOWN;
 				break;
 			default:
-				dev->class = ATA_DEV_UNKNOWN;
 				goto err;
 			}
 		}
@@ -2826,11 +2835,15 @@ static int ata_eh_revalidate_and_attach(struct ata_link *link,
 		    dev->class == ATA_DEV_PMP)
 			continue;
 
+		dev->class = ehc->classes[dev->devno];
+
 		ehc->i.flags |= ATA_EHI_PRINTINFO;
 		rc = ata_dev_configure(dev);
 		ehc->i.flags &= ~ATA_EHI_PRINTINFO;
-		if (rc)
+		if (rc) {
+			dev->class = ATA_DEV_UNKNOWN;
 			goto err;
+		}
 
 		spin_lock_irqsave(ap->lock, flags);
 		ap->pflags |= ATA_PFLAG_SCSI_HOTPLUG;

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH #upstream-fixes] libata: fix attach error handling
  2009-05-09 16:06         ` [PATCH #upstream-fixes] libata: fix attach error handling Tejun Heo
@ 2009-05-11 18:33           ` Jeff Garzik
  0 siblings, 0 replies; 7+ messages in thread
From: Jeff Garzik @ 2009-05-11 18:33 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Tim Connors, Linux Kernel Mailing List, linux-ide

Tejun Heo wrote:
> New device attach path in ata_eh_revalidate_and_attach() is divided
> into two separate loops because ATA requires IDENTIFY to be issued to
> slave first while the user expects to see device probe messages from
> the master device.  new_mask is used to track which devices are the
> new ones between the first loop and the second.
> 
> This usually works well but if an error occurs during configuration
> stage, ata_dev_revalidate_and_attach() returns with error code and
> forgets new_mask.  On the retry run, dev->class is set and new_mask
> for the device is clear, so the device just gets revalidated and thus
> ends up skipping post-configuration procedure including scheduling of
> SCSI_HOTPLUG for the device.  When this occurs, ATA part of probing
> works fine but SCSI probing usually doesn't happen and makes the
> device unreachable.
> 
> The behavior has been around for a very long time but it has been
> uncovered with the recent addition of 1_5_GBPS horkage which uses
> -EAGAIN return value from ata_dev_configure() to restart the probing
> sequence after forcing cable speed.
> 
> This can be fixed by making sure dev->class is permanently set only
> after all configurations are successfully complete.  Fix it.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Reported-by: Tim Connors <tconnors+linuxkml@astro.swin.edu.au>
> ---
>  drivers/ata/libata-eh.c |   21 +++++++++++++++++----
>  1 file changed, 17 insertions(+), 4 deletions(-)

applied



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2009-05-11 18:34 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-05-05 14:22 sata_sil24 EH complete Tim Connors
2009-05-06  0:32 ` Tejun Heo
2009-05-06  0:38   ` Tim Connors
2009-05-08 14:11     ` Tim Connors
2009-05-09  1:41       ` Tejun Heo
2009-05-09 16:06         ` [PATCH #upstream-fixes] libata: fix attach error handling Tejun Heo
2009-05-11 18:33           ` Jeff Garzik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).