* sata_sil24 EH complete @ 2009-05-05 14:22 Tim Connors 2009-05-06 0:32 ` Tejun Heo 0 siblings, 1 reply; 7+ messages in thread From: Tim Connors @ 2009-05-05 14:22 UTC (permalink / raw) To: Linux Kernel Mailing List, linux-ide With 2.6.29, when I hotplug a western digital mybook into a 'Silicon Image, Inc. SiI 3132 Serial ATA Raid II Controller', the drive is not found: *** drive failed due to user induced sillyness ... [18676.308750] sd 0:0:0:0: [sdb] Stopping disk [18676.308770] sd 0:0:0:0: [sdb] START_STOP FAILED [18676.308774] sd 0:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK *** drive plugged back in [18767.821747] ata1: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0xe frozen [18767.821756] ata1: irq_stat 0x00b40090, PHY RDY changed [18767.821775] ata1: hard resetting link [18774.633185] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 0) [18774.773591] ata1.00: applying link speed limit horkage to 1.5 Gbps [18779.655674] ata1: hard resetting link [18781.861074] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 10) [18782.178565] ata1.00: configured for UDMA/100 [18782.178579] ata1: EH complete And then nothing. The device is still alive - if I plug it back in through its usb interface, it comes up normally. I know from experience that if I was to reboot the machine, it would have come up quite nicely from boot: [ 9.780090] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 0) [ 9.916218] ata1.00: applying link speed limit horkage to 1.5 Gbps [ 16.981087] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 10) [ 17.291757] ata1.00: configured for UDMA/100 [ 17.291874] scsi 0:0:0:0: Direct-Access ATA WD My Book 01.0 PQ: 0 ANSI: 5 [ 17.291959] sd 0:0:0:0: [sdb] 1953525168 512-byte hardware sectors: (1.00 TB/931 GiB) [ 17.291972] sd 0:0:0:0: [sdb] Write Protect is off [ 17.291974] sd 0:0:0:0: [sdb] Mode Sense: 00 3a 00 00 [ 17.291994] sd 0:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 17.292051] sd 0:0:0:0: [sdb] 1953525168 512-byte hardware sectors: (1.00 TB/931 GiB) [ 17.292150] sd 0:0:0:0: [sdb] Write Protect is off [ 17.292153] sd 0:0:0:0: [sdb] Mode Sense: 00 3a 00 00 [ 17.292176] sd 0:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 17.292179] sdb: sdb1 [ 17.372927] sd 0:0:0:0: [sdb] Attached SCSI disk This is another attempt plugging in another mybook disk: [20662.149223] ata1: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0xe frozen [20662.149233] ata1: irq_stat 0x00a00080, device exchanged [20662.149252] ata1: hard resetting link [20672.164117] ata1: softreset failed (timeout) [20672.164139] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 0) [20672.164143] ata1: link online but device misclassified, retrying [20672.164146] ata1: hard resetting link [20674.372055] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 0) [20674.510116] ata1.00: applying link speed limit horkage to 1.5 Gbps [20679.372028] ata1: hard resetting link [20681.572066] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 10) [20681.880986] ata1.00: configured for UDMA/100 [20681.880999] ata1: EH complete Still no /dev/sd* I haven't tried booting into a kernel prior to these updates: http://lkml.org/lkml/2009/2/2/447 -- TimC Yay! I have found the last bug bug bug bug bug bug bug bug bug bug bug bug bug bug bug bug bug bug bug bug bug bug bug bu%$@#$@#%$@# Error: Missing Carrier Signal ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: sata_sil24 EH complete 2009-05-05 14:22 sata_sil24 EH complete Tim Connors @ 2009-05-06 0:32 ` Tejun Heo 2009-05-06 0:38 ` Tim Connors 0 siblings, 1 reply; 7+ messages in thread From: Tejun Heo @ 2009-05-06 0:32 UTC (permalink / raw) To: Tim Connors; +Cc: Linux Kernel Mailing List, linux-ide Tim Connors wrote: > With 2.6.29, when I hotplug a western digital mybook into a 'Silicon > Image, Inc. SiI 3132 Serial ATA Raid II Controller', the drive is not > found: > > > *** drive failed due to user induced sillyness > ... > [18676.308750] sd 0:0:0:0: [sdb] Stopping disk > [18676.308770] sd 0:0:0:0: [sdb] START_STOP FAILED > [18676.308774] sd 0:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK > *** drive plugged back in > [18767.821747] ata1: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0xe frozen > [18767.821756] ata1: irq_stat 0x00b40090, PHY RDY changed > [18767.821775] ata1: hard resetting link > [18774.633185] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 0) > [18774.773591] ata1.00: applying link speed limit horkage to 1.5 Gbps > [18779.655674] ata1: hard resetting link > [18781.861074] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 10) > [18782.178565] ata1.00: configured for UDMA/100 > [18782.178579] ata1: EH complete > > And then nothing. The device is still alive - if I plug it back in > through its usb interface, it comes up normally. I know from experience > that if I was to reboot the machine, it would have come up quite nicely > from boot: Hmm... "ata1.00: configured for UDMA/100" means libata part of configuration went okay but it looks like SCSI probing didn't kick in for reason. Is this always reproducible? Thanks. -- tejun ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: sata_sil24 EH complete 2009-05-06 0:32 ` Tejun Heo @ 2009-05-06 0:38 ` Tim Connors 2009-05-08 14:11 ` Tim Connors 0 siblings, 1 reply; 7+ messages in thread From: Tim Connors @ 2009-05-06 0:38 UTC (permalink / raw) To: Tejun Heo; +Cc: Linux Kernel Mailing List, linux-ide On Wed, 6 May 2009, Tejun Heo wrote: > Tim Connors wrote: > > With 2.6.29, when I hotplug a western digital mybook into a 'Silicon > > Image, Inc. SiI 3132 Serial ATA Raid II Controller', the drive is not > > found: > > > > > > *** drive failed due to user induced sillyness > > ... > > [18676.308750] sd 0:0:0:0: [sdb] Stopping disk > > [18676.308770] sd 0:0:0:0: [sdb] START_STOP FAILED > > [18676.308774] sd 0:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK > > *** drive plugged back in > > [18767.821747] ata1: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0xe frozen > > [18767.821756] ata1: irq_stat 0x00b40090, PHY RDY changed > > [18767.821775] ata1: hard resetting link > > [18774.633185] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 0) > > [18774.773591] ata1.00: applying link speed limit horkage to 1.5 Gbps > > [18779.655674] ata1: hard resetting link > > [18781.861074] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 10) > > [18782.178565] ata1.00: configured for UDMA/100 > > [18782.178579] ata1: EH complete > > > > And then nothing. The device is still alive - if I plug it back in > > through its usb interface, it comes up normally. I know from experience > > that if I was to reboot the machine, it would have come up quite nicely > > from boot: > > Hmm... "ata1.00: configured for UDMA/100" means libata part of > configuration went okay but it looks like SCSI probing didn't kick in > for reason. Is this always reproducible? 100% of time if it is plugged in after boot. -- TimC The generation of random numbers is too important to be left to chance. -- Robert R. Coveyou, Oak Ridge National Laboratory. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: sata_sil24 EH complete 2009-05-06 0:38 ` Tim Connors @ 2009-05-08 14:11 ` Tim Connors 2009-05-09 1:41 ` Tejun Heo 0 siblings, 1 reply; 7+ messages in thread From: Tim Connors @ 2009-05-08 14:11 UTC (permalink / raw) To: Tejun Heo; +Cc: Linux Kernel Mailing List, linux-ide On Wed, 6 May 2009, Tim Connors wrote: > On Wed, 6 May 2009, Tejun Heo wrote: > > > Tim Connors wrote: > > > With 2.6.29, when I hotplug a western digital mybook into a 'Silicon > > > Image, Inc. SiI 3132 Serial ATA Raid II Controller', the drive is not > > > found: > > > > > > > > > *** drive failed due to user induced sillyness > > > ... > > > [18676.308750] sd 0:0:0:0: [sdb] Stopping disk > > > [18676.308770] sd 0:0:0:0: [sdb] START_STOP FAILED > > > [18676.308774] sd 0:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK > > > *** drive plugged back in > > > [18767.821747] ata1: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0xe frozen > > > [18767.821756] ata1: irq_stat 0x00b40090, PHY RDY changed > > > [18767.821775] ata1: hard resetting link > > > [18774.633185] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 0) > > > [18774.773591] ata1.00: applying link speed limit horkage to 1.5 Gbps > > > [18779.655674] ata1: hard resetting link > > > [18781.861074] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 10) > > > [18782.178565] ata1.00: configured for UDMA/100 > > > [18782.178579] ata1: EH complete > > > > > > And then nothing. The device is still alive - if I plug it back in > > > through its usb interface, it comes up normally. I know from experience > > > that if I was to reboot the machine, it would have come up quite nicely > > > from boot: > > > > Hmm... "ata1.00: configured for UDMA/100" means libata part of > > configuration went okay but it looks like SCSI probing didn't kick in > > for reason. Is this always reproducible? > > 100% of time if it is plugged in after boot. Hah, I lied. Not 100% of the time. With *some* combination perhaps of > for host in 0 1 2 3 ; do ; echo "- - -" > /sys/class/scsi_host/host$host/scan ; done and > scsiadd -s when the drive is plugged in and when it isn't plugged in (or maybe even just leaving it alone) and plugging and unplugging it repeadedly, eventually (maybe 20% of the time?) it comes good: [278081.974683] ata1: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0xe frozen [278081.974689] ata1: irq_stat 0x00a00080, device exchanged [278081.974703] ata1: limiting SATA link speed to 1.5 Gbps [278081.974708] ata1: hard resetting link [278091.984170] ata1: softreset failed (timeout) [278091.984200] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 10) [278091.984209] ata1: link online but device misclassified, retrying [278091.984215] ata1: hard resetting link [278094.196137] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 10) [278094.333982] ata1.00: ATA-6: WD My Book, 01.01A01, max UDMA/133 [278094.333990] ata1.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 1) [278094.506583] ata1.00: configured for UDMA/100 [278094.506601] ata1: EH complete [278094.506805] scsi 0:0:0:0: Direct-Access ATA WD My Book 01.0 PQ: 0 ANSI: 5 [278094.508717] sd 0:0:0:0: [sde] 1953525168 512-byte hardware sectors: (1.00 TB/931 GiB) [278094.508752] sd 0:0:0:0: [sde] Write Protect is off [278094.508759] sd 0:0:0:0: [sde] Mode Sense: 00 3a 00 00 [278094.508815] sd 0:0:0:0: [sde] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [278094.508969] sd 0:0:0:0: [sde] 1953525168 512-byte hardware sectors: (1.00 TB/931 GiB) [278094.509003] sd 0:0:0:0: [sde] Write Protect is off [278094.509009] sd 0:0:0:0: [sde] Mode Sense: 00 3a 00 00 [278094.509064] sd 0:0:0:0: [sde] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [278094.509076] sde: sde1 [278094.581084] sd 0:0:0:0: [sde] Attached SCSI disk Maybe it's a bit worrying, but subsequent to this, it spontaneously occasionally says: [279339.156950] ata1: hard resetting link [279341.280088] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 10) [279341.590803] ata1.00: configured for UDMA/100 [279341.590815] ata1: EH complete [279341.627787] sd 0:0:0:0: [sde] 1953525168 512-byte hardware sectors: (1.00 TB/931 GiB) [279341.698493] sd 0:0:0:0: [sde] Write Protect is off [279341.698500] sd 0:0:0:0: [sde] Mode Sense: 00 3a 00 00 [279341.698572] sd 0:0:0:0: [sde] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA It's done that twice in the half hour or so (as raid is recovering) that it's been up so far. -- TimC It's the _target_ that supposed to go "F00F", not the processor. -- Mike Andrews, on Pentiums in missiles ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: sata_sil24 EH complete 2009-05-08 14:11 ` Tim Connors @ 2009-05-09 1:41 ` Tejun Heo 2009-05-09 16:06 ` [PATCH #upstream-fixes] libata: fix attach error handling Tejun Heo 0 siblings, 1 reply; 7+ messages in thread From: Tejun Heo @ 2009-05-09 1:41 UTC (permalink / raw) To: Tim Connors; +Cc: Linux Kernel Mailing List, linux-ide Tim Connors wrote: >>> Hmm... "ata1.00: configured for UDMA/100" means libata part of >>> configuration went okay but it looks like SCSI probing didn't kick in >>> for reason. Is this always reproducible? >> 100% of time if it is plugged in after boot. > > Hah, I lied. Not 100% of the time. > > With *some* combination perhaps of >> for host in 0 1 2 3 ; do ; echo "- - -" > /sys/class/scsi_host/host$host/scan ; done > and >> scsiadd -s > when the drive is plugged in and when it isn't plugged in (or maybe even > just leaving it alone) and plugging and unplugging it repeadedly, > eventually (maybe 20% of the time?) it comes good: I can reliably reproduce it here. Heh... interesting. I'll dig in. Thanks. -- tejun ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH #upstream-fixes] libata: fix attach error handling 2009-05-09 1:41 ` Tejun Heo @ 2009-05-09 16:06 ` Tejun Heo 2009-05-11 18:33 ` Jeff Garzik 0 siblings, 1 reply; 7+ messages in thread From: Tejun Heo @ 2009-05-09 16:06 UTC (permalink / raw) To: Tim Connors; +Cc: Linux Kernel Mailing List, linux-ide, Jeff Garzik New device attach path in ata_eh_revalidate_and_attach() is divided into two separate loops because ATA requires IDENTIFY to be issued to slave first while the user expects to see device probe messages from the master device. new_mask is used to track which devices are the new ones between the first loop and the second. This usually works well but if an error occurs during configuration stage, ata_dev_revalidate_and_attach() returns with error code and forgets new_mask. On the retry run, dev->class is set and new_mask for the device is clear, so the device just gets revalidated and thus ends up skipping post-configuration procedure including scheduling of SCSI_HOTPLUG for the device. When this occurs, ATA part of probing works fine but SCSI probing usually doesn't happen and makes the device unreachable. The behavior has been around for a very long time but it has been uncovered with the recent addition of 1_5_GBPS horkage which uses -EAGAIN return value from ata_dev_configure() to restart the probing sequence after forcing cable speed. This can be fixed by making sure dev->class is permanently set only after all configurations are successfully complete. Fix it. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Tim Connors <tconnors+linuxkml@astro.swin.edu.au> --- drivers/ata/libata-eh.c | 21 +++++++++++++++++---- 1 file changed, 17 insertions(+), 4 deletions(-) diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c index 0183131..eb8b940 100644 --- a/drivers/ata/libata-eh.c +++ b/drivers/ata/libata-eh.c @@ -2783,6 +2783,12 @@ static int ata_eh_revalidate_and_attach(struct ata_link *link, } else if (dev->class == ATA_DEV_UNKNOWN && ehc->tries[dev->devno] && ata_class_enabled(ehc->classes[dev->devno])) { + /* Temporarily set dev->class, it will be + * permanently set once all configurations are + * complete. This is necessary because new + * device configuration is done in two + * separate loops. + */ dev->class = ehc->classes[dev->devno]; if (dev->class == ATA_DEV_PMP) @@ -2790,6 +2796,11 @@ static int ata_eh_revalidate_and_attach(struct ata_link *link, else rc = ata_dev_read_id(dev, &dev->class, readid_flags, dev->id); + + /* read_id might have changed class, store and reset */ + ehc->classes[dev->devno] = dev->class; + dev->class = ATA_DEV_UNKNOWN; + switch (rc) { case 0: /* clear error info accumulated during probe */ @@ -2799,13 +2810,11 @@ static int ata_eh_revalidate_and_attach(struct ata_link *link, case -ENOENT: /* IDENTIFY was issued to non-existent * device. No need to reset. Just - * thaw and kill the device. + * thaw and ignore the device. */ ata_eh_thaw_port(ap); - dev->class = ATA_DEV_UNKNOWN; break; default: - dev->class = ATA_DEV_UNKNOWN; goto err; } } @@ -2826,11 +2835,15 @@ static int ata_eh_revalidate_and_attach(struct ata_link *link, dev->class == ATA_DEV_PMP) continue; + dev->class = ehc->classes[dev->devno]; + ehc->i.flags |= ATA_EHI_PRINTINFO; rc = ata_dev_configure(dev); ehc->i.flags &= ~ATA_EHI_PRINTINFO; - if (rc) + if (rc) { + dev->class = ATA_DEV_UNKNOWN; goto err; + } spin_lock_irqsave(ap->lock, flags); ap->pflags |= ATA_PFLAG_SCSI_HOTPLUG; ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH #upstream-fixes] libata: fix attach error handling 2009-05-09 16:06 ` [PATCH #upstream-fixes] libata: fix attach error handling Tejun Heo @ 2009-05-11 18:33 ` Jeff Garzik 0 siblings, 0 replies; 7+ messages in thread From: Jeff Garzik @ 2009-05-11 18:33 UTC (permalink / raw) To: Tejun Heo; +Cc: Tim Connors, Linux Kernel Mailing List, linux-ide Tejun Heo wrote: > New device attach path in ata_eh_revalidate_and_attach() is divided > into two separate loops because ATA requires IDENTIFY to be issued to > slave first while the user expects to see device probe messages from > the master device. new_mask is used to track which devices are the > new ones between the first loop and the second. > > This usually works well but if an error occurs during configuration > stage, ata_dev_revalidate_and_attach() returns with error code and > forgets new_mask. On the retry run, dev->class is set and new_mask > for the device is clear, so the device just gets revalidated and thus > ends up skipping post-configuration procedure including scheduling of > SCSI_HOTPLUG for the device. When this occurs, ATA part of probing > works fine but SCSI probing usually doesn't happen and makes the > device unreachable. > > The behavior has been around for a very long time but it has been > uncovered with the recent addition of 1_5_GBPS horkage which uses > -EAGAIN return value from ata_dev_configure() to restart the probing > sequence after forcing cable speed. > > This can be fixed by making sure dev->class is permanently set only > after all configurations are successfully complete. Fix it. > > Signed-off-by: Tejun Heo <tj@kernel.org> > Reported-by: Tim Connors <tconnors+linuxkml@astro.swin.edu.au> > --- > drivers/ata/libata-eh.c | 21 +++++++++++++++++---- > 1 file changed, 17 insertions(+), 4 deletions(-) applied ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2009-05-11 18:34 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-05-05 14:22 sata_sil24 EH complete Tim Connors 2009-05-06 0:32 ` Tejun Heo 2009-05-06 0:38 ` Tim Connors 2009-05-08 14:11 ` Tim Connors 2009-05-09 1:41 ` Tejun Heo 2009-05-09 16:06 ` [PATCH #upstream-fixes] libata: fix attach error handling Tejun Heo 2009-05-11 18:33 ` Jeff Garzik
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).