linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* link resets with SSD on AHCI
@ 2010-04-29 21:59 Olof Johansson
  2010-05-05 10:15 ` Tejun Heo
  0 siblings, 1 reply; 3+ messages in thread
From: Olof Johansson @ 2010-04-29 21:59 UTC (permalink / raw)
  To: linux-ide; +Cc: jgarzik

Hi,

I've been investigating a puzzling error here. It seems to happen on my
netbook, the chipset/controller is "82801GR/GH (ICH7 Family) SATA AHCI
Controller (rev 02)".

The problem is: Once per boot, it will pop an error:

[  282.701448] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[  282.701465] ata1.00: failed command: WRITE DMA
[  282.701492] ata1.00: cmd ca/00:00:00:ae:cc/00:00:00:00:00/e0 tag 0 dma 131072 out
[  282.701498]          res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[  282.701509] ata1.00: status: { DRDY }
[  282.701527] ata1: hard resetting link
[  283.006179] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[  283.007491] ata1.00: configured for UDMA/100
[  283.007506] ata1.00: device reported invalid CHS sector 0
[  283.007529] ata1: EH complete

This will happen only once. I've found reasonably reliable ways to
trigger it within a few minutes by running dbench (which does not stress
the disks hard). Errors are of the exact same format as above, just LBA
numbers and transfer sizes/directions differing.

Things I have tried without helping:

* acpi=off
* pci=nomsi
* running single cpu / no ht (makes it take much longer to happen but still does)
* making sure no laptop-mode hdparm tunings are done
* various other combinations of the above

I have seen it with different SSD vendors and products, as well as
possibly on another chipset but I can't confirm that at the moment.

It only happens exactly once, and never again.

Boot time messages are:

[    1.310632] ahci 0000:00:1f.2: version 3.0
[    1.310662] ahci 0000:00:1f.2: PCI INT B -> GSI 19 (level, low) -> IRQ 19
[    1.310750] ahci: SSS flag set, parallel bus scan disabled
[    1.310801] ahci 0000:00:1f.2: AHCI 0001.0100 32 slots 4 ports 3 Gbps 0x3 impl SATA mode
[    1.310810] ahci 0000:00:1f.2: flags: 64bit ncq stag pm led clo pio slum part 
[    1.310820] ahci 0000:00:1f.2: setting latency timer to 64
[...]
[    1.621051] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[    1.630878] ata1.00: ATA-7: TOSHIBA THNSA16G1P4L, A090228a, max UDMA/100
[    1.630886] ata1.00: 31309824 sectors, multi 1: LBA 
[    1.631590] ata1.00: configured for UDMA/100
[    1.643227] scsi 0:0:0:0: Direct-Access     ATA      TOSHIBA THNSA16G A090 PQ: 0 ANSI: 5
[    1.643829] sd 0:0:0:0: [sda] 31309824 512-byte logical blocks: (16.0 GB/14.9 GiB)
[    1.644000] sd 0:0:0:0: Attached scsi generic sg0 type 0
[    1.644095] sd 0:0:0:0: [sda] Write Protect is off
[    1.644105] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[    1.644198] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

I did notice that ALPM is enabled at boot, and doesn't seem to be
re-enabled after the error reset. Based on this, I experimented with
disabling it (just returning -EINVAL in ahci_enable_alpm). That did make
the problem not happen after a significant test run (overnight vs 4.5
minutes above).

Jeff, any known issues with this chipset? I tried doing a decent amount
of searching of similar issues, but besides the ones from running the
chipset in PIIX mode I'm not seeing anything out there.


-Olof


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: link resets with SSD on AHCI
  2010-04-29 21:59 link resets with SSD on AHCI Olof Johansson
@ 2010-05-05 10:15 ` Tejun Heo
  2010-05-12 18:49   ` Olof Johansson
  0 siblings, 1 reply; 3+ messages in thread
From: Tejun Heo @ 2010-05-05 10:15 UTC (permalink / raw)
  To: Olof Johansson; +Cc: linux-ide, jgarzik

[-- Attachment #1: Type: text/plain, Size: 863 bytes --]

Hello,

On 04/29/2010 11:59 PM, Olof Johansson wrote:
> I did notice that ALPM is enabled at boot, and doesn't seem to be
> re-enabled after the error reset. Based on this, I experimented with
> disabling it (just returning -EINVAL in ahci_enable_alpm). That did make
> the problem not happen after a significant test run (overnight vs 4.5
> minutes above).

It could be that libata's ALPM enable sequence isn't liked by the
controller.  libata first resets the link disabling all powersave
transitions, then turn on ALPM then allows powersave transitions.
It's possible that the controller or device somehow gets upset by this
(ie. the device is told to go to powersave mode only to find out that
the host side isn't allowing it).

Does the attached patch make any difference?  Can you please post the
kernel boot log with the patch applied?

Thanks.

-- 
tejun

[-- Attachment #2: dont-inhibit-ps-on-reset.patch --]
[-- Type: text/x-patch, Size: 948 bytes --]

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 49cffb6..696be5f 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -3810,7 +3810,7 @@ int sata_link_resume(struct ata_link *link, const unsigned long *params,
 	 * cleared.
 	 */
 	do {
-		scontrol = (scontrol & 0x0f0) | 0x300;
+		scontrol = (scontrol & 0x0f0)/* | 0x300*/;
 		if ((rc = sata_scr_write(link, SCR_CONTROL, scontrol)))
 			return rc;
 		/*
@@ -3823,9 +3823,9 @@ int sata_link_resume(struct ata_link *link, const unsigned long *params,
 		/* is SControl restored correctly? */
 		if ((rc = sata_scr_read(link, SCR_CONTROL, &scontrol)))
 			return rc;
-	} while ((scontrol & 0xf0f) != 0x300 && --tries);
+	} while ((scontrol & 0xf0f) != /*0x300*/0 && --tries);
 
-	if ((scontrol & 0xf0f) != 0x300) {
+	if ((scontrol & 0xf0f) != /*0x300*/0) {
 		ata_link_printk(link, KERN_ERR,
 				"failed to resume link (SControl %X)\n",
 				scontrol);

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: link resets with SSD on AHCI
  2010-05-05 10:15 ` Tejun Heo
@ 2010-05-12 18:49   ` Olof Johansson
  0 siblings, 0 replies; 3+ messages in thread
From: Olof Johansson @ 2010-05-12 18:49 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-ide, jgarzik

On Wed, May 05, 2010 at 12:15:06PM +0200, Tejun Heo wrote:
> Hello,
> 
> On 04/29/2010 11:59 PM, Olof Johansson wrote:
> > I did notice that ALPM is enabled at boot, and doesn't seem to be
> > re-enabled after the error reset. Based on this, I experimented with
> > disabling it (just returning -EINVAL in ahci_enable_alpm). That did make
> > the problem not happen after a significant test run (overnight vs 4.5
> > minutes above).
> 
> It could be that libata's ALPM enable sequence isn't liked by the
> controller.  libata first resets the link disabling all powersave
> transitions, then turn on ALPM then allows powersave transitions.
> It's possible that the controller or device somehow gets upset by this
> (ie. the device is told to go to powersave mode only to find out that
> the host side isn't allowing it).
> 
> Does the attached patch make any difference?  Can you please post the
> kernel boot log with the patch applied?

The patch didn't make a difference for me, but I got sidetracked looking
at some other things and didn't get a chance to collect data just yet;
it's coming.

I also got second thoughts about what actually makes it happen; and
I am now of the suspicion that it does indeed take an invocation of
the laptop-mode tools setting power settings for battery operation
(i.e. taking the link off of max_performance) for the problem to
show. Earlier I had been of the impression that it happened even with
it being left untouched, that does not seem to be the case.

As for ALPM overall: I have also noticed that the link doesn't actually
seem to be put in low-power mode with these SSD devices, since I see
now power consumption difference between having it on or off.


-Olof


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2010-05-12 18:32 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-29 21:59 link resets with SSD on AHCI Olof Johansson
2010-05-05 10:15 ` Tejun Heo
2010-05-12 18:49   ` Olof Johansson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).