Odd behaviour of device in response to idleimmediate with unload

linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Odd behaviour of device in response to idleimmediate with unload
@ 2008-11-04 10:31 Elias Oltmanns
  2008-11-04 10:40 ` Tejun Heo
  0 siblings, 1 reply; 37+ messages in thread
From: Elias Oltmanns @ 2008-11-04 10:31 UTC (permalink / raw)
  To: linux-ide, Tejun Heo, Alan Cox; +Cc: Evgeni Golov

Hi all,

apparently, we have the first case of a quirky implementation of
idleimmediate with unload feature in a device, or I'm barking up the
wrong tree, of course. Evgeni Golov has reported the following on
hdaps-devel:

Evgeni Golov <sargentd@die-welt.net> wrote:
> Hi,
>
> I have a ThinkPad Z61m with a 100GB S-ATA TOSHIBA MK1032GSX.
[...]
> Now I thought I should try the new interface and compiled 2.6.28-rc2,
[...]
> After that I could park the heads and got the following in dmesg:
>
> ata1: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0xf
> ata1: SError: { PHYRdyChg CommWake }
> ata1: hard resetting link
> ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> ata1.00: ACPI cmd ef/02:00:00:00:00:a0 succeeded
> ata1.00: ACPI cmd f5/00:00:00:00:00:a0 filtered out
> ata1.00: ACPI cmd ef/10:03:00:00:00:a0 filtered out
> ata1.00: ACPI cmd ef/02:00:00:00:00:a0 succeeded
> ata1.00: ACPI cmd f5/00:00:00:00:00:a0 filtered out
> ata1.00: ACPI cmd ef/10:03:00:00:00:a0 filtered out
> ata1.00: configured for UDMA/100
> ata1.00: configured for UDMA/100
> ata1: EH complete
> sd 0:0:0:0: [sda] 195371568 512-byte hardware sectors: (100 GB/93.1 GiB)
> sd 0:0:0:0: [sda] Write Protect is off
> sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
> sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't
> support DPO or FUA
> sd 0:0:0:0: [sda] 195371568 512-byte hardware sectors: (100 GB/93.1 GiB)
> sd 0:0:0:0: [sda] Write Protect is off
> sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
> sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't
> support DPO or FUA

The only explanation I could come up with so far is that the head park
command, for some reason or other, causes the device to set
SERR_PHYRDY_CHG and SERR_COMM_WAKE in serror, thus triggering the
handling of hotplug events. Do you have any idea what's really going on
here and what can / should be done about it?

Regards,

Elias

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Odd behaviour of device in response to idleimmediate with unload
  2008-11-04 10:31 Odd behaviour of device in response to idleimmediate with unload Elias Oltmanns
@ 2008-11-04 10:40 ` Tejun Heo
  2008-11-04 12:32   ` Evgeni Golov
  0 siblings, 1 reply; 37+ messages in thread
From: Tejun Heo @ 2008-11-04 10:40 UTC (permalink / raw)
  To: Elias Oltmanns; +Cc: linux-ide, Alan Cox, Evgeni Golov

Elias Oltmanns wrote:
> apparently, we have the first case of a quirky implementation of
> idleimmediate with unload feature in a device, or I'm barking up the
> wrong tree, of course. Evgeni Golov has reported the following on
> hdaps-devel:

Aieeeee...

> Evgeni Golov <sargentd@die-welt.net> wrote:
>> Hi,
>>
>> I have a ThinkPad Z61m with a 100GB S-ATA TOSHIBA MK1032GSX.
> [...]
>> Now I thought I should try the new interface and compiled 2.6.28-rc2,
> [...]
>> After that I could park the heads and got the following in dmesg:
>>
>> ata1: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0xf
>> ata1: SError: { PHYRdyChg CommWake }
>> ata1: hard resetting link
>> ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>> ata1.00: ACPI cmd ef/02:00:00:00:00:a0 succeeded
>> ata1.00: ACPI cmd f5/00:00:00:00:00:a0 filtered out
>> ata1.00: ACPI cmd ef/10:03:00:00:00:a0 filtered out
>> ata1.00: ACPI cmd ef/02:00:00:00:00:a0 succeeded
>> ata1.00: ACPI cmd f5/00:00:00:00:00:a0 filtered out
>> ata1.00: ACPI cmd ef/10:03:00:00:00:a0 filtered out
>> ata1.00: configured for UDMA/100
>> ata1.00: configured for UDMA/100
>> ata1: EH complete
>> sd 0:0:0:0: [sda] 195371568 512-byte hardware sectors: (100 GB/93.1 GiB)
>> sd 0:0:0:0: [sda] Write Protect is off
>> sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
>> sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't
>> support DPO or FUA
>> sd 0:0:0:0: [sda] 195371568 512-byte hardware sectors: (100 GB/93.1 GiB)
>> sd 0:0:0:0: [sda] Write Protect is off
>> sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
>> sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't
>> support DPO or FUA
> 
> The only explanation I could come up with so far is that the head park
> command, for some reason or other, causes the device to set
> SERR_PHYRDY_CHG and SERR_COMM_WAKE in serror, thus triggering the
> handling of hotplug events. Do you have any idea what's really going on
> here and what can / should be done about it?

Is it a laptop?  Does 'hdparm -y' cause the same thing?  Can you post
"hdparm -I"?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Odd behaviour of device in response to idleimmediate with unload
  2008-11-04 10:40 ` Tejun Heo
@ 2008-11-04 12:32   ` Evgeni Golov
  2008-11-04 17:06     ` Mark Lord
  0 siblings, 1 reply; 37+ messages in thread
From: Evgeni Golov @ 2008-11-04 12:32 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Elias Oltmanns, linux-ide, Alan Cox

Hi,

On Tue, 04 Nov 2008 19:40:43 +0900 Tejun Heo wrote:

> Is it a laptop?

Yes it is, it's a IBM/Lenovo Thinkpad Z61m (9453A11)

> Does 'hdparm -y' cause the same thing?

No, neither -y nor -Y cause these messages.

> Can you post "hdparm -I"?

Here we go:


/dev/sda:

ATA device, with non-removable media
	Model Number:       TOSHIBA MK1032GSX                       
	Serial Number:      MYSERIAL:)
	Firmware Revision:  AS024E  
Standards:
	Supported: 6 5 4 
	Likely used: 6
Configuration:
	Logical		max	current
	cylinders	16383	16383
	heads		16	16
	sectors/track	63	63
	--
	CHS current addressable sectors:   16514064
	LBA    user addressable sectors:  195371568
	LBA48  user addressable sectors:  195371568
	device size with M = 1024*1024:       95396 MBytes
	device size with M = 1000*1000:      100030 MBytes (100 GB)
Capabilities:
	LBA, IORDY(can be disabled)
	Standby timer values: spec'd by Standard, no device specific minimum
	R/W multiple sector transfer: Max = 16	Current = 16
	Advanced power management level: 128
	DMA: sdma0 sdma1 sdma2 mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 *udma5 
	     Cycle time: min=120ns recommended=120ns
	PIO: pio0 pio1 pio2 pio3 pio4 
	     Cycle time: no flow control=120ns  IORDY flow control=120ns
Commands/features:
	Enabled	Supported:
	   *	SMART feature set
	    	Security Mode feature set
	   *	Power Management feature set
	   *	Write cache
	   *	Look-ahead
	   *	Host Protected Area feature set
	   *	WRITE_BUFFER command
	   *	READ_BUFFER command
	   *	NOP cmd
	   *	DOWNLOAD_MICROCODE
	   *	Advanced Power Management feature set
	    	SET_MAX security extension
	   *	48-bit Address feature set
	   *	Device Configuration Overlay feature set
	   *	Mandatory FLUSH_CACHE
	   *	FLUSH_CACHE_EXT
	   *	SMART error logging
	   *	SMART self-test
	   *	General Purpose Logging feature set
	   *	IDLE_IMMEDIATE with UNLOAD
	   *	SATA-I signaling speed (1.5Gb/s)
	   *	Host-initiated interface power management
	   *	Phy event counters
	   *	Device-initiated interface power management
	   *	Software settings preservation
Security: 
	Master password revision code = 65534
		supported
	not	enabled
	not	locked
		frozen
	not	expired: security count
	not	supported: enhanced erase
	74min for SECURITY ERASE UNIT. 
Checksum: correct

HTH
Evgeni

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Odd behaviour of device in response to idleimmediate with unload
  2008-11-04 12:32   ` Evgeni Golov
@ 2008-11-04 17:06     ` Mark Lord
  2008-11-04 17:18       ` Mark Lord
  0 siblings, 1 reply; 37+ messages in thread
From: Mark Lord @ 2008-11-04 17:06 UTC (permalink / raw)
  To: Evgeni Golov; +Cc: Tejun Heo, Elias Oltmanns, linux-ide, Alan Cox

Evgeni Golov wrote:
> Hi,
> 
> On Tue, 04 Nov 2008 19:40:43 +0900 Tejun Heo wrote:
> 
>> Is it a laptop?
> 
> Yes it is, it's a IBM/Lenovo Thinkpad Z61m (9453A11)
> 
>> Does 'hdparm -y' cause the same thing?
> 
> No, neither -y nor -Y cause these messages.
..

Well, -y does a simple ATA "IDLE IMMEDIATE" command.

So, what command is being used to cause the problem??

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Odd behaviour of device in response to idleimmediate with unload
  2008-11-04 17:06     ` Mark Lord
@ 2008-11-04 17:18       ` Mark Lord
  2008-11-04 17:47         ` Mark Lord
  0 siblings, 1 reply; 37+ messages in thread
From: Mark Lord @ 2008-11-04 17:18 UTC (permalink / raw)
  To: Evgeni Golov; +Cc: Tejun Heo, Elias Oltmanns, linux-ide, Alan Cox

Mark Lord wrote:
> Evgeni Golov wrote:
>> Hi,
>>
>> On Tue, 04 Nov 2008 19:40:43 +0900 Tejun Heo wrote:
>>
>>> Is it a laptop?
>>
>> Yes it is, it's a IBM/Lenovo Thinkpad Z61m (9453A11)
>>
>>> Does 'hdparm -y' cause the same thing?
>>
>> No, neither -y nor -Y cause these messages.
> ..
> 
> Well, -y does a simple ATA "IDLE IMMEDIATE" command.
> 
> So, what command is being used to cause the problem??
..

Duh.. sorry about that:  IDLE_IMMEDIATE_WITH_UNLOAD.

I'll add that to hdparm for v9.3, eventually.

Cheers

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Odd behaviour of device in response to idleimmediate with unload
  2008-11-04 17:18       ` Mark Lord
@ 2008-11-04 17:47         ` Mark Lord
  2008-11-04 18:13           ` Mark Lord
  0 siblings, 1 reply; 37+ messages in thread
From: Mark Lord @ 2008-11-04 17:47 UTC (permalink / raw)
  To: Evgeni Golov; +Cc: Tejun Heo, Elias Oltmanns, linux-ide, Alan Cox

Mark Lord wrote:
> Mark Lord wrote:
>> Evgeni Golov wrote:
>>> Hi,
>>>
>>> On Tue, 04 Nov 2008 19:40:43 +0900 Tejun Heo wrote:
>>>
>>>> Is it a laptop?
>>>
>>> Yes it is, it's a IBM/Lenovo Thinkpad Z61m (9453A11)
>>>
>>>> Does 'hdparm -y' cause the same thing?
>>>
>>> No, neither -y nor -Y cause these messages.
>> ..
>>
>> Well, -y does a simple ATA "IDLE IMMEDIATE" command.
..

EEeek.. actually, no it doesn't.  It issues a STANDBY IMMEDIATE.

I'm releasing hdparm-9.3 in the next few minutes,
with new --idle and --idle-unload flags to issue
the commands in question here.

Cheers

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Odd behaviour of device in response to idleimmediate with unload
  2008-11-04 17:47         ` Mark Lord
@ 2008-11-04 18:13           ` Mark Lord
  2008-11-04 18:54             ` Evgeni Golov
  0 siblings, 1 reply; 37+ messages in thread
From: Mark Lord @ 2008-11-04 18:13 UTC (permalink / raw)
  To: Evgeni Golov; +Cc: Tejun Heo, Elias Oltmanns, linux-ide, Alan Cox

Mark Lord wrote:
> ..
> I'm releasing hdparm-9.3 in the next few minutes,
> with new --idle and --idle-unload flags to issue
> the commands in question here.
..

Okay, hdparm-9.3 is now out in the wild (sourceforge),
and has  --idle-immediate and --idle-unload  flags now,
so it can be used to help debug/test this problem.

Cheers

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Odd behaviour of device in response to idleimmediate with unload
  2008-11-04 18:13           ` Mark Lord
@ 2008-11-04 18:54             ` Evgeni Golov
  2008-11-04 19:39               ` Mark Lord
  0 siblings, 1 reply; 37+ messages in thread
From: Evgeni Golov @ 2008-11-04 18:54 UTC (permalink / raw)
  To: Mark Lord; +Cc: Tejun Heo, Elias Oltmanns, linux-ide, Alan Cox

[-- Attachment #1: Type: text/plain, Size: 404 bytes --]

On Tue, 04 Nov 2008 13:13:16 -0500 Mark Lord wrote:

> Okay, hdparm-9.3 is now out in the wild (sourceforge),
> and has  --idle-immediate and --idle-unload  flags now,
> so it can be used to help debug/test this problem.

Okay, got it, built it.
Neither --idle-immediate nor --idle-immediate brings up the reset,
echo 1000 > /sys/block/sda/device/unload_heads does.

Regards
Evgeni, puzzled.

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Odd behaviour of device in response to idleimmediate with unload
  2008-11-04 18:54             ` Evgeni Golov
@ 2008-11-04 19:39               ` Mark Lord
  2008-11-05  9:32                 ` Tejun Heo
  0 siblings, 1 reply; 37+ messages in thread
From: Mark Lord @ 2008-11-04 19:39 UTC (permalink / raw)
  To: Evgeni Golov; +Cc: Tejun Heo, Elias Oltmanns, linux-ide, Alan Cox

Evgeni Golov wrote:
> On Tue, 04 Nov 2008 13:13:16 -0500 Mark Lord wrote:
> 
>> Okay, hdparm-9.3 is now out in the wild (sourceforge),
>> and has  --idle-immediate and --idle-unload  flags now,
>> so it can be used to help debug/test this problem.
> 
> Okay, got it, built it.
> Neither --idle-immediate nor --idle-immediate brings up the reset,
> echo 1000 > /sys/block/sda/device/unload_heads does.
..

Mmmm.. okay, this is new stuff in 2.6.28,
and it appears to just issue a --idle-unload equivalent after a delay.

But it does it from within libata-eh, so I suppose there must
be some confusion in there somewhere.

So it's up to Tejun now, I suppose.

Cheers

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Odd behaviour of device in response to idleimmediate with unload
  2008-11-04 19:39               ` Mark Lord
@ 2008-11-05  9:32                 ` Tejun Heo
  2008-11-05 13:47                   ` Elias Oltmanns
  0 siblings, 1 reply; 37+ messages in thread
From: Tejun Heo @ 2008-11-05  9:32 UTC (permalink / raw)
  To: Mark Lord; +Cc: Evgeni Golov, Elias Oltmanns, linux-ide, Alan Cox

[-- Attachment #1: Type: text/plain, Size: 760 bytes --]

Mark Lord wrote:
> Evgeni Golov wrote:
>> On Tue, 04 Nov 2008 13:13:16 -0500 Mark Lord wrote:
>>
>>> Okay, hdparm-9.3 is now out in the wild (sourceforge),
>>> and has  --idle-immediate and --idle-unload  flags now,
>>> so it can be used to help debug/test this problem.
>>
>> Okay, got it, built it.
>> Neither --idle-immediate nor --idle-immediate brings up the reset,
>> echo 1000 > /sys/block/sda/device/unload_heads does.
> ..
> 
> Mmmm.. okay, this is new stuff in 2.6.28,
> and it appears to just issue a --idle-unload equivalent after a delay.
> 
> But it does it from within libata-eh, so I suppose there must
> be some confusion in there somewhere.
> 
> So it's up to Tejun now, I suppose.

Hmmm... maybe garbage values in unused TF regs?

-- 
tejun

[-- Attachment #2: idleimm-dbg.patch --]
[-- Type: text/x-patch, Size: 556 bytes --]

diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index 8077bdf..f0f3d11 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -2649,7 +2649,7 @@ static void ata_eh_park_issue_cmd(struct ata_device *dev, int park)
 		tf.command = ATA_CMD_CHK_POWER;
 	}
 
-	tf.flags |= ATA_TFLAG_DEVICE | ATA_TFLAG_ISADDR;
+	tf.flags |= ATA_TFLAG_DEVICE | ATA_TFLAG_ISADDR | ATA_TFLAG_DEVICE;
 	tf.protocol |= ATA_PROT_NODATA;
 	err_mask = ata_exec_internal(dev, &tf, NULL, DMA_NONE, NULL, 0, 0);
 	if (park && (err_mask || tf.lbal != 0xc4)) {

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: Odd behaviour of device in response to idleimmediate with unload
  2008-11-05  9:32                 ` Tejun Heo
@ 2008-11-05 13:47                   ` Elias Oltmanns
  2008-11-05 14:08                     ` Tejun Heo
  0 siblings, 1 reply; 37+ messages in thread
From: Elias Oltmanns @ 2008-11-05 13:47 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Mark Lord, Evgeni Golov, linux-ide, Alan Cox

Tejun Heo <tj@kernel.org> wrote:
> Mark Lord wrote:
>> Evgeni Golov wrote:
>
>>> On Tue, 04 Nov 2008 13:13:16 -0500 Mark Lord wrote:
>>>
>>>> Okay, hdparm-9.3 is now out in the wild (sourceforge),
>>>> and has  --idle-immediate and --idle-unload  flags now,
>>>> so it can be used to help debug/test this problem.
>>>
>>> Okay, got it, built it.
>>> Neither --idle-immediate nor --idle-immediate brings up the reset,
>>> echo 1000 > /sys/block/sda/device/unload_heads does.

You really have tested --idle-unload as well, I suppose.

[...]
> Hmmm... maybe garbage values in unused TF regs?
[...]
> diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
> index 8077bdf..f0f3d11 100644
> --- a/drivers/ata/libata-eh.c
> +++ b/drivers/ata/libata-eh.c
> @@ -2649,7 +2649,7 @@ static void ata_eh_park_issue_cmd(struct ata_device *dev, int park)
>  		tf.command = ATA_CMD_CHK_POWER;
>  	}
> 
> -	tf.flags |= ATA_TFLAG_DEVICE | ATA_TFLAG_ISADDR;
> +	tf.flags |= ATA_TFLAG_DEVICE | ATA_TFLAG_ISADDR | ATA_TFLAG_DEVICE;

After my embarrassing failure to parse a macro correctly a few days
back, I've got to be careful ;-). But still, what good does that patch
do? It doesn't really change anything, does it?

As a wild guess, I'm wondering whether ata_eh_revalidate_and_attach()
has anything to do with it. Unless you have a better suggestion, perhaps
the following debug patch would give some useful information.

Regards,

Elias
---
diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index 8077bdf..26dc4d9 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -3020,6 +3020,7 @@ int ata_eh_recover(struct ata_port *ap, ata_prereset_fn_t prereset,
 	struct ata_device *dev;
 	int nr_failed_devs;
 	int rc;
+	int dev_parked = 0;
 	unsigned long flags, deadline;
 
 	DPRINTK("ENTER\n");
@@ -3123,6 +3124,7 @@ int ata_eh_recover(struct ata_port *ap, ata_prereset_fn_t prereset,
 					continue;
 
 				ata_eh_park_issue_cmd(dev, 1);
+				dev_parked = 1;
 			}
 		}
 
@@ -3135,10 +3137,17 @@ int ata_eh_recover(struct ata_port *ap, ata_prereset_fn_t prereset,
 	} while (deadline);
 	ata_port_for_each_link(link, ap) {
 		ata_link_for_each_dev(dev, link) {
+			u32 serror;
+
 			if (!(link->eh_context.unloaded_mask &
 			      (1 << dev->devno)))
 				continue;
 
+			sata_scr_read(link, SCR_ERROR, &serror);
+			ata_dev_printk(dev, KERN_INFO,
+				       "SError: 0x%x, hotplug: %d\n",
+				       serror,
+				       ap->pflags & ATA_PFLAG_SCSI_HOTPLUG ? 1 : 0);
 			ata_eh_park_issue_cmd(dev, 0);
 			ata_eh_done(link, dev, ATA_EH_PARK);
 		}
@@ -3203,6 +3212,9 @@ dev_fail:
 		}
 	}
 
+	if (dev_parked)
+		ata_port_printk(ap, KERN_INFO, "hotplug: %d, nr_failed: %d\n",
+				ap->pflags & ATA_PFLAG_SCSI_HOTPLUG ? 1 : 0, nr_failed_devs);
 	if (nr_failed_devs)
 		goto retry;
 

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: Odd behaviour of device in response to idleimmediate with unload
  2008-11-05 13:47                   ` Elias Oltmanns
@ 2008-11-05 14:08                     ` Tejun Heo
  2008-11-05 18:55                       ` Elias Oltmanns
                                         ` (2 more replies)
  0 siblings, 3 replies; 37+ messages in thread
From: Tejun Heo @ 2008-11-05 14:08 UTC (permalink / raw)
  To: Elias Oltmanns; +Cc: Mark Lord, Evgeni Golov, linux-ide, Alan Cox

Elias Oltmanns wrote:
>> -	tf.flags |= ATA_TFLAG_DEVICE | ATA_TFLAG_ISADDR;
>> +	tf.flags |= ATA_TFLAG_DEVICE | ATA_TFLAG_ISADDR | ATA_TFLAG_DEVICE;
> 
> After my embarrassing failure to parse a macro correctly a few days
> back, I've got to be careful ;-). But still, what good does that patch
> do? It doesn't really change anything, does it?

Hehheh, I suppose it's my turn for embarrassment.

> As a wild guess, I'm wondering whether ata_eh_revalidate_and_attach()
> has anything to do with it. Unless you have a better suggestion, perhaps
> the following debug patch would give some useful information.

I don't have much idea at this point.  To the drive, it shouldn't look
any different.  Ah... it's ata_piix, right?  ata_piix doesn't have PHY
event IRQ, so it could be that the command issued by hdparm did trigger
PHY event but didn't get noticed by EH while the condition triggered by
IDLE IMMEDIATE did.  One way to find out would be adding SCR print outs
on command completion.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Odd behaviour of device in response to idleimmediate with unload
  2008-11-05 14:08                     ` Tejun Heo
@ 2008-11-05 18:55                       ` Elias Oltmanns
  2008-11-06 11:23                         ` Evgeni Golov
  2008-11-05 19:34                       ` Evgeni Golov
  2008-11-06 11:41                       ` Elias Oltmanns
  2 siblings, 1 reply; 37+ messages in thread
From: Elias Oltmanns @ 2008-11-05 18:55 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Mark Lord, Evgeni Golov, linux-ide, Alan Cox

Tejun Heo <tj@kernel.org> wrote:
> Elias Oltmanns wrote:
[...]
>> As a wild guess, I'm wondering whether ata_eh_revalidate_and_attach()
>> has anything to do with it. Unless you have a better suggestion, perhaps
>> the following debug patch would give some useful information.
>
> I don't have much idea at this point.  To the drive, it shouldn't look
> any different.  Ah... it's ata_piix, right?  ata_piix doesn't have PHY
> event IRQ, so it could be that the command issued by hdparm did trigger
> PHY event but didn't get noticed by EH while the condition triggered by
> IDLE IMMEDIATE did.  One way to find out would be adding SCR print outs
> on command completion.

Right. Evgeni, could you please apply the following patch (2.6.28-rc3 or
thereabouts) and do the following in that order? Please report what has
been logged in dmesg:

# hdparm --idle-immediate /dev/sda
# hdparm --idle-unload /dev/sda
# hdparm --idle-immediate /dev/sda
# echo 1000 > /sys/block/sda/device/unload_heads
# hdparm --idle-immediate /dev/sda

The last one is only meant to verify that it behaves like the first.

Regards,

Elias
---
diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 622350d..4948433 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -4683,6 +4683,8 @@ void ata_qc_complete(struct ata_queued_cmd *qc)
 	if (ap->ops->error_handler) {
 		struct ata_device *dev = qc->dev;
 		struct ata_eh_info *ehi = &dev->link->eh_info;
+		u32 serror;
+		int rc;
 
 		WARN_ON(ap->pflags & ATA_PFLAG_FROZEN);
 
@@ -4721,6 +4723,16 @@ void ata_qc_complete(struct ata_queued_cmd *qc)
 		case ATA_CMD_SLEEP:
 			dev->flags |= ATA_DFLAG_SLEEPING;
 			break;
+
+		case ATA_CMD_IDLEIMMEDIATE:
+			rc = sata_scr_read(dev->link, SCR_ERROR, &serror);
+			if (!rc)
+				ata_dev_printk(dev, KERN_INFO,
+					       "SError: 0x%x\n", serror);
+			else
+				ata_dev_printk(dev, KERN_INFO,
+					       "Couldn't read SError: %d\n", rc);
+			break;
 		}
 
 		if (unlikely(dev->flags & ATA_DFLAG_DUBIOUS_XFER))

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: Odd behaviour of device in response to idleimmediate with unload
  2008-11-05 18:55                       ` Elias Oltmanns
@ 2008-11-06 11:23                         ` Evgeni Golov
  2008-11-06 12:12                           ` Elias Oltmanns
  0 siblings, 1 reply; 37+ messages in thread
From: Evgeni Golov @ 2008-11-06 11:23 UTC (permalink / raw)
  To: Elias Oltmanns; +Cc: Tejun Heo, Mark Lord, linux-ide, Alan Cox

On Wed, 05 Nov 2008 19:55:44 +0100 Elias Oltmanns wrote:

> # hdparm --idle-immediate /dev/sda
> # hdparm --idle-unload /dev/sda
> # hdparm --idle-immediate /dev/sda
> # echo 1000 > /sys/block/sda/device/unload_heads
> # hdparm --idle-immediate /dev/sda

Here come my test results:

shinkupaddo# ./hdparm --idle-immediate /dev/sda

/dev/sda:
 issuing idle_immediate command
shinkupaddo# dmesg -c
ata1.00: SError: 0x0


shinkupaddo# ./hdparm --idle-unload /dev/sda

/dev/sda:
 issuing idle_immediate_unload command
shinkupaddo# dmesg -c
ata1.00: SError: 0x0


shinkupaddo# ./hdparm --idle-immediate /dev/sda

/dev/sda:
 issuing idle_immediate command
shinkupaddo# dmesg -c 
ata1.00: SError: 0x0

shinkupaddo# echo 1000 > /sys/block/sda/device/unload_heads 
shinkupaddo# dmesg -c
ata1: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0xf
ata1: SError: { PHYRdyChg CommWake }
ata1: hard resetting link
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata1.00: SError: 0x0
ata1.00: ACPI cmd ef/02:00:00:00:00:a0 succeeded
ata1.00: ACPI cmd f5/00:00:00:00:00:a0 filtered out
ata1.00: ACPI cmd ef/10:03:00:00:00:a0 filtered out
ata1.00: ACPI cmd ef/02:00:00:00:00:a0 succeeded
ata1.00: ACPI cmd f5/00:00:00:00:00:a0 filtered out
ata1.00: ACPI cmd ef/10:03:00:00:00:a0 filtered out
ata1.00: configured for UDMA/100
ata1.00: configured for UDMA/100
ata1: EH complete
sd 0:0:0:0: [sda] 195371568 512-byte hardware sectors: (100 GB/93.1 GiB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sd 0:0:0:0: [sda] 195371568 512-byte hardware sectors: (100 GB/93.1 GiB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA


shinkupaddo# ./hdparm --idle-immediate /dev/sda            

/dev/sda:
 issuing idle_immediate command
shinkupaddo# dmesg -c
ata1.00: SError: 0x0

Any additional
shinkupaddo# echo 1000 > /sys/block/sda/device/unload_heads

Gives only                                        
ata1.00: SError: 0x0


Regards
Evgeni

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Odd behaviour of device in response to idleimmediate with unload
  2008-11-06 11:23                         ` Evgeni Golov
@ 2008-11-06 12:12                           ` Elias Oltmanns
  0 siblings, 0 replies; 37+ messages in thread
From: Elias Oltmanns @ 2008-11-06 12:12 UTC (permalink / raw)
  To: Evgeni Golov; +Cc: Tejun Heo, Mark Lord, linux-ide, Alan Cox

Evgeni Golov <sargentd@die-welt.net> wrote:
> On Wed, 05 Nov 2008 19:55:44 +0100 Elias Oltmanns wrote:
>
>> # hdparm --idle-immediate /dev/sda
>> # hdparm --idle-unload /dev/sda
>> # hdparm --idle-immediate /dev/sda
>> # echo 1000 > /sys/block/sda/device/unload_heads
>> # hdparm --idle-immediate /dev/sda
>
> Here come my test results:
>
[...]
> shinkupaddo# ./hdparm --idle-unload /dev/sda
>
> /dev/sda:
>  issuing idle_immediate_unload command
> shinkupaddo# dmesg -c
> ata1.00: SError: 0x0
[...]
> shinkupaddo# echo 1000 > /sys/block/sda/device/unload_heads 
> shinkupaddo# dmesg -c
> ata1: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0xf
> ata1: SError: { PHYRdyChg CommWake }
> ata1: hard resetting link
> ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> ata1.00: SError: 0x0
> ata1.00: ACPI cmd ef/02:00:00:00:00:a0 succeeded
> ata1.00: ACPI cmd f5/00:00:00:00:00:a0 filtered out
> ata1.00: ACPI cmd ef/10:03:00:00:00:a0 filtered out
> ata1.00: ACPI cmd ef/02:00:00:00:00:a0 succeeded
> ata1.00: ACPI cmd f5/00:00:00:00:00:a0 filtered out
> ata1.00: ACPI cmd ef/10:03:00:00:00:a0 filtered out
> ata1.00: configured for UDMA/100
> ata1.00: configured for UDMA/100
> ata1: EH complete
> sd 0:0:0:0: [sda] 195371568 512-byte hardware sectors: (100 GB/93.1 GiB)
> sd 0:0:0:0: [sda] Write Protect is off
> sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
> sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
> sd 0:0:0:0: [sda] 195371568 512-byte hardware sectors: (100 GB/93.1 GiB)
> sd 0:0:0:0: [sda] Write Protect is off
> sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
> sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[...]
> Any additional
> shinkupaddo# echo 1000 > /sys/block/sda/device/unload_heads
>
> Gives only                                        
> ata1.00: SError: 0x0

Now there's an interesting thing. Even though these results don't
exactly confirm my theory I posted just now, they certainly do throw
some light on the matter. It looks to me as if the hard reset is not at
all related to head parking, after all. It's just that echo 1000 >
/sys/.../unload_heads triggers EH and during link autopsy some stale
phyrdy | commwake bits in SError are discovered and acted upon. As to
how those bits got set in the first place, I suspect that some other
action in a previous EH cycle triggered a phyrdy event that didn't get
cleared because event notification was disabled (see my earlier post). I
don't think this is expected behaviour and in particular I have no idea,
right now, what actually may have triggered this event, Tejun?

Regards,

Elias

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Odd behaviour of device in response to idleimmediate with unload
  2008-11-05 14:08                     ` Tejun Heo
  2008-11-05 18:55                       ` Elias Oltmanns
@ 2008-11-05 19:34                       ` Evgeni Golov
  2008-11-06 11:41                       ` Elias Oltmanns
  2 siblings, 0 replies; 37+ messages in thread
From: Evgeni Golov @ 2008-11-05 19:34 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Elias Oltmanns, Mark Lord, linux-ide, Alan Cox

On Wed, 05 Nov 2008 23:08:42 +0900 Tejun Heo wrote:

> Ah... it's ata_piix, right?

It's ahci (Intel Corporation 82801GBM/GHM (ICH7 Family) SATA AHCI
Controller (rev 02)).

Regards

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Odd behaviour of device in response to idleimmediate with unload
  2008-11-05 14:08                     ` Tejun Heo
  2008-11-05 18:55                       ` Elias Oltmanns
  2008-11-05 19:34                       ` Evgeni Golov
@ 2008-11-06 11:41                       ` Elias Oltmanns
  2008-11-07  4:08                         ` Tejun Heo
  2 siblings, 1 reply; 37+ messages in thread
From: Elias Oltmanns @ 2008-11-06 11:41 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Mark Lord, Evgeni Golov, linux-ide, Alan Cox

Tejun Heo <tj@kernel.org> wrote:
> Elias Oltmanns wrote:
[...]
>> As a wild guess, I'm wondering whether ata_eh_revalidate_and_attach()
>> has anything to do with it. Unless you have a better suggestion, perhaps
>> the following debug patch would give some useful information.
>
> I don't have much idea at this point.  To the drive, it shouldn't look
> any different.  Ah... it's ata_piix, right?  ata_piix doesn't have PHY
> event IRQ, so it could be that the command issued by hdparm did trigger
> PHY event but didn't get noticed by EH while the condition triggered by
> IDLE IMMEDIATE did.  One way to find out would be adding SCR print outs
> on command completion.

Actually, event notification is turned off during error recovery for
ahci as well. Additionally, we have the following in the interrupt
handler of ahci.c:

	/* If we are getting PhyRdy, this is
 	 * just a power state change, we should
 	 * clear out this, plus the PhyRdy/Comm
 	 * Wake bits from Serror
 	 */
	if ((hpriv->flags & AHCI_HFLAG_NO_HOTPLUG) &&
		(status & PORT_IRQ_PHYRDY)) {
		status &= ~PORT_IRQ_PHYRDY;
		ahci_scr_write(&ap->link, SCR_ERROR, ((1 << 16) | (1 << 18)));
	}

This suggests to me that hdparm --idle-unload does indeed trigger a phy
event, but the interrupt handler clears SError. Issuing the unload
command in EH, on the other hand, does not result in a phy event because
event notification is disabled. That way, phyrdy and commwake don't get
cleared in SError in will indicate a hotplug event next time SError is
checked. Does that make sense? If so, what's to be done about it?

Regards,

Elias

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Odd behaviour of device in response to idleimmediate with unload
  2008-11-06 11:41                       ` Elias Oltmanns
@ 2008-11-07  4:08                         ` Tejun Heo
  2008-11-07  7:48                           ` Evgeni Golov
  0 siblings, 1 reply; 37+ messages in thread
From: Tejun Heo @ 2008-11-07  4:08 UTC (permalink / raw)
  To: Elias Oltmanns; +Cc: Mark Lord, Evgeni Golov, linux-ide, Alan Cox

Elias Oltmanns wrote:
> Tejun Heo <tj@kernel.org> wrote:
>> Elias Oltmanns wrote:
> [...]
>>> As a wild guess, I'm wondering whether ata_eh_revalidate_and_attach()
>>> has anything to do with it. Unless you have a better suggestion, perhaps
>>> the following debug patch would give some useful information.
>> I don't have much idea at this point.  To the drive, it shouldn't look
>> any different.  Ah... it's ata_piix, right?  ata_piix doesn't have PHY
>> event IRQ, so it could be that the command issued by hdparm did trigger
>> PHY event but didn't get noticed by EH while the condition triggered by
>> IDLE IMMEDIATE did.  One way to find out would be adding SCR print outs
>> on command completion.
> 
> Actually, event notification is turned off during error recovery for
> ahci as well.

Yeap, but SError is checked and cleared after event notification is
turned back on, so events shouldn't leak.

> Additionally, we have the following in the interrupt
> handler of ahci.c:
> 
> 	/* If we are getting PhyRdy, this is
>  	 * just a power state change, we should
>  	 * clear out this, plus the PhyRdy/Comm
>  	 * Wake bits from Serror
>  	 */
> 	if ((hpriv->flags & AHCI_HFLAG_NO_HOTPLUG) &&
> 		(status & PORT_IRQ_PHYRDY)) {
> 		status &= ~PORT_IRQ_PHYRDY;
> 		ahci_scr_write(&ap->link, SCR_ERROR, ((1 << 16) | (1 << 18)));
> 	}
> 
> This suggests to me that hdparm --idle-unload does indeed trigger a phy
> event, but the interrupt handler clears SError. Issuing the unload
> command in EH, on the other hand, does not result in a phy event because
> event notification is disabled. That way, phyrdy and commwake don't get
> cleared in SError in will indicate a hotplug event next time SError is
> checked. Does that make sense? If so, what's to be done about it?

Hmmm... if ALPM is enabled, it could explain all.  Enabling ALPM does
inhibit event notifications but it doesn't prevent autopsy from
interpreting SError as if ALPM is not enabled.

Evgeni, is ALPM enabled?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Odd behaviour of device in response to idleimmediate with unload
  2008-11-07  4:08                         ` Tejun Heo
@ 2008-11-07  7:48                           ` Evgeni Golov
  2008-11-10  9:00                             ` Tejun Heo
  0 siblings, 1 reply; 37+ messages in thread
From: Evgeni Golov @ 2008-11-07  7:48 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Elias Oltmanns, Mark Lord, linux-ide, Alan Cox

On Fri, Nov 07, 2008 at 01:08:29PM +0900, Tejun Heo wrote:
> > This suggests to me that hdparm --idle-unload does indeed trigger a phy
> > event, but the interrupt handler clears SError. Issuing the unload
> > command in EH, on the other hand, does not result in a phy event because
> > event notification is disabled. That way, phyrdy and commwake don't get
> > cleared in SError in will indicate a hotplug event next time SError is
> > checked. Does that make sense? If so, what's to be done about it?
> 
> Hmmm... if ALPM is enabled, it could explain all.  Enabling ALPM does
> inhibit event notifications but it doesn't prevent autopsy from
> interpreting SError as if ALPM is not enabled.
> 
> Evgeni, is ALPM enabled?

You mean Aggressive Link Power Management? As patches from here:
http://www.kernel.org/pub/linux/kernel/people/kristen/patches/SATA/alpm/
?
Unless they got merged into 2.6.27 and autoenabled, no, I dont use ALPM :)

Regards
Evgeni

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Odd behaviour of device in response to idleimmediate with unload
  2008-11-07  7:48                           ` Evgeni Golov
@ 2008-11-10  9:00                             ` Tejun Heo
  2008-11-10 10:26                               ` Evgeni Golov
  0 siblings, 1 reply; 37+ messages in thread
From: Tejun Heo @ 2008-11-10  9:00 UTC (permalink / raw)
  To: Evgeni Golov; +Cc: Elias Oltmanns, Mark Lord, linux-ide, Alan Cox

Evgeni Golov wrote:
> On Fri, Nov 07, 2008 at 01:08:29PM +0900, Tejun Heo wrote:
>>> This suggests to me that hdparm --idle-unload does indeed trigger a phy
>>> event, but the interrupt handler clears SError. Issuing the unload
>>> command in EH, on the other hand, does not result in a phy event because
>>> event notification is disabled. That way, phyrdy and commwake don't get
>>> cleared in SError in will indicate a hotplug event next time SError is
>>> checked. Does that make sense? If so, what's to be done about it?
>> Hmmm... if ALPM is enabled, it could explain all.  Enabling ALPM does
>> inhibit event notifications but it doesn't prevent autopsy from
>> interpreting SError as if ALPM is not enabled.
>>
>> Evgeni, is ALPM enabled?
> 
> You mean Aggressive Link Power Management? As patches from here:
> http://www.kernel.org/pub/linux/kernel/people/kristen/patches/SATA/alpm/
> ?
> Unless they got merged into 2.6.27 and autoenabled, no, I dont use ALPM :)

Heh... I'm fresh out of ideas.  Somehow SError is being set on the
initial EH head unload.  In general, what EH is doing is harmless in
itself but EH reset can delay head unloading defeating the purpose of
unloading.

Is the phy event before or after head unloading?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Odd behaviour of device in response to idleimmediate with unload
  2008-11-10  9:00                             ` Tejun Heo
@ 2008-11-10 10:26                               ` Evgeni Golov
  2008-11-10 11:35                                 ` Elias Oltmanns
  0 siblings, 1 reply; 37+ messages in thread
From: Evgeni Golov @ 2008-11-10 10:26 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Elias Oltmanns, Mark Lord, linux-ide, Alan Cox

On Mon, 10 Nov 2008 18:00:18 +0900 Tejun Heo wrote:

> Is the phy event before or after head unloading?

How do I check this?

Regards

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Odd behaviour of device in response to idleimmediate with unload
  2008-11-10 10:26                               ` Evgeni Golov
@ 2008-11-10 11:35                                 ` Elias Oltmanns
  2008-11-13 11:33                                   ` Evgeni Golov
  0 siblings, 1 reply; 37+ messages in thread
From: Elias Oltmanns @ 2008-11-10 11:35 UTC (permalink / raw)
  To: Evgeni Golov; +Cc: Tejun Heo, Mark Lord, linux-ide, Alan Cox

Evgeni Golov <sargentd@die-welt.net> wrote:
> On Mon, 10 Nov 2008 18:00:18 +0900 Tejun Heo wrote:
>
>> Is the phy event before or after head unloading?
>
> How do I check this?

# dmesg
# echo 3000 > /sys/block/sda/device/unload_heads &
# dmesg

The first call to dmesg is only to make sure that it is in the cache so
the second won't be blocked by the park request.

Regards,

Elias

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Odd behaviour of device in response to idleimmediate with unload
  2008-11-10 11:35                                 ` Elias Oltmanns
@ 2008-11-13 11:33                                   ` Evgeni Golov
  2008-11-13 12:29                                     ` Elias Oltmanns
  2008-11-16  9:39                                     ` Tejun Heo
  0 siblings, 2 replies; 37+ messages in thread
From: Evgeni Golov @ 2008-11-13 11:33 UTC (permalink / raw)
  To: Elias Oltmanns; +Cc: Tejun Heo, Mark Lord, linux-ide, Alan Cox

[-- Attachment #1: Type: text/plain, Size: 769 bytes --]

On Mon, 10 Nov 2008 12:35:20 +0100 Elias Oltmanns wrote:

> Evgeni Golov <sargentd@die-welt.net> wrote:
> > On Mon, 10 Nov 2008 18:00:18 +0900 Tejun Heo wrote:
> >
> >> Is the phy event before or after head unloading?
> >
> > How do I check this?
> 
> # dmesg
> # echo 3000 > /sys/block/sda/device/unload_heads &
> # dmesg
> 
> The first call to dmesg is only to make sure that it is in the cache so
> the second won't be blocked by the park request.

Okay, I get the following as soon I issue unload:
ata1: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0xf
ata1: SError: { PHYRdyChg CommWake }
ata1: hard resetting link
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata1.00: SError: 0x0

The rest comes a bit later.

HTH
Evgeni

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Odd behaviour of device in response to idleimmediate with unload
  2008-11-13 11:33                                   ` Evgeni Golov
@ 2008-11-13 12:29                                     ` Elias Oltmanns
  2008-11-16  9:39                                     ` Tejun Heo
  1 sibling, 0 replies; 37+ messages in thread
From: Elias Oltmanns @ 2008-11-13 12:29 UTC (permalink / raw)
  To: Evgeni Golov; +Cc: Tejun Heo, Mark Lord, linux-ide, Alan Cox

Evgeni Golov <sargentd@die-welt.net> wrote:
> On Mon, 10 Nov 2008 12:35:20 +0100 Elias Oltmanns wrote:
>
>> Evgeni Golov <sargentd@die-welt.net> wrote:
>> > On Mon, 10 Nov 2008 18:00:18 +0900 Tejun Heo wrote:
>> >
>> >> Is the phy event before or after head unloading?
>> >
>> > How do I check this?
>> 
>> # dmesg
>> # echo 3000 > /sys/block/sda/device/unload_heads &
>> # dmesg
>> 
>> The first call to dmesg is only to make sure that it is in the cache so
>> the second won't be blocked by the park request.
>
> Okay, I get the following as soon I issue unload:
> ata1: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0xf
> ata1: SError: { PHYRdyChg CommWake }
> ata1: hard resetting link
> ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> ata1.00: SError: 0x0
>
> The rest comes a bit later.

Right, that settles it: the reset sequence is initiated even before the
unload command is issued for the first time. This means that head
parking is not part of the picture except for the fact that it provides
the means to initiate EH from userspace and makes the problem easily
reproducible. On the other hand, it remains to be a mystery to me what
actually sets those bits in SError in the first place without event
notification taking care of it. I'll have to think about this for a
while. Perhaps Tejun has another idea?

Regards,

Elias

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Odd behaviour of device in response to idleimmediate with unload
  2008-11-13 11:33                                   ` Evgeni Golov
  2008-11-13 12:29                                     ` Elias Oltmanns
@ 2008-11-16  9:39                                     ` Tejun Heo
  2008-11-17  7:15                                       ` Evgeni Golov
  1 sibling, 1 reply; 37+ messages in thread
From: Tejun Heo @ 2008-11-16  9:39 UTC (permalink / raw)
  To: Evgeni Golov; +Cc: Elias Oltmanns, Mark Lord, linux-ide, Alan Cox

Hello,

Evgeni Golov wrote:
>> The first call to dmesg is only to make sure that it is in the cache so
>> the second won't be blocked by the park request.
> 
> Okay, I get the following as soon I issue unload:
> ata1: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0xf
> ata1: SError: { PHYRdyChg CommWake }
> ata1: hard resetting link
> ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> ata1.00: SError: 0x0
> 
> The rest comes a bit later.

So, this comes before actually issuing the park.  It sounds like it has
nothing to do with park command itself.  If you do "echo - - - >
/sys/class/scsi_host/host0/scan" right after boot, what does the kernel say?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Odd behaviour of device in response to idleimmediate with unload
  2008-11-16  9:39                                     ` Tejun Heo
@ 2008-11-17  7:15                                       ` Evgeni Golov
  2008-11-17  7:19                                         ` Tejun Heo
  0 siblings, 1 reply; 37+ messages in thread
From: Evgeni Golov @ 2008-11-17  7:15 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Elias Oltmanns, Mark Lord, linux-ide, Alan Cox

On Sun, 16 Nov 2008 18:39:08 +0900 Tejun Heo wrote:

> So, this comes before actually issuing the park.  It sounds like it has
> nothing to do with park command itself.  If you do "echo - - - >
> /sys/class/scsi_host/host0/scan" right after boot, what does the kernel say?

# echo - - - > /sys/class/scsi_host/host0/scan
echo: write error: invalid argument


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Odd behaviour of device in response to idleimmediate with unload
  2008-11-17  7:15                                       ` Evgeni Golov
@ 2008-11-17  7:19                                         ` Tejun Heo
  2008-11-17  7:48                                           ` Evgeni Golov
  0 siblings, 1 reply; 37+ messages in thread
From: Tejun Heo @ 2008-11-17  7:19 UTC (permalink / raw)
  To: Evgeni Golov; +Cc: Elias Oltmanns, Mark Lord, linux-ide, Alan Cox

Evgeni Golov wrote:
> On Sun, 16 Nov 2008 18:39:08 +0900 Tejun Heo wrote:
> 
>> So, this comes before actually issuing the park.  It sounds like it has
>> nothing to do with park command itself.  If you do "echo - - - >
>> /sys/class/scsi_host/host0/scan" right after boot, what does the kernel say?
> 
> # echo - - - > /sys/class/scsi_host/host0/scan
> echo: write error: invalid argument

Eh... strange.  It works perfectly fine here.  Can you play with quoting
and -n?

-- 
tejun

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Odd behaviour of device in response to idleimmediate with unload
  2008-11-17  7:19                                         ` Tejun Heo
@ 2008-11-17  7:48                                           ` Evgeni Golov
  2008-11-18  1:22                                             ` Tejun Heo
  0 siblings, 1 reply; 37+ messages in thread
From: Evgeni Golov @ 2008-11-17  7:48 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Elias Oltmanns, Mark Lord, linux-ide, Alan Cox

On Mon, 17 Nov 2008 16:19:59 +0900 Tejun Heo wrote:

> Evgeni Golov wrote:
> > On Sun, 16 Nov 2008 18:39:08 +0900 Tejun Heo wrote:
> > 
> >> So, this comes before actually issuing the park.  It sounds like it has
> >> nothing to do with park command itself.  If you do "echo - - - >
> >> /sys/class/scsi_host/host0/scan" right after boot, what does the kernel say?
> > 
> > # echo - - - > /sys/class/scsi_host/host0/scan
> > echo: write error: invalid argument
> 
> Eh... strange.  It works perfectly fine here.  Can you play with quoting
> and -n?

Heh, did try -n but no quoting, damn - :)
Get the same reset too:

shinkupaddo# dmesg
ata1: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0xf
ata1: SError: { PHYRdyChg CommWake }
ata1: hard resetting link
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata1.00: ACPI cmd ef/02:00:00:00:00:a0 succeeded
ata1.00: ACPI cmd f5/00:00:00:00:00:a0 filtered out
ata1.00: ACPI cmd ef/10:03:00:00:00:a0 filtered out
ata1.00: ACPI cmd ef/02:00:00:00:00:a0 succeeded
ata1.00: ACPI cmd f5/00:00:00:00:00:a0 filtered out
ata1.00: ACPI cmd ef/10:03:00:00:00:a0 filtered out
ata1.00: configured for UDMA/100
ata1.00: configured for UDMA/100
ata1: EH complete
sd 0:0:0:0: [sda] 195371568 512-byte hardware sectors: (100 GB/93.1 GiB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sd 0:0:0:0: [sda] 195371568 512-byte hardware sectors: (100 GB/93.1 GiB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

Btw, unload heads isn't even enabled at the moment...

Regards

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Odd behaviour of device in response to idleimmediate with unload
  2008-11-17  7:48                                           ` Evgeni Golov
@ 2008-11-18  1:22                                             ` Tejun Heo
  2008-11-18  7:37                                               ` Evgeni Golov
  0 siblings, 1 reply; 37+ messages in thread
From: Tejun Heo @ 2008-11-18  1:22 UTC (permalink / raw)
  To: Evgeni Golov; +Cc: Elias Oltmanns, Mark Lord, linux-ide, Alan Cox

Evgeni Golov wrote:
> On Mon, 17 Nov 2008 16:19:59 +0900 Tejun Heo wrote:
> 
>> Evgeni Golov wrote:
>>> On Sun, 16 Nov 2008 18:39:08 +0900 Tejun Heo wrote:
>>>
>>>> So, this comes before actually issuing the park.  It sounds like it has
>>>> nothing to do with park command itself.  If you do "echo - - - >
>>>> /sys/class/scsi_host/host0/scan" right after boot, what does the kernel say?
>>> # echo - - - > /sys/class/scsi_host/host0/scan
>>> echo: write error: invalid argument
>> Eh... strange.  It works perfectly fine here.  Can you play with quoting
>> and -n?
> 
> Heh, did try -n but no quoting, damn - :)
> Get the same reset too:
> 
> shinkupaddo# dmesg
> ata1: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0xf
> ata1: SError: { PHYRdyChg CommWake }

Heh... you're not supposed to see these events here, so the PHY event
and resetting don't have anything to do with head unloading per se.
It's just discovering previously set SError values.  The question is
when did they get set and why didn't ahci catch it.  libata EH enables
enable reporting and then clear SError before finishing up so there
shouldn't be any window where events can get lost.

Can you please attach lspci -nn result and full boot log?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Odd behaviour of device in response to idleimmediate with unload
  2008-11-18  1:22                                             ` Tejun Heo
@ 2008-11-18  7:37                                               ` Evgeni Golov
  2008-11-21  6:41                                                 ` Tejun Heo
  0 siblings, 1 reply; 37+ messages in thread
From: Evgeni Golov @ 2008-11-18  7:37 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Elias Oltmanns, Mark Lord, linux-ide, Alan Cox

[-- Attachment #1: Type: text/plain, Size: 138 bytes --]

On Tue, 18 Nov 2008 10:22:25 +0900 Tejun Heo wrote:

> Can you please attach lspci -nn result and full boot log?

Done :)

Regards
Evgeni

[-- Attachment #2: lspci.gz --]
[-- Type: application/octet-stream, Size: 794 bytes --]

[-- Attachment #3: dmesg.gz --]
[-- Type: application/octet-stream, Size: 5146 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Odd behaviour of device in response to idleimmediate with unload
  2008-11-18  7:37                                               ` Evgeni Golov
@ 2008-11-21  6:41                                                 ` Tejun Heo
  2008-11-21 19:40                                                   ` Evgeni Golov
  0 siblings, 1 reply; 37+ messages in thread
From: Tejun Heo @ 2008-11-21  6:41 UTC (permalink / raw)
  To: Evgeni Golov; +Cc: Elias Oltmanns, Mark Lord, linux-ide, Alan Cox

[-- Attachment #1: Type: text/plain, Size: 281 bytes --]

Evgeni Golov wrote:
> On Tue, 18 Nov 2008 10:22:25 +0900 Tejun Heo wrote:
> 
>> Can you please attach lspci -nn result and full boot log?
> 
> Done :)

Something strange is going on there.  Can you please apply the attached
patch and post resulting kernel log?

Thanks.

-- 
tejun

[-- Attachment #2: ahci-debug.patch --]
[-- Type: text/x-patch, Size: 2604 bytes --]

diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
index a67b8e7..8500fd5 100644
--- a/drivers/ata/ahci.c
+++ b/drivers/ata/ahci.c
@@ -2006,6 +2006,8 @@ static void ahci_error_intr(struct ata_port *ap, u32 irq_stat)
 	}
 
 	if (irq_stat & (PORT_IRQ_CONNECT | PORT_IRQ_PHYRDY)) {
+		ata_port_printk(ap, KERN_INFO, "XXX PHY event 0x%x\n",
+				irq_stat);
 		ata_ehi_hotplugged(host_ehi);
 		ata_ehi_push_desc(host_ehi, "%s",
 			irq_stat & PORT_IRQ_CONNECT ?
@@ -2044,6 +2046,7 @@ static void ahci_port_intr(struct ata_port *ap)
  	 */
 	if ((hpriv->flags & AHCI_HFLAG_NO_HOTPLUG) &&
 		(status & PORT_IRQ_PHYRDY)) {
+		ata_port_printk(ap, KERN_INFO, "XXX PHY event clearing 0x%x\n", status);
 		status &= ~PORT_IRQ_PHYRDY;
 		ahci_scr_write(&ap->link, SCR_ERROR, ((1 << 16) | (1 << 18)));
 	}
diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index 32da9a9..24b52aa 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -2380,13 +2380,15 @@ int ata_eh_reset(struct ata_link *link, int classify,
 	/*
 	 * Perform reset
 	 */
-	if (ata_is_host_link(link))
+	if (ata_is_host_link(link)) {
+		ata_port_printk(ap, KERN_INFO, "XXX FREEZING\n");
 		ata_eh_freeze_port(ap);
+	}
 
 	deadline = ata_deadline(jiffies, ata_eh_reset_timeouts[try++]);
 
 	if (reset) {
-		if (verbose)
+		//if (verbose)
 			ata_link_printk(link, KERN_INFO, "%s resetting link\n",
 					reset == softreset ? "soft" : "hard");
 
@@ -2440,6 +2442,7 @@ int ata_eh_reset(struct ata_link *link, int classify,
 				goto fail;
 			}
 
+			ata_port_printk(ap, KERN_INFO, "XXX follow-up SRST\n");
 			ata_eh_about_to_do(link, NULL, ATA_EH_RESET);
 			rc = ata_do_reset(link, reset, classes, deadline, true);
 		}
@@ -2479,8 +2482,10 @@ int ata_eh_reset(struct ata_link *link, int classify,
 		slave->sata_spd = (sstatus >> 4) & 0xf;
 
 	/* thaw the port */
-	if (ata_is_host_link(link))
+	if (ata_is_host_link(link)) {
+		ata_port_printk(ap, KERN_INFO, "XXX THAW\n");
 		ata_eh_thaw_port(ap);
+	}
 
 	/* postreset() should clear hardware SError.  Although SError
 	 * is cleared during link resume, clearing SError here is
@@ -2490,9 +2495,17 @@ int ata_eh_reset(struct ata_link *link, int classify,
 	 * link onlineness and classification result later.
 	 */
 	if (postreset) {
+		ata_port_printk(ap, KERN_INFO, "XXX postreset start\n");
 		postreset(link, classes);
 		if (slave)
 			postreset(slave, classes);
+		{
+			u32 serror = 0;
+			sata_scr_read(link, SCR_ERROR, &serror);
+			ata_port_printk(ap, KERN_INFO,
+					"XXX postreset end, SError=0x%x\n",
+					serror);
+		}
 	}
 
 	/* clear cached SError */

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: Odd behaviour of device in response to idleimmediate with unload
  2008-11-21  6:41                                                 ` Tejun Heo
@ 2008-11-21 19:40                                                   ` Evgeni Golov
  2008-11-22  8:22                                                     ` Tejun Heo
  0 siblings, 1 reply; 37+ messages in thread
From: Evgeni Golov @ 2008-11-21 19:40 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Elias Oltmanns, Mark Lord, linux-ide, Alan Cox


[-- Attachment #1.1: Type: text/plain, Size: 250 bytes --]

On Fri, 21 Nov 2008 15:41:22 +0900 Tejun Heo wrote:

> Something strange is going on there.  Can you please apply the attached
> patch and post resulting kernel log?

Wow, that one is spammy :)
See relevant part of /var/log/messages attached.

[-- Attachment #1.2: log-2.6.28-rc6.gz --]
[-- Type: application/octet-stream, Size: 5185 bytes --]

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Odd behaviour of device in response to idleimmediate with unload
  2008-11-21 19:40                                                   ` Evgeni Golov
@ 2008-11-22  8:22                                                     ` Tejun Heo
  2008-11-22  9:51                                                       ` Evgeni Golov
                                                                         ` (2 more replies)
  0 siblings, 3 replies; 37+ messages in thread
From: Tejun Heo @ 2008-11-22  8:22 UTC (permalink / raw)
  To: Evgeni Golov; +Cc: Elias Oltmanns, Mark Lord, linux-ide, Alan Cox

Evgeni Golov wrote:
> On Fri, 21 Nov 2008 15:41:22 +0900 Tejun Heo wrote:
> 
>> Something strange is going on there.  Can you please apply the attached
>> patch and post resulting kernel log?
> 
> Wow, that one is spammy :)
> See relevant part of /var/log/messages attached.

Eh... you have ALPM enabled.  What does "cat
/sys/block/sda/device/link_power_management_policy" tell you?

-- 
tejun

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Odd behaviour of device in response to idleimmediate with unload
  2008-11-22  8:22                                                     ` Tejun Heo
@ 2008-11-22  9:51                                                       ` Evgeni Golov
  2008-11-22  9:58                                                       ` Evgeni Golov
  2008-11-23 18:09                                                       ` Elias Oltmanns
  2 siblings, 0 replies; 37+ messages in thread
From: Evgeni Golov @ 2008-11-22  9:51 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Elias Oltmanns, Mark Lord, linux-ide, Alan Cox

On Sat, 22 Nov 2008 17:22:39 +0900 Tejun Heo wrote:

> Eh... you have ALPM enabled.

Mh, when did I enable that!?

>  What does "cat
> /sys/block/sda/device/link_power_management_policy" tell you?

Nothing as the file does not exits, but
cat /sys/class/scsi_host/host0/link_power_management_policy
works and said me "min_power".

I rebooted without setting it, so it now says "max_performance" and I
still get the reset on scan :/

Regards

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Odd behaviour of device in response to idleimmediate with unload
  2008-11-22  8:22                                                     ` Tejun Heo
  2008-11-22  9:51                                                       ` Evgeni Golov
@ 2008-11-22  9:58                                                       ` Evgeni Golov
  2008-11-23 18:09                                                       ` Elias Oltmanns
  2 siblings, 0 replies; 37+ messages in thread
From: Evgeni Golov @ 2008-11-22  9:58 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Elias Oltmanns, Mark Lord, linux-ide, Alan Cox

On Sat, 22 Nov 2008 17:22:39 +0900 Tejun Heo wrote:

> Eh... you have ALPM enabled.  What does "cat
> /sys/block/sda/device/link_power_management_policy" tell you?

Sorry, when I disable ALPM I get no errors on unload_heads, so it's
fine now :)
I still wonder why ALPM was enabled...

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Odd behaviour of device in response to idleimmediate with unload
  2008-11-22  8:22                                                     ` Tejun Heo
  2008-11-22  9:51                                                       ` Evgeni Golov
  2008-11-22  9:58                                                       ` Evgeni Golov
@ 2008-11-23 18:09                                                       ` Elias Oltmanns
  2008-11-24  4:20                                                         ` Tejun Heo
  2 siblings, 1 reply; 37+ messages in thread
From: Elias Oltmanns @ 2008-11-23 18:09 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Evgeni Golov, Mark Lord, linux-ide, Alan Cox

Tejun Heo <tj@kernel.org> wrote:
> Evgeni Golov wrote:
>> On Fri, 21 Nov 2008 15:41:22 +0900 Tejun Heo wrote:
>
>> 
>>> Something strange is going on there.  Can you please apply the attached
>>> patch and post resulting kernel log?
>> 
>> Wow, that one is spammy :)
>> See relevant part of /var/log/messages attached.
>
> Eh... you have ALPM enabled.

Quite. The thing that's puzzling me, though, is this: There are a lot of
phy events occurring during normal operation that get cleared by the
interrupt handler. However, during link autopsy, phyrdy and comwake are
set in SError even though event notification has not been disabled yet
at this point. This suggests to me that not all changes to those bits in
SError are brought to our attention through event notification, at least
as long as ALPM is enabled. Besides, I'm wondering whether that many phy
events are to be expected (i.e. observed on other controllers too) when
ALPM is enabled.

Regards,

Elias

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Odd behaviour of device in response to idleimmediate with unload
  2008-11-23 18:09                                                       ` Elias Oltmanns
@ 2008-11-24  4:20                                                         ` Tejun Heo
  0 siblings, 0 replies; 37+ messages in thread
From: Tejun Heo @ 2008-11-24  4:20 UTC (permalink / raw)
  To: Elias Oltmanns; +Cc: Evgeni Golov, Mark Lord, linux-ide, Alan Cox

Elias Oltmanns wrote:
> Tejun Heo <tj@kernel.org> wrote:
>> Evgeni Golov wrote:
>>> On Fri, 21 Nov 2008 15:41:22 +0900 Tejun Heo wrote:
>>>> Something strange is going on there.  Can you please apply the attached
>>>> patch and post resulting kernel log?
>>> Wow, that one is spammy :)
>>> See relevant part of /var/log/messages attached.
>> Eh... you have ALPM enabled.
> 
> Quite. The thing that's puzzling me, though, is this: There are a lot of
> phy events occurring during normal operation that get cleared by the
> interrupt handler. However, during link autopsy, phyrdy and comwake are
> set in SError even though event notification has not been disabled yet
> at this point. This suggests to me that not all changes to those bits in
> SError are brought to our attention through event notification, at least
> as long as ALPM is enabled. Besides, I'm wondering whether that many phy
> events are to be expected (i.e. observed on other controllers too) when
> ALPM is enabled.

Yeah, entering and leaving link power save mode generate PHY event
each time and ahci explicitly ignores them if ALPM is enabled.  We
probably need to teach EH that PHY events can be ignored if ALPM is
enabled.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2008-11-24  4:20 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-11-04 10:31 Odd behaviour of device in response to idleimmediate with unload Elias Oltmanns
2008-11-04 10:40 ` Tejun Heo
2008-11-04 12:32   ` Evgeni Golov
2008-11-04 17:06     ` Mark Lord
2008-11-04 17:18       ` Mark Lord
2008-11-04 17:47         ` Mark Lord
2008-11-04 18:13           ` Mark Lord
2008-11-04 18:54             ` Evgeni Golov
2008-11-04 19:39               ` Mark Lord
2008-11-05  9:32                 ` Tejun Heo
2008-11-05 13:47                   ` Elias Oltmanns
2008-11-05 14:08                     ` Tejun Heo
2008-11-05 18:55                       ` Elias Oltmanns
2008-11-06 11:23                         ` Evgeni Golov
2008-11-06 12:12                           ` Elias Oltmanns
2008-11-05 19:34                       ` Evgeni Golov
2008-11-06 11:41                       ` Elias Oltmanns
2008-11-07  4:08                         ` Tejun Heo
2008-11-07  7:48                           ` Evgeni Golov
2008-11-10  9:00                             ` Tejun Heo
2008-11-10 10:26                               ` Evgeni Golov
2008-11-10 11:35                                 ` Elias Oltmanns
2008-11-13 11:33                                   ` Evgeni Golov
2008-11-13 12:29                                     ` Elias Oltmanns
2008-11-16  9:39                                     ` Tejun Heo
2008-11-17  7:15                                       ` Evgeni Golov
2008-11-17  7:19                                         ` Tejun Heo
2008-11-17  7:48                                           ` Evgeni Golov
2008-11-18  1:22                                             ` Tejun Heo
2008-11-18  7:37                                               ` Evgeni Golov
2008-11-21  6:41                                                 ` Tejun Heo
2008-11-21 19:40                                                   ` Evgeni Golov
2008-11-22  8:22                                                     ` Tejun Heo
2008-11-22  9:51                                                       ` Evgeni Golov
2008-11-22  9:58                                                       ` Evgeni Golov
2008-11-23 18:09                                                       ` Elias Oltmanns
2008-11-24  4:20                                                         ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).