* Odd behaviour of device in response to idleimmediate with unload @ 2008-11-04 10:31 Elias Oltmanns 2008-11-04 10:40 ` Tejun Heo 0 siblings, 1 reply; 37+ messages in thread From: Elias Oltmanns @ 2008-11-04 10:31 UTC (permalink / raw) To: linux-ide, Tejun Heo, Alan Cox; +Cc: Evgeni Golov Hi all, apparently, we have the first case of a quirky implementation of idleimmediate with unload feature in a device, or I'm barking up the wrong tree, of course. Evgeni Golov has reported the following on hdaps-devel: Evgeni Golov <sargentd@die-welt.net> wrote: > Hi, > > I have a ThinkPad Z61m with a 100GB S-ATA TOSHIBA MK1032GSX. [...] > Now I thought I should try the new interface and compiled 2.6.28-rc2, [...] > After that I could park the heads and got the following in dmesg: > > ata1: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0xf > ata1: SError: { PHYRdyChg CommWake } > ata1: hard resetting link > ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) > ata1.00: ACPI cmd ef/02:00:00:00:00:a0 succeeded > ata1.00: ACPI cmd f5/00:00:00:00:00:a0 filtered out > ata1.00: ACPI cmd ef/10:03:00:00:00:a0 filtered out > ata1.00: ACPI cmd ef/02:00:00:00:00:a0 succeeded > ata1.00: ACPI cmd f5/00:00:00:00:00:a0 filtered out > ata1.00: ACPI cmd ef/10:03:00:00:00:a0 filtered out > ata1.00: configured for UDMA/100 > ata1.00: configured for UDMA/100 > ata1: EH complete > sd 0:0:0:0: [sda] 195371568 512-byte hardware sectors: (100 GB/93.1 GiB) > sd 0:0:0:0: [sda] Write Protect is off > sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 > sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't > support DPO or FUA > sd 0:0:0:0: [sda] 195371568 512-byte hardware sectors: (100 GB/93.1 GiB) > sd 0:0:0:0: [sda] Write Protect is off > sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 > sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't > support DPO or FUA The only explanation I could come up with so far is that the head park command, for some reason or other, causes the device to set SERR_PHYRDY_CHG and SERR_COMM_WAKE in serror, thus triggering the handling of hotplug events. Do you have any idea what's really going on here and what can / should be done about it? Regards, Elias ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Odd behaviour of device in response to idleimmediate with unload 2008-11-04 10:31 Odd behaviour of device in response to idleimmediate with unload Elias Oltmanns @ 2008-11-04 10:40 ` Tejun Heo 2008-11-04 12:32 ` Evgeni Golov 0 siblings, 1 reply; 37+ messages in thread From: Tejun Heo @ 2008-11-04 10:40 UTC (permalink / raw) To: Elias Oltmanns; +Cc: linux-ide, Alan Cox, Evgeni Golov Elias Oltmanns wrote: > apparently, we have the first case of a quirky implementation of > idleimmediate with unload feature in a device, or I'm barking up the > wrong tree, of course. Evgeni Golov has reported the following on > hdaps-devel: Aieeeee... > Evgeni Golov <sargentd@die-welt.net> wrote: >> Hi, >> >> I have a ThinkPad Z61m with a 100GB S-ATA TOSHIBA MK1032GSX. > [...] >> Now I thought I should try the new interface and compiled 2.6.28-rc2, > [...] >> After that I could park the heads and got the following in dmesg: >> >> ata1: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0xf >> ata1: SError: { PHYRdyChg CommWake } >> ata1: hard resetting link >> ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) >> ata1.00: ACPI cmd ef/02:00:00:00:00:a0 succeeded >> ata1.00: ACPI cmd f5/00:00:00:00:00:a0 filtered out >> ata1.00: ACPI cmd ef/10:03:00:00:00:a0 filtered out >> ata1.00: ACPI cmd ef/02:00:00:00:00:a0 succeeded >> ata1.00: ACPI cmd f5/00:00:00:00:00:a0 filtered out >> ata1.00: ACPI cmd ef/10:03:00:00:00:a0 filtered out >> ata1.00: configured for UDMA/100 >> ata1.00: configured for UDMA/100 >> ata1: EH complete >> sd 0:0:0:0: [sda] 195371568 512-byte hardware sectors: (100 GB/93.1 GiB) >> sd 0:0:0:0: [sda] Write Protect is off >> sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 >> sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't >> support DPO or FUA >> sd 0:0:0:0: [sda] 195371568 512-byte hardware sectors: (100 GB/93.1 GiB) >> sd 0:0:0:0: [sda] Write Protect is off >> sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 >> sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't >> support DPO or FUA > > The only explanation I could come up with so far is that the head park > command, for some reason or other, causes the device to set > SERR_PHYRDY_CHG and SERR_COMM_WAKE in serror, thus triggering the > handling of hotplug events. Do you have any idea what's really going on > here and what can / should be done about it? Is it a laptop? Does 'hdparm -y' cause the same thing? Can you post "hdparm -I"? Thanks. -- tejun ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Odd behaviour of device in response to idleimmediate with unload 2008-11-04 10:40 ` Tejun Heo @ 2008-11-04 12:32 ` Evgeni Golov 2008-11-04 17:06 ` Mark Lord 0 siblings, 1 reply; 37+ messages in thread From: Evgeni Golov @ 2008-11-04 12:32 UTC (permalink / raw) To: Tejun Heo; +Cc: Elias Oltmanns, linux-ide, Alan Cox Hi, On Tue, 04 Nov 2008 19:40:43 +0900 Tejun Heo wrote: > Is it a laptop? Yes it is, it's a IBM/Lenovo Thinkpad Z61m (9453A11) > Does 'hdparm -y' cause the same thing? No, neither -y nor -Y cause these messages. > Can you post "hdparm -I"? Here we go: /dev/sda: ATA device, with non-removable media Model Number: TOSHIBA MK1032GSX Serial Number: MYSERIAL:) Firmware Revision: AS024E Standards: Supported: 6 5 4 Likely used: 6 Configuration: Logical max current cylinders 16383 16383 heads 16 16 sectors/track 63 63 -- CHS current addressable sectors: 16514064 LBA user addressable sectors: 195371568 LBA48 user addressable sectors: 195371568 device size with M = 1024*1024: 95396 MBytes device size with M = 1000*1000: 100030 MBytes (100 GB) Capabilities: LBA, IORDY(can be disabled) Standby timer values: spec'd by Standard, no device specific minimum R/W multiple sector transfer: Max = 16 Current = 16 Advanced power management level: 128 DMA: sdma0 sdma1 sdma2 mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 *udma5 Cycle time: min=120ns recommended=120ns PIO: pio0 pio1 pio2 pio3 pio4 Cycle time: no flow control=120ns IORDY flow control=120ns Commands/features: Enabled Supported: * SMART feature set Security Mode feature set * Power Management feature set * Write cache * Look-ahead * Host Protected Area feature set * WRITE_BUFFER command * READ_BUFFER command * NOP cmd * DOWNLOAD_MICROCODE * Advanced Power Management feature set SET_MAX security extension * 48-bit Address feature set * Device Configuration Overlay feature set * Mandatory FLUSH_CACHE * FLUSH_CACHE_EXT * SMART error logging * SMART self-test * General Purpose Logging feature set * IDLE_IMMEDIATE with UNLOAD * SATA-I signaling speed (1.5Gb/s) * Host-initiated interface power management * Phy event counters * Device-initiated interface power management * Software settings preservation Security: Master password revision code = 65534 supported not enabled not locked frozen not expired: security count not supported: enhanced erase 74min for SECURITY ERASE UNIT. Checksum: correct HTH Evgeni ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Odd behaviour of device in response to idleimmediate with unload 2008-11-04 12:32 ` Evgeni Golov @ 2008-11-04 17:06 ` Mark Lord 2008-11-04 17:18 ` Mark Lord 0 siblings, 1 reply; 37+ messages in thread From: Mark Lord @ 2008-11-04 17:06 UTC (permalink / raw) To: Evgeni Golov; +Cc: Tejun Heo, Elias Oltmanns, linux-ide, Alan Cox Evgeni Golov wrote: > Hi, > > On Tue, 04 Nov 2008 19:40:43 +0900 Tejun Heo wrote: > >> Is it a laptop? > > Yes it is, it's a IBM/Lenovo Thinkpad Z61m (9453A11) > >> Does 'hdparm -y' cause the same thing? > > No, neither -y nor -Y cause these messages. .. Well, -y does a simple ATA "IDLE IMMEDIATE" command. So, what command is being used to cause the problem?? ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Odd behaviour of device in response to idleimmediate with unload 2008-11-04 17:06 ` Mark Lord @ 2008-11-04 17:18 ` Mark Lord 2008-11-04 17:47 ` Mark Lord 0 siblings, 1 reply; 37+ messages in thread From: Mark Lord @ 2008-11-04 17:18 UTC (permalink / raw) To: Evgeni Golov; +Cc: Tejun Heo, Elias Oltmanns, linux-ide, Alan Cox Mark Lord wrote: > Evgeni Golov wrote: >> Hi, >> >> On Tue, 04 Nov 2008 19:40:43 +0900 Tejun Heo wrote: >> >>> Is it a laptop? >> >> Yes it is, it's a IBM/Lenovo Thinkpad Z61m (9453A11) >> >>> Does 'hdparm -y' cause the same thing? >> >> No, neither -y nor -Y cause these messages. > .. > > Well, -y does a simple ATA "IDLE IMMEDIATE" command. > > So, what command is being used to cause the problem?? .. Duh.. sorry about that: IDLE_IMMEDIATE_WITH_UNLOAD. I'll add that to hdparm for v9.3, eventually. Cheers ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Odd behaviour of device in response to idleimmediate with unload 2008-11-04 17:18 ` Mark Lord @ 2008-11-04 17:47 ` Mark Lord 2008-11-04 18:13 ` Mark Lord 0 siblings, 1 reply; 37+ messages in thread From: Mark Lord @ 2008-11-04 17:47 UTC (permalink / raw) To: Evgeni Golov; +Cc: Tejun Heo, Elias Oltmanns, linux-ide, Alan Cox Mark Lord wrote: > Mark Lord wrote: >> Evgeni Golov wrote: >>> Hi, >>> >>> On Tue, 04 Nov 2008 19:40:43 +0900 Tejun Heo wrote: >>> >>>> Is it a laptop? >>> >>> Yes it is, it's a IBM/Lenovo Thinkpad Z61m (9453A11) >>> >>>> Does 'hdparm -y' cause the same thing? >>> >>> No, neither -y nor -Y cause these messages. >> .. >> >> Well, -y does a simple ATA "IDLE IMMEDIATE" command. .. EEeek.. actually, no it doesn't. It issues a STANDBY IMMEDIATE. I'm releasing hdparm-9.3 in the next few minutes, with new --idle and --idle-unload flags to issue the commands in question here. Cheers ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Odd behaviour of device in response to idleimmediate with unload 2008-11-04 17:47 ` Mark Lord @ 2008-11-04 18:13 ` Mark Lord 2008-11-04 18:54 ` Evgeni Golov 0 siblings, 1 reply; 37+ messages in thread From: Mark Lord @ 2008-11-04 18:13 UTC (permalink / raw) To: Evgeni Golov; +Cc: Tejun Heo, Elias Oltmanns, linux-ide, Alan Cox Mark Lord wrote: > .. > I'm releasing hdparm-9.3 in the next few minutes, > with new --idle and --idle-unload flags to issue > the commands in question here. .. Okay, hdparm-9.3 is now out in the wild (sourceforge), and has --idle-immediate and --idle-unload flags now, so it can be used to help debug/test this problem. Cheers ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Odd behaviour of device in response to idleimmediate with unload 2008-11-04 18:13 ` Mark Lord @ 2008-11-04 18:54 ` Evgeni Golov 2008-11-04 19:39 ` Mark Lord 0 siblings, 1 reply; 37+ messages in thread From: Evgeni Golov @ 2008-11-04 18:54 UTC (permalink / raw) To: Mark Lord; +Cc: Tejun Heo, Elias Oltmanns, linux-ide, Alan Cox [-- Attachment #1: Type: text/plain, Size: 404 bytes --] On Tue, 04 Nov 2008 13:13:16 -0500 Mark Lord wrote: > Okay, hdparm-9.3 is now out in the wild (sourceforge), > and has --idle-immediate and --idle-unload flags now, > so it can be used to help debug/test this problem. Okay, got it, built it. Neither --idle-immediate nor --idle-immediate brings up the reset, echo 1000 > /sys/block/sda/device/unload_heads does. Regards Evgeni, puzzled. [-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Odd behaviour of device in response to idleimmediate with unload 2008-11-04 18:54 ` Evgeni Golov @ 2008-11-04 19:39 ` Mark Lord 2008-11-05 9:32 ` Tejun Heo 0 siblings, 1 reply; 37+ messages in thread From: Mark Lord @ 2008-11-04 19:39 UTC (permalink / raw) To: Evgeni Golov; +Cc: Tejun Heo, Elias Oltmanns, linux-ide, Alan Cox Evgeni Golov wrote: > On Tue, 04 Nov 2008 13:13:16 -0500 Mark Lord wrote: > >> Okay, hdparm-9.3 is now out in the wild (sourceforge), >> and has --idle-immediate and --idle-unload flags now, >> so it can be used to help debug/test this problem. > > Okay, got it, built it. > Neither --idle-immediate nor --idle-immediate brings up the reset, > echo 1000 > /sys/block/sda/device/unload_heads does. .. Mmmm.. okay, this is new stuff in 2.6.28, and it appears to just issue a --idle-unload equivalent after a delay. But it does it from within libata-eh, so I suppose there must be some confusion in there somewhere. So it's up to Tejun now, I suppose. Cheers ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Odd behaviour of device in response to idleimmediate with unload 2008-11-04 19:39 ` Mark Lord @ 2008-11-05 9:32 ` Tejun Heo 2008-11-05 13:47 ` Elias Oltmanns 0 siblings, 1 reply; 37+ messages in thread From: Tejun Heo @ 2008-11-05 9:32 UTC (permalink / raw) To: Mark Lord; +Cc: Evgeni Golov, Elias Oltmanns, linux-ide, Alan Cox [-- Attachment #1: Type: text/plain, Size: 760 bytes --] Mark Lord wrote: > Evgeni Golov wrote: >> On Tue, 04 Nov 2008 13:13:16 -0500 Mark Lord wrote: >> >>> Okay, hdparm-9.3 is now out in the wild (sourceforge), >>> and has --idle-immediate and --idle-unload flags now, >>> so it can be used to help debug/test this problem. >> >> Okay, got it, built it. >> Neither --idle-immediate nor --idle-immediate brings up the reset, >> echo 1000 > /sys/block/sda/device/unload_heads does. > .. > > Mmmm.. okay, this is new stuff in 2.6.28, > and it appears to just issue a --idle-unload equivalent after a delay. > > But it does it from within libata-eh, so I suppose there must > be some confusion in there somewhere. > > So it's up to Tejun now, I suppose. Hmmm... maybe garbage values in unused TF regs? -- tejun [-- Attachment #2: idleimm-dbg.patch --] [-- Type: text/x-patch, Size: 556 bytes --] diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c index 8077bdf..f0f3d11 100644 --- a/drivers/ata/libata-eh.c +++ b/drivers/ata/libata-eh.c @@ -2649,7 +2649,7 @@ static void ata_eh_park_issue_cmd(struct ata_device *dev, int park) tf.command = ATA_CMD_CHK_POWER; } - tf.flags |= ATA_TFLAG_DEVICE | ATA_TFLAG_ISADDR; + tf.flags |= ATA_TFLAG_DEVICE | ATA_TFLAG_ISADDR | ATA_TFLAG_DEVICE; tf.protocol |= ATA_PROT_NODATA; err_mask = ata_exec_internal(dev, &tf, NULL, DMA_NONE, NULL, 0, 0); if (park && (err_mask || tf.lbal != 0xc4)) { ^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: Odd behaviour of device in response to idleimmediate with unload 2008-11-05 9:32 ` Tejun Heo @ 2008-11-05 13:47 ` Elias Oltmanns 2008-11-05 14:08 ` Tejun Heo 0 siblings, 1 reply; 37+ messages in thread From: Elias Oltmanns @ 2008-11-05 13:47 UTC (permalink / raw) To: Tejun Heo; +Cc: Mark Lord, Evgeni Golov, linux-ide, Alan Cox Tejun Heo <tj@kernel.org> wrote: > Mark Lord wrote: >> Evgeni Golov wrote: > >>> On Tue, 04 Nov 2008 13:13:16 -0500 Mark Lord wrote: >>> >>>> Okay, hdparm-9.3 is now out in the wild (sourceforge), >>>> and has --idle-immediate and --idle-unload flags now, >>>> so it can be used to help debug/test this problem. >>> >>> Okay, got it, built it. >>> Neither --idle-immediate nor --idle-immediate brings up the reset, >>> echo 1000 > /sys/block/sda/device/unload_heads does. You really have tested --idle-unload as well, I suppose. [...] > Hmmm... maybe garbage values in unused TF regs? [...] > diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c > index 8077bdf..f0f3d11 100644 > --- a/drivers/ata/libata-eh.c > +++ b/drivers/ata/libata-eh.c > @@ -2649,7 +2649,7 @@ static void ata_eh_park_issue_cmd(struct ata_device *dev, int park) > tf.command = ATA_CMD_CHK_POWER; > } > > - tf.flags |= ATA_TFLAG_DEVICE | ATA_TFLAG_ISADDR; > + tf.flags |= ATA_TFLAG_DEVICE | ATA_TFLAG_ISADDR | ATA_TFLAG_DEVICE; After my embarrassing failure to parse a macro correctly a few days back, I've got to be careful ;-). But still, what good does that patch do? It doesn't really change anything, does it? As a wild guess, I'm wondering whether ata_eh_revalidate_and_attach() has anything to do with it. Unless you have a better suggestion, perhaps the following debug patch would give some useful information. Regards, Elias --- diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c index 8077bdf..26dc4d9 100644 --- a/drivers/ata/libata-eh.c +++ b/drivers/ata/libata-eh.c @@ -3020,6 +3020,7 @@ int ata_eh_recover(struct ata_port *ap, ata_prereset_fn_t prereset, struct ata_device *dev; int nr_failed_devs; int rc; + int dev_parked = 0; unsigned long flags, deadline; DPRINTK("ENTER\n"); @@ -3123,6 +3124,7 @@ int ata_eh_recover(struct ata_port *ap, ata_prereset_fn_t prereset, continue; ata_eh_park_issue_cmd(dev, 1); + dev_parked = 1; } } @@ -3135,10 +3137,17 @@ int ata_eh_recover(struct ata_port *ap, ata_prereset_fn_t prereset, } while (deadline); ata_port_for_each_link(link, ap) { ata_link_for_each_dev(dev, link) { + u32 serror; + if (!(link->eh_context.unloaded_mask & (1 << dev->devno))) continue; + sata_scr_read(link, SCR_ERROR, &serror); + ata_dev_printk(dev, KERN_INFO, + "SError: 0x%x, hotplug: %d\n", + serror, + ap->pflags & ATA_PFLAG_SCSI_HOTPLUG ? 1 : 0); ata_eh_park_issue_cmd(dev, 0); ata_eh_done(link, dev, ATA_EH_PARK); } @@ -3203,6 +3212,9 @@ dev_fail: } } + if (dev_parked) + ata_port_printk(ap, KERN_INFO, "hotplug: %d, nr_failed: %d\n", + ap->pflags & ATA_PFLAG_SCSI_HOTPLUG ? 1 : 0, nr_failed_devs); if (nr_failed_devs) goto retry; ^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: Odd behaviour of device in response to idleimmediate with unload 2008-11-05 13:47 ` Elias Oltmanns @ 2008-11-05 14:08 ` Tejun Heo 2008-11-05 18:55 ` Elias Oltmanns ` (2 more replies) 0 siblings, 3 replies; 37+ messages in thread From: Tejun Heo @ 2008-11-05 14:08 UTC (permalink / raw) To: Elias Oltmanns; +Cc: Mark Lord, Evgeni Golov, linux-ide, Alan Cox Elias Oltmanns wrote: >> - tf.flags |= ATA_TFLAG_DEVICE | ATA_TFLAG_ISADDR; >> + tf.flags |= ATA_TFLAG_DEVICE | ATA_TFLAG_ISADDR | ATA_TFLAG_DEVICE; > > After my embarrassing failure to parse a macro correctly a few days > back, I've got to be careful ;-). But still, what good does that patch > do? It doesn't really change anything, does it? Hehheh, I suppose it's my turn for embarrassment. > As a wild guess, I'm wondering whether ata_eh_revalidate_and_attach() > has anything to do with it. Unless you have a better suggestion, perhaps > the following debug patch would give some useful information. I don't have much idea at this point. To the drive, it shouldn't look any different. Ah... it's ata_piix, right? ata_piix doesn't have PHY event IRQ, so it could be that the command issued by hdparm did trigger PHY event but didn't get noticed by EH while the condition triggered by IDLE IMMEDIATE did. One way to find out would be adding SCR print outs on command completion. Thanks. -- tejun ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Odd behaviour of device in response to idleimmediate with unload 2008-11-05 14:08 ` Tejun Heo @ 2008-11-05 18:55 ` Elias Oltmanns 2008-11-06 11:23 ` Evgeni Golov 2008-11-05 19:34 ` Evgeni Golov 2008-11-06 11:41 ` Elias Oltmanns 2 siblings, 1 reply; 37+ messages in thread From: Elias Oltmanns @ 2008-11-05 18:55 UTC (permalink / raw) To: Tejun Heo; +Cc: Mark Lord, Evgeni Golov, linux-ide, Alan Cox Tejun Heo <tj@kernel.org> wrote: > Elias Oltmanns wrote: [...] >> As a wild guess, I'm wondering whether ata_eh_revalidate_and_attach() >> has anything to do with it. Unless you have a better suggestion, perhaps >> the following debug patch would give some useful information. > > I don't have much idea at this point. To the drive, it shouldn't look > any different. Ah... it's ata_piix, right? ata_piix doesn't have PHY > event IRQ, so it could be that the command issued by hdparm did trigger > PHY event but didn't get noticed by EH while the condition triggered by > IDLE IMMEDIATE did. One way to find out would be adding SCR print outs > on command completion. Right. Evgeni, could you please apply the following patch (2.6.28-rc3 or thereabouts) and do the following in that order? Please report what has been logged in dmesg: # hdparm --idle-immediate /dev/sda # hdparm --idle-unload /dev/sda # hdparm --idle-immediate /dev/sda # echo 1000 > /sys/block/sda/device/unload_heads # hdparm --idle-immediate /dev/sda The last one is only meant to verify that it behaves like the first. Regards, Elias --- diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c index 622350d..4948433 100644 --- a/drivers/ata/libata-core.c +++ b/drivers/ata/libata-core.c @@ -4683,6 +4683,8 @@ void ata_qc_complete(struct ata_queued_cmd *qc) if (ap->ops->error_handler) { struct ata_device *dev = qc->dev; struct ata_eh_info *ehi = &dev->link->eh_info; + u32 serror; + int rc; WARN_ON(ap->pflags & ATA_PFLAG_FROZEN); @@ -4721,6 +4723,16 @@ void ata_qc_complete(struct ata_queued_cmd *qc) case ATA_CMD_SLEEP: dev->flags |= ATA_DFLAG_SLEEPING; break; + + case ATA_CMD_IDLEIMMEDIATE: + rc = sata_scr_read(dev->link, SCR_ERROR, &serror); + if (!rc) + ata_dev_printk(dev, KERN_INFO, + "SError: 0x%x\n", serror); + else + ata_dev_printk(dev, KERN_INFO, + "Couldn't read SError: %d\n", rc); + break; } if (unlikely(dev->flags & ATA_DFLAG_DUBIOUS_XFER)) ^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: Odd behaviour of device in response to idleimmediate with unload 2008-11-05 18:55 ` Elias Oltmanns @ 2008-11-06 11:23 ` Evgeni Golov 2008-11-06 12:12 ` Elias Oltmanns 0 siblings, 1 reply; 37+ messages in thread From: Evgeni Golov @ 2008-11-06 11:23 UTC (permalink / raw) To: Elias Oltmanns; +Cc: Tejun Heo, Mark Lord, linux-ide, Alan Cox On Wed, 05 Nov 2008 19:55:44 +0100 Elias Oltmanns wrote: > # hdparm --idle-immediate /dev/sda > # hdparm --idle-unload /dev/sda > # hdparm --idle-immediate /dev/sda > # echo 1000 > /sys/block/sda/device/unload_heads > # hdparm --idle-immediate /dev/sda Here come my test results: shinkupaddo# ./hdparm --idle-immediate /dev/sda /dev/sda: issuing idle_immediate command shinkupaddo# dmesg -c ata1.00: SError: 0x0 shinkupaddo# ./hdparm --idle-unload /dev/sda /dev/sda: issuing idle_immediate_unload command shinkupaddo# dmesg -c ata1.00: SError: 0x0 shinkupaddo# ./hdparm --idle-immediate /dev/sda /dev/sda: issuing idle_immediate command shinkupaddo# dmesg -c ata1.00: SError: 0x0 shinkupaddo# echo 1000 > /sys/block/sda/device/unload_heads shinkupaddo# dmesg -c ata1: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0xf ata1: SError: { PHYRdyChg CommWake } ata1: hard resetting link ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata1.00: SError: 0x0 ata1.00: ACPI cmd ef/02:00:00:00:00:a0 succeeded ata1.00: ACPI cmd f5/00:00:00:00:00:a0 filtered out ata1.00: ACPI cmd ef/10:03:00:00:00:a0 filtered out ata1.00: ACPI cmd ef/02:00:00:00:00:a0 succeeded ata1.00: ACPI cmd f5/00:00:00:00:00:a0 filtered out ata1.00: ACPI cmd ef/10:03:00:00:00:a0 filtered out ata1.00: configured for UDMA/100 ata1.00: configured for UDMA/100 ata1: EH complete sd 0:0:0:0: [sda] 195371568 512-byte hardware sectors: (100 GB/93.1 GiB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 0:0:0:0: [sda] 195371568 512-byte hardware sectors: (100 GB/93.1 GiB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA shinkupaddo# ./hdparm --idle-immediate /dev/sda /dev/sda: issuing idle_immediate command shinkupaddo# dmesg -c ata1.00: SError: 0x0 Any additional shinkupaddo# echo 1000 > /sys/block/sda/device/unload_heads Gives only ata1.00: SError: 0x0 Regards Evgeni ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Odd behaviour of device in response to idleimmediate with unload 2008-11-06 11:23 ` Evgeni Golov @ 2008-11-06 12:12 ` Elias Oltmanns 0 siblings, 0 replies; 37+ messages in thread From: Elias Oltmanns @ 2008-11-06 12:12 UTC (permalink / raw) To: Evgeni Golov; +Cc: Tejun Heo, Mark Lord, linux-ide, Alan Cox Evgeni Golov <sargentd@die-welt.net> wrote: > On Wed, 05 Nov 2008 19:55:44 +0100 Elias Oltmanns wrote: > >> # hdparm --idle-immediate /dev/sda >> # hdparm --idle-unload /dev/sda >> # hdparm --idle-immediate /dev/sda >> # echo 1000 > /sys/block/sda/device/unload_heads >> # hdparm --idle-immediate /dev/sda > > Here come my test results: > [...] > shinkupaddo# ./hdparm --idle-unload /dev/sda > > /dev/sda: > issuing idle_immediate_unload command > shinkupaddo# dmesg -c > ata1.00: SError: 0x0 [...] > shinkupaddo# echo 1000 > /sys/block/sda/device/unload_heads > shinkupaddo# dmesg -c > ata1: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0xf > ata1: SError: { PHYRdyChg CommWake } > ata1: hard resetting link > ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) > ata1.00: SError: 0x0 > ata1.00: ACPI cmd ef/02:00:00:00:00:a0 succeeded > ata1.00: ACPI cmd f5/00:00:00:00:00:a0 filtered out > ata1.00: ACPI cmd ef/10:03:00:00:00:a0 filtered out > ata1.00: ACPI cmd ef/02:00:00:00:00:a0 succeeded > ata1.00: ACPI cmd f5/00:00:00:00:00:a0 filtered out > ata1.00: ACPI cmd ef/10:03:00:00:00:a0 filtered out > ata1.00: configured for UDMA/100 > ata1.00: configured for UDMA/100 > ata1: EH complete > sd 0:0:0:0: [sda] 195371568 512-byte hardware sectors: (100 GB/93.1 GiB) > sd 0:0:0:0: [sda] Write Protect is off > sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 > sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA > sd 0:0:0:0: [sda] 195371568 512-byte hardware sectors: (100 GB/93.1 GiB) > sd 0:0:0:0: [sda] Write Protect is off > sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 > sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [...] > Any additional > shinkupaddo# echo 1000 > /sys/block/sda/device/unload_heads > > Gives only > ata1.00: SError: 0x0 Now there's an interesting thing. Even though these results don't exactly confirm my theory I posted just now, they certainly do throw some light on the matter. It looks to me as if the hard reset is not at all related to head parking, after all. It's just that echo 1000 > /sys/.../unload_heads triggers EH and during link autopsy some stale phyrdy | commwake bits in SError are discovered and acted upon. As to how those bits got set in the first place, I suspect that some other action in a previous EH cycle triggered a phyrdy event that didn't get cleared because event notification was disabled (see my earlier post). I don't think this is expected behaviour and in particular I have no idea, right now, what actually may have triggered this event, Tejun? Regards, Elias ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Odd behaviour of device in response to idleimmediate with unload 2008-11-05 14:08 ` Tejun Heo 2008-11-05 18:55 ` Elias Oltmanns @ 2008-11-05 19:34 ` Evgeni Golov 2008-11-06 11:41 ` Elias Oltmanns 2 siblings, 0 replies; 37+ messages in thread From: Evgeni Golov @ 2008-11-05 19:34 UTC (permalink / raw) To: Tejun Heo; +Cc: Elias Oltmanns, Mark Lord, linux-ide, Alan Cox On Wed, 05 Nov 2008 23:08:42 +0900 Tejun Heo wrote: > Ah... it's ata_piix, right? It's ahci (Intel Corporation 82801GBM/GHM (ICH7 Family) SATA AHCI Controller (rev 02)). Regards ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Odd behaviour of device in response to idleimmediate with unload 2008-11-05 14:08 ` Tejun Heo 2008-11-05 18:55 ` Elias Oltmanns 2008-11-05 19:34 ` Evgeni Golov @ 2008-11-06 11:41 ` Elias Oltmanns 2008-11-07 4:08 ` Tejun Heo 2 siblings, 1 reply; 37+ messages in thread From: Elias Oltmanns @ 2008-11-06 11:41 UTC (permalink / raw) To: Tejun Heo; +Cc: Mark Lord, Evgeni Golov, linux-ide, Alan Cox Tejun Heo <tj@kernel.org> wrote: > Elias Oltmanns wrote: [...] >> As a wild guess, I'm wondering whether ata_eh_revalidate_and_attach() >> has anything to do with it. Unless you have a better suggestion, perhaps >> the following debug patch would give some useful information. > > I don't have much idea at this point. To the drive, it shouldn't look > any different. Ah... it's ata_piix, right? ata_piix doesn't have PHY > event IRQ, so it could be that the command issued by hdparm did trigger > PHY event but didn't get noticed by EH while the condition triggered by > IDLE IMMEDIATE did. One way to find out would be adding SCR print outs > on command completion. Actually, event notification is turned off during error recovery for ahci as well. Additionally, we have the following in the interrupt handler of ahci.c: /* If we are getting PhyRdy, this is * just a power state change, we should * clear out this, plus the PhyRdy/Comm * Wake bits from Serror */ if ((hpriv->flags & AHCI_HFLAG_NO_HOTPLUG) && (status & PORT_IRQ_PHYRDY)) { status &= ~PORT_IRQ_PHYRDY; ahci_scr_write(&ap->link, SCR_ERROR, ((1 << 16) | (1 << 18))); } This suggests to me that hdparm --idle-unload does indeed trigger a phy event, but the interrupt handler clears SError. Issuing the unload command in EH, on the other hand, does not result in a phy event because event notification is disabled. That way, phyrdy and commwake don't get cleared in SError in will indicate a hotplug event next time SError is checked. Does that make sense? If so, what's to be done about it? Regards, Elias ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Odd behaviour of device in response to idleimmediate with unload 2008-11-06 11:41 ` Elias Oltmanns @ 2008-11-07 4:08 ` Tejun Heo 2008-11-07 7:48 ` Evgeni Golov 0 siblings, 1 reply; 37+ messages in thread From: Tejun Heo @ 2008-11-07 4:08 UTC (permalink / raw) To: Elias Oltmanns; +Cc: Mark Lord, Evgeni Golov, linux-ide, Alan Cox Elias Oltmanns wrote: > Tejun Heo <tj@kernel.org> wrote: >> Elias Oltmanns wrote: > [...] >>> As a wild guess, I'm wondering whether ata_eh_revalidate_and_attach() >>> has anything to do with it. Unless you have a better suggestion, perhaps >>> the following debug patch would give some useful information. >> I don't have much idea at this point. To the drive, it shouldn't look >> any different. Ah... it's ata_piix, right? ata_piix doesn't have PHY >> event IRQ, so it could be that the command issued by hdparm did trigger >> PHY event but didn't get noticed by EH while the condition triggered by >> IDLE IMMEDIATE did. One way to find out would be adding SCR print outs >> on command completion. > > Actually, event notification is turned off during error recovery for > ahci as well. Yeap, but SError is checked and cleared after event notification is turned back on, so events shouldn't leak. > Additionally, we have the following in the interrupt > handler of ahci.c: > > /* If we are getting PhyRdy, this is > * just a power state change, we should > * clear out this, plus the PhyRdy/Comm > * Wake bits from Serror > */ > if ((hpriv->flags & AHCI_HFLAG_NO_HOTPLUG) && > (status & PORT_IRQ_PHYRDY)) { > status &= ~PORT_IRQ_PHYRDY; > ahci_scr_write(&ap->link, SCR_ERROR, ((1 << 16) | (1 << 18))); > } > > This suggests to me that hdparm --idle-unload does indeed trigger a phy > event, but the interrupt handler clears SError. Issuing the unload > command in EH, on the other hand, does not result in a phy event because > event notification is disabled. That way, phyrdy and commwake don't get > cleared in SError in will indicate a hotplug event next time SError is > checked. Does that make sense? If so, what's to be done about it? Hmmm... if ALPM is enabled, it could explain all. Enabling ALPM does inhibit event notifications but it doesn't prevent autopsy from interpreting SError as if ALPM is not enabled. Evgeni, is ALPM enabled? Thanks. -- tejun ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Odd behaviour of device in response to idleimmediate with unload 2008-11-07 4:08 ` Tejun Heo @ 2008-11-07 7:48 ` Evgeni Golov 2008-11-10 9:00 ` Tejun Heo 0 siblings, 1 reply; 37+ messages in thread From: Evgeni Golov @ 2008-11-07 7:48 UTC (permalink / raw) To: Tejun Heo; +Cc: Elias Oltmanns, Mark Lord, linux-ide, Alan Cox On Fri, Nov 07, 2008 at 01:08:29PM +0900, Tejun Heo wrote: > > This suggests to me that hdparm --idle-unload does indeed trigger a phy > > event, but the interrupt handler clears SError. Issuing the unload > > command in EH, on the other hand, does not result in a phy event because > > event notification is disabled. That way, phyrdy and commwake don't get > > cleared in SError in will indicate a hotplug event next time SError is > > checked. Does that make sense? If so, what's to be done about it? > > Hmmm... if ALPM is enabled, it could explain all. Enabling ALPM does > inhibit event notifications but it doesn't prevent autopsy from > interpreting SError as if ALPM is not enabled. > > Evgeni, is ALPM enabled? You mean Aggressive Link Power Management? As patches from here: http://www.kernel.org/pub/linux/kernel/people/kristen/patches/SATA/alpm/ ? Unless they got merged into 2.6.27 and autoenabled, no, I dont use ALPM :) Regards Evgeni ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Odd behaviour of device in response to idleimmediate with unload 2008-11-07 7:48 ` Evgeni Golov @ 2008-11-10 9:00 ` Tejun Heo 2008-11-10 10:26 ` Evgeni Golov 0 siblings, 1 reply; 37+ messages in thread From: Tejun Heo @ 2008-11-10 9:00 UTC (permalink / raw) To: Evgeni Golov; +Cc: Elias Oltmanns, Mark Lord, linux-ide, Alan Cox Evgeni Golov wrote: > On Fri, Nov 07, 2008 at 01:08:29PM +0900, Tejun Heo wrote: >>> This suggests to me that hdparm --idle-unload does indeed trigger a phy >>> event, but the interrupt handler clears SError. Issuing the unload >>> command in EH, on the other hand, does not result in a phy event because >>> event notification is disabled. That way, phyrdy and commwake don't get >>> cleared in SError in will indicate a hotplug event next time SError is >>> checked. Does that make sense? If so, what's to be done about it? >> Hmmm... if ALPM is enabled, it could explain all. Enabling ALPM does >> inhibit event notifications but it doesn't prevent autopsy from >> interpreting SError as if ALPM is not enabled. >> >> Evgeni, is ALPM enabled? > > You mean Aggressive Link Power Management? As patches from here: > http://www.kernel.org/pub/linux/kernel/people/kristen/patches/SATA/alpm/ > ? > Unless they got merged into 2.6.27 and autoenabled, no, I dont use ALPM :) Heh... I'm fresh out of ideas. Somehow SError is being set on the initial EH head unload. In general, what EH is doing is harmless in itself but EH reset can delay head unloading defeating the purpose of unloading. Is the phy event before or after head unloading? Thanks. -- tejun ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Odd behaviour of device in response to idleimmediate with unload 2008-11-10 9:00 ` Tejun Heo @ 2008-11-10 10:26 ` Evgeni Golov 2008-11-10 11:35 ` Elias Oltmanns 0 siblings, 1 reply; 37+ messages in thread From: Evgeni Golov @ 2008-11-10 10:26 UTC (permalink / raw) To: Tejun Heo; +Cc: Elias Oltmanns, Mark Lord, linux-ide, Alan Cox On Mon, 10 Nov 2008 18:00:18 +0900 Tejun Heo wrote: > Is the phy event before or after head unloading? How do I check this? Regards ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Odd behaviour of device in response to idleimmediate with unload 2008-11-10 10:26 ` Evgeni Golov @ 2008-11-10 11:35 ` Elias Oltmanns 2008-11-13 11:33 ` Evgeni Golov 0 siblings, 1 reply; 37+ messages in thread From: Elias Oltmanns @ 2008-11-10 11:35 UTC (permalink / raw) To: Evgeni Golov; +Cc: Tejun Heo, Mark Lord, linux-ide, Alan Cox Evgeni Golov <sargentd@die-welt.net> wrote: > On Mon, 10 Nov 2008 18:00:18 +0900 Tejun Heo wrote: > >> Is the phy event before or after head unloading? > > How do I check this? # dmesg # echo 3000 > /sys/block/sda/device/unload_heads & # dmesg The first call to dmesg is only to make sure that it is in the cache so the second won't be blocked by the park request. Regards, Elias ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Odd behaviour of device in response to idleimmediate with unload 2008-11-10 11:35 ` Elias Oltmanns @ 2008-11-13 11:33 ` Evgeni Golov 2008-11-13 12:29 ` Elias Oltmanns 2008-11-16 9:39 ` Tejun Heo 0 siblings, 2 replies; 37+ messages in thread From: Evgeni Golov @ 2008-11-13 11:33 UTC (permalink / raw) To: Elias Oltmanns; +Cc: Tejun Heo, Mark Lord, linux-ide, Alan Cox [-- Attachment #1: Type: text/plain, Size: 769 bytes --] On Mon, 10 Nov 2008 12:35:20 +0100 Elias Oltmanns wrote: > Evgeni Golov <sargentd@die-welt.net> wrote: > > On Mon, 10 Nov 2008 18:00:18 +0900 Tejun Heo wrote: > > > >> Is the phy event before or after head unloading? > > > > How do I check this? > > # dmesg > # echo 3000 > /sys/block/sda/device/unload_heads & > # dmesg > > The first call to dmesg is only to make sure that it is in the cache so > the second won't be blocked by the park request. Okay, I get the following as soon I issue unload: ata1: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0xf ata1: SError: { PHYRdyChg CommWake } ata1: hard resetting link ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata1.00: SError: 0x0 The rest comes a bit later. HTH Evgeni [-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Odd behaviour of device in response to idleimmediate with unload 2008-11-13 11:33 ` Evgeni Golov @ 2008-11-13 12:29 ` Elias Oltmanns 2008-11-16 9:39 ` Tejun Heo 1 sibling, 0 replies; 37+ messages in thread From: Elias Oltmanns @ 2008-11-13 12:29 UTC (permalink / raw) To: Evgeni Golov; +Cc: Tejun Heo, Mark Lord, linux-ide, Alan Cox Evgeni Golov <sargentd@die-welt.net> wrote: > On Mon, 10 Nov 2008 12:35:20 +0100 Elias Oltmanns wrote: > >> Evgeni Golov <sargentd@die-welt.net> wrote: >> > On Mon, 10 Nov 2008 18:00:18 +0900 Tejun Heo wrote: >> > >> >> Is the phy event before or after head unloading? >> > >> > How do I check this? >> >> # dmesg >> # echo 3000 > /sys/block/sda/device/unload_heads & >> # dmesg >> >> The first call to dmesg is only to make sure that it is in the cache so >> the second won't be blocked by the park request. > > Okay, I get the following as soon I issue unload: > ata1: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0xf > ata1: SError: { PHYRdyChg CommWake } > ata1: hard resetting link > ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) > ata1.00: SError: 0x0 > > The rest comes a bit later. Right, that settles it: the reset sequence is initiated even before the unload command is issued for the first time. This means that head parking is not part of the picture except for the fact that it provides the means to initiate EH from userspace and makes the problem easily reproducible. On the other hand, it remains to be a mystery to me what actually sets those bits in SError in the first place without event notification taking care of it. I'll have to think about this for a while. Perhaps Tejun has another idea? Regards, Elias ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Odd behaviour of device in response to idleimmediate with unload 2008-11-13 11:33 ` Evgeni Golov 2008-11-13 12:29 ` Elias Oltmanns @ 2008-11-16 9:39 ` Tejun Heo 2008-11-17 7:15 ` Evgeni Golov 1 sibling, 1 reply; 37+ messages in thread From: Tejun Heo @ 2008-11-16 9:39 UTC (permalink / raw) To: Evgeni Golov; +Cc: Elias Oltmanns, Mark Lord, linux-ide, Alan Cox Hello, Evgeni Golov wrote: >> The first call to dmesg is only to make sure that it is in the cache so >> the second won't be blocked by the park request. > > Okay, I get the following as soon I issue unload: > ata1: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0xf > ata1: SError: { PHYRdyChg CommWake } > ata1: hard resetting link > ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) > ata1.00: SError: 0x0 > > The rest comes a bit later. So, this comes before actually issuing the park. It sounds like it has nothing to do with park command itself. If you do "echo - - - > /sys/class/scsi_host/host0/scan" right after boot, what does the kernel say? Thanks. -- tejun ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Odd behaviour of device in response to idleimmediate with unload 2008-11-16 9:39 ` Tejun Heo @ 2008-11-17 7:15 ` Evgeni Golov 2008-11-17 7:19 ` Tejun Heo 0 siblings, 1 reply; 37+ messages in thread From: Evgeni Golov @ 2008-11-17 7:15 UTC (permalink / raw) To: Tejun Heo; +Cc: Elias Oltmanns, Mark Lord, linux-ide, Alan Cox On Sun, 16 Nov 2008 18:39:08 +0900 Tejun Heo wrote: > So, this comes before actually issuing the park. It sounds like it has > nothing to do with park command itself. If you do "echo - - - > > /sys/class/scsi_host/host0/scan" right after boot, what does the kernel say? # echo - - - > /sys/class/scsi_host/host0/scan echo: write error: invalid argument ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Odd behaviour of device in response to idleimmediate with unload 2008-11-17 7:15 ` Evgeni Golov @ 2008-11-17 7:19 ` Tejun Heo 2008-11-17 7:48 ` Evgeni Golov 0 siblings, 1 reply; 37+ messages in thread From: Tejun Heo @ 2008-11-17 7:19 UTC (permalink / raw) To: Evgeni Golov; +Cc: Elias Oltmanns, Mark Lord, linux-ide, Alan Cox Evgeni Golov wrote: > On Sun, 16 Nov 2008 18:39:08 +0900 Tejun Heo wrote: > >> So, this comes before actually issuing the park. It sounds like it has >> nothing to do with park command itself. If you do "echo - - - > >> /sys/class/scsi_host/host0/scan" right after boot, what does the kernel say? > > # echo - - - > /sys/class/scsi_host/host0/scan > echo: write error: invalid argument Eh... strange. It works perfectly fine here. Can you play with quoting and -n? -- tejun ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Odd behaviour of device in response to idleimmediate with unload 2008-11-17 7:19 ` Tejun Heo @ 2008-11-17 7:48 ` Evgeni Golov 2008-11-18 1:22 ` Tejun Heo 0 siblings, 1 reply; 37+ messages in thread From: Evgeni Golov @ 2008-11-17 7:48 UTC (permalink / raw) To: Tejun Heo; +Cc: Elias Oltmanns, Mark Lord, linux-ide, Alan Cox On Mon, 17 Nov 2008 16:19:59 +0900 Tejun Heo wrote: > Evgeni Golov wrote: > > On Sun, 16 Nov 2008 18:39:08 +0900 Tejun Heo wrote: > > > >> So, this comes before actually issuing the park. It sounds like it has > >> nothing to do with park command itself. If you do "echo - - - > > >> /sys/class/scsi_host/host0/scan" right after boot, what does the kernel say? > > > > # echo - - - > /sys/class/scsi_host/host0/scan > > echo: write error: invalid argument > > Eh... strange. It works perfectly fine here. Can you play with quoting > and -n? Heh, did try -n but no quoting, damn - :) Get the same reset too: shinkupaddo# dmesg ata1: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0xf ata1: SError: { PHYRdyChg CommWake } ata1: hard resetting link ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata1.00: ACPI cmd ef/02:00:00:00:00:a0 succeeded ata1.00: ACPI cmd f5/00:00:00:00:00:a0 filtered out ata1.00: ACPI cmd ef/10:03:00:00:00:a0 filtered out ata1.00: ACPI cmd ef/02:00:00:00:00:a0 succeeded ata1.00: ACPI cmd f5/00:00:00:00:00:a0 filtered out ata1.00: ACPI cmd ef/10:03:00:00:00:a0 filtered out ata1.00: configured for UDMA/100 ata1.00: configured for UDMA/100 ata1: EH complete sd 0:0:0:0: [sda] 195371568 512-byte hardware sectors: (100 GB/93.1 GiB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 0:0:0:0: [sda] 195371568 512-byte hardware sectors: (100 GB/93.1 GiB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Btw, unload heads isn't even enabled at the moment... Regards ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Odd behaviour of device in response to idleimmediate with unload 2008-11-17 7:48 ` Evgeni Golov @ 2008-11-18 1:22 ` Tejun Heo 2008-11-18 7:37 ` Evgeni Golov 0 siblings, 1 reply; 37+ messages in thread From: Tejun Heo @ 2008-11-18 1:22 UTC (permalink / raw) To: Evgeni Golov; +Cc: Elias Oltmanns, Mark Lord, linux-ide, Alan Cox Evgeni Golov wrote: > On Mon, 17 Nov 2008 16:19:59 +0900 Tejun Heo wrote: > >> Evgeni Golov wrote: >>> On Sun, 16 Nov 2008 18:39:08 +0900 Tejun Heo wrote: >>> >>>> So, this comes before actually issuing the park. It sounds like it has >>>> nothing to do with park command itself. If you do "echo - - - > >>>> /sys/class/scsi_host/host0/scan" right after boot, what does the kernel say? >>> # echo - - - > /sys/class/scsi_host/host0/scan >>> echo: write error: invalid argument >> Eh... strange. It works perfectly fine here. Can you play with quoting >> and -n? > > Heh, did try -n but no quoting, damn - :) > Get the same reset too: > > shinkupaddo# dmesg > ata1: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0xf > ata1: SError: { PHYRdyChg CommWake } Heh... you're not supposed to see these events here, so the PHY event and resetting don't have anything to do with head unloading per se. It's just discovering previously set SError values. The question is when did they get set and why didn't ahci catch it. libata EH enables enable reporting and then clear SError before finishing up so there shouldn't be any window where events can get lost. Can you please attach lspci -nn result and full boot log? Thanks. -- tejun ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Odd behaviour of device in response to idleimmediate with unload 2008-11-18 1:22 ` Tejun Heo @ 2008-11-18 7:37 ` Evgeni Golov 2008-11-21 6:41 ` Tejun Heo 0 siblings, 1 reply; 37+ messages in thread From: Evgeni Golov @ 2008-11-18 7:37 UTC (permalink / raw) To: Tejun Heo; +Cc: Elias Oltmanns, Mark Lord, linux-ide, Alan Cox [-- Attachment #1: Type: text/plain, Size: 138 bytes --] On Tue, 18 Nov 2008 10:22:25 +0900 Tejun Heo wrote: > Can you please attach lspci -nn result and full boot log? Done :) Regards Evgeni [-- Attachment #2: lspci.gz --] [-- Type: application/octet-stream, Size: 794 bytes --] [-- Attachment #3: dmesg.gz --] [-- Type: application/octet-stream, Size: 5146 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Odd behaviour of device in response to idleimmediate with unload 2008-11-18 7:37 ` Evgeni Golov @ 2008-11-21 6:41 ` Tejun Heo 2008-11-21 19:40 ` Evgeni Golov 0 siblings, 1 reply; 37+ messages in thread From: Tejun Heo @ 2008-11-21 6:41 UTC (permalink / raw) To: Evgeni Golov; +Cc: Elias Oltmanns, Mark Lord, linux-ide, Alan Cox [-- Attachment #1: Type: text/plain, Size: 281 bytes --] Evgeni Golov wrote: > On Tue, 18 Nov 2008 10:22:25 +0900 Tejun Heo wrote: > >> Can you please attach lspci -nn result and full boot log? > > Done :) Something strange is going on there. Can you please apply the attached patch and post resulting kernel log? Thanks. -- tejun [-- Attachment #2: ahci-debug.patch --] [-- Type: text/x-patch, Size: 2604 bytes --] diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c index a67b8e7..8500fd5 100644 --- a/drivers/ata/ahci.c +++ b/drivers/ata/ahci.c @@ -2006,6 +2006,8 @@ static void ahci_error_intr(struct ata_port *ap, u32 irq_stat) } if (irq_stat & (PORT_IRQ_CONNECT | PORT_IRQ_PHYRDY)) { + ata_port_printk(ap, KERN_INFO, "XXX PHY event 0x%x\n", + irq_stat); ata_ehi_hotplugged(host_ehi); ata_ehi_push_desc(host_ehi, "%s", irq_stat & PORT_IRQ_CONNECT ? @@ -2044,6 +2046,7 @@ static void ahci_port_intr(struct ata_port *ap) */ if ((hpriv->flags & AHCI_HFLAG_NO_HOTPLUG) && (status & PORT_IRQ_PHYRDY)) { + ata_port_printk(ap, KERN_INFO, "XXX PHY event clearing 0x%x\n", status); status &= ~PORT_IRQ_PHYRDY; ahci_scr_write(&ap->link, SCR_ERROR, ((1 << 16) | (1 << 18))); } diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c index 32da9a9..24b52aa 100644 --- a/drivers/ata/libata-eh.c +++ b/drivers/ata/libata-eh.c @@ -2380,13 +2380,15 @@ int ata_eh_reset(struct ata_link *link, int classify, /* * Perform reset */ - if (ata_is_host_link(link)) + if (ata_is_host_link(link)) { + ata_port_printk(ap, KERN_INFO, "XXX FREEZING\n"); ata_eh_freeze_port(ap); + } deadline = ata_deadline(jiffies, ata_eh_reset_timeouts[try++]); if (reset) { - if (verbose) + //if (verbose) ata_link_printk(link, KERN_INFO, "%s resetting link\n", reset == softreset ? "soft" : "hard"); @@ -2440,6 +2442,7 @@ int ata_eh_reset(struct ata_link *link, int classify, goto fail; } + ata_port_printk(ap, KERN_INFO, "XXX follow-up SRST\n"); ata_eh_about_to_do(link, NULL, ATA_EH_RESET); rc = ata_do_reset(link, reset, classes, deadline, true); } @@ -2479,8 +2482,10 @@ int ata_eh_reset(struct ata_link *link, int classify, slave->sata_spd = (sstatus >> 4) & 0xf; /* thaw the port */ - if (ata_is_host_link(link)) + if (ata_is_host_link(link)) { + ata_port_printk(ap, KERN_INFO, "XXX THAW\n"); ata_eh_thaw_port(ap); + } /* postreset() should clear hardware SError. Although SError * is cleared during link resume, clearing SError here is @@ -2490,9 +2495,17 @@ int ata_eh_reset(struct ata_link *link, int classify, * link onlineness and classification result later. */ if (postreset) { + ata_port_printk(ap, KERN_INFO, "XXX postreset start\n"); postreset(link, classes); if (slave) postreset(slave, classes); + { + u32 serror = 0; + sata_scr_read(link, SCR_ERROR, &serror); + ata_port_printk(ap, KERN_INFO, + "XXX postreset end, SError=0x%x\n", + serror); + } } /* clear cached SError */ ^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: Odd behaviour of device in response to idleimmediate with unload 2008-11-21 6:41 ` Tejun Heo @ 2008-11-21 19:40 ` Evgeni Golov 2008-11-22 8:22 ` Tejun Heo 0 siblings, 1 reply; 37+ messages in thread From: Evgeni Golov @ 2008-11-21 19:40 UTC (permalink / raw) To: Tejun Heo; +Cc: Elias Oltmanns, Mark Lord, linux-ide, Alan Cox [-- Attachment #1.1: Type: text/plain, Size: 250 bytes --] On Fri, 21 Nov 2008 15:41:22 +0900 Tejun Heo wrote: > Something strange is going on there. Can you please apply the attached > patch and post resulting kernel log? Wow, that one is spammy :) See relevant part of /var/log/messages attached. [-- Attachment #1.2: log-2.6.28-rc6.gz --] [-- Type: application/octet-stream, Size: 5185 bytes --] [-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Odd behaviour of device in response to idleimmediate with unload 2008-11-21 19:40 ` Evgeni Golov @ 2008-11-22 8:22 ` Tejun Heo 2008-11-22 9:51 ` Evgeni Golov ` (2 more replies) 0 siblings, 3 replies; 37+ messages in thread From: Tejun Heo @ 2008-11-22 8:22 UTC (permalink / raw) To: Evgeni Golov; +Cc: Elias Oltmanns, Mark Lord, linux-ide, Alan Cox Evgeni Golov wrote: > On Fri, 21 Nov 2008 15:41:22 +0900 Tejun Heo wrote: > >> Something strange is going on there. Can you please apply the attached >> patch and post resulting kernel log? > > Wow, that one is spammy :) > See relevant part of /var/log/messages attached. Eh... you have ALPM enabled. What does "cat /sys/block/sda/device/link_power_management_policy" tell you? -- tejun ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Odd behaviour of device in response to idleimmediate with unload 2008-11-22 8:22 ` Tejun Heo @ 2008-11-22 9:51 ` Evgeni Golov 2008-11-22 9:58 ` Evgeni Golov 2008-11-23 18:09 ` Elias Oltmanns 2 siblings, 0 replies; 37+ messages in thread From: Evgeni Golov @ 2008-11-22 9:51 UTC (permalink / raw) To: Tejun Heo; +Cc: Elias Oltmanns, Mark Lord, linux-ide, Alan Cox On Sat, 22 Nov 2008 17:22:39 +0900 Tejun Heo wrote: > Eh... you have ALPM enabled. Mh, when did I enable that!? > What does "cat > /sys/block/sda/device/link_power_management_policy" tell you? Nothing as the file does not exits, but cat /sys/class/scsi_host/host0/link_power_management_policy works and said me "min_power". I rebooted without setting it, so it now says "max_performance" and I still get the reset on scan :/ Regards ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Odd behaviour of device in response to idleimmediate with unload 2008-11-22 8:22 ` Tejun Heo 2008-11-22 9:51 ` Evgeni Golov @ 2008-11-22 9:58 ` Evgeni Golov 2008-11-23 18:09 ` Elias Oltmanns 2 siblings, 0 replies; 37+ messages in thread From: Evgeni Golov @ 2008-11-22 9:58 UTC (permalink / raw) To: Tejun Heo; +Cc: Elias Oltmanns, Mark Lord, linux-ide, Alan Cox On Sat, 22 Nov 2008 17:22:39 +0900 Tejun Heo wrote: > Eh... you have ALPM enabled. What does "cat > /sys/block/sda/device/link_power_management_policy" tell you? Sorry, when I disable ALPM I get no errors on unload_heads, so it's fine now :) I still wonder why ALPM was enabled... ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Odd behaviour of device in response to idleimmediate with unload 2008-11-22 8:22 ` Tejun Heo 2008-11-22 9:51 ` Evgeni Golov 2008-11-22 9:58 ` Evgeni Golov @ 2008-11-23 18:09 ` Elias Oltmanns 2008-11-24 4:20 ` Tejun Heo 2 siblings, 1 reply; 37+ messages in thread From: Elias Oltmanns @ 2008-11-23 18:09 UTC (permalink / raw) To: Tejun Heo; +Cc: Evgeni Golov, Mark Lord, linux-ide, Alan Cox Tejun Heo <tj@kernel.org> wrote: > Evgeni Golov wrote: >> On Fri, 21 Nov 2008 15:41:22 +0900 Tejun Heo wrote: > >> >>> Something strange is going on there. Can you please apply the attached >>> patch and post resulting kernel log? >> >> Wow, that one is spammy :) >> See relevant part of /var/log/messages attached. > > Eh... you have ALPM enabled. Quite. The thing that's puzzling me, though, is this: There are a lot of phy events occurring during normal operation that get cleared by the interrupt handler. However, during link autopsy, phyrdy and comwake are set in SError even though event notification has not been disabled yet at this point. This suggests to me that not all changes to those bits in SError are brought to our attention through event notification, at least as long as ALPM is enabled. Besides, I'm wondering whether that many phy events are to be expected (i.e. observed on other controllers too) when ALPM is enabled. Regards, Elias ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Odd behaviour of device in response to idleimmediate with unload 2008-11-23 18:09 ` Elias Oltmanns @ 2008-11-24 4:20 ` Tejun Heo 0 siblings, 0 replies; 37+ messages in thread From: Tejun Heo @ 2008-11-24 4:20 UTC (permalink / raw) To: Elias Oltmanns; +Cc: Evgeni Golov, Mark Lord, linux-ide, Alan Cox Elias Oltmanns wrote: > Tejun Heo <tj@kernel.org> wrote: >> Evgeni Golov wrote: >>> On Fri, 21 Nov 2008 15:41:22 +0900 Tejun Heo wrote: >>>> Something strange is going on there. Can you please apply the attached >>>> patch and post resulting kernel log? >>> Wow, that one is spammy :) >>> See relevant part of /var/log/messages attached. >> Eh... you have ALPM enabled. > > Quite. The thing that's puzzling me, though, is this: There are a lot of > phy events occurring during normal operation that get cleared by the > interrupt handler. However, during link autopsy, phyrdy and comwake are > set in SError even though event notification has not been disabled yet > at this point. This suggests to me that not all changes to those bits in > SError are brought to our attention through event notification, at least > as long as ALPM is enabled. Besides, I'm wondering whether that many phy > events are to be expected (i.e. observed on other controllers too) when > ALPM is enabled. Yeah, entering and leaving link power save mode generate PHY event each time and ahci explicitly ignores them if ALPM is enabled. We probably need to teach EH that PHY events can be ignored if ALPM is enabled. Thanks. -- tejun ^ permalink raw reply [flat|nested] 37+ messages in thread
end of thread, other threads:[~2008-11-24 4:20 UTC | newest] Thread overview: 37+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-11-04 10:31 Odd behaviour of device in response to idleimmediate with unload Elias Oltmanns 2008-11-04 10:40 ` Tejun Heo 2008-11-04 12:32 ` Evgeni Golov 2008-11-04 17:06 ` Mark Lord 2008-11-04 17:18 ` Mark Lord 2008-11-04 17:47 ` Mark Lord 2008-11-04 18:13 ` Mark Lord 2008-11-04 18:54 ` Evgeni Golov 2008-11-04 19:39 ` Mark Lord 2008-11-05 9:32 ` Tejun Heo 2008-11-05 13:47 ` Elias Oltmanns 2008-11-05 14:08 ` Tejun Heo 2008-11-05 18:55 ` Elias Oltmanns 2008-11-06 11:23 ` Evgeni Golov 2008-11-06 12:12 ` Elias Oltmanns 2008-11-05 19:34 ` Evgeni Golov 2008-11-06 11:41 ` Elias Oltmanns 2008-11-07 4:08 ` Tejun Heo 2008-11-07 7:48 ` Evgeni Golov 2008-11-10 9:00 ` Tejun Heo 2008-11-10 10:26 ` Evgeni Golov 2008-11-10 11:35 ` Elias Oltmanns 2008-11-13 11:33 ` Evgeni Golov 2008-11-13 12:29 ` Elias Oltmanns 2008-11-16 9:39 ` Tejun Heo 2008-11-17 7:15 ` Evgeni Golov 2008-11-17 7:19 ` Tejun Heo 2008-11-17 7:48 ` Evgeni Golov 2008-11-18 1:22 ` Tejun Heo 2008-11-18 7:37 ` Evgeni Golov 2008-11-21 6:41 ` Tejun Heo 2008-11-21 19:40 ` Evgeni Golov 2008-11-22 8:22 ` Tejun Heo 2008-11-22 9:51 ` Evgeni Golov 2008-11-22 9:58 ` Evgeni Golov 2008-11-23 18:09 ` Elias Oltmanns 2008-11-24 4:20 ` Tejun Heo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).