* Libata EH False Alarm (2.6.18-rc2)?
@ 2006-08-24 19:42 Fajun Chen
2006-08-24 20:00 ` Tejun Heo
0 siblings, 1 reply; 8+ messages in thread
From: Fajun Chen @ 2006-08-24 19:42 UTC (permalink / raw)
To: linux-ide; +Cc: jgarzik, htejun
Hi Guys,
I ran 2.6.18-rc2 with libata patch. I have seen many occurrences of EH
traces in dmesg log as show below:
...
[30540.003174] ata1: exception Emask 0x10 SAct 0x0 SErr 0x80000 action
0x2 frozen
[30540.003259] ata1: (irq_stat 0x01100010, PHY RDY changed)
[30540.548784] ata1: waiting for device to spin up (8 secs)
[30548.721383] ata1: soft resetting port
[30548.822362] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[30548.825201] ata1.00: configured for UDMA/100
[30548.825248] ata1: EH complete
[repeats]...
While at the same time, my testing still proceeded without problem.
Are these genuine errors or false alarms?
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Libata EH False Alarm (2.6.18-rc2)?
2006-08-24 19:42 Libata EH False Alarm (2.6.18-rc2)? Fajun Chen
@ 2006-08-24 20:00 ` Tejun Heo
2006-08-24 20:05 ` Fajun Chen
0 siblings, 1 reply; 8+ messages in thread
From: Tejun Heo @ 2006-08-24 20:00 UTC (permalink / raw)
To: Fajun Chen; +Cc: linux-ide, jgarzik
Fajun Chen wrote:
> Hi Guys,
>
> I ran 2.6.18-rc2 with libata patch. I have seen many occurrences of EH
> traces in dmesg log as show below:
> ...
> [30540.003174] ata1: exception Emask 0x10 SAct 0x0 SErr 0x80000 action
> 0x2 frozen
> [30540.003259] ata1: (irq_stat 0x01100010, PHY RDY changed)
> [30540.548784] ata1: waiting for device to spin up (8 secs)
> [30548.721383] ata1: soft resetting port
> [30548.822362] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> [30548.825201] ata1.00: configured for UDMA/100
> [30548.825248] ata1: EH complete
> [repeats]...
>
> While at the same time, my testing still proceeded without problem.
> Are these genuine errors or false alarms?
Do you happen to be using sata_sil on ATI controller? If so, those
chipsets don't support hotplug && please use -rc4. If you're using
something else, please report what chipset you're using.
--
tejun
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Libata EH False Alarm (2.6.18-rc2)?
2006-08-24 20:00 ` Tejun Heo
@ 2006-08-24 20:05 ` Fajun Chen
2006-08-24 20:21 ` Tejun Heo
0 siblings, 1 reply; 8+ messages in thread
From: Fajun Chen @ 2006-08-24 20:05 UTC (permalink / raw)
To: Tejun Heo; +Cc: linux-ide, jgarzik
Sil3124. That's the only chipset we use.
Thanks,
Fajun
On 8/24/06, Tejun Heo <htejun@gmail.com> wrote:
> Fajun Chen wrote:
> > Hi Guys,
> >
> > I ran 2.6.18-rc2 with libata patch. I have seen many occurrences of EH
> > traces in dmesg log as show below:
> > ...
> > [30540.003174] ata1: exception Emask 0x10 SAct 0x0 SErr 0x80000 action
> > 0x2 frozen
> > [30540.003259] ata1: (irq_stat 0x01100010, PHY RDY changed)
> > [30540.548784] ata1: waiting for device to spin up (8 secs)
> > [30548.721383] ata1: soft resetting port
> > [30548.822362] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> > [30548.825201] ata1.00: configured for UDMA/100
> > [30548.825248] ata1: EH complete
> > [repeats]...
> >
> > While at the same time, my testing still proceeded without problem.
> > Are these genuine errors or false alarms?
>
> Do you happen to be using sata_sil on ATI controller? If so, those
> chipsets don't support hotplug && please use -rc4. If you're using
> something else, please report what chipset you're using.
>
> --
> tejun
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Libata EH False Alarm (2.6.18-rc2)?
2006-08-24 20:05 ` Fajun Chen
@ 2006-08-24 20:21 ` Tejun Heo
2006-08-24 20:24 ` Tejun Heo
2006-08-24 21:28 ` Fajun Chen
0 siblings, 2 replies; 8+ messages in thread
From: Tejun Heo @ 2006-08-24 20:21 UTC (permalink / raw)
To: Fajun Chen; +Cc: linux-ide, jgarzik
Fajun Chen wrote:
> Sil3124. That's the only chipset we use.
>> > [30540.003174] ata1: exception Emask 0x10 SAct 0x0 SErr 0x80000 action
>> > 0x2 frozen
>> > [30540.003259] ata1: (irq_stat 0x01100010, PHY RDY changed)
Yeap, this message from sata_sil24. You're not getting any phy status
changes bits in SError although the device is reporting phy rdy changed
event. However, your 3124 is reporting 8b/10b decoding error threshold
exceeded error interrupt. That could be related to the phyrdy status
changed event. This happens only under heavy IO, right? How often does
it occur in units of times per megabytes transferred?
8b/10b error is a recoverable FIS reception error. The interrupt bit
(bit 24 of irq_stat) is only turned on if threshold count is exceeded,
which is initialized to 0x8000 at the moment. This indicates that there
are quite some number of transmission failures.
--
tejun
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Libata EH False Alarm (2.6.18-rc2)?
2006-08-24 20:21 ` Tejun Heo
@ 2006-08-24 20:24 ` Tejun Heo
2006-08-25 14:41 ` Fajun Chen
2006-08-24 21:28 ` Fajun Chen
1 sibling, 1 reply; 8+ messages in thread
From: Tejun Heo @ 2006-08-24 20:24 UTC (permalink / raw)
To: Fajun Chen; +Cc: linux-ide, jgarzik
[-- Attachment #1: Type: text/plain, Size: 1040 bytes --]
Tejun Heo wrote:
> Fajun Chen wrote:
>> Sil3124. That's the only chipset we use.
>
>>> > [30540.003174] ata1: exception Emask 0x10 SAct 0x0 SErr 0x80000 action
>>> > 0x2 frozen
>>> > [30540.003259] ata1: (irq_stat 0x01100010, PHY RDY changed)
>
> Yeap, this message from sata_sil24. You're not getting any phy status
> changes bits in SError although the device is reporting phy rdy changed
> event. However, your 3124 is reporting 8b/10b decoding error threshold
> exceeded error interrupt. That could be related to the phyrdy status
> changed event. This happens only under heavy IO, right? How often does
> it occur in units of times per megabytes transferred?
>
> 8b/10b error is a recoverable FIS reception error. The interrupt bit
> (bit 24 of irq_stat) is only turned on if threshold count is exceeded,
> which is initialized to 0x8000 at the moment. This indicates that there
> are quite some number of transmission failures.
>
Sorry, I forgot to attach patch. Can you please try the attached patch?
--
tejun
[-- Attachment #2: patch --]
[-- Type: text/plain, Size: 636 bytes --]
--- a/drivers/scsi/sata_sil24.c
+++ b/drivers/scsi/sata_sil24.c
@@ -1034,9 +1034,9 @@ static void sil24_init_controller(struct
writel(PORT_CS_IRQ_WOC, port + PORT_CTRL_CLR);
/* Zero error counters. */
- writel(0x8000, port + PORT_DECODE_ERR_THRESH);
- writel(0x8000, port + PORT_CRC_ERR_THRESH);
- writel(0x8000, port + PORT_HSHK_ERR_THRESH);
+ writel(0x0000, port + PORT_DECODE_ERR_THRESH);
+ writel(0x0000, port + PORT_CRC_ERR_THRESH);
+ writel(0x0000, port + PORT_HSHK_ERR_THRESH);
writel(0x0000, port + PORT_DECODE_ERR_CNT);
writel(0x0000, port + PORT_CRC_ERR_CNT);
writel(0x0000, port + PORT_HSHK_ERR_CNT);
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Libata EH False Alarm (2.6.18-rc2)?
2006-08-24 20:21 ` Tejun Heo
2006-08-24 20:24 ` Tejun Heo
@ 2006-08-24 21:28 ` Fajun Chen
1 sibling, 0 replies; 8+ messages in thread
From: Fajun Chen @ 2006-08-24 21:28 UTC (permalink / raw)
To: Tejun Heo; +Cc: linux-ide, jgarzik
On 8/24/06, Tejun Heo <htejun@gmail.com> wrote:
> Fajun Chen wrote:
> > Sil3124. That's the only chipset we use.
>
> >> > [30540.003174] ata1: exception Emask 0x10 SAct 0x0 SErr 0x80000 action
> >> > 0x2 frozen
> >> > [30540.003259] ata1: (irq_stat 0x01100010, PHY RDY changed)
>
> Yeap, this message from sata_sil24. You're not getting any phy status
> changes bits in SError although the device is reporting phy rdy changed
> event. However, your 3124 is reporting 8b/10b decoding error threshold
> exceeded error interrupt. That could be related to the phyrdy status
> changed event. This happens only under heavy IO, right? How often does
> it occur in units of times per megabytes transferred?
>
> 8b/10b error is a recoverable FIS reception error. The interrupt bit
> (bit 24 of irq_stat) is only turned on if threshold count is exceeded,
> which is initialized to 0x8000 at the moment. This indicates that there
> are quite some number of transmission failures.
>
Hi Tejun,
Yes, these happened under heavy IO. Based on printk timing, it seems
when the problem happened, it was pretty frequent. Please see the snap
shot of timed printk log below. However, when the system calms down,
nothing printed for many hours. I don't know the direct cause yet.
Thanks,
Fajun
[28143.903371] ata1: EH complete
[28216.484097] ata1: exception Emask 0x10 SAct 0x0 SErr 0x80000 action
0x2 frozen
[28216.484192] ata1: (irq_stat 0x01100010, PHY RDY changed)
[28217.030009] ata1: waiting for device to spin up (8 secs)
[28225.201609] ata1: soft resetting port
[28225.302587] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[28225.305382] ata1.00: configured for UDMA/100
[28225.305430] ata1: EH complete
[28293.634777] ata1: exception Emask 0x10 SAct 0x0 SErr 0x80000 action
0x2 frozen
[28293.634864] ata1: (irq_stat 0x01100010, PHY RDY changed)
[28294.180322] ata1: waiting for device to spin up (8 secs)
[28302.352922] ata1: soft resetting port
[28302.464898] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[28302.467587] ata1.00: configured for UDMA/100
[28302.467632] ata1: EH complete
[29311.442808] ata1: exception Emask 0x10 SAct 0x0 SErr 0x80000 action
0x2 frozen
[29311.442898] ata1: (irq_stat 0x01100010, PHY RDY changed)
[29311.988038] ata1: waiting for device to spin up (8 secs)
[29320.160639] ata1: soft resetting port
[29320.272614] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[29320.275362] ata1.00: configured for UDMA/100
[29320.275408] ata1: EH complete
[29324.778511] ata1: exception Emask 0x10 SAct 0x0 SErr 0x80000 action
0x2 frozen
[29324.778595] ata1: (irq_stat 0x01100010, PHY RDY changed)
[29325.324117] ata1: waiting for device to spin up (8 secs)
[29333.496716] ata1: soft resetting port
[29333.608692] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[29333.611426] ata1.00: configured for UDMA/100
[29333.611472] ata1: EH complete
[29401.940767] ata1: exception Emask 0x10 SAct 0x0 SErr 0x80000 action
0x2 frozen
[29401.940850] ata1: (irq_stat 0x01100010, PHY RDY changed)
[29402.486428] ata1: waiting for device to spin up (8 secs)
[29410.659027] ata1: soft resetting port
[29410.760006] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[29410.762878] ata1.00: configured for UDMA/100
[29410.762926] ata1: EH complete
[29436.860583] ata1: exception Emask 0x10 SAct 0x0 SErr 0x80000 action
0x2 frozen
[29436.860666] ata1: (irq_stat 0x01100010, PHY RDY changed)
[29437.406160] ata1: waiting for device to spin up (8 secs)
[29445.578759] ata1: soft resetting port
[29445.690740] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[29445.693501] ata1.00: configured for UDMA/100
[29445.693548] ata1: EH complete
[29471.655324] ata1: exception Emask 0x10 SAct 0x0 SErr 0x80000 action
0x2 frozen
[29471.655406] ata1: (irq_stat 0x01100010, PHY RDY changed)
[29472.200932] ata1: waiting for device to spin up (8 secs)
[29480.373529] ata1: soft resetting port
[29480.474507] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[29480.477348] ata1.00: configured for UDMA/100
[29480.477395] ata1: EH complete
[29875.996417] ata1: exception Emask 0x10 SAct 0x0 SErr 0x80000 action
0x2 frozen
[29875.996501] ata1: (irq_stat 0x01100010, PHY RDY changed)
[29876.542033] ata1: waiting for device to spin up (8 secs)
[29884.714634] ata1: soft resetting port
[29884.815612] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[29884.818411] ata1.00: configured for UDMA/100
[29884.818458] ata1: EH complete
[29927.771105] ata1: exception Emask 0x10 SAct 0x0 SErr 0x80000 action
0x2 frozen
[29927.771187] ata1: (irq_stat 0x01100010, PHY RDY changed)
[29928.316809] ata1: waiting for device to spin up (8 secs)
[29936.489409] ata1: soft resetting port
[29936.590387] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[29936.593166] ata1.00: configured for UDMA/100
[29936.593215] ata1: EH complete
[29962.657966] ata1: exception Emask 0x10 SAct 0x0 SErr 0x80000 action
0x2 frozen
[29962.658051] ata1: (irq_stat 0x01100010, PHY RDY changed)
[29963.203550] ata1: waiting for device to spin up (8 secs)
[29971.376151] ata1: soft resetting port
[29971.488126] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[29971.491138] ata1.00: configured for UDMA/100
[29971.491188] ata1: EH complete
[29997.454772] ata1: exception Emask 0x10 SAct 0x0 SErr 0x80000 action
0x2 frozen
[29997.454860] ata1: (irq_stat 0x01100010, PHY RDY changed)
[29998.000319] ata1: waiting for device to spin up (8 secs)
[30006.172918] ata1: soft resetting port
[30006.284894] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[30006.287658] ata1.00: configured for UDMA/100
[30006.287705] ata1: EH complete
[30032.953233] ata1: exception Emask 0x10 SAct 0x0 SErr 0x80000 action
0x2 frozen
[30032.953318] ata1: (irq_stat 0x01100010, PHY RDY changed)
[30033.498880] ata1: waiting for device to spin up (8 secs)
[30041.671478] ata1: soft resetting port
[30041.783459] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[30041.786204] ata1.00: configured for UDMA/100
[30041.786253] ata1: EH complete
[30068.484787] ata1: exception Emask 0x10 SAct 0x0 SErr 0x80000 action
0x2 frozen
[30068.484869] ata1: (irq_stat 0x01100010, PHY RDY changed)
[30069.030432] ata1: waiting for device to spin up (8 secs)
[30077.203034] ata1: soft resetting port
[30077.304012] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[30077.306761] ata1.00: configured for UDMA/100
[30077.306810] ata1: EH complete
[30309.781820] ata1: exception Emask 0x10 SAct 0x0 SErr 0x80000 action
0x2 frozen
[30309.781906] ata1: (irq_stat 0x01100010, PHY RDY changed)
[30310.327479] ata1: waiting for device to spin up (8 secs)
[30318.500082] ata1: soft resetting port
[30318.601059] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[30318.603940] ata1.00: configured for UDMA/100
[30318.603988] ata1: EH complete
[30540.003174] ata1: exception Emask 0x10 SAct 0x0 SErr 0x80000 action
0x2 frozen
[30540.003259] ata1: (irq_stat 0x01100010, PHY RDY changed)
[30540.548784] ata1: waiting for device to spin up (8 secs)
[30548.721383] ata1: soft resetting port
[30548.822362] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[30548.825201] ata1.00: configured for UDMA/100
[30548.825248] ata1: EH complete
[30575.399747] ata1: exception Emask 0x10 SAct 0x0 SErr 0x80000 action
0x2 frozen
[30575.399833] ata1: (irq_stat 0x01100010, PHY RDY changed)
[30575.945375] ata1: waiting for device to spin up (8 secs)
[30584.117974] ata1: soft resetting port
[30584.229951] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[30584.232735] ata1.00: configured for UDMA/100
[30584.232782] ata1: EH complete
[30610.554411] ata1: exception Emask 0x10 SAct 0x0 SErr 0x80000 action
0x2 frozen
[30610.554496] ata1: (irq_stat 0x01100010, PHY RDY changed)
[30611.100038] ata1: waiting for device to spin up (8 secs)
[30619.272638] ata1: soft resetting port
[30619.373617] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[30619.376389] ata1.00: configured for UDMA/100
[30619.376439] ata1: EH complete
[30626.649269] ata1: exception Emask 0x10 SAct 0x0 SErr 0x80000 action
0x2 frozen
[30626.649353] ata1: (irq_stat 0x01100010, PHY RDY changed)
[30627.194306] ata1: waiting for device to spin up (8 secs)
[30635.366907] ata1: soft resetting port
[30635.467886] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[30635.470648] ata1.00: configured for UDMA/100
[30635.470694] ata1: EH complete
[31100.947327] ata1: exception Emask 0x10 SAct 0x0 SErr 0x80000 action
0x2 frozen
[31100.947409] ata1: (irq_stat 0x01100010, PHY RDY changed)
[31101.492839] ata1: waiting for device to spin up (8 secs)
[31109.665441] ata1: soft resetting port
[31109.766420] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[31109.769259] ata1.00: configured for UDMA/100
[31109.769305] ata1: EH complete
[31344.375653] ata1: exception Emask 0x10 SAct 0x0 SErr 0x80000 action
0x2 frozen
[31344.375740] ata1: (irq_stat 0x01100010, PHY RDY changed)
[31344.921260] ata1: waiting for device to spin up (8 secs)
[31353.093860] ata1: soft resetting port
[31353.194839] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[31353.197665] ata1.00: configured for UDMA/100
[31353.197712] ata1: EH complete
[31610.265816] ata1: exception Emask 0x10 SAct 0x0 SErr 0x80000 action
0x2 frozen
[31610.265901] ata1: (irq_stat 0x01100010, PHY RDY changed)
[31610.811076] ata1: waiting for device to spin up (8 secs)
[31618.983675] ata1: soft resetting port
[31619.095652] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[31619.098363] ata1.00: configured for UDMA/100
[31619.098410] ata1: EH complete
[31623.600365] ata1: exception Emask 0x10 SAct 0x0 SErr 0x80000 action
0x2 frozen
[31623.600450] ata1: (irq_stat 0x01100010, PHY RDY changed)
[31624.146155] ata1: waiting for device to spin up (8 secs)
[31632.318756] ata1: soft resetting port
[31632.419733] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[31632.422564] ata1.00: configured for UDMA/100
[31632.422611] ata1: EH complete
[31700.777772] ata1: exception Emask 0x10 SAct 0x0 SErr 0x80000 action
0x2 frozen
[31700.777860] ata1: (irq_stat 0x01100010, PHY RDY changed)
[31701.323461] ata1: waiting for device to spin up (8 secs)
[31709.496064] ata1: soft resetting port
[31709.597041] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[31709.599857] ata1.00: configured for UDMA/100
[31709.599906] ata1: EH complete
[31944.217310] ata1: exception Emask 0x10 SAct 0x0 SErr 0x80000 action
0x2 frozen
[31944.217391] ata1: (irq_stat 0x01100010, PHY RDY changed)
[31944.762878] ata1: waiting for device to spin up (8 secs)
[31952.935477] ata1: soft resetting port
[31953.047454] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[31953.050157] ata1.00: configured for UDMA/100
[31953.050203] ata1: EH complete
[32034.091806] ata1: exception Emask 0x10 SAct 0x0 SErr 0x80000 action
0x2 frozen
[32034.091889] ata1: (irq_stat 0x01100010, PHY RDY changed)
[32034.637450] ata1: waiting for device to spin up (8 secs)
[32042.810051] ata1: soft resetting port
[32042.922027] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[32042.924751] ata1.00: configured for UDMA/100
[32042.924798] ata1: EH complete
[32115.516844] ata1: exception Emask 0x10 SAct 0x0 SErr 0x80000 action
0x2 frozen
[32115.516929] ata1: (irq_stat 0x01100010, PHY RDY changed)
[32116.062510] ata1: waiting for device to spin up (8 secs)
[32124.235108] ata1: soft resetting port
[32124.347084] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[32124.349879] ata1.00: configured for UDMA/100
[32124.349927] ata1: EH complete
[32192.660105] ata1: exception Emask 0x10 SAct 0x0 SErr 0x80000 action
0x2 frozen
[32192.660190] ata1: (irq_stat 0x01100010, PHY RDY changed)
[32193.205824] ata1: waiting for device to spin up (8 secs)
[32201.378423] ata1: soft resetting port
[32201.490400] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[32201.493189] ata1.00: configured for UDMA/100
[32201.493235] ata1: EH complete
[33213.144571] ata1: exception Emask 0x10 SAct 0x0 SErr 0x80000 action
0x2 frozen
[33213.144660] ata1: (irq_stat 0x01100010, PHY RDY changed)
[33213.689755] ata1: waiting for device to spin up (8 secs)
[33221.862352] ata1: soft resetting port
[33221.963332] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[33221.966204] ata1.00: configured for UDMA/100
[33221.966251] ata1: EH complete
[33226.479118] ata1: exception Emask 0x10 SAct 0x0 SErr 0x80000 action
0x2 frozen
[33226.479203] ata1: (irq_stat 0x01100010, PHY RDY changed)
[33227.024832] ata1: waiting for device to spin up (8 secs)
[33235.197430] ata1: soft resetting port
[33235.298411] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[33235.301294] ata1.00: configured for UDMA/100
[33235.301341] ata1: EH complete
[33303.608605] ata1: exception Emask 0x10 SAct 0x0 SErr 0x80000 action
0x2 frozen
[33303.608693] ata1: (irq_stat 0x01100010, PHY RDY changed)
[33304.154152] ata1: waiting for device to spin up (8 secs)
[33312.326753] ata1: soft resetting port
[33312.427731] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[33312.430512] ata1.00: configured for UDMA/100
[33312.430557] ata1: EH complete
[33338.528219] ata1: exception Emask 0x10 SAct 0x0 SErr 0x80000 action
0x2 frozen
[33338.528301] ata1: (irq_stat 0x01100010, PHY RDY changed)
[33339.073885] ata1: waiting for device to spin up (8 secs)
[33347.246484] ata1: soft resetting port
[33347.347463] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[33347.350215] ata1.00: configured for UDMA/100
[33347.350264] ata1: EH complete
[33373.332079] ata1: exception Emask 0x10 SAct 0x0 SErr 0x80000 action
0x2 frozen
[33373.332167] ata1: (irq_stat 0x01100010, PHY RDY changed)
[33373.877651] ata1: waiting for device to spin up (8 secs)
[33382.050251] ata1: soft resetting port
[33382.162226] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[33382.165255] ata1.00: configured for UDMA/100
[33382.165301] ata1: EH complete
[33777.968017] ata1: exception Emask 0x10 SAct 0x0 SErr 0x80000 action
0x2 frozen
[33777.968102] ata1: (irq_stat 0x01100010, PHY RDY changed)
[33778.513668] ata1: waiting for device to spin up (8 secs)
[33786.686268] ata1: soft resetting port
[33786.798245] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[33786.801038] ata1.00: configured for UDMA/100
[33786.801086] ata1: EH complete
[33829.722919] ata1: exception Emask 0x10 SAct 0x0 SErr 0x80000 action
0x2 frozen
[33829.723000] ata1: (irq_stat 0x01100010, PHY RDY changed)
[33830.268451] ata1: waiting for device to spin up (8 secs)
[33838.441052] ata1: soft resetting port
[33838.542029] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[33838.544844] ata1.00: configured for UDMA/100
[33838.544893] ata1: EH complete
[33864.622635] ata1: exception Emask 0x10 SAct 0x0 SErr 0x80000 action
0x2 frozen
[33864.622723] ata1: (irq_stat 0x01100010, PHY RDY changed)
[33865.168188] ata1: waiting for device to spin up (8 secs)
[33873.340790] ata1: soft resetting port
[33873.441767] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[33873.444581] ata1.00: configured for UDMA/100
[33873.444628] ata1: EH complete
[33899.404288] ata1: exception Emask 0x10 SAct 0x0 SErr 0x80000 action
0x2 frozen
[33899.404375] ata1: (irq_stat 0x01100010, PHY RDY changed)
[33899.949960] ata1: waiting for device to spin up (8 secs)
[33908.122560] ata1: soft resetting port
[33908.223540] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[33908.226384] ata1.00: configured for UDMA/100
[33908.226431] ata1: EH complete
[33934.886888] ata1: exception Emask 0x10 SAct 0x0 SErr 0x80000 action
0x2 frozen
[33934.886976] ata1: (irq_stat 0x01100010, PHY RDY changed)
[33935.432527] ata1: waiting for device to spin up (8 secs)
[33943.605128] ata1: soft resetting port
[33943.717103] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[33943.719871] ata1.00: configured for UDMA/100
[33943.719921] ata1: EH complete
[33970.392444] ata1: exception Emask 0x10 SAct 0x0 SErr 0x80000 action
0x2 frozen
[33970.392525] ata1: (irq_stat 0x01100010, PHY RDY changed)
[33970.938087] ata1: waiting for device to spin up (8 secs)
[33979.110685] ata1: soft resetting port
[33979.222663] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[33979.225410] ata1.00: configured for UDMA/100
[33979.225460] ata1: EH complete
[34024.082151] ata1: exception Emask 0x10 SAct 0x0 SErr 0x80000 action
0x2 frozen
[34024.082239] ata1: (irq_stat 0x01100010, PHY RDY changed)
[34024.627299] ata1: waiting for device to spin up (8 secs)
[34032.799898] ata1: soft resetting port
[34032.911876] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[34032.914608] ata1.00: configured for UDMA/100
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Libata EH False Alarm (2.6.18-rc2)?
2006-08-24 20:24 ` Tejun Heo
@ 2006-08-25 14:41 ` Fajun Chen
2006-08-25 14:54 ` Tejun Heo
0 siblings, 1 reply; 8+ messages in thread
From: Fajun Chen @ 2006-08-25 14:41 UTC (permalink / raw)
To: Tejun Heo; +Cc: linux-ide, jgarzik
Hi Tejun,
I tested your patch. No more EH alarms in dmesg during an overnight
test. One question I have is whether this fix will mask genuine
failure.
Thanks,
Fajun
On 8/24/06, Tejun Heo <htejun@gmail.com> wrote:
> Tejun Heo wrote:
> > Fajun Chen wrote:
> >> Sil3124. That's the only chipset we use.
> >
> >>> > [30540.003174] ata1: exception Emask 0x10 SAct 0x0 SErr 0x80000 action
> >>> > 0x2 frozen
> >>> > [30540.003259] ata1: (irq_stat 0x01100010, PHY RDY changed)
> >
> > Yeap, this message from sata_sil24. You're not getting any phy status
> > changes bits in SError although the device is reporting phy rdy changed
> > event. However, your 3124 is reporting 8b/10b decoding error threshold
> > exceeded error interrupt. That could be related to the phyrdy status
> > changed event. This happens only under heavy IO, right? How often does
> > it occur in units of times per megabytes transferred?
> >
> > 8b/10b error is a recoverable FIS reception error. The interrupt bit
> > (bit 24 of irq_stat) is only turned on if threshold count is exceeded,
> > which is initialized to 0x8000 at the moment. This indicates that there
> > are quite some number of transmission failures.
> >
>
> Sorry, I forgot to attach patch. Can you please try the attached patch?
>
> --
> tejun
>
>
> --- a/drivers/scsi/sata_sil24.c
> +++ b/drivers/scsi/sata_sil24.c
> @@ -1034,9 +1034,9 @@ static void sil24_init_controller(struct
> writel(PORT_CS_IRQ_WOC, port + PORT_CTRL_CLR);
>
> /* Zero error counters. */
> - writel(0x8000, port + PORT_DECODE_ERR_THRESH);
> - writel(0x8000, port + PORT_CRC_ERR_THRESH);
> - writel(0x8000, port + PORT_HSHK_ERR_THRESH);
> + writel(0x0000, port + PORT_DECODE_ERR_THRESH);
> + writel(0x0000, port + PORT_CRC_ERR_THRESH);
> + writel(0x0000, port + PORT_HSHK_ERR_THRESH);
> writel(0x0000, port + PORT_DECODE_ERR_CNT);
> writel(0x0000, port + PORT_CRC_ERR_CNT);
> writel(0x0000, port + PORT_HSHK_ERR_CNT);
>
>
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Libata EH False Alarm (2.6.18-rc2)?
2006-08-25 14:41 ` Fajun Chen
@ 2006-08-25 14:54 ` Tejun Heo
0 siblings, 0 replies; 8+ messages in thread
From: Tejun Heo @ 2006-08-25 14:54 UTC (permalink / raw)
To: Fajun Chen; +Cc: linux-ide, jgarzik
Fajun Chen wrote:
> Hi Tejun,
>
> I tested your patch. No more EH alarms in dmesg during an overnight
> test. One question I have is whether this fix will mask genuine
> failure.
No, you're experiencing a lot recoverable failures. The reason can be
anything - faulty drive, flakey cable, out-of-spec on-board trace,
etc... You probably want to solve the problem.
--
tejun
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2006-08-25 14:54 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-08-24 19:42 Libata EH False Alarm (2.6.18-rc2)? Fajun Chen
2006-08-24 20:00 ` Tejun Heo
2006-08-24 20:05 ` Fajun Chen
2006-08-24 20:21 ` Tejun Heo
2006-08-24 20:24 ` Tejun Heo
2006-08-25 14:41 ` Fajun Chen
2006-08-25 14:54 ` Tejun Heo
2006-08-24 21:28 ` Fajun Chen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).