linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* libata interface fatal error
@ 2007-05-24 13:25 Florian Effenberger
  2007-05-24 13:45 ` Tejun Heo
  0 siblings, 1 reply; 41+ messages in thread
From: Florian Effenberger @ 2007-05-24 13:25 UTC (permalink / raw)
  To: jgarzik, linux-ide

Hi there,

seems I've always subscribed to SATA problems. :-)

We installed Debian Etch with the pre-compiled kernel, but when doing 
heavy SATA data transfer, the drives seem to make trouble. Even with the 
latest kernel, 2.6.21.2, we receive:

===
ata3.00: exception Emask 0x10 SAct 0x1 SErr 0x400100 action 0x2 frozen
ata3.00: (irq_stat 0x08000000, interface fatal error)
ata3.00: cmd 61/80:00:00:91:91/00:00:1d:00:00/40 tag 0 cdb 0x0 data 
65536 out
          res 40/00:04:00:91:91/00:00:1d:00:00/40 Emask 0x10 (ATA bus error)
ata3: soft resetting port
ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata3.00: configured for UDMA/133
ata3: EH complete
SCSI device sdc: 625142448 512-byte hdwr sectors (320073 MB)
sdc: Write Protect is off
sdc: Mode Sense: 00 3a 00 00
SCSI device sdc: write cache: enabled, read cache: enabled, doesn't 
support DPO or FUA
===

MD5 sums of copied files are right and we experience no other problems. 
Is this a driver bug? If so, can I be of any help in debugging it?

lspci gives:

===
00:00.0 Host bridge: Intel Corporation P965/G965 Memory Controller Hub 
(rev 02)
00:01.0 PCI bridge: Intel Corporation P965/G965 PCI Express Root Port 
(rev 02)
00:1a.0 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI 
#4 (rev 02)
00:1a.1 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI 
#5 (rev 02)
00:1a.7 USB Controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI 
#2 (rev 02)
00:1b.0 Audio device: Intel Corporation 82801H (ICH8 Family) HD Audio 
Controller (rev 02)
00:1c.0 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express 
Port 1 (rev 02)
00:1c.4 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express 
Port 5 (rev 02)
00:1c.5 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express 
Port 6 (rev 02)
00:1d.0 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI 
#1 (rev 02)
00:1d.1 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI 
#2 (rev 02)
00:1d.2 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI 
#3 (rev 02)
00:1d.7 USB Controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI 
#1 (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev f2)
00:1f.0 ISA bridge: Intel Corporation 82801HB/HR (ICH8/R) LPC Interface 
Controller (rev 02)
00:1f.2 SATA controller: Intel Corporation 82801HB (ICH8) SATA AHCI 
Controller (rev 02)
00:1f.3 SMBus: Intel Corporation 82801H (ICH8 Family) SMBus Controller 
(rev 02)
01:00.0 VGA compatible controller: nVidia Corporation Unknown device 
016a (rev a1)
03:00.0 Ethernet controller: Marvell Technology Group Ltd. Unknown 
device 4364 (rev 12)
04:00.0 SATA controller: JMicron Technologies, Inc. JMicron 20360/20363 
AHCI Controller (rev 02)
04:00.1 IDE interface: JMicron Technologies, Inc. JMicron 20360/20363 
AHCI Controller (rev 02)
05:01.0 Ethernet controller: Intel Corporation 82557/8/9 [Ethernet Pro 
100] (rev 0c)
05:06.0 FireWire (IEEE 1394): Texas Instruments TSB43AB23 
IEEE-1394a-2000 Controller (PHY/Link)
===

Thanks
Florian

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: libata interface fatal error
  2007-05-24 13:25 Florian Effenberger
@ 2007-05-24 13:45 ` Tejun Heo
  2007-05-24 14:08   ` Florian Effenberger
  0 siblings, 1 reply; 41+ messages in thread
From: Tejun Heo @ 2007-05-24 13:45 UTC (permalink / raw)
  To: Florian Effenberger; +Cc: jgarzik, linux-ide

Hello,

Florian Effenberger wrote:
> We installed Debian Etch with the pre-compiled kernel, but when doing
> heavy SATA data transfer, the drives seem to make trouble. Even with the
> latest kernel, 2.6.21.2, we receive:
> 
> ===
> ata3.00: exception Emask 0x10 SAct 0x1 SErr 0x400100 action 0x2 frozen
> ata3.00: (irq_stat 0x08000000, interface fatal error)
> ata3.00: cmd 61/80:00:00:91:91/00:00:1d:00:00/40 tag 0 cdb 0x0 data
> 65536 out
>          res 40/00:04:00:91:91/00:00:1d:00:00/40 Emask 0x10 (ATA bus error)
> ata3: soft resetting port
> ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> ata3.00: configured for UDMA/133
> ata3: EH complete
> SCSI device sdc: 625142448 512-byte hdwr sectors (320073 MB)
> sdc: Write Protect is off
> sdc: Mode Sense: 00 3a 00 00
> SCSI device sdc: write cache: enabled, read cache: enabled, doesn't
> support DPO or FUA
> ===

Looks like a genuine transmission/interface error to me.  How often does
this occur?  Please try to connect the drive to another port using and
possibly different power lane.  Also, testing with another drive is a
good way to track down where the problem is.

> MD5 sums of copied files are right and we experience no other problems.
> Is this a driver bug? If so, can I be of any help in debugging it?

Yeah, libata EH is working properly so there shouldn't be any problem
other than the error messages and a bit slower transfer speed.

-- 
tejun

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: libata interface fatal error
  2007-05-24 13:45 ` Tejun Heo
@ 2007-05-24 14:08   ` Florian Effenberger
  2007-05-24 14:21     ` Tejun Heo
  0 siblings, 1 reply; 41+ messages in thread
From: Florian Effenberger @ 2007-05-24 14:08 UTC (permalink / raw)
  To: Tejun Heo; +Cc: jgarzik, linux-ide

Hi,

thanks for the fast reply!

> Looks like a genuine transmission/interface error to me.  How often does
> this occur?  Please try to connect the drive to another port using and
> possibly different power lane.  Also, testing with another drive is a
> good way to track down where the problem is.

it occurs as soon as the drive is being used heavily (load of about 2,x 
on the machine when running our test scripts). About 15 times in 2 or 3 
hours. Will try to change port, power supply and drive.

> Yeah, libata EH is working properly so there shouldn't be any problem
> other than the error messages and a bit slower transfer speed.

So, even if the errors are still there, there is nothing real to worry 
about for me?

There are now new errors with hard errors, is this still ok?

===
ata4.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x2 frozen
ata4.00: cmd 60/80:00:00:09:97/00:00:0a:00:00/40 tag 0 cdb 0x0 data 65536 in
          res 40/00:04:00:67:14/00:00:1c:00:00/40 Emask 0x4 (timeout)
ata4: soft resetting port
ata4: softreset failed (1st FIS failed)
ata4: softreset failed, retrying in 5 secs
ata4: hard resetting port
ata4: port is slow to respond, please be patient (Status 0x80)
ata4: port failed to respond (30 secs, Status 0x80)
ata4: COMRESET failed (device not ready)
ata4: hardreset failed, retrying in 5 secs
ata4: hard resetting port
ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata4.00: configured for UDMA/133
ata4: EH complete
SCSI device sdd: 625142448 512-byte hdwr sectors (320073 MB)
sdd: Write Protect is off
sdd: Mode Sense: 00 3a 00 00
SCSI device sdd: write cache: enabled, read cache: enabled, doesn't 
support DPO or FUA
===

Florian

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: libata interface fatal error
  2007-05-24 14:08   ` Florian Effenberger
@ 2007-05-24 14:21     ` Tejun Heo
  2007-05-24 14:47       ` Florian Effenberger
  0 siblings, 1 reply; 41+ messages in thread
From: Tejun Heo @ 2007-05-24 14:21 UTC (permalink / raw)
  To: Florian Effenberger; +Cc: jgarzik, linux-ide

Florian Effenberger wrote:
>> Looks like a genuine transmission/interface error to me.  How often does
>> this occur?  Please try to connect the drive to another port using and
>> possibly different power lane.  Also, testing with another drive is a
>> good way to track down where the problem is.
> 
> it occurs as soon as the drive is being used heavily (load of about 2,x
> on the machine when running our test scripts). About 15 times in 2 or 3
> hours. Will try to change port, power supply and drive.
> 
>> Yeah, libata EH is working properly so there shouldn't be any problem
>> other than the error messages and a bit slower transfer speed.
> 
> So, even if the errors are still there, there is nothing real to worry
> about for me?

Data integrity wise there should be no problem but your error rate is
pretty high and eventually will make libata turn off NCQ and/or speed
down PHY speed.

> There are now new errors with hard errors, is this still ok?
> 
> ===
> ata4.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x2 frozen
> ata4.00: cmd 60/80:00:00:09:97/00:00:0a:00:00/40 tag 0 cdb 0x0 data
> 65536 in
>          res 40/00:04:00:67:14/00:00:1c:00:00/40 Emask 0x4 (timeout)

Yeap, your data is safe.  With timeouts, data transfer speed can be much
lower tho.  It definitely seems something is wrong with your hardware setup.

-- 
tejun

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: libata interface fatal error
  2007-05-24 14:21     ` Tejun Heo
@ 2007-05-24 14:47       ` Florian Effenberger
  2007-05-24 14:53         ` Tejun Heo
  2007-05-24 14:55         ` Greg Freemyer
  0 siblings, 2 replies; 41+ messages in thread
From: Florian Effenberger @ 2007-05-24 14:47 UTC (permalink / raw)
  To: Tejun Heo; +Cc: jgarzik, linux-ide

Hi,

> Data integrity wise there should be no problem but your error rate is
> pretty high and eventually will make libata turn off NCQ and/or speed
> down PHY speed.

switching ports is not easy. Both on-board SATA controllers are being 
used, and the error seems to occur on all ports.

> Yeap, your data is safe.  With timeouts, data transfer speed can be much
> lower tho.  It definitely seems something is wrong with your hardware setup.

I will try to use another test disk. Right now we use different models 
of Western Digital "RAID edition".

Any other debug information I could gather?

Florian

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: libata interface fatal error
  2007-05-24 14:47       ` Florian Effenberger
@ 2007-05-24 14:53         ` Tejun Heo
  2007-05-24 15:28           ` Florian Effenberger
  2007-05-24 14:55         ` Greg Freemyer
  1 sibling, 1 reply; 41+ messages in thread
From: Tejun Heo @ 2007-05-24 14:53 UTC (permalink / raw)
  To: Florian Effenberger; +Cc: jgarzik, linux-ide

Florian Effenberger wrote:
> Hi,
> 
>> Data integrity wise there should be no problem but your error rate is
>> pretty high and eventually will make libata turn off NCQ and/or speed
>> down PHY speed.
> 
> switching ports is not easy. Both on-board SATA controllers are being
> used, and the error seems to occur on all ports.

Hmmmm...

>> Yeap, your data is safe.  With timeouts, data transfer speed can be much
>> lower tho.  It definitely seems something is wrong with your hardware
>> setup.
> 
> I will try to use another test disk. Right now we use different models
> of Western Digital "RAID edition".
> 
> Any other debug information I could gather?

If you let the system run, libata will turn off NCQ and/or lower PHY
speed to 1.5Gbps.  Do errors disappear after that happens?

-- 
tejun

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: libata interface fatal error
  2007-05-24 14:47       ` Florian Effenberger
  2007-05-24 14:53         ` Tejun Heo
@ 2007-05-24 14:55         ` Greg Freemyer
  2007-05-24 14:59           ` Tejun Heo
  2007-05-24 15:00           ` Florian Effenberger
  1 sibling, 2 replies; 41+ messages in thread
From: Greg Freemyer @ 2007-05-24 14:55 UTC (permalink / raw)
  To: Florian Effenberger; +Cc: Tejun Heo, jgarzik, linux-ide

On 5/24/07, Florian Effenberger <florian@effenberger.org> wrote:
> Hi,
>
> > Data integrity wise there should be no problem but your error rate is
> > pretty high and eventually will make libata turn off NCQ and/or speed
> > down PHY speed.
>
> switching ports is not easy. Both on-board SATA controllers are being
> used, and the error seems to occur on all ports.
>
> > Yeap, your data is safe.  With timeouts, data transfer speed can be much
> > lower tho.  It definitely seems something is wrong with your hardware setup.
>
> I will try to use another test disk. Right now we use different models
> of Western Digital "RAID edition".
>

iiuc, raid editions are designed to fail fast thus allowing an
alternate drive to provide the data rather than having to wait thru
multiple internal retries.

Could this just be a case of the drive functioning as designed?

Greg
-- 
Greg Freemyer
The Norcross Group
Forensics for the 21st Century

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: libata interface fatal error
  2007-05-24 14:55         ` Greg Freemyer
@ 2007-05-24 14:59           ` Tejun Heo
  2007-05-24 15:00           ` Florian Effenberger
  1 sibling, 0 replies; 41+ messages in thread
From: Tejun Heo @ 2007-05-24 14:59 UTC (permalink / raw)
  To: Greg Freemyer; +Cc: Florian Effenberger, jgarzik, linux-ide

Greg Freemyer wrote:
>> I will try to use another test disk. Right now we use different models
>> of Western Digital "RAID edition".
>>
> 
> iiuc, raid editions are designed to fail fast thus allowing an
> alternate drive to provide the data rather than having to wait thru
> multiple internal retries.
> 
> Could this just be a case of the drive functioning as designed?

If that's the case, the drive should be aborting commands with ICRC bit
set reporting unrecoverable media error (AC_ERR_DEV | AC_ERR_MEDIA in
libata terms) but the errors are fatal interface errors and timeouts,
both of which are indicative of transmission problems on ATA link.

-- 
tejun

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: libata interface fatal error
  2007-05-24 14:55         ` Greg Freemyer
  2007-05-24 14:59           ` Tejun Heo
@ 2007-05-24 15:00           ` Florian Effenberger
  1 sibling, 0 replies; 41+ messages in thread
From: Florian Effenberger @ 2007-05-24 15:00 UTC (permalink / raw)
  To: Greg Freemyer; +Cc: Tejun Heo, jgarzik, linux-ide

Hi,

> iiuc, raid editions are designed to fail fast thus allowing an
> alternate drive to provide the data rather than having to wait thru
> multiple internal retries.
> 
> Could this just be a case of the drive functioning as designed? 

to be honest, I don't know. :-)

Any jumper settings to change that, or any driver settings? Are you 
aware of something like that?

Florian

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: libata interface fatal error
  2007-05-24 14:53         ` Tejun Heo
@ 2007-05-24 15:28           ` Florian Effenberger
  0 siblings, 0 replies; 41+ messages in thread
From: Florian Effenberger @ 2007-05-24 15:28 UTC (permalink / raw)
  To: Tejun Heo; +Cc: jgarzik, linux-ide

We just disabled the RAID (Linux software RAID, no hardware RAID) and 
tested with one disk only, same results

What string should I grep the logs for when things are being lowered?

> If you let the system run, libata will turn off NCQ and/or lower PHY
> speed to 1.5Gbps.  Do errors disappear after that happens?


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: libata interface fatal error
@ 2007-05-26  9:43 Florian Effenberger
  2007-05-29  9:16 ` Tejun Heo
  0 siblings, 1 reply; 41+ messages in thread
From: Florian Effenberger @ 2007-05-26  9:43 UTC (permalink / raw)
  To: linux-ide; +Cc: htejun, jeff

Hi,

it seems that the speed is never lowered, I always see "SATA link up 3.0 
Gbps (SStatus 123 SControl 300)".

Can I manually lower the speed via a kernel parameter?

Thanks
Florian

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: libata interface fatal error
  2007-05-26  9:43 Florian Effenberger
@ 2007-05-29  9:16 ` Tejun Heo
  2007-05-29 14:16   ` Florian Effenberger
  2007-06-06 21:23   ` Florian Effenberger
  0 siblings, 2 replies; 41+ messages in thread
From: Tejun Heo @ 2007-05-29  9:16 UTC (permalink / raw)
  To: Florian Effenberger; +Cc: linux-ide, jeff

Florian Effenberger wrote:
> Hi,
> 
> it seems that the speed is never lowered, I always see "SATA link up 3.0
> Gbps (SStatus 123 SControl 300)".
> 
> Can I manually lower the speed via a kernel parameter?

Currently, there is no mechanism to do that but hard drives usually have
dip switch to force 1.5Gbps.  Please try that.  If your harddrive
doesn't have that, please lemme know.  I'll prepare a simple patch.

Thanks.

-- 
tejun


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: libata interface fatal error
  2007-05-29  9:16 ` Tejun Heo
@ 2007-05-29 14:16   ` Florian Effenberger
  2007-06-06 21:23   ` Florian Effenberger
  1 sibling, 0 replies; 41+ messages in thread
From: Florian Effenberger @ 2007-05-29 14:16 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-ide, jeff

Hi Tejun,

> Currently, there is no mechanism to do that but hard drives usually have
> dip switch to force 1.5Gbps.  Please try that.  If your harddrive
> doesn't have that, please lemme know.  I'll prepare a simple patch.

thanks a lot, I will try that out and tell you the results. :-)

Florian

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: libata interface fatal error
  2007-05-29  9:16 ` Tejun Heo
  2007-05-29 14:16   ` Florian Effenberger
@ 2007-06-06 21:23   ` Florian Effenberger
  2007-06-07  9:50     ` Tejun Heo
  1 sibling, 1 reply; 41+ messages in thread
From: Florian Effenberger @ 2007-06-06 21:23 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-ide, jeff

Hi,

> Currently, there is no mechanism to do that but hard drives usually have
> dip switch to force 1.5Gbps.  Please try that.  If your harddrive
> doesn't have that, please lemme know.  I'll prepare a simple patch.

unfortunately, the disks have a jumper board, but the jumpers are 
missing... could you write a patch for me? Would be much appreciated!

Thanks a lot!
Florian

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: libata interface fatal error
  2007-06-06 21:23   ` Florian Effenberger
@ 2007-06-07  9:50     ` Tejun Heo
  2007-06-07 14:08       ` Florian Effenberger
                         ` (2 more replies)
  0 siblings, 3 replies; 41+ messages in thread
From: Tejun Heo @ 2007-06-07  9:50 UTC (permalink / raw)
  To: Florian Effenberger; +Cc: linux-ide, jeff

[-- Attachment #1: Type: text/plain, Size: 612 bytes --]

Florian Effenberger wrote:
> Hi,
> 
>> Currently, there is no mechanism to do that but hard drives usually have
>> dip switch to force 1.5Gbps.  Please try that.  If your harddrive
>> doesn't have that, please lemme know.  I'll prepare a simple patch.
> 
> unfortunately, the disks have a jumper board, but the jumpers are
> missing... could you write a patch for me? Would be much appreciated!

Okay, there was a bug in link speed limit logic.  That's probably why
speed down to 1.5Gbps didn't kick in.  The attached patch contains the
fix and hack to force 1.5Gbps.  Please give it a shot.

Thanks.

-- 
tejun

[-- Attachment #2: ahci-force-1_5.patch --]
[-- Type: text/x-patch, Size: 2156 bytes --]

diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
index 7baeaff..f9550f1 100644
--- a/drivers/ata/ahci.c
+++ b/drivers/ata/ahci.c
@@ -219,6 +219,7 @@ static int ahci_init_one (struct pci_dev *pdev, const struct pci_device_id *ent)
 static unsigned int ahci_qc_issue(struct ata_queued_cmd *qc);
 static void ahci_irq_clear(struct ata_port *ap);
 static int ahci_port_start(struct ata_port *ap);
+static int ahci_vt8251_port_start(struct ata_port *ap);
 static void ahci_port_stop(struct ata_port *ap);
 static void ahci_tf_read(struct ata_port *ap, struct ata_taskfile *tf);
 static void ahci_qc_prep(struct ata_queued_cmd *qc);
@@ -284,7 +285,7 @@ static const struct ata_port_operations ahci_ops = {
 	.port_resume		= ahci_port_resume,
 #endif
 
-	.port_start		= ahci_port_start,
+	.port_start		= ahci_vt8251_port_start,
 	.port_stop		= ahci_port_stop,
 };
 
@@ -318,7 +319,7 @@ static const struct ata_port_operations ahci_vt8251_ops = {
 	.port_resume		= ahci_port_resume,
 #endif
 
-	.port_start		= ahci_port_start,
+	.port_start		= ahci_vt8251_port_start,
 	.port_stop		= ahci_port_stop,
 };
 
@@ -1558,6 +1559,19 @@ static int ahci_port_start(struct ata_port *ap)
 	return 0;
 }
 
+static int ahci_vt8251_port_start(struct ata_port *ap)
+{
+	struct ahci_host_priv *hpriv = ap->host->private_data;
+
+	if (((hpriv->cap >> 20) & 0xf) != 1) {
+		printk("limiting SATA link speed to 1.5Gbps\n");
+		ap->hw_sata_spd_limit = 1;
+		ap->eh_info.action |= ATA_EH_HARDRESET;
+	}
+
+	return ahci_port_start(ap);
+}
+
 static void ahci_port_stop(struct ata_port *ap)
 {
 	const char *emsg = NULL;
diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 4733f00..57940ba 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -6313,7 +6313,8 @@ int ata_host_register(struct ata_host *host, struct scsi_host_template *sht)
 		/* init sata_spd_limit to the current value */
 		if (sata_scr_read(ap, SCR_CONTROL, &scontrol) == 0) {
 			int spd = (scontrol >> 4) & 0xf;
-			ap->hw_sata_spd_limit &= (1 << spd) - 1;
+			if (spd)
+				ap->hw_sata_spd_limit &= (1 << spd) - 1;
 		}
 		ap->sata_spd_limit = ap->hw_sata_spd_limit;
 

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: libata interface fatal error
  2007-06-07  9:50     ` Tejun Heo
@ 2007-06-07 14:08       ` Florian Effenberger
  2007-06-13 10:37       ` Florian Effenberger
  2007-06-16 10:23       ` Florian Effenberger
  2 siblings, 0 replies; 41+ messages in thread
From: Florian Effenberger @ 2007-06-07 14:08 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-ide, jeff

Hi Tejun,

> Okay, there was a bug in link speed limit logic.  That's probably why
> speed down to 1.5Gbps didn't kick in.  The attached patch contains the
> fix and hack to force 1.5Gbps.  Please give it a shot.

thanks a lot for your help, much appreciated! Will test it and let you 
know if it works.

Florian

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: libata interface fatal error
  2007-06-07  9:50     ` Tejun Heo
  2007-06-07 14:08       ` Florian Effenberger
@ 2007-06-13 10:37       ` Florian Effenberger
  2007-06-14  9:43         ` Tejun Heo
  2007-06-16 10:23       ` Florian Effenberger
  2 siblings, 1 reply; 41+ messages in thread
From: Florian Effenberger @ 2007-06-13 10:37 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-ide, jeff

Hi Tejun,

> Okay, there was a bug in link speed limit logic.  That's probably why
> speed down to 1.5Gbps didn't kick in.  The attached patch contains the
> fix and hack to force 1.5Gbps.  Please give it a shot.

thanks a lot for your patch, it seems to work, at least better than 
without patch. :-)

When rsyncing about 12 GB, no trouble occured. When doing heavy stress 
tests, I receive errors again, but okay, maybe that's due to a hardware bug.

Will your patch go into the vanilla kernel?

Florian

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: libata interface fatal error
  2007-06-13 10:37       ` Florian Effenberger
@ 2007-06-14  9:43         ` Tejun Heo
  2007-06-14 11:12           ` Florian Effenberger
  0 siblings, 1 reply; 41+ messages in thread
From: Tejun Heo @ 2007-06-14  9:43 UTC (permalink / raw)
  To: Florian Effenberger; +Cc: linux-ide, jeff

Florian Effenberger wrote:
> Hi Tejun,
> 
>> Okay, there was a bug in link speed limit logic.  That's probably why
>> speed down to 1.5Gbps didn't kick in.  The attached patch contains the
>> fix and hack to force 1.5Gbps.  Please give it a shot.
> 
> thanks a lot for your patch, it seems to work, at least better than
> without patch. :-)
> 
> When rsyncing about 12 GB, no trouble occured. When doing heavy stress
> tests, I receive errors again, but okay, maybe that's due to a hardware
> bug.
> 
> Will your patch go into the vanilla kernel?

I'm currently not sure what the root cause is

1. if the controller is at fault, we need to force 1.5Gbps on the
controller.

2. if the drive model is broken, we need to blacklist the drives.

3. if your specific configuration is broken (faulty hw, PSU, bad karma),
the upstream speed limit fix patch should be enough.

Can you post the result of 'hdparm -I /dev/sdX'?

-- 
tejun

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: libata interface fatal error
  2007-06-14  9:43         ` Tejun Heo
@ 2007-06-14 11:12           ` Florian Effenberger
  2007-06-14 12:25             ` Tejun Heo
  0 siblings, 1 reply; 41+ messages in thread
From: Florian Effenberger @ 2007-06-14 11:12 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-ide, jeff

[-- Attachment #1: Type: text/plain, Size: 222 bytes --]

Hi,

> Can you post the result of 'hdparm -I /dev/sdX'?

thanks a lot for your kind support, that is much appreciated!

Attached is some machine output, hope that helps. Let me know I you need 
more information.

Florian


[-- Attachment #2: debug.txt --]
[-- Type: text/plain, Size: 18541 bytes --]

00:00.0 Host bridge: Intel Corporation P965/G965 Memory Controller Hub (rev 02)
00:01.0 PCI bridge: Intel Corporation P965/G965 PCI Express Root Port (rev 02)
00:1a.0 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI #4 (rev 02)
00:1a.1 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI #5 (rev 02)
00:1a.7 USB Controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI #2 (rev 02)
00:1b.0 Audio device: Intel Corporation 82801H (ICH8 Family) HD Audio Controller (rev 02)
00:1c.0 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 1 (rev 02)
00:1c.4 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 5 (rev 02)
00:1c.5 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 6 (rev 02)
00:1d.0 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI #2 (rev 02)
00:1d.2 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI #3 (rev 02)
00:1d.7 USB Controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI #1 (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev f2)
00:1f.0 ISA bridge: Intel Corporation 82801HB/HR (ICH8/R) LPC Interface Controller (rev 02)
00:1f.2 SATA controller: Intel Corporation 82801HB (ICH8) SATA AHCI Controller (rev 02)
00:1f.3 SMBus: Intel Corporation 82801H (ICH8 Family) SMBus Controller (rev 02)
01:00.0 VGA compatible controller: nVidia Corporation Unknown device 016a (rev a1)
03:00.0 Ethernet controller: Marvell Technology Group Ltd. Unknown device 4364 (rev 12)
04:00.0 SATA controller: JMicron Technologies, Inc. JMicron 20360/20363 AHCI Controller (rev 02)
04:00.1 IDE interface: JMicron Technologies, Inc. JMicron 20360/20363 AHCI Controller (rev 02)
05:01.0 Ethernet controller: Intel Corporation 82557/8/9 [Ethernet Pro 100] (rev 0c)
05:06.0 FireWire (IEEE 1394): Texas Instruments TSB43AB23 IEEE-1394a-2000 Controller (PHY/Link)



/dev/md0:
        Version : 00.90.03
  Creation Time : Tue May  1 15:56:11 2007
     Raid Level : raid5
     Array Size : 937713408 (894.27 GiB 960.22 GB)
    Device Size : 312571136 (298.09 GiB 320.07 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Thu Jun 14 12:40:44 2007
          State : clean
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           UUID : 6e3c156d:c91eb028:40daae21:698c531b
         Events : 0.62

    Number   Major   Minor   RaidDevice State
       0       8       16        0      active sync   /dev/sdb
       1       8       32        1      active sync   /dev/sdc
       2       8       48        2      active sync   /dev/sdd
       3       8       64        3      active sync   /dev/sde


/dev/sda:

ATA device, with non-removable media
        Model Number:       WDC WD1600YS-01SHB1
        Serial Number:      WD-WCAP01819659
        Firmware Revision:  20.06C06
Standards:
        Supported: 7 6 5 4
        Likely used: 7
Configuration:
        Logical         max     current
        cylinders       16383   16383
        heads           16      16
        sectors/track   63      63
        --
        CHS current addressable sectors:   16514064
        LBA    user addressable sectors:  268435455
        LBA48  user addressable sectors:  321670847
        device size with M = 1024*1024:      157065 MBytes
        device size with M = 1000*1000:      164695 MBytes (164 GB)
Capabilities:
        LBA, IORDY(can be disabled)
        Queue depth: 32
        Standby timer values: spec'd by Standard, with device specific minimum
        R/W multiple sector transfer: Max = 16  Current = 0
        Recommended acoustic management value: 128, current value: 254
        DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6
             Cycle time: min=120ns recommended=120ns
        PIO: pio0 pio1 pio2 pio3 pio4
             Cycle time: no flow control=120ns  IORDY flow control=120ns
Commands/features:
        Enabled Supported:
           *    SMART feature set
                Security Mode feature set
           *    Power Management feature set
           *    Write cache
           *    Look-ahead
           *    Host Protected Area feature set
           *    WRITE_BUFFER command
           *    READ_BUFFER command
           *    NOP cmd
           *    DOWNLOAD_MICROCODE
                Power-Up In Standby feature set
           *    SET_FEATURES required to spinup after power up
                SET_MAX security extension
                Automatic Acoustic Management feature set
           *    48-bit Address feature set
           *    Device Configuration Overlay feature set
           *    Mandatory FLUSH_CACHE
           *    FLUSH_CACHE_EXT
           *    SMART error logging
           *    SMART self-test
           *    General Purpose Logging feature set
           *    WRITE_{DMA|MULTIPLE}_FUA_EXT
           *    64-bit World wide name
           *    SATA-I signaling speed (1.5Gb/s)
           *    SATA-II signaling speed (3.0Gb/s)
           *    Native Command Queueing (NCQ)
           *    Host-initiated interface power management
           *    Phy event counters
                DMA Setup Auto-Activate optimization
           *    Software settings preservation
           *    SMART Command Transport (SCT) feature set
           *    SCT Long Sector Access (AC1)
           *    SCT LBA Segment Access (AC2)
           *    SCT Error Recovery Control (AC3)
           *    SCT Features Control (AC4)
           *    SCT Data Tables (AC5)
                unknown 206[12]
Security:
        Master password revision code = 65534
                supported
        not     enabled
        not     locked
        not     frozen
        not     expired: security count
        not     supported: enhanced erase
        52min for SECURITY ERASE UNIT.
Checksum: correct


/dev/sdb:

ATA device, with non-removable media
        Model Number:       WDC WD3200YS-01PGB0
        Serial Number:      WD-WCAPD3405080
        Firmware Revision:  21.00M21
Standards:
        Supported: 7 6 5 4
        Likely used: 7
Configuration:
        Logical         max     current
        cylinders       16383   16383
        heads           16      16
        sectors/track   63      63
        --
        CHS current addressable sectors:   16514064
        LBA    user addressable sectors:  268435455
        LBA48  user addressable sectors:  625142448
        device size with M = 1024*1024:      305245 MBytes
        device size with M = 1000*1000:      320072 MBytes (320 GB)
Capabilities:
        LBA, IORDY(can be disabled)
        Queue depth: 1
        Standby timer values: spec'd by Standard, with device specific minimum
        R/W multiple sector transfer: Max = 16  Current = 0
        Recommended acoustic management value: 128, current value: 254
        DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6
             Cycle time: min=120ns recommended=120ns
        PIO: pio0 pio1 pio2 pio3 pio4
             Cycle time: no flow control=120ns  IORDY flow control=120ns
Commands/features:
        Enabled Supported:
           *    SMART feature set
                Security Mode feature set
           *    Power Management feature set
           *    Write cache
           *    Look-ahead
           *    Host Protected Area feature set
           *    WRITE_BUFFER command
           *    READ_BUFFER command
           *    NOP cmd
           *    DOWNLOAD_MICROCODE
                Power-Up In Standby feature set
           *    SET_FEATURES required to spinup after power up
                SET_MAX security extension
                Automatic Acoustic Management feature set
           *    48-bit Address feature set
           *    Device Configuration Overlay feature set
           *    Mandatory FLUSH_CACHE
           *    FLUSH_CACHE_EXT
           *    SMART error logging
           *    SMART self-test
           *    General Purpose Logging feature set
           *    SATA-I signaling speed (1.5Gb/s)
           *    SATA-II signaling speed (3.0Gb/s)
           *    Native Command Queueing (NCQ)
           *    Host-initiated interface power management
           *    Phy event counters
                DMA Setup Auto-Activate optimization
           *    Software settings preservation
           *    SMART Command Transport (SCT) feature set
           *    SCT Long Sector Access (AC1)
           *    SCT LBA Segment Access (AC2)
           *    SCT Error Recovery Control (AC3)
           *    SCT Features Control (AC4)
           *    SCT Data Tables (AC5)
                unknown 206[12]
Security:
        Master password revision code = 65534
                supported
        not     enabled
        not     locked
        not     frozen
        not     expired: security count
        not     supported: enhanced erase
Checksum: correct


/dev/sdc:

ATA device, with non-removable media
        Model Number:       WDC WD3200YS-01PGB0
        Serial Number:      WD-WCAPD4087913
        Firmware Revision:  21.00M21
Standards:
        Supported: 7 6 5 4
        Likely used: 7
Configuration:
        Logical         max     current
        cylinders       16383   16383
        heads           16      16
        sectors/track   63      63
        --
        CHS current addressable sectors:   16514064
        LBA    user addressable sectors:  268435455
        LBA48  user addressable sectors:  625142448
        device size with M = 1024*1024:      305245 MBytes
        device size with M = 1000*1000:      320072 MBytes (320 GB)
Capabilities:
        LBA, IORDY(can be disabled)
        Queue depth: 1
        Standby timer values: spec'd by Standard, with device specific minimum
        R/W multiple sector transfer: Max = 16  Current = 0
        Recommended acoustic management value: 128, current value: 254
        DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6
             Cycle time: min=120ns recommended=120ns
        PIO: pio0 pio1 pio2 pio3 pio4
             Cycle time: no flow control=120ns  IORDY flow control=120ns
Commands/features:
        Enabled Supported:
           *    SMART feature set
                Security Mode feature set
           *    Power Management feature set
           *    Write cache
           *    Look-ahead
           *    Host Protected Area feature set
           *    WRITE_BUFFER command
           *    READ_BUFFER command
           *    NOP cmd
           *    DOWNLOAD_MICROCODE
                Power-Up In Standby feature set
           *    SET_FEATURES required to spinup after power up
                SET_MAX security extension
                Automatic Acoustic Management feature set
           *    48-bit Address feature set
           *    Device Configuration Overlay feature set
           *    Mandatory FLUSH_CACHE
           *    FLUSH_CACHE_EXT
           *    SMART error logging
           *    SMART self-test
           *    General Purpose Logging feature set
           *    SATA-I signaling speed (1.5Gb/s)
           *    SATA-II signaling speed (3.0Gb/s)
           *    Native Command Queueing (NCQ)
           *    Host-initiated interface power management
           *    Phy event counters
                DMA Setup Auto-Activate optimization
           *    Software settings preservation
           *    SMART Command Transport (SCT) feature set
           *    SCT Long Sector Access (AC1)
           *    SCT LBA Segment Access (AC2)
           *    SCT Error Recovery Control (AC3)
           *    SCT Features Control (AC4)
           *    SCT Data Tables (AC5)
                unknown 206[12]
Security:
        Master password revision code = 65534
                supported
        not     enabled
        not     locked
        not     frozen
        not     expired: security count
        not     supported: enhanced erase
Checksum: correct


/dev/sdd:

ATA device, with non-removable media
        Model Number:       WDC WD3200YS-01PGB0
        Serial Number:      WD-WCAPD4124047
        Firmware Revision:  21.00M21
Standards:
        Supported: 7 6 5 4
        Likely used: 7
Configuration:
        Logical         max     current
        cylinders       16383   16383
        heads           16      16
        sectors/track   63      63
        --
        CHS current addressable sectors:   16514064
        LBA    user addressable sectors:  268435455
        LBA48  user addressable sectors:  625142448
        device size with M = 1024*1024:      305245 MBytes
        device size with M = 1000*1000:      320072 MBytes (320 GB)
Capabilities:
        LBA, IORDY(can be disabled)
        Queue depth: 1
        Standby timer values: spec'd by Standard, with device specific minimum
        R/W multiple sector transfer: Max = 16  Current = 0
        Recommended acoustic management value: 128, current value: 254
        DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6
             Cycle time: min=120ns recommended=120ns
        PIO: pio0 pio1 pio2 pio3 pio4
             Cycle time: no flow control=120ns  IORDY flow control=120ns
Commands/features:
        Enabled Supported:
           *    SMART feature set
                Security Mode feature set
           *    Power Management feature set
           *    Write cache
           *    Look-ahead
           *    Host Protected Area feature set
           *    WRITE_BUFFER command
           *    READ_BUFFER command
           *    NOP cmd
           *    DOWNLOAD_MICROCODE
                Power-Up In Standby feature set
           *    SET_FEATURES required to spinup after power up
                SET_MAX security extension
                Automatic Acoustic Management feature set
           *    48-bit Address feature set
           *    Device Configuration Overlay feature set
           *    Mandatory FLUSH_CACHE
           *    FLUSH_CACHE_EXT
           *    SMART error logging
           *    SMART self-test
           *    General Purpose Logging feature set
           *    SATA-I signaling speed (1.5Gb/s)
           *    SATA-II signaling speed (3.0Gb/s)
           *    Native Command Queueing (NCQ)
           *    Host-initiated interface power management
           *    Phy event counters
                DMA Setup Auto-Activate optimization
           *    Software settings preservation
           *    SMART Command Transport (SCT) feature set
           *    SCT Long Sector Access (AC1)
           *    SCT LBA Segment Access (AC2)
           *    SCT Error Recovery Control (AC3)
           *    SCT Features Control (AC4)
           *    SCT Data Tables (AC5)
                unknown 206[12]
Security:
        Master password revision code = 65534
                supported
        not     enabled
        not     locked
        not     frozen
        not     expired: security count
        not     supported: enhanced erase
Checksum: correct


/dev/sde:

ATA device, with non-removable media
        Model Number:       WDC WD3200YS-01PGB0
        Serial Number:      WD-WCAPD3406202
        Firmware Revision:  21.00M21
Standards:
        Supported: 7 6 5 4
        Likely used: 7
Configuration:
        Logical         max     current
        cylinders       16383   16383
        heads           16      16
        sectors/track   63      63
        --
        CHS current addressable sectors:   16514064
        LBA    user addressable sectors:  268435455
        LBA48  user addressable sectors:  625142448
        device size with M = 1024*1024:      305245 MBytes
        device size with M = 1000*1000:      320072 MBytes (320 GB)
Capabilities:
        LBA, IORDY(can be disabled)
        Queue depth: 1
        Standby timer values: spec'd by Standard, with device specific minimum
        R/W multiple sector transfer: Max = 16  Current = 0
        Recommended acoustic management value: 128, current value: 254
        DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6
             Cycle time: min=120ns recommended=120ns
        PIO: pio0 pio1 pio2 pio3 pio4
             Cycle time: no flow control=120ns  IORDY flow control=120ns
Commands/features:
        Enabled Supported:
           *    SMART feature set
                Security Mode feature set
           *    Power Management feature set
           *    Write cache
           *    Look-ahead
           *    Host Protected Area feature set
           *    WRITE_BUFFER command
           *    READ_BUFFER command
           *    NOP cmd
           *    DOWNLOAD_MICROCODE
                Power-Up In Standby feature set
           *    SET_FEATURES required to spinup after power up
                SET_MAX security extension
                Automatic Acoustic Management feature set
           *    48-bit Address feature set
           *    Device Configuration Overlay feature set
           *    Mandatory FLUSH_CACHE
           *    FLUSH_CACHE_EXT
           *    SMART error logging
           *    SMART self-test
           *    General Purpose Logging feature set
           *    SATA-I signaling speed (1.5Gb/s)
           *    SATA-II signaling speed (3.0Gb/s)
           *    Native Command Queueing (NCQ)
           *    Host-initiated interface power management
           *    Phy event counters
                DMA Setup Auto-Activate optimization
           *    Software settings preservation
           *    SMART Command Transport (SCT) feature set
           *    SCT Long Sector Access (AC1)
           *    SCT LBA Segment Access (AC2)
           *    SCT Error Recovery Control (AC3)
           *    SCT Features Control (AC4)
           *    SCT Data Tables (AC5)
                unknown 206[12]
Security:
        Master password revision code = 65534
                supported
        not     enabled
        not     locked
        not     frozen
        not     expired: security count
        not     supported: enhanced erase
Checksum: correct

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: libata interface fatal error
  2007-06-14 11:12           ` Florian Effenberger
@ 2007-06-14 12:25             ` Tejun Heo
  2007-06-14 15:12               ` Florian Effenberger
  0 siblings, 1 reply; 41+ messages in thread
From: Tejun Heo @ 2007-06-14 12:25 UTC (permalink / raw)
  To: Florian Effenberger; +Cc: linux-ide, jeff

Florian Effenberger wrote:
> Hi,
> 
>> Can you post the result of 'hdparm -I /dev/sdX'?
> 
> thanks a lot for your kind support, that is much appreciated!
> 
> Attached is some machine output, hope that helps. Let me know I you need
> more information.

Okay, ich8.  I don't think the chipset is at fault here and you have a
lot of disks.  My primary suspect is power supply problem but things
like this are hard to prove.  With the merged speed down fix, libata
will do the right thing after a few errors, so ignoring the problem
wouldn't be a too bad idea.  If you're curious, you can try to connect
drives to different SATA ports and power lanes and see whether errors
follow the disk, port or power lane.

-- 
tejun

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: libata interface fatal error
  2007-06-14 12:25             ` Tejun Heo
@ 2007-06-14 15:12               ` Florian Effenberger
  2007-06-18  3:10                 ` Tejun Heo
  0 siblings, 1 reply; 41+ messages in thread
From: Florian Effenberger @ 2007-06-14 15:12 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-ide, jeff

Hi Tejun,

> Okay, ich8.  I don't think the chipset is at fault here and you have a
> lot of disks.  My primary suspect is power supply problem but things
> like this are hard to prove.  With the merged speed down fix, libata
> will do the right thing after a few errors, so ignoring the problem
> wouldn't be a too bad idea.  If you're curious, you can try to connect
> drives to different SATA ports and power lanes and see whether errors
> follow the disk, port or power lane.

exactly, should be four disks in the machine.

What power supply would you recommend for this type of disks? I think we 
got a 450W Enermax, IIRC.

What do you mean by "merged speed down fix"? Is your fix for the speed 
down logic implemented in the current kernel, so I don't have to patch 
anymore (except when I want to force 1.5Gbps right from the beginning)?

All SATA parts are used, so reconnecting is not an option. But I can try 
to switch the power supply (lanes).

Thanks for all your kind help, that is much appreciated!
Florian

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: libata interface fatal error
  2007-06-07  9:50     ` Tejun Heo
  2007-06-07 14:08       ` Florian Effenberger
  2007-06-13 10:37       ` Florian Effenberger
@ 2007-06-16 10:23       ` Florian Effenberger
  2007-06-18  3:13         ` Tejun Heo
  2 siblings, 1 reply; 41+ messages in thread
From: Florian Effenberger @ 2007-06-16 10:23 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-ide, jeff

Hi there,

we tested out two 600W Fortron PSUs, also tried a BIOS update. Didn't 
work out.

We also tried the jumper on the disks labelled SSP (Spread Spectrum 
Clocking), didn't work out out as well.

What seemed to help at least a little bit is to use the 12V connector on 
the board, that is normally dedicated for graphic cards.

The best test to reproduce the problem, according to a colleague also 
working on the machine, is a cat /dev/zero > zero.bin

Do you still think it is a PSU or hardware problem? Do you need more 
details/logs?

Thanks!
Florian

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: libata interface fatal error
  2007-06-14 15:12               ` Florian Effenberger
@ 2007-06-18  3:10                 ` Tejun Heo
  2007-06-18  6:08                   ` Tomi Orava
  2007-06-18 10:38                   ` Florian Effenberger
  0 siblings, 2 replies; 41+ messages in thread
From: Tejun Heo @ 2007-06-18  3:10 UTC (permalink / raw)
  To: Florian Effenberger; +Cc: linux-ide, jeff

Hello,

Florian Effenberger wrote:
> What power supply would you recommend for this type of disks? I think we
> got a 450W Enermax, IIRC.

Most power supplies should be able to do 4 disks without any problem
unless it's broken.

> What do you mean by "merged speed down fix"? Is your fix for the speed
> down logic implemented in the current kernel, so I don't have to patch
> anymore (except when I want to force 1.5Gbps right from the beginning)?

Yeap, kernel will automatically downgrade to 1.5Gbps after several failures.

-- 
tejun

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: libata interface fatal error
  2007-06-16 10:23       ` Florian Effenberger
@ 2007-06-18  3:13         ` Tejun Heo
  2007-06-18 10:44           ` Florian Effenberger
  0 siblings, 1 reply; 41+ messages in thread
From: Tejun Heo @ 2007-06-18  3:13 UTC (permalink / raw)
  To: Florian Effenberger; +Cc: linux-ide, jeff

Hello,

Florian Effenberger wrote:
> we tested out two 600W Fortron PSUs, also tried a BIOS update. Didn't
> work out.

I see.

> We also tried the jumper on the disks labelled SSP (Spread Spectrum
> Clocking), didn't work out out as well.
> 
> What seemed to help at least a little bit is to use the 12V connector on
> the board, that is normally dedicated for graphic cards.

Hmmmmm....

> The best test to reproduce the problem, according to a colleague also
> working on the machine, is a cat /dev/zero > zero.bin
> 
> Do you still think it is a PSU or hardware problem? Do you need more
> details/logs?

The controller being ich8, I'm pretty sure it isn't a driver problem.
Do the errors occur on all four drives?  Also, if things work after
speed is downgraded to 1.5Gbps, it doesn't really matter.  There's no
noticeable performance difference for single disk anyway.

-- 
tejun

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: libata interface fatal error
  2007-06-18  3:10                 ` Tejun Heo
@ 2007-06-18  6:08                   ` Tomi Orava
  2007-06-18  6:28                     ` Tejun Heo
  2007-06-18 10:38                   ` Florian Effenberger
  1 sibling, 1 reply; 41+ messages in thread
From: Tomi Orava @ 2007-06-18  6:08 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Florian Effenberger, linux-ide, jeff


Hi Tejun,

I've been trying to find a solution for a long time for quite a similar
libata errror messages as shown in this thread. Perhaps you might get have
some ideas what the actual originator might be:

With the latest 2.6.22-rc4-git4 kernel I still get the following error
messages
with high I/O load:

sd 2:0:0:0: [sdc] 976773168 512-byte hardware sectors (500108 MB)
sd 2:0:0:0: [sdc] Write Protect is off
sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00
sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2
ata3.00: (port_status 0x20080000)
ata3.00: cmd c8/00:08:af:91:49/00:00:00:00:00/e5 tag 0 cdb 0x0 data 4096 in
         res 50/00:00:b6:91:49/00:00:11:00:00/e5 Emask 0x2 (HSM violation)
ata3: soft resetting port
ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata3.00: ata_hpa_resize 1: sectors = 976773168, hpa_sectors = 976773168
ata3.00: ata_hpa_resize 1: sectors = 976773168, hpa_sectors = 976773168
ata3.00: configured for UDMA/133
ata3: EH complete

... and later in the chain ...

sd 2:0:0:0: [sdc] 976773168 512-byte hardware sectors (500108 MB)
sd 2:0:0:0: [sdc] Write Protect is off
sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00
sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2
ata3.00: (port_status 0x20080000)
ata3.00: cmd c8/00:08:67:74:65/00:00:00:00:00/ec tag 0 cdb 0x0 data 4096 in
         res 50/00:00:6e:74:65/00:00:1b:00:00/ec Emask 0x2 (HSM violation)
ata3: soft resetting port
ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata3.00: ata_hpa_resize 1: sectors = 976773168, hpa_sectors = 976773168
ata3.00: ata_hpa_resize 1: sectors = 976773168, hpa_sectors = 976773168
ata3.00: configured for UDMA/100
ata3: EH complete

--- This goes on until UDMA/33 has been reched

The problematic hardware combination is:

00:00.0 Host bridge: VIA Technologies, Inc. KT880 Host Bridge (rev 80)
00:00.1 Host bridge: VIA Technologies, Inc. KT880 Host Bridge
00:00.2 Host bridge: VIA Technologies, Inc. KT880 Host Bridge
00:00.3 Host bridge: VIA Technologies, Inc. KT880 Host Bridge
00:00.4 Host bridge: VIA Technologies, Inc. KT880 Host Bridge
00:00.7 Host bridge: VIA Technologies, Inc. KT880 Host Bridge
00:01.0 PCI bridge: VIA Technologies, Inc. VT8237 PCI Bridge
00:09.0 Ethernet controller: Marvell Technology Group Ltd. 88E8001 Gigabit
Ethernet Controller (rev 13)
00:0a.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
RTL-8139/8139C/8139C+ (rev 10)
00:0d.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
RTL-8139/8139C/8139C+ (rev 10)
00:0e.0 Mass storage controller: Promise Technology, Inc. PDC40718 (SATA
300 TX4) (rev 02)
00:0f.0 RAID bus controller: VIA Technologies, Inc. VIA VT6420 SATA RAID
Controller (rev 80)
00:0f.1 IDE interface: VIA Technologies, Inc.
VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)
00:10.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1
Controller (rev 81)
00:10.1 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1
Controller (rev 81)
00:10.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1
Controller (rev 81)
00:10.3 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1
Controller (rev 81)
00:10.4 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 86)
00:11.0 ISA bridge: VIA Technologies, Inc. VT8237 ISA bridge
[KT600/K8T800/K8T890 South]
00:11.5 Multimedia audio controller: VIA Technologies, Inc.
VT8233/A/8235/8237 AC97 Audio Controller (rev 60)
00:11.6 Communication controller: VIA Technologies, Inc. AC'97 Modem
Controller (rev 80)
01:00.0 VGA compatible controller: nVidia Corporation NV36.2 [GeForce FX
5700] (rev a1)

and the problems relate only to Seagate 7200.10 SATA-disks, never with the
older 7200.7 SATA-disks alll connected to Promise Sata 300TX4-controller.

Because this problem has been around for as long as I've had the Promise
Sata300TX4 controller an additional new problem is that after kernel
version 2.6.21-rc3-git10 the libata error handling/interface speed
downgrade has been fixed ---> these new seagate disks get downgraded from
UDMA/133 to UDMA/33 overnight (can the speed downgrade  be disabled as a
quick and dirty fix in this case somehow ?). For some reason the above
mentioned libata error messages don't really do any noticeable harm but it
would be very nice to be able to prevent the interface speed downgrade for
now.

>> What do you mean by "merged speed down fix"? Is your fix for the speed
>> down logic implemented in the current kernel, so I don't have to patch
>> anymore (except when I want to force 1.5Gbps right from the beginning)?
>
> Yeap, kernel will automatically downgrade to 1.5Gbps after several
> failures.

Yes, this feature seems to work quite nicely as the included logs show.

Regards,
Tomi Orava

PS. These problems are not special to this single machine as a friend at work
      has the same Promise Sata300TX4 card with exactly the same Seagate
7200.10
      SATA-disks on an intel-based P4 machine with similar problems under
I/O-load.

---------------------------------------------------------
scsi0 : sata_promise
scsi1 : sata_promise
scsi2 : sata_promise
scsi3 : sata_promise
ata1: SATA max UDMA/133 cmd 0xf880a380 ctl 0xf880a3b8 bmdma 0x00000000 irq 0
ata2: SATA max UDMA/133 cmd 0xf880a280 ctl 0xf880a2b8 bmdma 0x00000000 irq 0
ata3: SATA max UDMA/133 cmd 0xf880a200 ctl 0xf880a238 bmdma 0x00000000 irq 0
ata4: SATA max UDMA/133 cmd 0xf880a300 ctl 0xf880a338 bmdma 0x00000000 irq 0
Switched to high resolution mode on CPU 0
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata1.00: ata_hpa_resize 1: sectors = 390721968, hpa_sectors = 390721968
ata1.00: ATA-6: ST3200822AS, 3.01, max UDMA/133
ata1.00: 390721968 sectors, multi 0: LBA48
ata1.00: ata_hpa_resize 1: sectors = 390721968, hpa_sectors = 390721968
ata1.00: configured for UDMA/133
ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata2.00: ata_hpa_resize 1: sectors = 390721968, hpa_sectors = 390721968
ata2.00: ATA-6: ST3200822AS, 3.01, max UDMA/133
ata2.00: 390721968 sectors, multi 0: LBA48
ata2.00: ata_hpa_resize 1: sectors = 390721968, hpa_sectors = 390721968
ata2.00: configured for UDMA/133
ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata3.00: ata_hpa_resize 1: sectors = 976773168, hpa_sectors = 976773168
ata3.00: ATA-7: ST3500630AS, 3.AAK, max UDMA/133
ata3.00: 976773168 sectors, multi 0: LBA48 NCQ (depth 0/32)
ata3.00: ata_hpa_resize 1: sectors = 976773168, hpa_sectors = 976773168
ata3.00: configured for UDMA/133
ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata4.00: ata_hpa_resize 1: sectors = 976773168, hpa_sectors = 976773168
ata4.00: ATA-7: ST3500630AS, 3.AAK, max UDMA/133
ata4.00: 976773168 sectors, multi 0: LBA48 NCQ (depth 0/32)
ata4.00: ata_hpa_resize 1: sectors = 976773168, hpa_sectors = 976773168
ata4.00: configured for UDMA/133
scsi 0:0:0:0: Direct-Access     ATA      ST3200822AS      3.01 PQ: 0 ANSI: 5
sd 0:0:0:0: [sda] 390721968 512-byte hardware sectors (200050 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
sd 0:0:0:0: [sda] 390721968 512-byte hardware sectors (200050 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
 sda: sda1 sda2
sd 0:0:0:0: [sda] Attached SCSI disk
sd 0:0:0:0: Attached scsi generic sg0 type 0
scsi 1:0:0:0: Direct-Access     ATA      ST3200822AS      3.01 PQ: 0 ANSI: 5
sd 1:0:0:0: [sdb] 390721968 512-byte hardware sectors (200050 MB)
sd 1:0:0:0: [sdb] Write Protect is off
sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
sd 1:0:0:0: [sdb] 390721968 512-byte hardware sectors (200050 MB)
sd 1:0:0:0: [sdb] Write Protect is off
sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
 sdb: sdb1 sdb2
sd 1:0:0:0: [sdb] Attached SCSI disk
sd 1:0:0:0: Attached scsi generic sg1 type 0
scsi 2:0:0:0: Direct-Access     ATA      ST3500630AS      3.AA PQ: 0 ANSI: 5
sd 2:0:0:0: [sdc] 976773168 512-byte hardware sectors (500108 MB)
sd 2:0:0:0: [sdc] Write Protect is off
sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00
sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
sd 2:0:0:0: [sdc] 976773168 512-byte hardware sectors (500108 MB)
sd 2:0:0:0: [sdc] Write Protect is off
sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00
sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
 sdc: sdc1 sdc2
sd 2:0:0:0: [sdc] Attached SCSI disk
sd 2:0:0:0: Attached scsi generic sg2 type 0
scsi 3:0:0:0: Direct-Access     ATA      ST3500630AS      3.AA PQ: 0 ANSI: 5
sd 3:0:0:0: [sdd] 976773168 512-byte hardware sectors (500108 MB)
sd 3:0:0:0: [sdd] Write Protect is off
sd 3:0:0:0: [sdd] Mode Sense: 00 3a 00 00
sd 3:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
sd 3:0:0:0: [sdd] 976773168 512-byte hardware sectors (500108 MB)
sd 3:0:0:0: [sdd] Write Protect is off
sd 3:0:0:0: [sdd] Mode Sense: 00 3a 00 00
sd 3:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
 sdd: sdd1 sdd2
sd 3:0:0:0: [sdd] Attached SCSI disk
sd 3:0:0:0: Attached scsi generic sg3 type 0

-- 



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: libata interface fatal error
  2007-06-18  6:08                   ` Tomi Orava
@ 2007-06-18  6:28                     ` Tejun Heo
  0 siblings, 0 replies; 41+ messages in thread
From: Tejun Heo @ 2007-06-18  6:28 UTC (permalink / raw)
  To: Tomi Orava; +Cc: Florian Effenberger, linux-ide, jeff, Mikael Pettersson

Hello,

Yeah, it seems promise has some problem with 3G link.  Cc'ing Mikael
Pettersson and quoting whole body for him.  Mikael, does this look familiar?

Tomi Orava wrote:
> Hi Tejun,
> 
> I've been trying to find a solution for a long time for quite a similar
> libata errror messages as shown in this thread. Perhaps you might get have
> some ideas what the actual originator might be:
> 
> With the latest 2.6.22-rc4-git4 kernel I still get the following error
> messages
> with high I/O load:
> 
> sd 2:0:0:0: [sdc] 976773168 512-byte hardware sectors (500108 MB)
> sd 2:0:0:0: [sdc] Write Protect is off
> sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00
> sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't
> support DPO or FUA
> ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2
> ata3.00: (port_status 0x20080000)
> ata3.00: cmd c8/00:08:af:91:49/00:00:00:00:00/e5 tag 0 cdb 0x0 data 4096 in
>          res 50/00:00:b6:91:49/00:00:11:00:00/e5 Emask 0x2 (HSM violation)
> ata3: soft resetting port
> ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> ata3.00: ata_hpa_resize 1: sectors = 976773168, hpa_sectors = 976773168
> ata3.00: ata_hpa_resize 1: sectors = 976773168, hpa_sectors = 976773168
> ata3.00: configured for UDMA/133
> ata3: EH complete
> 
> ... and later in the chain ...
> 
> sd 2:0:0:0: [sdc] 976773168 512-byte hardware sectors (500108 MB)
> sd 2:0:0:0: [sdc] Write Protect is off
> sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00
> sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't
> support DPO or FUA
> ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2
> ata3.00: (port_status 0x20080000)
> ata3.00: cmd c8/00:08:67:74:65/00:00:00:00:00/ec tag 0 cdb 0x0 data 4096 in
>          res 50/00:00:6e:74:65/00:00:1b:00:00/ec Emask 0x2 (HSM violation)
> ata3: soft resetting port
> ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
> ata3.00: ata_hpa_resize 1: sectors = 976773168, hpa_sectors = 976773168
> ata3.00: ata_hpa_resize 1: sectors = 976773168, hpa_sectors = 976773168
> ata3.00: configured for UDMA/100
> ata3: EH complete
> 
> --- This goes on until UDMA/33 has been reched
> 
> The problematic hardware combination is:
> 
> 00:00.0 Host bridge: VIA Technologies, Inc. KT880 Host Bridge (rev 80)
> 00:00.1 Host bridge: VIA Technologies, Inc. KT880 Host Bridge
> 00:00.2 Host bridge: VIA Technologies, Inc. KT880 Host Bridge
> 00:00.3 Host bridge: VIA Technologies, Inc. KT880 Host Bridge
> 00:00.4 Host bridge: VIA Technologies, Inc. KT880 Host Bridge
> 00:00.7 Host bridge: VIA Technologies, Inc. KT880 Host Bridge
> 00:01.0 PCI bridge: VIA Technologies, Inc. VT8237 PCI Bridge
> 00:09.0 Ethernet controller: Marvell Technology Group Ltd. 88E8001 Gigabit
> Ethernet Controller (rev 13)
> 00:0a.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
> RTL-8139/8139C/8139C+ (rev 10)
> 00:0d.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
> RTL-8139/8139C/8139C+ (rev 10)
> 00:0e.0 Mass storage controller: Promise Technology, Inc. PDC40718 (SATA
> 300 TX4) (rev 02)
> 00:0f.0 RAID bus controller: VIA Technologies, Inc. VIA VT6420 SATA RAID
> Controller (rev 80)
> 00:0f.1 IDE interface: VIA Technologies, Inc.
> VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)
> 00:10.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1
> Controller (rev 81)
> 00:10.1 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1
> Controller (rev 81)
> 00:10.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1
> Controller (rev 81)
> 00:10.3 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1
> Controller (rev 81)
> 00:10.4 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 86)
> 00:11.0 ISA bridge: VIA Technologies, Inc. VT8237 ISA bridge
> [KT600/K8T800/K8T890 South]
> 00:11.5 Multimedia audio controller: VIA Technologies, Inc.
> VT8233/A/8235/8237 AC97 Audio Controller (rev 60)
> 00:11.6 Communication controller: VIA Technologies, Inc. AC'97 Modem
> Controller (rev 80)
> 01:00.0 VGA compatible controller: nVidia Corporation NV36.2 [GeForce FX
> 5700] (rev a1)
> 
> and the problems relate only to Seagate 7200.10 SATA-disks, never with the
> older 7200.7 SATA-disks alll connected to Promise Sata 300TX4-controller.
> 
> Because this problem has been around for as long as I've had the Promise
> Sata300TX4 controller an additional new problem is that after kernel
> version 2.6.21-rc3-git10 the libata error handling/interface speed
> downgrade has been fixed ---> these new seagate disks get downgraded from
> UDMA/133 to UDMA/33 overnight (can the speed downgrade  be disabled as a
> quick and dirty fix in this case somehow ?). For some reason the above
> mentioned libata error messages don't really do any noticeable harm but it
> would be very nice to be able to prevent the interface speed downgrade for
> now.
> 
>>> What do you mean by "merged speed down fix"? Is your fix for the speed
>>> down logic implemented in the current kernel, so I don't have to patch
>>> anymore (except when I want to force 1.5Gbps right from the beginning)?
>> Yeap, kernel will automatically downgrade to 1.5Gbps after several
>> failures.
> 
> Yes, this feature seems to work quite nicely as the included logs show.
> 
> Regards,
> Tomi Orava
> 
> PS. These problems are not special to this single machine as a friend at work
>       has the same Promise Sata300TX4 card with exactly the same Seagate
> 7200.10
>       SATA-disks on an intel-based P4 machine with similar problems under
> I/O-load.
> 
> ---------------------------------------------------------
> scsi0 : sata_promise
> scsi1 : sata_promise
> scsi2 : sata_promise
> scsi3 : sata_promise
> ata1: SATA max UDMA/133 cmd 0xf880a380 ctl 0xf880a3b8 bmdma 0x00000000 irq 0
> ata2: SATA max UDMA/133 cmd 0xf880a280 ctl 0xf880a2b8 bmdma 0x00000000 irq 0
> ata3: SATA max UDMA/133 cmd 0xf880a200 ctl 0xf880a238 bmdma 0x00000000 irq 0
> ata4: SATA max UDMA/133 cmd 0xf880a300 ctl 0xf880a338 bmdma 0x00000000 irq 0
> Switched to high resolution mode on CPU 0
> ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> ata1.00: ata_hpa_resize 1: sectors = 390721968, hpa_sectors = 390721968
> ata1.00: ATA-6: ST3200822AS, 3.01, max UDMA/133
> ata1.00: 390721968 sectors, multi 0: LBA48
> ata1.00: ata_hpa_resize 1: sectors = 390721968, hpa_sectors = 390721968
> ata1.00: configured for UDMA/133
> ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> ata2.00: ata_hpa_resize 1: sectors = 390721968, hpa_sectors = 390721968
> ata2.00: ATA-6: ST3200822AS, 3.01, max UDMA/133
> ata2.00: 390721968 sectors, multi 0: LBA48
> ata2.00: ata_hpa_resize 1: sectors = 390721968, hpa_sectors = 390721968
> ata2.00: configured for UDMA/133
> ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> ata3.00: ata_hpa_resize 1: sectors = 976773168, hpa_sectors = 976773168
> ata3.00: ATA-7: ST3500630AS, 3.AAK, max UDMA/133
> ata3.00: 976773168 sectors, multi 0: LBA48 NCQ (depth 0/32)
> ata3.00: ata_hpa_resize 1: sectors = 976773168, hpa_sectors = 976773168
> ata3.00: configured for UDMA/133
> ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> ata4.00: ata_hpa_resize 1: sectors = 976773168, hpa_sectors = 976773168
> ata4.00: ATA-7: ST3500630AS, 3.AAK, max UDMA/133
> ata4.00: 976773168 sectors, multi 0: LBA48 NCQ (depth 0/32)
> ata4.00: ata_hpa_resize 1: sectors = 976773168, hpa_sectors = 976773168
> ata4.00: configured for UDMA/133
> scsi 0:0:0:0: Direct-Access     ATA      ST3200822AS      3.01 PQ: 0 ANSI: 5
> sd 0:0:0:0: [sda] 390721968 512-byte hardware sectors (200050 MB)
> sd 0:0:0:0: [sda] Write Protect is off
> sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
> sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't
> support DPO or FUA
> sd 0:0:0:0: [sda] 390721968 512-byte hardware sectors (200050 MB)
> sd 0:0:0:0: [sda] Write Protect is off
> sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
> sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't
> support DPO or FUA
>  sda: sda1 sda2
> sd 0:0:0:0: [sda] Attached SCSI disk
> sd 0:0:0:0: Attached scsi generic sg0 type 0
> scsi 1:0:0:0: Direct-Access     ATA      ST3200822AS      3.01 PQ: 0 ANSI: 5
> sd 1:0:0:0: [sdb] 390721968 512-byte hardware sectors (200050 MB)
> sd 1:0:0:0: [sdb] Write Protect is off
> sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
> sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't
> support DPO or FUA
> sd 1:0:0:0: [sdb] 390721968 512-byte hardware sectors (200050 MB)
> sd 1:0:0:0: [sdb] Write Protect is off
> sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
> sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't
> support DPO or FUA
>  sdb: sdb1 sdb2
> sd 1:0:0:0: [sdb] Attached SCSI disk
> sd 1:0:0:0: Attached scsi generic sg1 type 0
> scsi 2:0:0:0: Direct-Access     ATA      ST3500630AS      3.AA PQ: 0 ANSI: 5
> sd 2:0:0:0: [sdc] 976773168 512-byte hardware sectors (500108 MB)
> sd 2:0:0:0: [sdc] Write Protect is off
> sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00
> sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't
> support DPO or FUA
> sd 2:0:0:0: [sdc] 976773168 512-byte hardware sectors (500108 MB)
> sd 2:0:0:0: [sdc] Write Protect is off
> sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00
> sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't
> support DPO or FUA
>  sdc: sdc1 sdc2
> sd 2:0:0:0: [sdc] Attached SCSI disk
> sd 2:0:0:0: Attached scsi generic sg2 type 0
> scsi 3:0:0:0: Direct-Access     ATA      ST3500630AS      3.AA PQ: 0 ANSI: 5
> sd 3:0:0:0: [sdd] 976773168 512-byte hardware sectors (500108 MB)
> sd 3:0:0:0: [sdd] Write Protect is off
> sd 3:0:0:0: [sdd] Mode Sense: 00 3a 00 00
> sd 3:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't
> support DPO or FUA
> sd 3:0:0:0: [sdd] 976773168 512-byte hardware sectors (500108 MB)
> sd 3:0:0:0: [sdd] Write Protect is off
> sd 3:0:0:0: [sdd] Mode Sense: 00 3a 00 00
> sd 3:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't
> support DPO or FUA
>  sdd: sdd1 sdd2
> sd 3:0:0:0: [sdd] Attached SCSI disk
> sd 3:0:0:0: Attached scsi generic sg3 type 0
> 


-- 
tejun

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: libata interface fatal error
@ 2007-06-18  7:05 Mikael Pettersson
  2007-06-18  7:13 ` Tejun Heo
                   ` (2 more replies)
  0 siblings, 3 replies; 41+ messages in thread
From: Mikael Pettersson @ 2007-06-18  7:05 UTC (permalink / raw)
  To: Tomi.Orava, htejun; +Cc: florian, jeff, linux-ide, mikpe

On Mon, 18 Jun 2007 15:28:44 +0900, Tejun Heo wrote:
> Yeah, it seems promise has some problem with 3G link.  Cc'ing Mikael
> Pettersson and quoting whole body for him.  Mikael, does this look familiar?
> 
> Tomi Orava wrote:
> > Hi Tejun,
> > 
> > I've been trying to find a solution for a long time for quite a similar
> > libata errror messages as shown in this thread. Perhaps you might get have
> > some ideas what the actual originator might be:
> > 
> > With the latest 2.6.22-rc4-git4 kernel I still get the following error
> > messages
> > with high I/O load:
> > 
> > sd 2:0:0:0: [sdc] 976773168 512-byte hardware sectors (500108 MB)
> > sd 2:0:0:0: [sdc] Write Protect is off
> > sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00
> > sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't
> > support DPO or FUA
> > ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2
> > ata3.00: (port_status 0x20080000)
> > ata3.00: cmd c8/00:08:af:91:49/00:00:00:00:00/e5 tag 0 cdb 0x0 data 4096 in
> >          res 50/00:00:b6:91:49/00:00:11:00:00/e5 Emask 0x2 (HSM violation)
> > ata3: soft resetting port
> > ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> > ata3.00: ata_hpa_resize 1: sectors = 976773168, hpa_sectors = 976773168
> > ata3.00: ata_hpa_resize 1: sectors = 976773168, hpa_sectors = 976773168
> > ata3.00: configured for UDMA/133
> > ata3: EH complete
> > 
> > ... and later in the chain ...
> > 
> > sd 2:0:0:0: [sdc] 976773168 512-byte hardware sectors (500108 MB)
> > sd 2:0:0:0: [sdc] Write Protect is off
> > sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00
> > sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't
> > support DPO or FUA
> > ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2
> > ata3.00: (port_status 0x20080000)
> > ata3.00: cmd c8/00:08:67:74:65/00:00:00:00:00/ec tag 0 cdb 0x0 data 4096 in
> >          res 50/00:00:6e:74:65/00:00:1b:00:00/ec Emask 0x2 (HSM violation)
> > ata3: soft resetting port
> > ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
> > ata3.00: ata_hpa_resize 1: sectors = 976773168, hpa_sectors = 976773168
> > ata3.00: ata_hpa_resize 1: sectors = 976773168, hpa_sectors = 976773168
> > ata3.00: configured for UDMA/100
> > ata3: EH complete
> > 
> > --- This goes on until UDMA/33 has been reched
...
> > and the problems relate only to Seagate 7200.10 SATA-disks, never with the
> > older 7200.7 SATA-disks alll connected to Promise Sata 300TX4-controller.
...
> > PS. These problems are not special to this single machine as a friend at work
> >       has the same Promise Sata300TX4 card with exactly the same Seagate
> > 7200.10
> >       SATA-disks on an intel-based P4 machine with similar problems under
> > I/O-load.

Yes, this is familiar. Several people have reported problems with
Seagate's 7200.10 disks in 3Gbps operation on sata_promise.
Unfortunately the error reports don't really give a clue as to what
the root cause is.

I used to be able to forcibly trigger similar errors with their
7200.9 disks, but I can't seem to do that any more.

/Mikael

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: libata interface fatal error
  2007-06-18  7:05 libata interface fatal error Mikael Pettersson
@ 2007-06-18  7:13 ` Tejun Heo
  2007-06-18 10:47   ` Florian Effenberger
  2007-06-18 17:14 ` Ansgar Knappheide
  2007-06-18 18:54 ` Tomi Orava
  2 siblings, 1 reply; 41+ messages in thread
From: Tejun Heo @ 2007-06-18  7:13 UTC (permalink / raw)
  To: Mikael Pettersson; +Cc: Tomi.Orava, florian, jeff, linux-ide

Mikael Pettersson wrote:
> Yes, this is familiar. Several people have reported problems with
> Seagate's 7200.10 disks in 3Gbps operation on sata_promise.
> Unfortunately the error reports don't really give a clue as to what
> the root cause is.
> 
> I used to be able to forcibly trigger similar errors with their
> 7200.9 disks, but I can't seem to do that any more.

Maybe we need to limit link speed to 1.5Gbps for these drives on
sata_promise?

-- 
tejun

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: libata interface fatal error
  2007-06-18  3:10                 ` Tejun Heo
  2007-06-18  6:08                   ` Tomi Orava
@ 2007-06-18 10:38                   ` Florian Effenberger
  2007-06-18 10:44                     ` Tejun Heo
  1 sibling, 1 reply; 41+ messages in thread
From: Florian Effenberger @ 2007-06-18 10:38 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-ide, jeff

Hi,

> Yeap, kernel will automatically downgrade to 1.5Gbps after several failures.

is there also a boot-time option to force 1.5Gbps right from booting up?

Florian

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: libata interface fatal error
  2007-06-18 10:38                   ` Florian Effenberger
@ 2007-06-18 10:44                     ` Tejun Heo
  0 siblings, 0 replies; 41+ messages in thread
From: Tejun Heo @ 2007-06-18 10:44 UTC (permalink / raw)
  To: Florian Effenberger; +Cc: linux-ide, jeff

Florian Effenberger wrote:
> Hi,
> 
>> Yeap, kernel will automatically downgrade to 1.5Gbps after several
>> failures.
> 
> is there also a boot-time option to force 1.5Gbps right from booting up?

Nope.

-- 
tejun

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: libata interface fatal error
  2007-06-18  3:13         ` Tejun Heo
@ 2007-06-18 10:44           ` Florian Effenberger
  2007-06-18 10:56             ` Tejun Heo
  0 siblings, 1 reply; 41+ messages in thread
From: Florian Effenberger @ 2007-06-18 10:44 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-ide, jeff

Hi,

> The controller being ich8, I'm pretty sure it isn't a driver problem.

I think so, too. The Intel chipsets have shown to be very good in the past.

> Do the errors occur on all four drives?  Also, if things work after
> speed is downgraded to 1.5Gbps, it doesn't really matter.  There's no
> noticeable performance difference for single disk anyway.

Yes, they do occur on all drives, as far as I know. With 1.5Gbps, the 
error doesn't occur much as often and not under normal circumstances, 
only when doing a real hard stress test.

Would it make sense to downgrade to 1.5 Gbps via a boot option?

Florian

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: libata interface fatal error
  2007-06-18  7:13 ` Tejun Heo
@ 2007-06-18 10:47   ` Florian Effenberger
  0 siblings, 0 replies; 41+ messages in thread
From: Florian Effenberger @ 2007-06-18 10:47 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Mikael Pettersson, Tomi.Orava, jeff, linux-ide

Hi,

> Maybe we need to limit link speed to 1.5Gbps for these drives on
> sata_promise?

in our case, it's a

   Vendor: ATA       Model: WDC WD1600YS-01S  Rev: 20.0
   Type:   Direct-Access                      ANSI SCSI revision: 05
   Vendor: ATA       Model: WDC WD3200YS-01P  Rev: 21.0
   Type:   Direct-Access                      ANSI SCSI revision: 05
   Vendor: ATA       Model: WDC WD3200YS-01P  Rev: 21.0
   Type:   Direct-Access                      ANSI SCSI revision: 05
   Vendor: ATA       Model: WDC WD3200YS-01P  Rev: 21.0
   Type:   Direct-Access                      ANSI SCSI revision: 05
   Vendor: ATA       Model: WDC WD3200YS-01P  Rev: 21.0
   Type:   Direct-Access                      ANSI SCSI revision: 05
   Vendor: ATA       Model: WDC WD3200YS-01P  Rev: 21.0
   Type:   Direct-Access                      ANSI SCSI revision: 05

on a

00:00.0 Host bridge: Intel Corporation P965/G965 Memory Controller Hub 
(rev 02)
00:01.0 PCI bridge: Intel Corporation P965/G965 PCI Express Root Port 
(rev 02)
00:1a.0 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI 
#4 (rev 02)
00:1a.1 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI 
#5 (rev 02)
00:1a.7 USB Controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI 
#2 (rev 02)
00:1b.0 Audio device: Intel Corporation 82801H (ICH8 Family) HD Audio 
Controller (rev 02)
00:1c.0 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express 
Port 1 (rev 02)
00:1c.4 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express 
Port 5 (rev 02)
00:1c.5 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express 
Port 6 (rev 02)
00:1d.0 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI 
#1 (rev 02)
00:1d.1 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI 
#2 (rev 02)
00:1d.2 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI 
#3 (rev 02)
00:1d.7 USB Controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI 
#1 (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev f2)
00:1f.0 ISA bridge: Intel Corporation 82801HB/HR (ICH8/R) LPC Interface 
Controller (rev 02)
00:1f.2 SATA controller: Intel Corporation 82801HB (ICH8) SATA AHCI 
Controller (rev 02)
00:1f.3 SMBus: Intel Corporation 82801H (ICH8 Family) SMBus Controller 
(rev 02)
01:00.0 VGA compatible controller: nVidia Corporation Unknown device 
016a (rev a1)
03:00.0 Ethernet controller: Marvell Technology Group Ltd. Unknown 
device 4364 (rev 12)
04:00.0 SATA controller: JMicron Technologies, Inc. JMicron 20360/20363 
AHCI Controller (rev 02)
04:00.1 IDE interface: JMicron Technologies, Inc. JMicron 20360/20363 
AHCI Controller (rev 02)
05:01.0 Ethernet controller: Intel Corporation 82557/8/9 [Ethernet Pro 
100] (rev 0c)
05:06.0 FireWire (IEEE 1394): Texas Instruments TSB43AB23 
IEEE-1394a-2000 Controller (PHY/Link)

Maybe blacklisting makes sense here, too?

Florian

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: libata interface fatal error
  2007-06-18 10:44           ` Florian Effenberger
@ 2007-06-18 10:56             ` Tejun Heo
  2007-06-18 11:28               ` Florian Effenberger
  2007-06-24 11:32               ` Florian Effenberger
  0 siblings, 2 replies; 41+ messages in thread
From: Tejun Heo @ 2007-06-18 10:56 UTC (permalink / raw)
  To: Florian Effenberger; +Cc: linux-ide, jeff

Florian Effenberger wrote:
> Hi,
> 
>> The controller being ich8, I'm pretty sure it isn't a driver problem.
> 
> I think so, too. The Intel chipsets have shown to be very good in the past.
> 
>> Do the errors occur on all four drives?  Also, if things work after
>> speed is downgraded to 1.5Gbps, it doesn't really matter.  There's no
>> noticeable performance difference for single disk anyway.
> 
> Yes, they do occur on all drives, as far as I know. With 1.5Gbps, the
> error doesn't occur much as often and not under normal circumstances,
> only when doing a real hard stress test.

Hmmm... Can you use a separate PSU to power two of the four drives and
see what happens?  Just power up a PSU as directed in the following
webpage and connect two of the harddrives to the PSU.

  http://modtown.co.uk/mt/article2.php?id=psumod

> Would it make sense to downgrade to 1.5 Gbps via a boot option?

I don't know.  Till now all the problem cases have been isolated to a
specific controller / drive combination (sata_promise and newer seagate
drives) or hardware configuration problem (most of them being PSU
issues), so I don't think we need such option yet.  If you have a
problematic hardware which pukes on 3.0Gbps, libata should do the right
thing after complaining a bit which IMHO isn't too bad.

-- 
tejun

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: libata interface fatal error
  2007-06-18 10:56             ` Tejun Heo
@ 2007-06-18 11:28               ` Florian Effenberger
  2007-06-18 11:30                 ` Tejun Heo
  2007-06-24 11:32               ` Florian Effenberger
  1 sibling, 1 reply; 41+ messages in thread
From: Florian Effenberger @ 2007-06-18 11:28 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-ide, jeff

Hi Tejun,

> Hmmm... Can you use a separate PSU to power two of the four drives and
> see what happens?  Just power up a PSU as directed in the following
> webpage and connect two of the harddrives to the PSU.
> 
>   http://modtown.co.uk/mt/article2.php?id=psumod

thanks for that link, we will try that and keep you updated what happens!

> I don't know.  Till now all the problem cases have been isolated to a
> specific controller / drive combination (sata_promise and newer seagate
> drives) or hardware configuration problem (most of them being PSU
> issues), so I don't think we need such option yet.  If you have a
> problematic hardware which pukes on 3.0Gbps, libata should do the right
> thing after complaining a bit which IMHO isn't too bad.

So, loss of data or data corruption can't occur, even when we have to 
wait until the speed is limited?

Florian














^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: libata interface fatal error
  2007-06-18 11:28               ` Florian Effenberger
@ 2007-06-18 11:30                 ` Tejun Heo
  2007-06-18 11:32                   ` Florian Effenberger
  0 siblings, 1 reply; 41+ messages in thread
From: Tejun Heo @ 2007-06-18 11:30 UTC (permalink / raw)
  To: Florian Effenberger; +Cc: linux-ide, jeff

Florian Effenberger wrote:
>> I don't know.  Till now all the problem cases have been isolated to a
>> specific controller / drive combination (sata_promise and newer seagate
>> drives) or hardware configuration problem (most of them being PSU
>> issues), so I don't think we need such option yet.  If you have a
>> problematic hardware which pukes on 3.0Gbps, libata should do the right
>> thing after complaining a bit which IMHO isn't too bad.
> 
> So, loss of data or data corruption can't occur, even when we have to
> wait until the speed is limited?

Nope, there's nothing to worry about.

-- 
tejun

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: libata interface fatal error
  2007-06-18 11:30                 ` Tejun Heo
@ 2007-06-18 11:32                   ` Florian Effenberger
  0 siblings, 0 replies; 41+ messages in thread
From: Florian Effenberger @ 2007-06-18 11:32 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-ide, jeff

Hi,

> Nope, there's nothing to worry about.

okay, thanks a lot so far, it is good to know that developers are there 
to help. ;-)

I will let you know how it turned out with the second PSU.

Florian

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: libata interface fatal error
  2007-06-18  7:05 libata interface fatal error Mikael Pettersson
  2007-06-18  7:13 ` Tejun Heo
@ 2007-06-18 17:14 ` Ansgar Knappheide
  2007-06-18 18:54 ` Tomi Orava
  2 siblings, 0 replies; 41+ messages in thread
From: Ansgar Knappheide @ 2007-06-18 17:14 UTC (permalink / raw)
  To: linux-ide

Mikael Pettersson schrieb:
> On Mon, 18 Jun 2007 15:28:44 +0900, Tejun Heo wrote:
>   
>> Yeah, it seems promise has some problem with 3G link.  Cc'ing Mikael
>> Pettersson and quoting whole body for him.  Mikael, does this look familiar?
>>
>> Tomi Orava wrote:
>>     
>>> Hi Tejun,
>>>
>>> I've been trying to find a solution for a long time for quite a similar
>>> libata errror messages as shown in this thread. Perhaps you might get have
>>> some ideas what the actual originator might be:
>>>
>>> With the latest 2.6.22-rc4-git4 kernel I still get the following error
>>> messages
>>> with high I/O load:
>>>
>>> sd 2:0:0:0: [sdc] 976773168 512-byte hardware sectors (500108 MB)
>>> sd 2:0:0:0: [sdc] Write Protect is off
>>> sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00
>>> sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't
>>> support DPO or FUA
>>> ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2
>>> ata3.00: (port_status 0x20080000)
>>> ata3.00: cmd c8/00:08:af:91:49/00:00:00:00:00/e5 tag 0 cdb 0x0 data 4096 in
>>>          res 50/00:00:b6:91:49/00:00:11:00:00/e5 Emask 0x2 (HSM violation)
>>> ata3: soft resetting port
>>> ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>>> ata3.00: ata_hpa_resize 1: sectors = 976773168, hpa_sectors = 976773168
>>> ata3.00: ata_hpa_resize 1: sectors = 976773168, hpa_sectors = 976773168
>>> ata3.00: configured for UDMA/133
>>> ata3: EH complete
>>>
>>> ... and later in the chain ...
>>>
>>> sd 2:0:0:0: [sdc] 976773168 512-byte hardware sectors (500108 MB)
>>> sd 2:0:0:0: [sdc] Write Protect is off
>>> sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00
>>> sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't
>>> support DPO or FUA
>>> ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2
>>> ata3.00: (port_status 0x20080000)
>>> ata3.00: cmd c8/00:08:67:74:65/00:00:00:00:00/ec tag 0 cdb 0x0 data 4096 in
>>>          res 50/00:00:6e:74:65/00:00:1b:00:00/ec Emask 0x2 (HSM violation)
>>> ata3: soft resetting port
>>> ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
>>> ata3.00: ata_hpa_resize 1: sectors = 976773168, hpa_sectors = 976773168
>>> ata3.00: ata_hpa_resize 1: sectors = 976773168, hpa_sectors = 976773168
>>> ata3.00: configured for UDMA/100
>>> ata3: EH complete
>>>
>>> --- This goes on until UDMA/33 has been reched
>>>       
> ...
>   
>>> and the problems relate only to Seagate 7200.10 SATA-disks, never with the
>>> older 7200.7 SATA-disks alll connected to Promise Sata 300TX4-controller.
>>>       
> ...
>   
>>> PS. These problems are not special to this single machine as a friend at work
>>>       has the same Promise Sata300TX4 card with exactly the same Seagate
>>> 7200.10
>>>       SATA-disks on an intel-based P4 machine with similar problems under
>>> I/O-load.
>>>       
>
> Yes, this is familiar. Several people have reported problems with
> Seagate's 7200.10 disks in 3Gbps operation on sata_promise.
> Unfortunately the error reports don't really give a clue as to what
> the root cause is.
>
> I used to be able to forcibly trigger similar errors with their
> 7200.9 disks, but I can't seem to do that any more.
>
>   
Hello,

I'm jumping in this thread, because I'm seeing the same probleme on my 
system with Promise SATAII 150 TX4 (PDC40518) and harddrive Maxtor 
6L200M0 (BANC1E00) with following error

Jun 18 01:16:03 buffy kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 
0x0 action 0x2
Jun 18 01:16:03 buffy kernel: ata1.00: (port_status 0x20080000)
Jun 18 01:16:03 buffy kernel: ata1.00: cmd 
c8/00:15:e1:e3:16/00:00:00:00:00/e6 tag 0 cdb 0x0 data 10752 in
Jun 18 01:16:03 buffy kernel:          res 
50/00:00:f5:e3:16/00:00:00:00:00/e6 Emask 0x2 (HSM violation)
Jun 18 01:16:03 buffy kernel: ata1: soft resetting port
Jun 18 01:16:03 buffy kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 
SControl 300)
Jun 18 01:16:03 buffy kernel: ata1.00: ata_hpa_resize 1: sectors = 
398297088, hpa_sectors = 398297088
Jun 18 01:16:03 buffy kernel: ata1.00: ata_hpa_resize 1: sectors = 
398297088, hpa_sectors = 398297088
Jun 18 01:16:03 buffy kernel: ata1.00: configured for UDMA/133
Jun 18 01:16:03 buffy kernel: sd 0:0:0:0: [sda] Result: hostbyte=0x00 
driverbyte=0x08
Jun 18 01:16:03 buffy kernel: sd 0:0:0:0: [sda] Sense Key : 0xb 
[current] [descriptor]
Jun 18 01:16:03 buffy kernel: Descriptor sense data with sense 
descriptors (in hex):
Jun 18 01:16:03 buffy kernel:         72 0b 00 00 00 00 00 0c 00 0a 80 
00 00 00 00 00
Jun 18 01:16:03 buffy kernel:         06 16 e3 f5
Jun 18 01:16:03 buffy kernel: sd 0:0:0:0: [sda] ASC=0x0 ASCQ=0x0
Jun 18 01:16:03 buffy kernel: end_request: I/O error, dev sda, sector 
102163425
Jun 18 01:16:03 buffy kernel: ata1: EH complete
Jun 18 01:16:03 buffy kernel: sd 0:0:0:0: [sda] 398297088 512-byte 
hardware sectors (203928 MB)
Jun 18 01:16:03 buffy kernel: sd 0:0:0:0: [sda] Write Protect is off
Jun 18 01:16:03 buffy kernel: sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
Jun 18 01:16:03 buffy kernel: sd 0:0:0:0: [sda] Write cache: enabled, 
read cache: enabled, doesn't support DPO or FUA

On normal use this error shows up only once a week, but when transfering 
lot of data (> 100MB) to USB-Stick that error shows every few seconds 
with only different values for data. When transfering data from 
USB-Stick to harddrive no error shows.

Other information on my system:

smartctl -d sat -a /dev/sda
smartctl version 5.38 [i686-suse-linux] Copyright (C) 2002-7 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Model Family:     Maxtor DiamondMax 10 family (ATA/133 and SATA/150)
Device Model:     Maxtor 6L200M0
Serial Number:    L40A4PDH
Firmware Version: BANC1E00
User Capacity:    203.928.109.056 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   7
ATA Standard is:  ATA/ATAPI-7 T13 1532D revision 0
Local Time is:    Mon Jun 18 19:11:50 2007 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

Warning! SMART Attribute Thresholds Structure error: invalid SMART checksum.
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x02) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: 
Disabled.
Self-test execution status:      (   0) The previous self-test routine 
completed
                                        without error or no self-test 
has ever
                                        been run.
Total time to complete Offline
data collection:                 (1562) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection 
on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  81) minutes.
SCT capabilities:              (0x0021) SCT Status supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      
UPDATED  WHEN_FAILED RAW_VALUE
  3 Spin_Up_Time            0x0027   206   204   063    Pre-fail  
Always       -       10179
  4 Start_Stop_Count        0x0032   253   253   000    Old_age   
Always       -       1502
  5 Reallocated_Sector_Ct   0x0033   253   253   063    Pre-fail  
Always       -       0
  6 Read_Channel_Margin     0x0001   253   253   100    Pre-fail  
Offline      -       0
  7 Seek_Error_Rate         0x000a   253   252   000    Old_age   
Always       -       0
  8 Seek_Time_Performance   0x0027   246   240   187    Pre-fail  
Always       -       37304
  9 Power_On_Minutes        0x0032   239   239   000    Old_age   
Always       -       539h+13m
 10 Spin_Retry_Count        0x002b   253   252   157    Pre-fail  
Always       -       0
 11 Calibration_Retry_Count 0x002b   253   252   223    Pre-fail  
Always       -       0
 12 Power_Cycle_Count       0x0032   250   250   000    Old_age   
Always       -       1570
192 Power-Off_Retract_Count 0x0032   253   253   000    Old_age   
Always       -       0
193 Load_Cycle_Count        0x0032   253   253   000    Old_age   
Always       -       0
194 Temperature_Celsius     0x0032   031   253   000    Old_age   
Always       -       33
195 Hardware_ECC_Recovered  0x000a   253   252   000    Old_age   
Always       -       9263
196 Reallocated_Event_Count 0x0008   253   253   000    Old_age   
Offline      -       0
197 Current_Pending_Sector  0x0008   253   253   000    Old_age   
Offline      -       0
198 Offline_Uncorrectable   0x0008   253   253   000    Old_age   
Offline      -       0
199 UDMA_CRC_Error_Count    0x0008   199   199   000    Old_age   
Offline      -       0
200 Multi_Zone_Error_Rate   0x000a   253   252   000    Old_age   
Always       -       0
201 Soft_Read_Error_Rate    0x000a   253   252   000    Old_age   
Always       -       0
202 TA_Increase_Count       0x000a   253   252   000    Old_age   
Always       -       0
203 Run_Out_Cancel          0x000b   253   252   180    Pre-fail  
Always       -       0
204 Shock_Count_Write_Opern 0x000a   253   252   000    Old_age   
Always       -       0
205 Shock_Rate_Write_Opern  0x000a   253   252   000    Old_age   
Always       -       0
207 Spin_High_Current       0x002a   253   252   000    Old_age   
Always       -       0
208 Spin_Buzz               0x002a   253   252   000    Old_age   
Always       -       0
209 Offline_Seek_Performnce 0x0024   239   239   000    Old_age   
Offline      -       179
210 Unknown_Attribute       0x0032   253   252   000    Old_age   
Always       -       0
211 Unknown_Attribute       0x0032   253   252   000    Old_age   
Always       -       0
212 Unknown_Attribute       0x0032   253   252   000    Old_age   
Always       -       0

Warning! SMART ATA Error Log Structure error: invalid SMART checksum.
SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  
LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      
1163         -
# 2  Short offline       Completed without error       00%      
1163         -
# 3  Offline             Aborted by host               70%         
0         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

hdparm -I /dev/sda

/dev/sda:

ATA device, with non-removable media
        Model Number:       Maxtor 6L200M0
        Serial Number:      L40A4PDH
        Firmware Revision:  BANC1E00
Standards:
        Used: ATA/ATAPI-7 T13 1532D revision 0
        Supported: 7 6 5 4
Configuration:
        Logical         max     current
        cylinders       16383   16383
        heads           16      16
        sectors/track   63      63
        --
        CHS current addressable sectors:   16514064
        LBA    user addressable sectors:  268435455
        LBA48  user addressable sectors:  398297088
        device size with M = 1024*1024:      194481 MBytes
        device size with M = 1000*1000:      203928 MBytes (203 GB)
Capabilities:
        LBA, IORDY(can be disabled)
        Queue depth: 32
        Standby timer values: spec'd by Standard, no device specific minimum
        R/W multiple sector transfer: Max = 16  Current = 0
        Advanced power management level: unknown setting (0x0000)
        Recommended acoustic management value: 192, current value: 254
        DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6
             Cycle time: min=120ns recommended=120ns
        PIO: pio0 pio1 pio2 pio3 pio4
             Cycle time: no flow control=120ns  IORDY flow control=120ns
Commands/features:
        Enabled Supported:
           *    SMART feature set
                Security Mode feature set
           *    Power Management feature set
           *    Write cache
           *    Look-ahead
           *    Host Protected Area feature set
           *    WRITE_VERIFY command
           *    WRITE_BUFFER command
           *    READ_BUFFER command
           *    NOP cmd
           *    DOWNLOAD_MICROCODE
                Advanced Power Management feature set
                SET_MAX security extension
           *    Automatic Acoustic Management feature set
           *    48-bit Address feature set
           *    Device Configuration Overlay feature set
           *    Mandatory FLUSH_CACHE
           *    FLUSH_CACHE_EXT
           *    SMART error logging
           *    SMART self-test
           *    General Purpose Logging feature set
           *    WRITE_{DMA|MULTIPLE}_FUA_EXT
           *    SATA-I signaling speed (1.5Gb/s)
           *    Native Command Queueing (NCQ)
                Software settings preservation
           *    SMART Command Transport (SCT) feature set
           *    SCT Data Tables (AC5)
Security:
        Master password revision code = 65534
                supported
        not     enabled
        not     locked
        not     frozen
        not     expired: security count
        not     supported: enhanced erase
Checksum: correct

lspci
00:00.0 Host bridge: VIA Technologies, Inc. VT8377 [KT400/KT600 AGP] 
Host Bridge
00:01.0 PCI bridge: VIA Technologies, Inc. VT8235 PCI Bridge
00:06.0 Ethernet controller: D-Link System Inc RTL8139 Ethernet (rev 10)
00:07.0 Mass storage controller: Promise Technology, Inc. 
PDC20518/PDC40518 (SATAII 150 TX4) (rev 02)
00:0b.0 Multimedia audio controller: Creative Labs SB Live! EMU10k1 (rev 08)
00:0b.1 Input device controller: Creative Labs SB Live! Game Port (rev 08)
00:0c.0 Multimedia audio controller: C-Media Electronics Inc CM8738 (rev 10)
00:10.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 
Controller (rev 80)
00:10.1 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 
Controller (rev 80)
00:10.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 
Controller (rev 80)
00:10.3 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 82)
00:11.0 ISA bridge: VIA Technologies, Inc. VT8235 ISA Bridge
00:11.1 IDE interface: VIA Technologies, Inc. 
VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)
01:00.0 VGA compatible controller: nVidia Corporation NV25 [GeForce4 Ti 
4200] (rev a3)


Perhaps this will help to resolve the problem

Ansgar


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: libata interface fatal error
  2007-06-18  7:05 libata interface fatal error Mikael Pettersson
  2007-06-18  7:13 ` Tejun Heo
  2007-06-18 17:14 ` Ansgar Knappheide
@ 2007-06-18 18:54 ` Tomi Orava
  2 siblings, 0 replies; 41+ messages in thread
From: Tomi Orava @ 2007-06-18 18:54 UTC (permalink / raw)
  Cc: htejun, florian, jeff, linux-ide, mikpe


> On Mon, 18 Jun 2007 15:28:44 +0900, Tejun Heo wrote:
>> Yeah, it seems promise has some problem with 3G link.  Cc'ing Mikael
>> Pettersson and quoting whole body for him.  Mikael, does this look
>> familiar?
>>
>> Tomi Orava wrote:
>> > Hi Tejun,
>> >
>> > I've been trying to find a solution for a long time for quite a
>> similar
>> > libata errror messages as shown in this thread. Perhaps you might get
>> have
>> > some ideas what the actual originator might be:
>> >
>> > With the latest 2.6.22-rc4-git4 kernel I still get the following error
>> > messages
>> > with high I/O load:

<snip>

>> > and the problems relate only to Seagate 7200.10 SATA-disks, never with
>> the
>> > older 7200.7 SATA-disks alll connected to Promise Sata
>> 300TX4-controller.
> ...
>> > PS. These problems are not special to this single machine as a friend
>> at work
>> >       has the same Promise Sata300TX4 card with exactly the same
>> Seagate
>> > 7200.10
>> >       SATA-disks on an intel-based P4 machine with similar problems
>> under
>> > I/O-load.
>
> Yes, this is familiar. Several people have reported problems with
> Seagate's 7200.10 disks in 3Gbps operation on sata_promise.
> Unfortunately the error reports don't really give a clue as to what
> the root cause is.
>
> I used to be able to forcibly trigger similar errors with their
> 7200.9 disks, but I can't seem to do that any more.

Hmm, are you really sure that this is 3Gbps mode related ?
I'm wondering about that as the problem is there no matter if the
1.5Gbps jumper is set on the 7200.10 disks or not. Also I retested
your older sata_promise 1.5Gbps speed limit patch and it did not
fix the problem. This is really strange!

I've now connected the problematic two 7200.10 disks into Via VT6420
controller and the problem has been fixed for me (for now). It would be great
to figure out what is the actual problem here though ...

Regards,
Tomi Orava

-- 
Tomi.Orava@ncircle.nullnet.fi


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: libata interface fatal error
  2007-06-18 10:56             ` Tejun Heo
  2007-06-18 11:28               ` Florian Effenberger
@ 2007-06-24 11:32               ` Florian Effenberger
  2007-06-25  2:49                 ` Tejun Heo
  1 sibling, 1 reply; 41+ messages in thread
From: Florian Effenberger @ 2007-06-24 11:32 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-ide, jeff

Hi there,

sorry, it seems it was all a false alert and our mainboard was 
defective. At the end, it turned on only sometimes. To test it, we 
wanted to install Windows, which didn't work as well.

Now the dealer changed the motherboard, and we are just fine with 3.0 
Gbps and Kernel 2.6.21.5.

Sorry for the big confusion and for your great help! I didn't know the 
board was defective in the first place, there have been no indications 
like that...

Florian

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: libata interface fatal error
  2007-06-24 11:32               ` Florian Effenberger
@ 2007-06-25  2:49                 ` Tejun Heo
  2007-06-25  8:47                   ` Florian Effenberger
  0 siblings, 1 reply; 41+ messages in thread
From: Tejun Heo @ 2007-06-25  2:49 UTC (permalink / raw)
  To: Florian Effenberger; +Cc: linux-ide, jeff

Florian Effenberger wrote:
> sorry, it seems it was all a false alert and our mainboard was
> defective. At the end, it turned on only sometimes. To test it, we
> wanted to install Windows, which didn't work as well.
> 
> Now the dealer changed the motherboard, and we are just fine with 3.0
> Gbps and Kernel 2.6.21.5.
> 
> Sorry for the big confusion and for your great help! I didn't know the
> board was defective in the first place, there have been no indications
> like that...

Yeah, things like these are tricky.  SATA is usually the first one to
suffer from hardware defect including power fluctuation due to input
power, PSU or on-board voltage regulator problems because the link is
relatively long and runs at very high speed.  I also heard that SATA
cables should have been made more resistant to interference but I'm no
expert in that area.

It's interesting to see how it got solved.  Thanks for another data
point to blame hardware when I don't have a clue.  :-)

-- 
tejun

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: libata interface fatal error
  2007-06-25  2:49                 ` Tejun Heo
@ 2007-06-25  8:47                   ` Florian Effenberger
  0 siblings, 0 replies; 41+ messages in thread
From: Florian Effenberger @ 2007-06-25  8:47 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-ide, jeff

Hi Tejun,

> Yeah, things like these are tricky.  SATA is usually the first one to
> suffer from hardware defect including power fluctuation due to input
> power, PSU or on-board voltage regulator problems because the link is
> relatively long and runs at very high speed.  I also heard that SATA
> cables should have been made more resistant to interference but I'm no
> expert in that area.

me neither. I first thought of a driver issue, because the machine just 
ran fine and started to have mysterious effects some weeks later...

> It's interesting to see how it got solved.  Thanks for another data
> point to blame hardware when I don't have a clue.  :-)

Hehe, you're welcome. ;-)

Thanks for all your efforts, I really appreciate them!

Florian


^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2007-06-25  8:47 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-06-18  7:05 libata interface fatal error Mikael Pettersson
2007-06-18  7:13 ` Tejun Heo
2007-06-18 10:47   ` Florian Effenberger
2007-06-18 17:14 ` Ansgar Knappheide
2007-06-18 18:54 ` Tomi Orava
  -- strict thread matches above, loose matches on Subject: below --
2007-05-26  9:43 Florian Effenberger
2007-05-29  9:16 ` Tejun Heo
2007-05-29 14:16   ` Florian Effenberger
2007-06-06 21:23   ` Florian Effenberger
2007-06-07  9:50     ` Tejun Heo
2007-06-07 14:08       ` Florian Effenberger
2007-06-13 10:37       ` Florian Effenberger
2007-06-14  9:43         ` Tejun Heo
2007-06-14 11:12           ` Florian Effenberger
2007-06-14 12:25             ` Tejun Heo
2007-06-14 15:12               ` Florian Effenberger
2007-06-18  3:10                 ` Tejun Heo
2007-06-18  6:08                   ` Tomi Orava
2007-06-18  6:28                     ` Tejun Heo
2007-06-18 10:38                   ` Florian Effenberger
2007-06-18 10:44                     ` Tejun Heo
2007-06-16 10:23       ` Florian Effenberger
2007-06-18  3:13         ` Tejun Heo
2007-06-18 10:44           ` Florian Effenberger
2007-06-18 10:56             ` Tejun Heo
2007-06-18 11:28               ` Florian Effenberger
2007-06-18 11:30                 ` Tejun Heo
2007-06-18 11:32                   ` Florian Effenberger
2007-06-24 11:32               ` Florian Effenberger
2007-06-25  2:49                 ` Tejun Heo
2007-06-25  8:47                   ` Florian Effenberger
2007-05-24 13:25 Florian Effenberger
2007-05-24 13:45 ` Tejun Heo
2007-05-24 14:08   ` Florian Effenberger
2007-05-24 14:21     ` Tejun Heo
2007-05-24 14:47       ` Florian Effenberger
2007-05-24 14:53         ` Tejun Heo
2007-05-24 15:28           ` Florian Effenberger
2007-05-24 14:55         ` Greg Freemyer
2007-05-24 14:59           ` Tejun Heo
2007-05-24 15:00           ` Florian Effenberger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).