linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* SATA timeouts on two disks
@ 2008-01-13  0:56 Jim MacBaine
  2008-01-13 12:07 ` Mikael Pettersson
  0 siblings, 1 reply; 9+ messages in thread
From: Jim MacBaine @ 2008-01-13  0:56 UTC (permalink / raw)
  To: linux-ide

Hi,

Recently I'm experiencing strange sata errors on my desktop system.
The system was recently equipped with three 250 GB SATA drives from
three different manufacturers and I'm having an identical problem on
two of them.  The drives are connected to two on-board controllers on
an Asus A8V board, which were both running with Linux for more than
two years with older SATA disks without problems. A hardware failure
seems unlikely to me as the same error occurrs on two brand new disks
from two different manufacturers.  I'm running a vanilla 2.6.23.12
kernel.

Errror on sdc happened about 10 times tonight, each time I could hear
the disk spin down and up again, while the system was frozen for
several seconds:

ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x180000 action 0x2 frozen
ata2.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0x0 data 0
         res 40/00:00:00:00:40/00:00:00:00:00/00 Emask 0x4 (timeout)
ata2: soft resetting port
ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata2.00: configured for UDMA/133
ata2: EH complete
sd 1:0:0:0: [sdb] 488397168 512-byte hardware sectors (250059 MB)
sd 1:0:0:0: [sdb] Write Protect is off
sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA

In the log I also found several identical errors on one other drive:

ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata5.00: cmd 25/00:08:b7:f2:11/00:00:13:00:00/e0 tag 0 cdb 0x0 data 4096 in
         res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
ata5: soft resetting port
ata5.00: configured for UDMA/33
ata5: EH complete
sd 4:0:0:0: [sdc] 488397168 512-byte hardware sectors (250059 MB)
sd 4:0:0:0: [sdc] Write Protect is off
sd 4:0:0:0: [sdc] Mode Sense: 00 3a 00 00
sd 4:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA

Can this be the result of a hardware failure?  I've seen several
drives being added to an NCQ blacklist during the last weeks.  Is it
possible that my drives need to be added here, too?  Or have I just
two failing drives?

Thanks a lot for any clues,
Jim


System boot log extract:

sata_promise 0000:00:08.0: version 2.10
ACPI: PCI Interrupt 0000:00:08.0[A] -> GSI 18 (level, low) -> IRQ 18
scsi0 : sata_promise
scsi1 : sata_promise
scsi2 : sata_promise
ata1: SATA max UDMA/133 cmd 0xf882e200 ctl 0xf882e238 bmdma 0x00000000 irq 18
ata2: SATA max UDMA/133 cmd 0xf882e280 ctl 0xf882e2b8 bmdma 0x00000000 irq 18
ata3: PATA max UDMA/133 cmd 0xf882e300 ctl 0xf882e338 bmdma 0x00000000 irq 18
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata1.00: ATA-8: SAMSUNG HD252KJ, CM100-12, max UDMA7
ata1.00: 488397168 sectors, multi 0: LBA48 NCQ (depth 0/32)
ata1.00: configured for UDMA/133
ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata2.00: ATA-7: WDC WD2500JS-55NCB1, 10.02E01, max UDMA/133
ata2.00: 488397168 sectors, multi 0: LBA48 NCQ (depth 0/32)
ata2.00: configured for UDMA/133
scsi 0:0:0:0: Direct-Access     ATA      SAMSUNG HD252KJ  CM10 PQ: 0 ANSI: 5
sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors (250059 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors (250059 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
 sda: sda2 sda3
sd 0:0:0:0: [sda] Attached SCSI disk
scsi 1:0:0:0: Direct-Access     ATA      WDC WD2500JS-55N 10.0 PQ: 0 ANSI: 5
sd 1:0:0:0: [sdb] 488397168 512-byte hardware sectors (250059 MB)
sd 1:0:0:0: [sdb] Write Protect is off
sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
sd 1:0:0:0: [sdb] 488397168 512-byte hardware sectors (250059 MB)
sd 1:0:0:0: [sdb] Write Protect is off
sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
 sdb: sdb2 sdb3
sd 1:0:0:0: [sdb] Attached SCSI disk
sata_via 0000:00:0f.0: version 2.3
ACPI: PCI Interrupt 0000:00:0f.0[B] -> GSI 20 (level, low) -> IRQ 17
sata_via 0000:00:0f.0: routed to hard irq line 10
scsi3 : sata_via
scsi4 : sata_via
ata4: SATA max UDMA/133 cmd 0x0001d000 ctl 0x0001c802 bmdma 0x0001b800 irq 17
ata5: SATA max UDMA/133 cmd 0x0001c400 ctl 0x0001c002 bmdma 0x0001b808 irq 17
ata4: SATA link down 1.5 Gbps (SStatus 0 SControl 300)
ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata5.00: ATA-7: MAXTOR STM3250820AS, 3.AAE, max UDMA/133
ata5.00: 488397168 sectors, multi 16: LBA48 NCQ (depth 0/32)
ata5.00: configured for UDMA/133
scsi 4:0:0:0: Direct-Access     ATA      MAXTOR STM325082 3.AA PQ: 0 ANSI: 5
sd 4:0:0:0: [sdc] 488397168 512-byte hardware sectors (250059 MB)
sd 4:0:0:0: [sdc] Write Protect is off
sd 4:0:0:0: [sdc] Mode Sense: 00 3a 00 00
sd 4:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
sd 4:0:0:0: [sdc] 488397168 512-byte hardware sectors (250059 MB)
sd 4:0:0:0: [sdc] Write Protect is off
sd 4:0:0:0: [sdc] Mode Sense: 00 3a 00 00
sd 4:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
 sdc: sdc1 sdc2 sdc3
sd 4:0:0:0: [sdc] Attached SCSI disk


# lspci -nn
00:00.0 Host bridge [0600]: VIA Technologies, Inc. K8T800Pro Host
Bridge [1106:0282]
00:00.1 Host bridge [0600]: VIA Technologies, Inc. K8T800Pro Host
Bridge [1106:1282]
00:00.2 Host bridge [0600]: VIA Technologies, Inc. K8T800Pro Host
Bridge [1106:2282]
00:00.3 Host bridge [0600]: VIA Technologies, Inc. K8T800Pro Host
Bridge [1106:3282]
00:00.4 Host bridge [0600]: VIA Technologies, Inc. K8T800Pro Host
Bridge [1106:4282]
00:00.7 Host bridge [0600]: VIA Technologies, Inc. K8T800Pro Host
Bridge [1106:7282]
00:01.0 PCI bridge [0604]: VIA Technologies, Inc. VT8237 PCI bridge
[K8T800/K8T890 South] [1106:b188]
00:08.0 RAID bus controller [0104]: Promise Technology, Inc. PDC20378
(FastTrak 378/SATA 378) [105a:3373] (rev 02)
00:09.0 Multimedia video controller [0400]: Brooktree Corporation
Bt848 Video Capture [109e:0350] (rev 12)
00:0a.0 Ethernet controller [0200]: Marvell Technology Group Ltd.
88E8001 Gigabit Ethernet Controller [11ab:4320] (rev 13)
00:0f.0 RAID bus controller [0104]: VIA Technologies, Inc. VIA VT6420
SATA RAID Controller [1106:3149] (rev 80)
00:0f.1 IDE interface [0101]: VIA Technologies, Inc.
VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE [1106:0571]
(rev 06)
00:10.0 USB Controller [0c03]: VIA Technologies, Inc. VT82xxxxx UHCI
USB 1.1 Controller [1106:3038] (rev 81)
00:10.1 USB Controller [0c03]: VIA Technologies, Inc. VT82xxxxx UHCI
USB 1.1 Controller [1106:3038] (rev 81)
00:10.2 USB Controller [0c03]: VIA Technologies, Inc. VT82xxxxx UHCI
USB 1.1 Controller [1106:3038] (rev 81)
00:10.3 USB Controller [0c03]: VIA Technologies, Inc. VT82xxxxx UHCI
USB 1.1 Controller [1106:3038] (rev 81)
00:10.4 USB Controller [0c03]: VIA Technologies, Inc. USB 2.0
[1106:3104] (rev 86)
00:11.0 ISA bridge [0601]: VIA Technologies, Inc. VT8237 ISA bridge
[KT600/K8T800/K8T890 South] [1106:3227]
00:11.5 Multimedia audio controller [0401]: VIA Technologies, Inc.
VT8233/A/8235/8237 AC97 Audio Controller [1106:3059] (rev 60)
00:11.6 Communication controller [0780]: VIA Technologies, Inc. AC'97
Modem Controller [1106:3068] (rev 80)
00:18.0 Host bridge [0600]: Advanced Micro Devices [AMD] K8
[Athlon64/Opteron] HyperTransport Technology Configuration [1022:1100]
00:18.1 Host bridge [0600]: Advanced Micro Devices [AMD] K8
[Athlon64/Opteron] Address Map [1022:1101]
00:18.2 Host bridge [0600]: Advanced Micro Devices [AMD] K8
[Athlon64/Opteron] DRAM Controller [1022:1102]
00:18.3 Host bridge [0600]: Advanced Micro Devices [AMD] K8
[Athlon64/Opteron] Miscellaneous Control [1022:1103]
01:00.0 VGA compatible controller [0300]: Matrox Graphics, Inc. MGA
G400/G450 [102b:0525] (rev 82)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: SATA timeouts on two disks
  2008-01-13  0:56 SATA timeouts on two disks Jim MacBaine
@ 2008-01-13 12:07 ` Mikael Pettersson
  2008-01-19 16:22   ` Jim MacBaine
  0 siblings, 1 reply; 9+ messages in thread
From: Mikael Pettersson @ 2008-01-13 12:07 UTC (permalink / raw)
  To: Jim MacBaine; +Cc: linux-ide

Jim MacBaine writes:
 > Hi,
 > 
 > Recently I'm experiencing strange sata errors on my desktop system.
 > The system was recently equipped with three 250 GB SATA drives from

Clue #1: added drives

 > three different manufacturers and I'm having an identical problem on
 > two of them.  The drives are connected to two on-board controllers on
 > an Asus A8V board, which were both running with Linux for more than
 > two years with older SATA disks without problems. A hardware failure
 > seems unlikely to me as the same error occurrs on two brand new disks
 > from two different manufacturers.  I'm running a vanilla 2.6.23.12
 > kernel.
 > 
 > Errror on sdc happened about 10 times tonight, each time I could hear
 > the disk spin down and up again, while the system was frozen for
 > several seconds:
 > 
 > ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x180000 action 0x2 frozen
 > ata2.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0x0 data 0
 >          res 40/00:00:00:00:40/00:00:00:00:00/00 Emask 0x4 (timeout)
 > ata2: soft resetting port
 > ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
 > ata2.00: configured for UDMA/133
 > ata2: EH complete
 > sd 1:0:0:0: [sdb] 488397168 512-byte hardware sectors (250059 MB)
 > sd 1:0:0:0: [sdb] Write Protect is off
 > sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
 > sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't
 > support DPO or FUA
 > 
 > In the log I also found several identical errors on one other drive:
 > 
 > ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
 > ata5.00: cmd 25/00:08:b7:f2:11/00:00:13:00:00/e0 tag 0 cdb 0x0 data 4096 in
 >          res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
 > ata5: soft resetting port
 > ata5.00: configured for UDMA/33
 > ata5: EH complete
 > sd 4:0:0:0: [sdc] 488397168 512-byte hardware sectors (250059 MB)
 > sd 4:0:0:0: [sdc] Write Protect is off
 > sd 4:0:0:0: [sdc] Mode Sense: 00 3a 00 00
 > sd 4:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't
 > support DPO or FUA

Clue #2: both ata2 and ata5 are having problems

 > 
 > Can this be the result of a hardware failure?  I've seen several
 > drives being added to an NCQ blacklist during the last weeks.  Is it
 > possible that my drives need to be added here, too?  Or have I just
 > two failing drives?
 > 
 > Thanks a lot for any clues,
 > Jim
 > 
 > 
 > System boot log extract:
 > 
 > sata_promise 0000:00:08.0: version 2.10
 > ACPI: PCI Interrupt 0000:00:08.0[A] -> GSI 18 (level, low) -> IRQ 18
 > scsi0 : sata_promise
 > scsi1 : sata_promise
 > scsi2 : sata_promise
 > ata1: SATA max UDMA/133 cmd 0xf882e200 ctl 0xf882e238 bmdma 0x00000000 irq 18
 > ata2: SATA max UDMA/133 cmd 0xf882e280 ctl 0xf882e2b8 bmdma 0x00000000 irq 18
 > ata3: PATA max UDMA/133 cmd 0xf882e300 ctl 0xf882e338 bmdma 0x00000000 irq 18
 > ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
 > ata1.00: ATA-8: SAMSUNG HD252KJ, CM100-12, max UDMA7
 > ata1.00: 488397168 sectors, multi 0: LBA48 NCQ (depth 0/32)
 > ata1.00: configured for UDMA/133
 > ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
 > ata2.00: ATA-7: WDC WD2500JS-55NCB1, 10.02E01, max UDMA/133
 > ata2.00: 488397168 sectors, multi 0: LBA48 NCQ (depth 0/32)
 > ata2.00: configured for UDMA/133

Clue #3: ata2 is driven by sata_promise (lspci says it's a 20378, they're good)

 > scsi 0:0:0:0: Direct-Access     ATA      SAMSUNG HD252KJ  CM10 PQ: 0 ANSI: 5
 > sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors (250059 MB)
 > sd 0:0:0:0: [sda] Write Protect is off
 > sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
 > sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't
 > support DPO or FUA
 > sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors (250059 MB)
 > sd 0:0:0:0: [sda] Write Protect is off
 > sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
 > sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't
 > support DPO or FUA
 >  sda: sda2 sda3
 > sd 0:0:0:0: [sda] Attached SCSI disk
 > scsi 1:0:0:0: Direct-Access     ATA      WDC WD2500JS-55N 10.0 PQ: 0 ANSI: 5
 > sd 1:0:0:0: [sdb] 488397168 512-byte hardware sectors (250059 MB)
 > sd 1:0:0:0: [sdb] Write Protect is off
 > sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
 > sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't
 > support DPO or FUA
 > sd 1:0:0:0: [sdb] 488397168 512-byte hardware sectors (250059 MB)
 > sd 1:0:0:0: [sdb] Write Protect is off
 > sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
 > sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't
 > support DPO or FUA
 >  sdb: sdb2 sdb3
 > sd 1:0:0:0: [sdb] Attached SCSI disk
 > sata_via 0000:00:0f.0: version 2.3
 > ACPI: PCI Interrupt 0000:00:0f.0[B] -> GSI 20 (level, low) -> IRQ 17
 > sata_via 0000:00:0f.0: routed to hard irq line 10
 > scsi3 : sata_via
 > scsi4 : sata_via
 > ata4: SATA max UDMA/133 cmd 0x0001d000 ctl 0x0001c802 bmdma 0x0001b800 irq 17
 > ata5: SATA max UDMA/133 cmd 0x0001c400 ctl 0x0001c002 bmdma 0x0001b808 irq 17
 > ata4: SATA link down 1.5 Gbps (SStatus 0 SControl 300)
 > ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
 > ata5.00: ATA-7: MAXTOR STM3250820AS, 3.AAE, max UDMA/133
 > ata5.00: 488397168 sectors, multi 16: LBA48 NCQ (depth 0/32)
 > ata5.00: configured for UDMA/133

Clue #4: ata5 is driven by sata_via

The fact that the problems occur on different disks on
different controllers driven by different drivers indicates
that it's not a disk, controller, or driver problem.

I strongly suspect an underdimensioned or failing PSU.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: SATA timeouts on two disks
  2008-01-13 12:07 ` Mikael Pettersson
@ 2008-01-19 16:22   ` Jim MacBaine
  2008-01-19 16:50     ` rgheck
  2008-01-21  7:47     ` Tejun Heo
  0 siblings, 2 replies; 9+ messages in thread
From: Jim MacBaine @ 2008-01-19 16:22 UTC (permalink / raw)
  To: Mikael Pettersson; +Cc: linux-ide

On Jan 13, 2008 1:07 PM, Mikael Pettersson <mikpe@it.uu.se> wrote:

> The fact that the problems occur on different disks on
> different controllers driven by different drivers indicates
> that it's not a disk, controller, or driver problem.
>
> I strongly suspect an underdimensioned or failing PSU.

Thanks a lot for your clues.

I bought a new PSU on Monday and didn't get any new disk failures for
days.  But last night the same time-outs occurred again on two disks.
I guess I will try to replace the motherboard including the two SATA
controllers next.

Regards,
Jim

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: SATA timeouts on two disks
  2008-01-19 16:22   ` Jim MacBaine
@ 2008-01-19 16:50     ` rgheck
  2008-01-19 16:58       ` Jim MacBaine
  2008-01-21  7:47     ` Tejun Heo
  1 sibling, 1 reply; 9+ messages in thread
From: rgheck @ 2008-01-19 16:50 UTC (permalink / raw)
  To: linux-ide; +Cc: Jim MacBaine, Mikael Pettersson

Jim MacBaine wrote:
> On Jan 13, 2008 1:07 PM, Mikael Pettersson <mikpe@it.uu.se> wrote
>> The fact that the problems occur on different disks on
>> different controllers driven by different drivers indicates
>> that it's not a disk, controller, or driver problem.
>>
>> I strongly suspect an underdimensioned or failing PSU.
>>     
> Thanks a lot for your clues.
>
> I bought a new PSU on Monday and didn't get any new disk failures for
> days.  But last night the same time-outs occurred again on two disks.
> I guess I will try to replace the motherboard including the two SATA
> controllers next.
>   
I don't know if your problems are similar to mine or not. But I have 
been having extensive problems for quite some time now. Do you get these 
timeouts when using optical drives? That's what seems to trigger it in 
my case: If I'm using the optical drives, I'll often see the errors with 
them first, and then the whole ATA subsystem seems to go down. Then I 
get journal commit errors, general read errors, etc, until the system 
basically locks up. Worst case, it all happens very suddenly, and 
there's not even anything in the logs. Just a couple messages to the 
terminal, usually a journal commit error.

In my case, the opticall drives are a brand new Pioneer DVD-RW on SATA 
and an old Plextor on PATA. I mostly see the errors with the latter but 
have also seen them with the former. I'd thought I'd fixed it by adding 
pnpacpi=off and pci=nomsi,nommconf to the kernel boot options, as well 
as libata noacpi=1 to modules.conf, but now I've just had the problem 
again. I'm now thinking I should try eliminating the Plextor drive. It 
may be that it's the PATA drive that is causing all the trouble. I'll 
report if so.

FYI, here are the relevant modules being loaded:
[root@rghquad rgheck]# lsmod | grep ata
pata_amd               20293  0
pata_pdc2027x          17477  0
sata_nv                25157  8
ata_generic            14405  0
libata                114673  4 pata_amd,pata_pdc2027x,sata_nv,ata_generic
scsi_mod              145657  5 sr_mod,sg,usb_storage,libata,sd_mod
The IDE interface is an nVidia MCP55, apparently, on an ASUS P5N32-E mb.

I doubt very much it's a PS issue in my case. There's not that much in 
the box.

Richard


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: SATA timeouts on two disks
  2008-01-19 16:50     ` rgheck
@ 2008-01-19 16:58       ` Jim MacBaine
  0 siblings, 0 replies; 9+ messages in thread
From: Jim MacBaine @ 2008-01-19 16:58 UTC (permalink / raw)
  To: rgheck; +Cc: linux-ide, Mikael Pettersson

On Jan 19, 2008 5:50 PM, rgheck <rgheck@bobjweil.com> wrote:

> I don't know if your problems are similar to mine or not. But I have
> been having extensive problems for quite some time now. Do you get these
> timeouts when using optical drives?

No, I don't see any connections to optical drives here.  I have a DVD
drive and a DVDRW drive on a PATA controller in the failing system but
I have not used them for weeks.

Regards,
Jim

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: SATA timeouts on two disks
  2008-01-19 16:22   ` Jim MacBaine
  2008-01-19 16:50     ` rgheck
@ 2008-01-21  7:47     ` Tejun Heo
  2008-01-24 17:31       ` Jim MacBaine
  1 sibling, 1 reply; 9+ messages in thread
From: Tejun Heo @ 2008-01-21  7:47 UTC (permalink / raw)
  To: Jim MacBaine; +Cc: Mikael Pettersson, linux-ide

Jim MacBaine wrote:
> On Jan 13, 2008 1:07 PM, Mikael Pettersson <mikpe@it.uu.se> wrote:
> 
>> The fact that the problems occur on different disks on
>> different controllers driven by different drivers indicates
>> that it's not a disk, controller, or driver problem.
>>
>> I strongly suspect an underdimensioned or failing PSU.
> 
> Thanks a lot for your clues.
> 
> I bought a new PSU on Monday and didn't get any new disk failures for
> days.  But last night the same time-outs occurred again on two disks.
> I guess I will try to replace the motherboard including the two SATA
> controllers next.

If you still have the old PSU lying around, please try to power one of
the failing drive with the old PSU.  Just leave everything else as-is,
power-up old PSU by itself as described in the following web page and
connect only one of the failing drive to the old PSU.

  http://modtown.co.uk/mt/article2.php?id=psumod

And see whether the problem continues and if so on which drives.
Connecting SATA drives to separate power is completely safe even if they
don't have common ground because SATA connection never directly connect
to each other.

-- 
tejun

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: SATA timeouts on two disks
  2008-01-21  7:47     ` Tejun Heo
@ 2008-01-24 17:31       ` Jim MacBaine
  2008-01-24 23:19         ` Tejun Heo
  0 siblings, 1 reply; 9+ messages in thread
From: Jim MacBaine @ 2008-01-24 17:31 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Mikael Pettersson, linux-ide

Hi,

On Jan 21, 2008 8:47 AM, Tejun Heo <htejun@gmail.com> wrote:

> If you still have the old PSU lying around, please try to power one of
> the failing drive with the old PSU.  Just leave everything else as-is,
> power-up old PSU by itself as described in the following web page and
> connect only one of the failing drive to the old PSU.
>
>   http://modtown.co.uk/mt/article2.php?id=psumod
>
> And see whether the problem continues and if so on which drives.
> Connecting SATA drives to separate power is completely safe even if they
> don't have common ground because SATA connection never directly connect
> to each other.

Yes, I still have the old PSU lying around.

A co-worker, to whom I explained my problem, asked me whether I had
properly grounded my drives. In fact I had not: The drives resided in
a vibration-absorbing frame through which their exterior had no
electrical contact with the grounded case. Since I grounded the drives
two days ago, I got no new errors.  So maybe my problem is solved.

If not, I will happily try out your suggestion. Would you be so kind
to explain in a few words, what connecting one drive to a second
(supposedly good) PSU will show?

(Is this still on-topic on this list?)

Thanks a lot,
Jim

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: SATA timeouts on two disks
  2008-01-24 17:31       ` Jim MacBaine
@ 2008-01-24 23:19         ` Tejun Heo
  2008-01-25  3:24           ` rgheck
  0 siblings, 1 reply; 9+ messages in thread
From: Tejun Heo @ 2008-01-24 23:19 UTC (permalink / raw)
  To: Jim MacBaine; +Cc: Mikael Pettersson, linux-ide

Hello,

Jim MacBaine wrote:
> A co-worker, to whom I explained my problem, asked me whether I had
> properly grounded my drives. In fact I had not: The drives resided in
> a vibration-absorbing frame through which their exterior had no
> electrical contact with the grounded case. Since I grounded the drives
> two days ago, I got no new errors.  So maybe my problem is solved.

Hmmm... Grounding..... Interesting.

> If not, I will happily try out your suggestion. Would you be so kind
> to explain in a few words, what connecting one drive to a second
> (supposedly good) PSU will show?

It's just a good way to isolate problems.  For example, the motherboard
could be doing something strange on the 12v rail and the PSU could be
too sensitive causing the whole rail to fluctuate slightly leading to
occasional transmission errors.  SATA links are the first to be affected
by those kinds of electrical problems.

> (Is this still on-topic on this list?)

Yeah, sure.

-- 
tejun

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: SATA timeouts on two disks
  2008-01-24 23:19         ` Tejun Heo
@ 2008-01-25  3:24           ` rgheck
  0 siblings, 0 replies; 9+ messages in thread
From: rgheck @ 2008-01-25  3:24 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Jim MacBaine, Mikael Pettersson, linux-ide

Tejun Heo wrote:
> Hello,
>
> Jim MacBaine wrote:
>   
>> A co-worker, to whom I explained my problem, asked me whether I had
>> properly grounded my drives. In fact I had not: The drives resided in
>> a vibration-absorbing frame through which their exterior had no
>> electrical contact with the grounded case. Since I grounded the drives
>> two days ago, I got no new errors.  So maybe my problem is solved.
>>     
>
> Hmmm... Grounding..... Interesting.
>
>   
Can you say about more about this, Jim? This may also be my problem, or 
part of it, as my drives too are mounted in such a way as not to be in 
physical contact with the case. How did you go about grounding them? I 
suppose one test would be just to remove the washers....

That said, in my case, 2.6.24 seems to make a big difference, too. I 
accidentally booted into 2.6.23 today and, boom.

Richard


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2008-01-25  3:24 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-01-13  0:56 SATA timeouts on two disks Jim MacBaine
2008-01-13 12:07 ` Mikael Pettersson
2008-01-19 16:22   ` Jim MacBaine
2008-01-19 16:50     ` rgheck
2008-01-19 16:58       ` Jim MacBaine
2008-01-21  7:47     ` Tejun Heo
2008-01-24 17:31       ` Jim MacBaine
2008-01-24 23:19         ` Tejun Heo
2008-01-25  3:24           ` rgheck

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).