linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: SATA disks resets in a md setup
       [not found] <200905081739.46206.v.virvilis@biovista.com>
@ 2009-05-09  7:35 ` Jeff Garzik
  2009-05-09 16:41   ` v.virvilis
  2009-05-11 10:24   ` Vassilis Virvilis
  0 siblings, 2 replies; 4+ messages in thread
From: Jeff Garzik @ 2009-05-09  7:35 UTC (permalink / raw)
  To: v.virvilis; +Cc: linux-kernel, Linux IDE mailing list

Vassilis Virvilis wrote:
> [ 9351.377961] ata2: SError: { PHYRdyChg PHYInt 10B8B Dispar }
> [ 9351.377983] ata2.00: cmd b0/d5:01:09:4f:c2/00:00:00:00:00/00 tag 0 pio 512 in
> [ 9351.377985]          res 50/00:00:b6:46:6a/00:00:13:00:00/e0 Emask 0x10 (ATA bus error)
[...]
> [10665.354196] ata2: SError: { UnrecovData Handshk }
> [10665.354196] ata2.00: cmd 35/00:00:27:ae:7a/00:04:01:00:00/e0 tag 0 dma 524288 out
> [10665.354196]          res 50/00:00:26:ae:7a/00:00:01:00:00/e0 Emask 0x10 (ATA bus error)
[...]
> and my filesystem is dead. /dev/sdb is deleted from /dev. I have to reboot and even then linux can't find the ata2 /dev/sdb.
> I have to remove power for 1-2 min for the disk to become accessible again.
> 
> Do you think the disk is bad or something?


For hardware details, see 
http://ata.wiki.kernel.org/index.php/Libata_error_messages

The ATA bus is the cable connection, so an ATA bus error typically means

- problem with your cable, or
- your motherboard's SATA port, or
- your drive's SATA port, or
- "dirty power" supply, or
- some other cause for cable interference

Regards,

	Jeff




^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: SATA disks resets in a md setup
  2009-05-09  7:35 ` SATA disks resets in a md setup Jeff Garzik
@ 2009-05-09 16:41   ` v.virvilis
  2009-05-11 10:24   ` Vassilis Virvilis
  1 sibling, 0 replies; 4+ messages in thread
From: v.virvilis @ 2009-05-09 16:41 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: linux-kernel, Linux IDE mailing list

On Sat, 09 May 2009 03:35:05 -0400, Jeff Garzik <jeff@garzik.org> wrote:

> 
> For hardware details, see 
> http://ata.wiki.kernel.org/index.php/Libata_error_messages

Thanks for the link.
> 
> The ATA bus is the cable connection, so an ATA bus error typically means
> 
> - problem with your cable, or
> - your motherboard's SATA port, or
> - your drive's SATA port, or
> - "dirty power" supply, or
> - some other cause for cable interference
> 

I have changed SATA cables twice and I added a better PSU. The problem
persists.

Do you have any insight on the sectors count mismatch I mentioned in the
first mail?
The disk is 250GB but it looks it is searching for a 500GB disk (that is
md0 = sda + sdb).

Is it possible the SATA reset thing to trigger an md bug. I am totally
guessing here...

Regards
            .bill

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: SATA disks resets in a md setup
  2009-05-09  7:35 ` SATA disks resets in a md setup Jeff Garzik
  2009-05-09 16:41   ` v.virvilis
@ 2009-05-11 10:24   ` Vassilis Virvilis
  2009-05-12  8:24     ` Tejun Heo
  1 sibling, 1 reply; 4+ messages in thread
From: Vassilis Virvilis @ 2009-05-11 10:24 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: linux-kernel, Linux IDE mailing list

On Saturday 09 May 2009, Jeff Garzik wrote:
> 
> For hardware details, see 
> http://ata.wiki.kernel.org/index.php/Libata_error_messages

thanks for the link

> 
> The ATA bus is the cable connection, so an ATA bus error typically means
> 
> - problem with your cable, or
> - your motherboard's SATA port, or
> - your drive's SATA port, or
> - "dirty power" supply, or
> - some other cause for cable interference
> 

Ok I changed
	M/B,
	PSU
	and cables.

Now the stress test passes only one SATA reset instead of 3 or 4 before the fatal one.


[ 1804.915319] ata1.01: exception Emask 0x10 SAct 0x0 SErr 0x10000 action 0xe frozen
[ 1804.915319] ata1.01: ST-ATA: DRQ=1 with device error, dev_stat 0x0
[ 1804.915319] ata1: SError: { PHYRdyChg }
[ 1804.915319] ata1.01: cmd b0/d5:01:09:4f:c2/00:00:00:00:00/10 tag 0 pio 512 in
[ 1804.915319]          res 00/00:01:09:4f:c2/00:00:00:00:00/10 Emask 0x212 (ATA bus error)
[ 1804.915319] ata1: hard resetting link
[ 1810.279540] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 1810.305230] ata1.00: configured for UDMA/133
[ 1810.314698] ata1.01: configured for UDMA/133
[ 1810.314698] ata1: EH complete
[ 1810.318713] sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors (250059 MB)
[ 1810.318713] sd 0:0:0:0: [sda] Write Protect is off
[ 1810.318713] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[ 1810.318713] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 1810.322654] sd 0:0:1:0: [sdb] 488397168 512-byte hardware sectors (250059 MB)
[ 1810.326655] sd 0:0:1:0: [sdb] Write Protect is off
[ 1810.326655] sd 0:0:1:0: [sdb] Mode Sense: 00 3a 00 00
[ 1810.326655] sd 0:0:1:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 1810.330758] sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors (250059 MB)
[ 1810.330758] sd 0:0:0:0: [sda] Write Protect is off
[ 1810.330758] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[ 1810.330758] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 1810.334656] sd 0:0:1:0: [sdb] 488397168 512-byte hardware sectors (250059 MB)
[ 1810.334656] sd 0:0:1:0: [sdb] Write Protect is off
[ 1810.334656] sd 0:0:1:0: [sdb] Mode Sense: 00 3a 00 00
[ 1810.334656] sd 0:0:1:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

 Regards

   .bill

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: SATA disks resets in a md setup
  2009-05-11 10:24   ` Vassilis Virvilis
@ 2009-05-12  8:24     ` Tejun Heo
  0 siblings, 0 replies; 4+ messages in thread
From: Tejun Heo @ 2009-05-12  8:24 UTC (permalink / raw)
  To: v.virvilis; +Cc: Jeff Garzik, linux-kernel, Linux IDE mailing list

Vassilis Virvilis wrote:
> Ok I changed
> 	M/B,
> 	PSU
> 	and cables.
> 
> Now the stress test passes only one SATA reset instead of 3 or 4 before the fatal one.
> 
> 
> [ 1804.915319] ata1.01: exception Emask 0x10 SAct 0x0 SErr 0x10000 action 0xe frozen
> [ 1804.915319] ata1.01: ST-ATA: DRQ=1 with device error, dev_stat 0x0
> [ 1804.915319] ata1: SError: { PHYRdyChg }
> [ 1804.915319] ata1.01: cmd b0/d5:01:09:4f:c2/00:00:00:00:00/10 tag 0 pio 512 in
> [ 1804.915319]          res 00/00:01:09:4f:c2/00:00:00:00:00/10 Emask 0x212 (ATA bus error)
> [ 1804.915319] ata1: hard resetting link
> [ 1810.279540] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

PHYRdyChg under load is very symptomatic of inadequate power supply.
If you run "smartctl -a" on the device before and after the error,
what counters change?

If you have two PSUs around, one thing worth trying is to power up the
second PSU separately and put half of the drives on the separate PSU
and see whether the problem goes away or the pattern of failures
changes.  PSU can be easily powered up w/o motherboard.

  http://modtown.co.uk/mt/article2.php?id=psumod

-- 
tejun

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2009-05-12  8:24 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <200905081739.46206.v.virvilis@biovista.com>
2009-05-09  7:35 ` SATA disks resets in a md setup Jeff Garzik
2009-05-09 16:41   ` v.virvilis
2009-05-11 10:24   ` Vassilis Virvilis
2009-05-12  8:24     ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).