libata error/reset

linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* libata error/reset
@ 2008-09-05 18:47 Dan Noé
  2008-09-09 11:28 ` Tejun Heo
  0 siblings, 1 reply; 2+ messages in thread
From: Dan Noé @ 2008-09-05 18:47 UTC (permalink / raw)
  To: linux-ide

Just after midnight last night, during an rsync job which copies a lot 
of data onto my backup disk (half of a Linux software RAID 1), I 
received the following:

-- SNIP --
ata3.00: exception Emask 0x10 SAct 0x0 SErr 0x10000 action 0xe frozen
ata3.00: irq_stat 0x00400000, PHY RDY changed
ata3: SError: { PHYRdyChg }
ata3.00: cmd 35/00:10:3f:00:34/00:00:22:00:00/e0 tag 0 dma 8192 out
          res 50/00:00:4e:01:18/00:00:22:00:00/e0 Emask 0x10 (ATA bus error)
ata3.00: status: { DRDY }
ata3: hard resetting link
ata3: link is slow to respond, please be patient (ready=0)
ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata3.00: qc timeout (cmd 0xec)
ata3.00: failed to IDENTIFY (I/O error, err_mask=0x5)
ata3.00: revalidation failed (errno=-5)
ata3: failed to recover some devices, retrying in 5 secs
ata3: hard resetting link
ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata3.00: configured for UDMA/100
ata3: EH complete
sd 2:0:0:0: [sdc] 625142448 512-byte hardware sectors (320073 MB)
sd 2:0:0:0: [sdc] Write Protect is off
sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00
sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't 
support DPO or FUA
-- SNIP --

The system seems to be working fine now, and there was not even a RAID 
failure reported by the md system.  Is this something I should be 
concerned about? Hardware issue, software bug?

Linux colobus 2.6.26.3 #1 SMP Thu Aug 21 10:15:38 EDT 2008 i686 Intel(R) 
Pentium(R) 4 CPU 3.20GHz GenuineIntel GNU/Linux

00:1f.2 SATA controller: Intel Corporation 82801FR/FRW (ICH6R/ICH6RW) 
SATA Controller (rev 03)

I am using the libata ahci driver.  There are four drives crammed into a 
1U with hotplug trays, but AFAIK no one was poking around the system.

Thanks much,
Dan

-- 
Dan Noé
Software Engineer
Lime Brokerage LLC
781-370-2518

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: libata error/reset
  2008-09-05 18:47 libata error/reset Dan Noé
@ 2008-09-09 11:28 ` Tejun Heo
  0 siblings, 0 replies; 2+ messages in thread
From: Tejun Heo @ 2008-09-09 11:28 UTC (permalink / raw)
  To: Dan Noé; +Cc: linux-ide

Dan Noé wrote:
> Just after midnight last night, during an rsync job which copies a lot
> of data onto my backup disk (half of a Linux software RAID 1), I
> received the following:
> 
> -- SNIP --
> ata3.00: exception Emask 0x10 SAct 0x0 SErr 0x10000 action 0xe frozen
> ata3.00: irq_stat 0x00400000, PHY RDY changed
> ata3: SError: { PHYRdyChg }
> ata3.00: cmd 35/00:10:3f:00:34/00:00:22:00:00/e0 tag 0 dma 8192 out
>          res 50/00:00:4e:01:18/00:00:22:00:00/e0 Emask 0x10 (ATA bus error)
> ata3.00: status: { DRDY }
> ata3: hard resetting link
> ata3: link is slow to respond, please be patient (ready=0)
> ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> ata3.00: qc timeout (cmd 0xec)
> ata3.00: failed to IDENTIFY (I/O error, err_mask=0x5)
> ata3.00: revalidation failed (errno=-5)
> ata3: failed to recover some devices, retrying in 5 secs
> ata3: hard resetting link
> ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> ata3.00: configured for UDMA/100
> ata3: EH complete
> sd 2:0:0:0: [sdc] 625142448 512-byte hardware sectors (320073 MB)
> sd 2:0:0:0: [sdc] Write Protect is off
> sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00
> sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't
> support DPO or FUA
> -- SNIP --
> 
> The system seems to be working fine now, and there was not even a RAID
> failure reported by the md system.  Is this something I should be
> concerned about? Hardware issue, software bug?
> 
> Linux colobus 2.6.26.3 #1 SMP Thu Aug 21 10:15:38 EDT 2008 i686 Intel(R)
> Pentium(R) 4 CPU 3.20GHz GenuineIntel GNU/Linux
> 
> 00:1f.2 SATA controller: Intel Corporation 82801FR/FRW (ICH6R/ICH6RW)
> SATA Controller (rev 03)
> 
> I am using the libata ahci driver.  There are four drives crammed into a
> 1U with hotplug trays, but AFAIK no one was poking around the system.

Transmission errors do occur occassionally on perfectly healthy
machines so if it doesn't happen regularly, you can just ignore it and
the kernel will do the right thing.  However, there have been
non-insignificant number of cases where sucky power supply fail to
maintain voltage under high IO load and make harddrive go offline
briefly which would also show up as PHYRdyChg.  In these cases, you
can usually hear the drive doing emergency unloading (clicking) and
smartctl -a is likely to show increased values for start/stop count
and/or emergency unload count.  In these cases, the drive loses data
in its buffer and filesystem gets corrupted and you really should get
a better power supply.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2008-09-09 11:30 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-09-05 18:47 libata error/reset Dan Noé
2008-09-09 11:28 ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).