What do these SATA errors mean / kernel 2.6.25.6 (DRDY ERR/ICRC ABRT)

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* What do these SATA errors mean / kernel 2.6.25.6 (DRDY ERR/ICRC ABRT)
@ 2008-06-11 10:14 Justin Piszcz
  2008-06-11 11:33 ` Justin Piszcz
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Justin Piszcz @ 2008-06-11 10:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-raid

Never had a single error so far, powered down my host, powered it back up,
and now with kernel 2.6.25.6:

Jun 11 05:23:24 p34 kernel: [   67.118632] mtrr: no more MTRRs available
Jun 11 05:46:23 p34 kernel: [ 1445.288619] ata12.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2
Jun 11 05:46:23 p34 kernel: [ 1445.288626] ata12.00: irq_stat 0x00060002, device error via D2H FIS
Jun 11 05:46:23 p34 kernel: [ 1445.288632] ata12.00: cmd 35/00:f8:47:dc:35/00:03:02:00:00/e0 tag 0 dma 520192 out
Jun 11 05:46:23 p34 kernel: [ 1445.288634]          res 51/84:f8:47:dc:35/00:03:02:00:00/e0 Emask 0x10 (ATA bus error)
Jun 11 05:46:23 p34 kernel: [ 1445.288637] ata12.00: status: { DRDY ERR }
Jun 11 05:46:23 p34 kernel: [ 1445.288639] ata12.00: error: { ICRC ABRT }
Jun 11 05:46:23 p34 kernel: [ 1445.288649] ata12: hard resetting link
Jun 11 05:46:25 p34 kernel: [ 1447.419983] ata12: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
Jun 11 05:46:25 p34 kernel: [ 1447.429612] ata12.00: configured for UDMA/100
Jun 11 05:46:25 p34 kernel: [ 1447.429628] ata12: EH complete
Jun 11 05:46:25 p34 kernel: [ 1447.813910] sd 11:0:0:0: [sdl] Write Protect is off
Jun 11 05:46:25 p34 kernel: [ 1447.813912] sd 11:0:0:0: [sdl] Mode Sense: 00 3a 00 00
Jun 11 05:46:25 p34 kernel: [ 1447.813928] sd 11:0:0:0: [sdl] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Jun 11 06:00:32 p34 kernel: [ 2293.491350] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
Jun 11 06:00:32 p34 kernel: [ 2293.491360] ata1.00: cmd 35/00:02:43:90:7d/00:00:12:00:00/e0 tag 0 dma 1024 out
Jun 11 06:00:32 p34 kernel: [ 2293.491362]          res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jun 11 06:00:32 p34 kernel: [ 2293.491365] ata1.00: status: { DRDY }
Jun 11 06:00:32 p34 kernel: [ 2293.794295] ata1: soft resetting link
Jun 11 06:00:32 p34 kernel: [ 2293.947277] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jun 11 06:00:32 p34 kernel: [ 2294.614206] ata1.00: configured for UDMA/133
Jun 11 06:00:32 p34 kernel: [ 2294.614227] ata1: EH complete
Jun 11 06:00:32 p34 kernel: [ 2294.335647] sd 0:0:0:0: [sda] Write Protect is off
Jun 11 06:00:32 p34 kernel: [ 2294.335650] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
Jun 11 06:00:32 p34 kernel: [ 2294.348472] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

Nothing was broken in any of the arrays and all seems to be functioning 
now but albeit at lower speeds as you see above UDMA/100 and UDMA/133. 
Could there be a bug with the new Veliciraptors and the drivers in the 
kernel?  I never saw this happen/occur with my old raptor 150s or 74s. 
Also, I stress tested all of these drives for 8hours+ and they never had a 
problem before so it makes the problem rather peculiar.

# cat /proc/mdstat 
Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] 
md1 : active raid1 sdb2[1] sda2[0]
       136448 blocks [2/2] [UU]

md2 : active raid1 sdb3[1] sda3[0]
       276109056 blocks [2/2] [UU]

md3 : active raid5 sdl1[9] sdk1[8] sdj1[7] sdi1[6] sdh1[5] sdg1[4] sdf1[3] sde1[2] sdd1[1] sdc1[0]
       2637296640 blocks level 5, 1024k chunk, algorithm 2 [10/10] [UUUUUUUUUU]

md0 : active raid1 sdb1[1] sda1[0]
       16787776 blocks [2/2] [UU]

unused devices: <none>

I am using the same cables/configuration, just new disks.  The smart tests
also show as good, is this a kernel problem?

/dev/sda:

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%       108         -
# 2  Short offline       Completed without error       00%       103         -
# 3  Short offline       Completed without error       00%        79         -
# 4  Short offline       Completed without error       00%        56         -
# 5  Extended offline    Completed without error       00%        32         -
# 6  Short offline       Completed without error       00%         8         -

SMART Error Log Version: 1
No Errors Logged

/dev/sdl:

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%       111         -
# 2  Short offline       Completed without error       00%       107         -
# 3  Short offline       Completed without error       00%        83         -
# 4  Short offline       Completed without error       00%        59         -
# 5  Extended offline    Completed without error       00%        36         -
# 6  Short offline       Completed without error       00%        11         -

Does/the kernel handle the ATA v8 protocol properly?
ATA Version is:   8

Justin.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: What do these SATA errors mean / kernel 2.6.25.6 (DRDY ERR/ICRC ABRT)
  2008-06-11 10:14 What do these SATA errors mean / kernel 2.6.25.6 (DRDY ERR/ICRC ABRT) Justin Piszcz
@ 2008-06-11 11:33 ` Justin Piszcz
  2008-06-12  2:50 ` Jeff Garzik
  2008-06-16  3:52 ` Tejun Heo
  2 siblings, 0 replies; 4+ messages in thread
From: Justin Piszcz @ 2008-06-11 11:33 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-raid



On Wed, 11 Jun 2008, Justin Piszcz wrote:

> Never had a single error so far, powered down my host, powered it back up,
> and now with kernel 2.6.25.6:
>

Will replace/re-connect/check cables/connectors, a long test on each disk
just passed fine as well but there was a single (1) CRC error, could be the
cables/connectors/will verify later today.

Justin.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: What do these SATA errors mean / kernel 2.6.25.6 (DRDY ERR/ICRC ABRT)
  2008-06-11 10:14 What do these SATA errors mean / kernel 2.6.25.6 (DRDY ERR/ICRC ABRT) Justin Piszcz
  2008-06-11 11:33 ` Justin Piszcz
@ 2008-06-12  2:50 ` Jeff Garzik
  2008-06-16  3:52 ` Tejun Heo
  2 siblings, 0 replies; 4+ messages in thread
From: Jeff Garzik @ 2008-06-12  2:50 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: linux-kernel, linux-raid

Justin Piszcz wrote:
> Never had a single error so far, powered down my host, powered it back up,
> and now with kernel 2.6.25.6:

http://ata.wiki.kernel.org/index.php/Libata_error_messages

In particular, timeouts may be solved by acpi=off or 'noapic' or
pci=nomsi or pci=biosirq.


> Does/the kernel handle the ATA v8 protocol properly?
> ATA Version is:   8

Yes.  ATA is always back-compatible.

	Jeff

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: What do these SATA errors mean / kernel 2.6.25.6 (DRDY ERR/ICRC ABRT)
  2008-06-11 10:14 What do these SATA errors mean / kernel 2.6.25.6 (DRDY ERR/ICRC ABRT) Justin Piszcz
  2008-06-11 11:33 ` Justin Piszcz
  2008-06-12  2:50 ` Jeff Garzik
@ 2008-06-16  3:52 ` Tejun Heo
  2 siblings, 0 replies; 4+ messages in thread
From: Tejun Heo @ 2008-06-16  3:52 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: linux-kernel, linux-raid

Justin Piszcz wrote:
> Never had a single error so far, powered down my host, powered it back up,
> Jun 11 05:23:24 p34 kernel: [   67.118632] mtrr: no more MTRRs available
> Jun 11 05:46:23 p34 kernel: [ 1445.288619] ata12.00: exception Emask 0x0
> SAct 0x0 SErr 0x0 action 0x2
> Jun 11 05:46:23 p34 kernel: [ 1445.288626] ata12.00: irq_stat
> 0x00060002, device error via D2H FIS
> Jun 11 05:46:23 p34 kernel: [ 1445.288632] ata12.00: cmd
> 35/00:f8:47:dc:35/00:03:02:00:00/e0 tag 0 dma 520192 out
> Jun 11 05:46:23 p34 kernel: [ 1445.288634]          res
> 51/84:f8:47:dc:35/00:03:02:00:00/e0 Emask 0x10 (ATA bus error)
> Jun 11 05:46:23 p34 kernel: [ 1445.288637] ata12.00: status: { DRDY ERR }
> Jun 11 05:46:23 p34 kernel: [ 1445.288639] ata12.00: error: { ICRC ABRT }

That's your drive reporting that it saw transmission error on the wire.


> Jun 11 06:00:32 p34 kernel: [ 2293.491350] ata1.00: exception Emask 0x0
> SAct 0x0 SErr 0x0 action 0x2 frozen
> Jun 11 06:00:32 p34 kernel: [ 2293.491360] ata1.00: cmd
> 35/00:02:43:90:7d/00:00:12:00:00/e0 tag 0 dma 1024 out
> Jun 11 06:00:32 p34 kernel: [ 2293.491362]          res
> 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
> Jun 11 06:00:32 p34 kernel: [ 2293.491365] ata1.00: status: { DRDY }
> Jun 11 06:00:32 p34 kernel: [ 2293.794295] ata1: soft resetting link
> Jun 11 06:00:32 p34 kernel: [ 2293.947277] ata1: SATA link up 3.0 Gbps
> (SStatus 123 SControl 300)

And a write command timed out which is also often caused by transmission
problems.

> Nothing was broken in any of the arrays and all seems to be functioning
> now but albeit at lower speeds as you see above UDMA/100 and UDMA/133.

No, according to the log, there was no slow down.  Transmission speed is
lowered only after some number of errors have accumulated.

> Could there be a bug with the new Veliciraptors and the drivers in the
> kernel?  I never saw this happen/occur with my old raptor 150s or 74s.
> Also, I stress tested all of these drives for 8hours+ and they never had
> a problem before so it makes the problem rather peculiar.

For SATA drives, occasional transmission problems are expected even on
otherwise pretty healthy systems.  No need to worry about it too much
unless the problem repeats itself a lot.

-- 
tejun

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2008-06-16  3:52 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-11 10:14 What do these SATA errors mean / kernel 2.6.25.6 (DRDY ERR/ICRC ABRT) Justin Piszcz
2008-06-11 11:33 ` Justin Piszcz
2008-06-12  2:50 ` Jeff Garzik
2008-06-16  3:52 ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).