From: Peter Rabbitson <rabbit+list@rabbit.us>
To: linux-raid@vger.kernel.org
Subject: Help to decipher kernel io error log
Date: Thu, 28 Aug 2008 12:03:07 +0200 [thread overview]
Message-ID: <48B677DB.4010306@rabbit.us> (raw)
Greetings,
This is not a strictly raid question, but this is the best list I know
of for this type of questions. Two days ago my server ground to a halt
without apparent reasons. There were tons of processes in D state, with
no signs of any significant work being done. I attributed it to resource
starvation (the server is pretty loaded), rebooted and went on with my
life.
Yesterday I received the log messages included at the bottom of this
email. Since I am running a --level=10 --raid-devices=4 --layout=f3 I am
not that worried abiut losing data, and decided to investigate. I
removed (mdadm -r) the devices in question from the arrays, power cycled
the server, and executed a full badblocks -svw /dev/sda run. It passed
with flying colors.
So here is my question - what does the log below signify (there are no
omissions, this is all I got) - is my controller dying? Or is there
indeed a well masked hard drive failure? Should I change the drive, the
controller, or both?
Thank you for your thoughts!
Peter
====================
=== Hardware setup
Intel SE7210 TP1-E board
(http://www.intel.com/support/motherboards/server/se7210tp1-e/index.htm)
4 identical 250GB Maxtor 7Y250M0 hard drives
- two of them attached to the on board SATA controller:
00:1f.2 IDE interface: Intel Corporation 6300ESB SATA Storage Controller
(rev 02) (prog-if 8f [Master SecP SecO PriP PriO])
Subsystem: Intel Corporation Device 342f
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+
Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 16
Region 0: I/O ports at e400 [size=8]
Region 1: I/O ports at e000 [size=4]
Region 2: I/O ports at dc00 [size=8]
Region 3: I/O ports at d800 [size=4]
Region 4: I/O ports at d400 [size=16]
Kernel driver in use: ata_piix
- two of them attached to a RocketRaid 1820A controller
(http://www.highpoint-tech.com/USA/rr1820a.htm)
02:04.0 SCSI storage controller: Marvell Technology Group Ltd.
MV88SX5081 8-port SATA I PCI-X Controller (rev 03)
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr+
Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 64, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 20
Region 0: Memory at fc480000 (64-bit, non-prefetchable) [size=512K]
Capabilities: [40] Power Management version 2
Flags: PMEClk+ DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [50] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0
Enable-
Address: 0000000000000000 Data: 0000
Capabilities: [60] PCI-X non-bridge device
Command: DPERE- ERO- RBC=512 OST=4
Status: Dev=ff:1f.0 64bit+ 133MHz+ SCD- USC- DC=simple DMMRBC=512
DMOST=4 DMCRS=8 RSCEM- 266MHz- 533MHz-
====================
=== Kernel error log
Aug 27 02:27:02 Arzamas kernel: IAL: COMPLETION ERROR, adapter 0,
channel 2, flags=104 lba 6dc0b7 sectors 10 cmd 20
Aug 27 02:27:02 Arzamas kernel: ATA regs: error 40, sector count 0, LBA
low b7, LBA mid c0, LBA high 6d, device 40, status 51
Aug 27 02:27:02 Arzamas kernel: --- RR182x: Channel [0/2] State Dump ---
Aug 27 02:27:02 Arzamas kernel: pending commands:
Aug 27 02:27:02 Arzamas kernel: EDMA registers:
Aug 27 02:27:02 Arzamas kernel: [26000] = 00000100 [26004] = A63D8198
Aug 27 02:27:02 Arzamas kernel: [26008] = 00000000 [2600C] = 00000118
Aug 27 02:27:02 Arzamas kernel: [26010] = 00000000 [26014] = 37CDCC00
Aug 27 02:27:02 Arzamas kernel: [26018] = 00000000 [2601C] = 00000000
Aug 27 02:27:02 Arzamas kernel: [26020] = 00000000 [26024] = 031DB300
Aug 27 02:27:02 Arzamas kernel: [26028] = 00000000 [2602C] = 00000000
Aug 27 02:27:02 Arzamas kernel: [26030] = 0000003E [26034] = 000000BC
Aug 27 02:27:02 Arzamas kernel: Device registers:
Aug 27 02:27:02 Arzamas kernel: [26100] = 00000000 [26104] = 00000001
Aug 27 02:27:02 Arzamas kernel: [26108] = 00000001 [2610C] = 00000001
Aug 27 02:27:02 Arzamas kernel: [26110] = 00000000 [26114] = 00000000
Aug 27 02:27:02 Arzamas kernel: [26118] = 00000000 [2611C] = 00000050
Aug 27 02:27:02 Arzamas kernel: [26120] = 00000050 [26124] = 00000000
Aug 27 02:27:02 Arzamas kernel: SATA Bridge registers:
Aug 27 02:27:02 Arzamas kernel: [20300] = 00000113
Aug 27 02:27:02 Arzamas kernel: [20304] = 00000000
Aug 27 02:27:02 Arzamas kernel: [20308] = 00000000
Aug 27 02:27:02 Arzamas kernel: [2030C] = 00500001
Aug 27 02:27:02 Arzamas kernel: [2033C] = 40000000
Aug 27 02:27:02 Arzamas kernel: [20374] = 05EAC880
Aug 27 02:27:03 Arzamas kernel: channel 2: perform recalibrate command
Aug 27 02:27:03 Arzamas kernel: Retry on channel(2)
Aug 27 02:27:05 Arzamas kernel: IAL: COMPLETION ERROR, adapter 0,
channel 2, flags=104 lba 6dc0b7 sectors 10 cmd 20
Aug 27 02:27:05 Arzamas kernel: ATA regs: error 40, sector count 0, LBA
low b7, LBA mid c0, LBA high 6d, device 40, status 51
Aug 27 02:27:05 Arzamas kernel: --- RR182x: Channel [0/2] State Dump ---
Aug 27 02:27:05 Arzamas kernel: pending commands:
Aug 27 02:27:05 Arzamas kernel: EDMA registers:
Aug 27 02:27:05 Arzamas kernel: [26000] = 00000100 [26004] = A63D8401
Aug 27 02:27:05 Arzamas kernel: [26008] = 00000000 [2600C] = 00000118
Aug 27 02:27:05 Arzamas kernel: [26010] = 00000000 [26014] = 37CDCC00
Aug 27 02:27:05 Arzamas kernel: [26018] = 00000000 [2601C] = 00000000
Aug 27 02:27:05 Arzamas kernel: [26020] = 00000000 [26024] = 031DB300
Aug 27 02:27:05 Arzamas kernel: [26028] = 00000000 [2602C] = 00000000
Aug 27 02:27:05 Arzamas kernel: [26030] = 0000003F [26034] = 000000BC
Aug 27 02:27:05 Arzamas kernel: Device registers:
Aug 27 02:27:05 Arzamas kernel: [26100] = 00000000 [26104] = 00000001
Aug 27 02:27:05 Arzamas kernel: [26108] = 00000001 [2610C] = 00000001
Aug 27 02:27:05 Arzamas kernel: [26110] = 00000000 [26114] = 00000000
Aug 27 02:27:05 Arzamas kernel: [26118] = 00000000 [2611C] = 00000050
Aug 27 02:27:05 Arzamas kernel: [26120] = 00000050 [26124] = 00000000
Aug 27 02:27:05 Arzamas kernel: SATA Bridge registers:
Aug 27 02:27:05 Arzamas kernel: [20300] = 00000113
Aug 27 02:27:05 Arzamas kernel: [20304] = 00000000
Aug 27 02:27:05 Arzamas kernel: [20308] = 00000000
Aug 27 02:27:05 Arzamas kernel: [2030C] = 00500001
Aug 27 02:27:05 Arzamas kernel: [2033C] = 40000000
Aug 27 02:27:05 Arzamas kernel: [20374] = 05EAC880
Aug 27 02:27:05 Arzamas kernel: channel 2: perform recalibrate command
Aug 27 02:27:05 Arzamas kernel: Retry on channel(2)
Aug 27 02:27:07 Arzamas kernel: IAL: COMPLETION ERROR, adapter 0,
channel 2, flags=104 lba 6dc0b7 sectors 10 cmd 20
Aug 27 02:27:07 Arzamas kernel: ATA regs: error 40, sector count 0, LBA
low b7, LBA mid c0, LBA high 6d, device 40, status 51
Aug 27 02:27:07 Arzamas kernel: --- RR182x: Channel [0/2] State Dump ---
Aug 27 02:27:07 Arzamas kernel: pending commands:
Aug 27 02:27:07 Arzamas kernel: EDMA registers:
Aug 27 02:27:07 Arzamas kernel: [26000] = 00000100 [26004] = A63D8669
Aug 27 02:27:07 Arzamas kernel: [26008] = 00000000 [2600C] = 00000118
Aug 27 02:27:07 Arzamas kernel: [26010] = 00000000 [26014] = 37CDCC00
Aug 27 02:27:07 Arzamas kernel: [26018] = 00000000 [2601C] = 00000000
Aug 27 02:27:07 Arzamas kernel: [26020] = 00000000 [26024] = 031DB300
Aug 27 02:27:07 Arzamas kernel: [26028] = 00000000 [2602C] = 00000000
Aug 27 02:27:07 Arzamas kernel: [26030] = 0000003F [26034] = 000000BC
Aug 27 02:27:07 Arzamas kernel: Device registers:
Aug 27 02:27:07 Arzamas kernel: [26100] = 00000000 [26104] = 00000001
Aug 27 02:27:07 Arzamas kernel: [26108] = 00000001 [2610C] = 00000001
Aug 27 02:27:07 Arzamas kernel: [26110] = 00000000 [26114] = 00000000
Aug 27 02:27:07 Arzamas kernel: [26118] = 00000000 [2611C] = 00000050
Aug 27 02:27:07 Arzamas kernel: [26120] = 00000050 [26124] = 00000000
Aug 27 02:27:07 Arzamas kernel: SATA Bridge registers:
Aug 27 02:27:07 Arzamas kernel: [20300] = 00000113
Aug 27 02:27:07 Arzamas kernel: [20304] = 00000000
Aug 27 02:27:07 Arzamas kernel: [20308] = 00000000
Aug 27 02:27:07 Arzamas kernel: [2030C] = 00500001
Aug 27 02:27:07 Arzamas kernel: [2033C] = 40000000
Aug 27 02:27:07 Arzamas kernel: [20374] = 05EAC880
Aug 27 02:27:07 Arzamas kernel: channel 2: perform recalibrate command
Aug 27 02:27:07 Arzamas kernel: Retry on channel(2)
Aug 27 02:27:08 Arzamas kernel: IAL: COMPLETION ERROR, adapter 0,
channel 2, flags=104 lba 6dc0b7 sectors 10 cmd 20
Aug 27 02:27:08 Arzamas kernel: ATA regs: error 40, sector count 0, LBA
low b7, LBA mid c0, LBA high 6d, device 40, status 51
Aug 27 02:27:08 Arzamas kernel: --- RR182x: Channel [0/2] State Dump ---
Aug 27 02:27:08 Arzamas kernel: pending commands:
Aug 27 02:27:08 Arzamas kernel: EDMA registers:
Aug 27 02:27:08 Arzamas kernel: [26000] = 00000100 [26004] = A63D88D1
Aug 27 02:27:08 Arzamas kernel: [26008] = 00000000 [2600C] = 00000118
Aug 27 02:27:08 Arzamas kernel: [26010] = 00000000 [26014] = 37CDCC00
Aug 27 02:27:08 Arzamas kernel: [26018] = 00000000 [2601C] = 00000000
Aug 27 02:27:08 Arzamas kernel: [26020] = 00000000 [26024] = 031DB300
Aug 27 02:27:08 Arzamas kernel: [26028] = 00000000 [2602C] = 00000000
Aug 27 02:27:08 Arzamas kernel: [26030] = 0000003F [26034] = 000000BC
Aug 27 02:27:08 Arzamas kernel: Device registers:
Aug 27 02:27:08 Arzamas kernel: [26100] = 00000000 [26104] = 00000001
Aug 27 02:27:08 Arzamas kernel: [26108] = 00000001 [2610C] = 00000001
Aug 27 02:27:08 Arzamas kernel: [26110] = 00000000 [26114] = 00000000
Aug 27 02:27:08 Arzamas kernel: [26118] = 00000000 [2611C] = 00000050
Aug 27 02:27:08 Arzamas kernel: [26120] = 00000050 [26124] = 00000000
Aug 27 02:27:08 Arzamas kernel: SATA Bridge registers:
Aug 27 02:27:08 Arzamas kernel: [20300] = 00000113
Aug 27 02:27:08 Arzamas kernel: [20304] = 00000000
Aug 27 02:27:08 Arzamas kernel: [20308] = 00000000
Aug 27 02:27:08 Arzamas kernel: [2030C] = 00500001
Aug 27 02:27:08 Arzamas kernel: [2033C] = 40000000
Aug 27 02:27:08 Arzamas kernel: [20374] = 05EAC880
Aug 27 02:27:08 Arzamas kernel: RR182x [0,2]: Reset more than 3 times,
disconnect it
Aug 27 02:27:08 Arzamas kernel: sd 0:0:0:0: [sda] Result: hostbyte=0x05
driverbyte=0x25
Aug 27 02:27:08 Arzamas kernel: end_request: I/O error, dev sda, sector
7192759
Aug 27 02:27:08 Arzamas kernel: raid1: sda1: rescheduling sector 7192696
Aug 27 02:27:08 Arzamas kernel: sd 0:0:0:0: [sda] Result: hostbyte=0x00
driverbyte=0x00
Aug 27 02:27:08 Arzamas kernel: end_request: I/O error, dev sda, sector
12000319
Aug 27 02:27:08 Arzamas kernel: md: super_written gets error=-5, uptodate=0
Aug 27 02:27:08 Arzamas kernel: raid1: Disk failure on sda1, disabling
device.
Aug 27 02:27:08 Arzamas kernel: Operation continuing on 3 devices
Aug 27 02:27:08 Arzamas kernel: RAID1 conf printout:
Aug 27 02:27:08 Arzamas kernel: --- wd:3 rd:4
Aug 27 02:27:08 Arzamas kernel: disk 0, wo:1, o:0, dev:sda1
Aug 27 02:27:08 Arzamas kernel: disk 1, wo:0, o:1, dev:sdb1
Aug 27 02:27:08 Arzamas kernel: disk 2, wo:0, o:1, dev:sdd1
Aug 27 02:27:08 Arzamas kernel: disk 3, wo:0, o:1, dev:sde1
Aug 27 02:27:08 Arzamas kernel: RAID1 conf printout:
Aug 27 02:27:08 Arzamas kernel: --- wd:3 rd:4
Aug 27 02:27:08 Arzamas kernel: disk 1, wo:0, o:1, dev:sdb1
Aug 27 02:27:08 Arzamas kernel: disk 2, wo:0, o:1, dev:sdd1
Aug 27 02:27:08 Arzamas kernel: disk 3, wo:0, o:1, dev:sde1
Aug 27 02:27:08 Arzamas kernel: raid1: sdd1: redirecting sector 7192696
to another mirror
Aug 27 02:27:15 Arzamas kernel: sd 0:0:0:0: [sda] Result: hostbyte=0x00
driverbyte=0x00
Aug 27 02:27:15 Arzamas kernel: end_request: I/O error, dev sda, sector
488166955
Aug 27 02:27:15 Arzamas kernel: md: super_written gets error=-5, uptodate=0
Aug 27 02:27:15 Arzamas kernel: raid10: Disk failure on sda2, disabling
device.
Aug 27 02:27:15 Arzamas kernel: Operation continuing on 3 devices
Aug 27 02:27:16 Arzamas kernel: RAID10 conf printout:
Aug 27 02:27:16 Arzamas kernel: --- wd:3 rd:4
Aug 27 02:27:16 Arzamas kernel: disk 0, wo:1, o:0, dev:sda2
Aug 27 02:27:16 Arzamas kernel: disk 1, wo:0, o:1, dev:sdb2
Aug 27 02:27:16 Arzamas kernel: disk 2, wo:0, o:1, dev:sdd2
Aug 27 02:27:16 Arzamas kernel: disk 3, wo:0, o:1, dev:sde2
Aug 27 02:27:16 Arzamas kernel: RAID10 conf printout:
Aug 27 02:27:16 Arzamas kernel: --- wd:3 rd:4
Aug 27 02:27:16 Arzamas kernel: disk 1, wo:0, o:1, dev:sdb2
Aug 27 02:27:16 Arzamas kernel: disk 2, wo:0, o:1, dev:sdd2
Aug 27 02:27:16 Arzamas kernel: disk 3, wo:0, o:1, dev:sde2
next reply other threads:[~2008-08-28 10:03 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-08-28 10:03 Peter Rabbitson [this message]
2008-08-28 15:38 ` Help to decipher kernel io error log David Greaves
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=48B677DB.4010306@rabbit.us \
--to=rabbit+list@rabbit.us \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).