* array sync looping
@ 2006-12-08 5:19 linux-raid-kernel-org
2006-12-13 22:42 ` Frequent SATA errors / port timeouts in 2.6.18.3? Patrik Jonsson
0 siblings, 1 reply; 3+ messages in thread
From: linux-raid-kernel-org @ 2006-12-08 5:19 UTC (permalink / raw)
To: linux-raid
Hi -
I recently upgraded to the 2.6.17-2-686 SMP kernel image (debian).
Now, when I hot-add a disk (mdadm /dev/md1 --add /dev/hdc3) the array
resyncs to 100%, a bunch of errors appear and then the array resync
starts from 0% again.
mdadm --zero-superblock on the "new" disk doesn't help.
Any suggestions on what might be happening or how to debug/fix would
be welcome.
Thanks -
Leni
$ dmesg # output truncated
...
md: md1: sync done.
hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=390716106,
high=23, low=4840138, sector=390716042
ide: failed opcode was: unknown
end_request: I/O error, dev hda, sector 390716042
...
raid1: hda: unrecoverable I/O read error for block 388499072
RAID1 conf printout:
--- wd:1 rd:3
disk 0, wo:1, o:1, dev:hdc3
disk 2, wo:0, o:1, dev:hda3
RAID1 conf printout:
--- wd:1 rd:3
disk 2, wo:0, o:1, dev:hda3
RAID1 conf printout:
--- wd:1 rd:3
disk 0, wo:1, o:1, dev:hdc3
disk 2, wo:0, o:1, dev:hda3
md: syncing RAID array md
...
^ permalink raw reply [flat|nested] 3+ messages in thread
* Frequent SATA errors / port timeouts in 2.6.18.3?
2006-12-08 5:19 array sync looping linux-raid-kernel-org
@ 2006-12-13 22:42 ` Patrik Jonsson
2006-12-14 8:40 ` David Greaves
0 siblings, 1 reply; 3+ messages in thread
From: Patrik Jonsson @ 2006-12-13 22:42 UTC (permalink / raw)
Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 2329 bytes --]
Hi all,
this may not be the best list for this question, but I figure that the
number of disks connected to users here should be pretty big...
I upgraded from 2.6.17-rc4 to 2.6.18.3 about a week ago, and I've since
had 3 drives kicked out of my 10-drive RAID5 array. Previously, I had no
kicks over almost a year. The kernel message is:
ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata7.00: (BMDMA stat 0x20)
ata7.00: tag 0 cmd 0xc8 Emask 0x1 stat 0x41 err 0x4 (device error)
ata7: EH complete
SCSI device sdc: 488397168 512-byte hdwr sectors (250059 MB)
sdc: Write Protect is off
sdc: Mode Sense: 00 3a 00 00
SCSI device sdc: drive cache: write back
ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata7.00: (BMDMA stat 0x20)
ata7.00: tag 0 cmd 0xca Emask 0x4 stat 0x40 err 0x0 (timeout)
ata7: port is slow to respond, please be patient
ata7: port failed to respond (30 secs)
ata7: soft resetting port
ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata7.00: failed to IDENTIFY (I/O error, err_mask=0x2)
ata7.00: revalidation failed (errno=-5)
ata7: failed to recover some devices, retrying in 5 secs
ata7: hard resetting port
ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata7.00: failed to IDENTIFY (I/O error, err_mask=0x2)
ata7.00: revalidation failed (errno=-5)
ata7: failed to recover some devices, retrying in 5 secs
ata7: hard resetting port
ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata7.00: failed to IDENTIFY (I/O error, err_mask=0x2)
ata7.00: revalidation failed (errno=-5)
ata7.00: disabled
ata7: EH complete
First I thought it was a cabling or card issue, because the same drive
got kicked twice. That drive was connected to a 2-port SIG sata_sil24
card. However, I just had another drive kicked that's connected to
sata_nv, which leads me to suspect that the upgraded kernel might have
something to do with it. A quick googling seems to indicate that others
are seeing this with 2.6.18, too, so I was wondering if anyone here
knows more.
I did at the same time also install an Areca ARC1260 controller and
connected a bunch of drives to it, so another idea I had was cable
interference or something (there are now 18 drives in the machine).
Any ideas or thought would be appreciated,
/Patrik
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Frequent SATA errors / port timeouts in 2.6.18.3?
2006-12-13 22:42 ` Frequent SATA errors / port timeouts in 2.6.18.3? Patrik Jonsson
@ 2006-12-14 8:40 ` David Greaves
0 siblings, 0 replies; 3+ messages in thread
From: David Greaves @ 2006-12-14 8:40 UTC (permalink / raw)
To: Patrik Jonsson; +Cc: linux-raid
Patrik Jonsson wrote:
> Hi all,
> this may not be the best list for this question, but I figure that the
> number of disks connected to users here should be pretty big...
>
> I upgraded from 2.6.17-rc4 to 2.6.18.3 about a week ago, and I've since
> had 3 drives kicked out of my 10-drive RAID5 array. Previously, I had no
> kicks over almost a year. The kernel message is:
>
> ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> ata7.00: (BMDMA stat 0x20)
> ata7.00: tag 0 cmd 0xc8 Emask 0x1 stat 0x41 err 0x4 (device error)
> ata7: EH complete
> Any ideas or thought would be appreciated,
SMART?
Read the manpage and then try running:
smartctl -data -S on /dev/...
and
smartctl -data -s on /dev/...
Then look at your smartd timing and see if it's related; possibly just do a
manual smartd poll.
I've had smart/libata problems (well, no, glitches) for about 2 years now but as
the irq handler occasionally says "no one cared" ;)
It may well not be your problem but...
David
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2006-12-14 8:40 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-12-08 5:19 array sync looping linux-raid-kernel-org
2006-12-13 22:42 ` Frequent SATA errors / port timeouts in 2.6.18.3? Patrik Jonsson
2006-12-14 8:40 ` David Greaves
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).