array sync looping

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* array sync looping
@ 2006-12-08  5:19 linux-raid-kernel-org
  2006-12-13 22:42 ` Frequent SATA errors / port timeouts in 2.6.18.3? Patrik Jonsson
  0 siblings, 1 reply; 3+ messages in thread
From: linux-raid-kernel-org @ 2006-12-08  5:19 UTC (permalink / raw)
  To: linux-raid

Hi -

I recently upgraded to the 2.6.17-2-686 SMP kernel image (debian).

Now, when I hot-add a disk (mdadm /dev/md1 --add /dev/hdc3) the array 
resyncs to 100%, a bunch of errors appear and then the array resync 
starts from 0% again.

mdadm --zero-superblock on the "new" disk doesn't help.

Any suggestions on what might be happening or how to debug/fix would 
be welcome.

Thanks -

Leni

$ dmesg  # output truncated
...
md: md1: sync done.
hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=390716106, 
high=23, low=4840138, sector=390716042
ide: failed opcode was: unknown
end_request: I/O error, dev hda, sector 390716042
...
raid1: hda: unrecoverable I/O read error for block 388499072
RAID1 conf printout:
  --- wd:1 rd:3
  disk 0, wo:1, o:1, dev:hdc3
  disk 2, wo:0, o:1, dev:hda3
RAID1 conf printout:
  --- wd:1 rd:3
  disk 2, wo:0, o:1, dev:hda3
RAID1 conf printout:
  --- wd:1 rd:3
  disk 0, wo:1, o:1, dev:hdc3
  disk 2, wo:0, o:1, dev:hda3
md: syncing RAID array md
...

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Frequent SATA errors / port timeouts in 2.6.18.3?
  2006-12-08  5:19 array sync looping linux-raid-kernel-org
@ 2006-12-13 22:42 ` Patrik Jonsson
  2006-12-14  8:40   ` David Greaves
  0 siblings, 1 reply; 3+ messages in thread
From: Patrik Jonsson @ 2006-12-13 22:42 UTC (permalink / raw)
  Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 2329 bytes --]

Hi all,
this may not be the best list for this question, but I figure that the
number of disks connected to users here should be pretty big...

I upgraded from 2.6.17-rc4 to 2.6.18.3 about a week ago, and I've since
had 3 drives kicked out of my 10-drive RAID5 array. Previously, I had no
kicks over almost a year. The kernel message is:

ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata7.00: (BMDMA stat 0x20)
ata7.00: tag 0 cmd 0xc8 Emask 0x1 stat 0x41 err 0x4 (device error)
ata7: EH complete
SCSI device sdc: 488397168 512-byte hdwr sectors (250059 MB)
sdc: Write Protect is off
sdc: Mode Sense: 00 3a 00 00
SCSI device sdc: drive cache: write back
ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata7.00: (BMDMA stat 0x20)
ata7.00: tag 0 cmd 0xca Emask 0x4 stat 0x40 err 0x0 (timeout)
ata7: port is slow to respond, please be patient
ata7: port failed to respond (30 secs)
ata7: soft resetting port
ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata7.00: failed to IDENTIFY (I/O error, err_mask=0x2)
ata7.00: revalidation failed (errno=-5)
ata7: failed to recover some devices, retrying in 5 secs
ata7: hard resetting port
ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata7.00: failed to IDENTIFY (I/O error, err_mask=0x2)
ata7.00: revalidation failed (errno=-5)
ata7: failed to recover some devices, retrying in 5 secs
ata7: hard resetting port
ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata7.00: failed to IDENTIFY (I/O error, err_mask=0x2)
ata7.00: revalidation failed (errno=-5)
ata7.00: disabled
ata7: EH complete

First I thought it was a cabling or card issue, because the same drive
got kicked twice. That drive was connected to a 2-port SIG sata_sil24
card. However, I just had another drive kicked that's connected to
sata_nv, which leads me to suspect that the upgraded kernel might have
something to do with it. A quick googling seems to indicate that others
are seeing this with 2.6.18, too, so I was wondering if anyone here
knows more.

I did at the same time also install an Areca ARC1260 controller and
connected a bunch of drives to it, so another idea I had was cable
interference or something (there are now 18 drives in the machine).

Any ideas or thought would be appreciated,

/Patrik

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Frequent SATA errors / port timeouts in 2.6.18.3?
  2006-12-13 22:42 ` Frequent SATA errors / port timeouts in 2.6.18.3? Patrik Jonsson
@ 2006-12-14  8:40   ` David Greaves
  0 siblings, 0 replies; 3+ messages in thread
From: David Greaves @ 2006-12-14  8:40 UTC (permalink / raw)
  To: Patrik Jonsson; +Cc: linux-raid

Patrik Jonsson wrote:
> Hi all,
> this may not be the best list for this question, but I figure that the
> number of disks connected to users here should be pretty big...
> 
> I upgraded from 2.6.17-rc4 to 2.6.18.3 about a week ago, and I've since
> had 3 drives kicked out of my 10-drive RAID5 array. Previously, I had no
> kicks over almost a year. The kernel message is:
> 
> ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> ata7.00: (BMDMA stat 0x20)
> ata7.00: tag 0 cmd 0xc8 Emask 0x1 stat 0x41 err 0x4 (device error)
> ata7: EH complete

> Any ideas or thought would be appreciated,
SMART?

Read the manpage and then try running:
smartctl -data -S on /dev/...
and
smartctl -data -s on /dev/...

Then look at your smartd timing and see if it's related; possibly just do a
manual smartd poll.

I've had smart/libata problems (well, no, glitches) for about 2 years now but as
the irq handler occasionally says "no one cared" ;)

It may well not be your problem but...

David

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2006-12-14  8:40 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-12-08  5:19 array sync looping linux-raid-kernel-org
2006-12-13 22:42 ` Frequent SATA errors / port timeouts in 2.6.18.3? Patrik Jonsson
2006-12-14  8:40   ` David Greaves

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).