Linux RAID subsystem development
 help / color / mirror / Atom feed
* Problems with RAID 6 across 15 disks
@ 2010-04-01 13:23 Max Eaves
  2010-04-01 13:49 ` Doug Ledford
  0 siblings, 1 reply; 13+ messages in thread
From: Max Eaves @ 2010-04-01 13:23 UTC (permalink / raw)
  To: linux-raid

Hi there,

I hope this gets through....my first posting on this dist.list.

I am running Centos 5.4 with a 2.6.18-164.15.1.el5 kernel (x86_64) 
kernel using a rather "homebrew" backblaze system 
(http://blog.backblaze.com/) system.

The mdadm version is: mdadm - v2.6.9 - 10th March 2009

It uses a number of Silicon Image 3124 (sIL 3124) cards and a number of 
multiplier port cards (sIL3132) to read a large number of disks.

I have 45 disks arranged into 3 mdadm raid sets of 15 disks.  These 15 
disks are raided using RAID6.

The problem I have is this:

At random times, the RAID decides that it needs to resynchronise 
/dev/md10 /dev/md11 and /dev/md12.  There is no error or log event in 
/var/log/messages, but the first thing I notice is that the performance 
of the RAID array drops, and checking out "cat /proc/mdadm" shows all 
three RAID re synchronising themselves.

ARRAY /dev/md0 level=raid1 num-devices=2 
uuid=7d7b19e6:56cc90cc:3cb166bd:b8086f29 (system boot) (not a problem)
ARRAY /dev/md1 level=raid1 num-devices=2 
uuid=3782d93d:a491ffd4:f32c1014:94a2b3f7 (system LVM) (not a problem)
ARRAY /dev/md10 level=raid6 num-devices=15 
uuid=5ca86e2a-3b86-4c0b-9a7a-59143bdcd0f1 (partition 1) (problem)
ARRAY /dev/md11 level=raid6 num-devices=15 
uuid=61188c90-4825-44c5-8fac-9bc82a5799fe (partition 2) (problem)
ARRAY /dev/md12 level=raid6 num-devices=15 
uuid=fa939816-1d0f-4eaa-98dd-c131449c3921 (partition 3) (problem)

These re-synchronisation events take about a week to complete (the RAID 
is 18TB a pop)

I know that the performance of this system is not great, but I wonder if 
this resynchronisation is occurring because of some I/O time-out.

Oddly enough, a restart of the server fixes the problem for a couple of 
days, and then problem occurs again (humm - not good).

I'm happy to post logs etc....just let me know what you need.

Thanks




Max

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2010-04-02 10:21 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-01 13:23 Problems with RAID 6 across 15 disks Max Eaves
2010-04-01 13:49 ` Doug Ledford
2010-04-01 14:07   ` Max Eaves
2010-04-01 20:43     ` Neil Brown
2010-04-01 22:46       ` Piergiorgio Sartor
2010-04-01 22:58         ` Jools Wills
2010-04-01 23:04           ` Piergiorgio Sartor
2010-04-01 23:46             ` Michael Evans
2010-04-02  1:40             ` Jools Wills
2010-04-02  5:03               ` Neil Brown
2010-04-02  8:22                 ` Piergiorgio Sartor
2010-04-02 10:21                 ` Max Eaves
2010-04-02  5:55       ` responsiveness during raid check (Was: Problems with RAID 6 across 15 disks) Luca Berra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox