Re: Problems with RAID 6 across 15 disks

Linux RAID subsystem development
 help / color / mirror / Atom feed

From: Neil Brown <neilb@suse.de>
To: max@maxeaves.co.uk
Cc: linux-raid@vger.kernel.org, Doug Ledford <dledford@redhat.com>
Subject: Re: Problems with RAID 6 across 15 disks
Date: Fri, 2 Apr 2010 07:43:25 +1100	[thread overview]
Message-ID: <20100402074325.3ce34e8f@notabene.brown> (raw)
In-Reply-To: <4BB4A89F.7030707@maxeaves.co.uk>

On Thu, 01 Apr 2010 15:07:27 +0100
Max Eaves <max@maxeaves.co.uk> wrote:

> Doug,
> 
> Thank you very much for that; a great relief off my shoulders.
> 
> You are right - there is a config file located in 
> /etc/sysconfig/raid-check.  I've changed ENABLED to no.

However there is real value in doing that check, at least occasionally.  It
catches latent read errors.

You might want to run it only every couple of months, and you might want to
wind down one of both of the /proc/sys/dev/raid/speed_limit_* numbers so
there is minimal impact on your system.

But not scrubbing at all is not advisable.

NeilBrown


> 
> Amazing - I've learnt something today.
> 
> Thanks once again.
> 
> Max
> 
> On 01/04/10 14:49, Doug Ledford wrote:
> > On 04/01/2010 09:23 AM, Max Eaves wrote:
> >    
> >> Hi there,
> >>
> >> I hope this gets through....my first posting on this dist.list.
> >>
> >> I am running Centos 5.4 with a 2.6.18-164.15.1.el5 kernel (x86_64)
> >> kernel using a rather "homebrew" backblaze system
> >> (http://blog.backblaze.com/) system.
> >>
> >> The mdadm version is: mdadm - v2.6.9 - 10th March 2009
> >>
> >> It uses a number of Silicon Image 3124 (sIL 3124) cards and a number of
> >> multiplier port cards (sIL3132) to read a large number of disks.
> >>
> >> I have 45 disks arranged into 3 mdadm raid sets of 15 disks.  These 15
> >> disks are raided using RAID6.
> >>
> >> The problem I have is this:
> >>
> >> At random times, the RAID decides that it needs to resynchronise
> >> /dev/md10 /dev/md11 and /dev/md12.  There is no error or log event in
> >> /var/log/messages, but the first thing I notice is that the performance
> >> of the RAID array drops, and checking out "cat /proc/mdadm" shows all
> >> three RAID re synchronising themselves.
> >>
> >> ARRAY /dev/md0 level=raid1 num-devices=2
> >> uuid=7d7b19e6:56cc90cc:3cb166bd:b8086f29 (system boot) (not a problem)
> >> ARRAY /dev/md1 level=raid1 num-devices=2
> >> uuid=3782d93d:a491ffd4:f32c1014:94a2b3f7 (system LVM) (not a problem)
> >> ARRAY /dev/md10 level=raid6 num-devices=15
> >> uuid=5ca86e2a-3b86-4c0b-9a7a-59143bdcd0f1 (partition 1) (problem)
> >> ARRAY /dev/md11 level=raid6 num-devices=15
> >> uuid=61188c90-4825-44c5-8fac-9bc82a5799fe (partition 2) (problem)
> >> ARRAY /dev/md12 level=raid6 num-devices=15
> >> uuid=fa939816-1d0f-4eaa-98dd-c131449c3921 (partition 3) (problem)
> >>
> >> These re-synchronisation events take about a week to complete (the RAID
> >> is 18TB a pop)
> >>
> >> I know that the performance of this system is not great, but I wonder if
> >> this resynchronisation is occurring because of some I/O time-out.
> >>
> >> Oddly enough, a restart of the server fixes the problem for a couple of
> >> days, and then problem occurs again (humm - not good).
> >>
> >> I'm happy to post logs etc....just let me know what you need.
> >>      
> > Disable /etc/cron.weekly/99-raid-check.  They aren't resyncronizing,
> > they are actually just checking themselves for consistency, but because
> > the 2.6.18 kernel didn't have a different word for it in the output of
> > /proc/mdstat it just looks that way.  I can't remember if the version of
> > mdadm in centos 5.4 has the /etc/sysconfig/raid-check config file, but
> > if it does, it's easy to disable the weekly check there.
> >
> >
> >    
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

next prev parent reply	other threads:[~2010-04-01 20:43 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-04-01 13:23 Problems with RAID 6 across 15 disks Max Eaves
2010-04-01 13:49 ` Doug Ledford
2010-04-01 14:07   ` Max Eaves
2010-04-01 20:43     ` Neil Brown [this message]
2010-04-01 22:46       ` Piergiorgio Sartor
2010-04-01 22:58         ` Jools Wills
2010-04-01 23:04           ` Piergiorgio Sartor
2010-04-01 23:46             ` Michael Evans
2010-04-02  1:40             ` Jools Wills
2010-04-02  5:03               ` Neil Brown
2010-04-02  8:22                 ` Piergiorgio Sartor
2010-04-02 10:21                 ` Max Eaves
2010-04-02  5:55       ` responsiveness during raid check (Was: Problems with RAID 6 across 15 disks) Luca Berra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100402074325.3ce34e8f@notabene.brown \
    --to=neilb@suse.de \
    --cc=dledford@redhat.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=max@maxeaves.co.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox