* Problems with RAID 6 across 15 disks @ 2010-04-01 13:23 Max Eaves 2010-04-01 13:49 ` Doug Ledford 0 siblings, 1 reply; 13+ messages in thread From: Max Eaves @ 2010-04-01 13:23 UTC (permalink / raw) To: linux-raid Hi there, I hope this gets through....my first posting on this dist.list. I am running Centos 5.4 with a 2.6.18-164.15.1.el5 kernel (x86_64) kernel using a rather "homebrew" backblaze system (http://blog.backblaze.com/) system. The mdadm version is: mdadm - v2.6.9 - 10th March 2009 It uses a number of Silicon Image 3124 (sIL 3124) cards and a number of multiplier port cards (sIL3132) to read a large number of disks. I have 45 disks arranged into 3 mdadm raid sets of 15 disks. These 15 disks are raided using RAID6. The problem I have is this: At random times, the RAID decides that it needs to resynchronise /dev/md10 /dev/md11 and /dev/md12. There is no error or log event in /var/log/messages, but the first thing I notice is that the performance of the RAID array drops, and checking out "cat /proc/mdadm" shows all three RAID re synchronising themselves. ARRAY /dev/md0 level=raid1 num-devices=2 uuid=7d7b19e6:56cc90cc:3cb166bd:b8086f29 (system boot) (not a problem) ARRAY /dev/md1 level=raid1 num-devices=2 uuid=3782d93d:a491ffd4:f32c1014:94a2b3f7 (system LVM) (not a problem) ARRAY /dev/md10 level=raid6 num-devices=15 uuid=5ca86e2a-3b86-4c0b-9a7a-59143bdcd0f1 (partition 1) (problem) ARRAY /dev/md11 level=raid6 num-devices=15 uuid=61188c90-4825-44c5-8fac-9bc82a5799fe (partition 2) (problem) ARRAY /dev/md12 level=raid6 num-devices=15 uuid=fa939816-1d0f-4eaa-98dd-c131449c3921 (partition 3) (problem) These re-synchronisation events take about a week to complete (the RAID is 18TB a pop) I know that the performance of this system is not great, but I wonder if this resynchronisation is occurring because of some I/O time-out. Oddly enough, a restart of the server fixes the problem for a couple of days, and then problem occurs again (humm - not good). I'm happy to post logs etc....just let me know what you need. Thanks Max ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Problems with RAID 6 across 15 disks 2010-04-01 13:23 Problems with RAID 6 across 15 disks Max Eaves @ 2010-04-01 13:49 ` Doug Ledford 2010-04-01 14:07 ` Max Eaves 0 siblings, 1 reply; 13+ messages in thread From: Doug Ledford @ 2010-04-01 13:49 UTC (permalink / raw) To: max; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 2659 bytes --] On 04/01/2010 09:23 AM, Max Eaves wrote: > Hi there, > > I hope this gets through....my first posting on this dist.list. > > I am running Centos 5.4 with a 2.6.18-164.15.1.el5 kernel (x86_64) > kernel using a rather "homebrew" backblaze system > (http://blog.backblaze.com/) system. > > The mdadm version is: mdadm - v2.6.9 - 10th March 2009 > > It uses a number of Silicon Image 3124 (sIL 3124) cards and a number of > multiplier port cards (sIL3132) to read a large number of disks. > > I have 45 disks arranged into 3 mdadm raid sets of 15 disks. These 15 > disks are raided using RAID6. > > The problem I have is this: > > At random times, the RAID decides that it needs to resynchronise > /dev/md10 /dev/md11 and /dev/md12. There is no error or log event in > /var/log/messages, but the first thing I notice is that the performance > of the RAID array drops, and checking out "cat /proc/mdadm" shows all > three RAID re synchronising themselves. > > ARRAY /dev/md0 level=raid1 num-devices=2 > uuid=7d7b19e6:56cc90cc:3cb166bd:b8086f29 (system boot) (not a problem) > ARRAY /dev/md1 level=raid1 num-devices=2 > uuid=3782d93d:a491ffd4:f32c1014:94a2b3f7 (system LVM) (not a problem) > ARRAY /dev/md10 level=raid6 num-devices=15 > uuid=5ca86e2a-3b86-4c0b-9a7a-59143bdcd0f1 (partition 1) (problem) > ARRAY /dev/md11 level=raid6 num-devices=15 > uuid=61188c90-4825-44c5-8fac-9bc82a5799fe (partition 2) (problem) > ARRAY /dev/md12 level=raid6 num-devices=15 > uuid=fa939816-1d0f-4eaa-98dd-c131449c3921 (partition 3) (problem) > > These re-synchronisation events take about a week to complete (the RAID > is 18TB a pop) > > I know that the performance of this system is not great, but I wonder if > this resynchronisation is occurring because of some I/O time-out. > > Oddly enough, a restart of the server fixes the problem for a couple of > days, and then problem occurs again (humm - not good). > > I'm happy to post logs etc....just let me know what you need. Disable /etc/cron.weekly/99-raid-check. They aren't resyncronizing, they are actually just checking themselves for consistency, but because the 2.6.18 kernel didn't have a different word for it in the output of /proc/mdstat it just looks that way. I can't remember if the version of mdadm in centos 5.4 has the /etc/sysconfig/raid-check config file, but if it does, it's easy to disable the weekly check there. -- Doug Ledford <dledford@redhat.com> GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Problems with RAID 6 across 15 disks 2010-04-01 13:49 ` Doug Ledford @ 2010-04-01 14:07 ` Max Eaves 2010-04-01 20:43 ` Neil Brown 0 siblings, 1 reply; 13+ messages in thread From: Max Eaves @ 2010-04-01 14:07 UTC (permalink / raw) To: linux-raid; +Cc: Doug Ledford Doug, Thank you very much for that; a great relief off my shoulders. You are right - there is a config file located in /etc/sysconfig/raid-check. I've changed ENABLED to no. Amazing - I've learnt something today. Thanks once again. Max On 01/04/10 14:49, Doug Ledford wrote: > On 04/01/2010 09:23 AM, Max Eaves wrote: > >> Hi there, >> >> I hope this gets through....my first posting on this dist.list. >> >> I am running Centos 5.4 with a 2.6.18-164.15.1.el5 kernel (x86_64) >> kernel using a rather "homebrew" backblaze system >> (http://blog.backblaze.com/) system. >> >> The mdadm version is: mdadm - v2.6.9 - 10th March 2009 >> >> It uses a number of Silicon Image 3124 (sIL 3124) cards and a number of >> multiplier port cards (sIL3132) to read a large number of disks. >> >> I have 45 disks arranged into 3 mdadm raid sets of 15 disks. These 15 >> disks are raided using RAID6. >> >> The problem I have is this: >> >> At random times, the RAID decides that it needs to resynchronise >> /dev/md10 /dev/md11 and /dev/md12. There is no error or log event in >> /var/log/messages, but the first thing I notice is that the performance >> of the RAID array drops, and checking out "cat /proc/mdadm" shows all >> three RAID re synchronising themselves. >> >> ARRAY /dev/md0 level=raid1 num-devices=2 >> uuid=7d7b19e6:56cc90cc:3cb166bd:b8086f29 (system boot) (not a problem) >> ARRAY /dev/md1 level=raid1 num-devices=2 >> uuid=3782d93d:a491ffd4:f32c1014:94a2b3f7 (system LVM) (not a problem) >> ARRAY /dev/md10 level=raid6 num-devices=15 >> uuid=5ca86e2a-3b86-4c0b-9a7a-59143bdcd0f1 (partition 1) (problem) >> ARRAY /dev/md11 level=raid6 num-devices=15 >> uuid=61188c90-4825-44c5-8fac-9bc82a5799fe (partition 2) (problem) >> ARRAY /dev/md12 level=raid6 num-devices=15 >> uuid=fa939816-1d0f-4eaa-98dd-c131449c3921 (partition 3) (problem) >> >> These re-synchronisation events take about a week to complete (the RAID >> is 18TB a pop) >> >> I know that the performance of this system is not great, but I wonder if >> this resynchronisation is occurring because of some I/O time-out. >> >> Oddly enough, a restart of the server fixes the problem for a couple of >> days, and then problem occurs again (humm - not good). >> >> I'm happy to post logs etc....just let me know what you need. >> > Disable /etc/cron.weekly/99-raid-check. They aren't resyncronizing, > they are actually just checking themselves for consistency, but because > the 2.6.18 kernel didn't have a different word for it in the output of > /proc/mdstat it just looks that way. I can't remember if the version of > mdadm in centos 5.4 has the /etc/sysconfig/raid-check config file, but > if it does, it's easy to disable the weekly check there. > > > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Problems with RAID 6 across 15 disks 2010-04-01 14:07 ` Max Eaves @ 2010-04-01 20:43 ` Neil Brown 2010-04-01 22:46 ` Piergiorgio Sartor 2010-04-02 5:55 ` responsiveness during raid check (Was: Problems with RAID 6 across 15 disks) Luca Berra 0 siblings, 2 replies; 13+ messages in thread From: Neil Brown @ 2010-04-01 20:43 UTC (permalink / raw) To: max; +Cc: linux-raid, Doug Ledford On Thu, 01 Apr 2010 15:07:27 +0100 Max Eaves <max@maxeaves.co.uk> wrote: > Doug, > > Thank you very much for that; a great relief off my shoulders. > > You are right - there is a config file located in > /etc/sysconfig/raid-check. I've changed ENABLED to no. However there is real value in doing that check, at least occasionally. It catches latent read errors. You might want to run it only every couple of months, and you might want to wind down one of both of the /proc/sys/dev/raid/speed_limit_* numbers so there is minimal impact on your system. But not scrubbing at all is not advisable. NeilBrown > > Amazing - I've learnt something today. > > Thanks once again. > > Max > > On 01/04/10 14:49, Doug Ledford wrote: > > On 04/01/2010 09:23 AM, Max Eaves wrote: > > > >> Hi there, > >> > >> I hope this gets through....my first posting on this dist.list. > >> > >> I am running Centos 5.4 with a 2.6.18-164.15.1.el5 kernel (x86_64) > >> kernel using a rather "homebrew" backblaze system > >> (http://blog.backblaze.com/) system. > >> > >> The mdadm version is: mdadm - v2.6.9 - 10th March 2009 > >> > >> It uses a number of Silicon Image 3124 (sIL 3124) cards and a number of > >> multiplier port cards (sIL3132) to read a large number of disks. > >> > >> I have 45 disks arranged into 3 mdadm raid sets of 15 disks. These 15 > >> disks are raided using RAID6. > >> > >> The problem I have is this: > >> > >> At random times, the RAID decides that it needs to resynchronise > >> /dev/md10 /dev/md11 and /dev/md12. There is no error or log event in > >> /var/log/messages, but the first thing I notice is that the performance > >> of the RAID array drops, and checking out "cat /proc/mdadm" shows all > >> three RAID re synchronising themselves. > >> > >> ARRAY /dev/md0 level=raid1 num-devices=2 > >> uuid=7d7b19e6:56cc90cc:3cb166bd:b8086f29 (system boot) (not a problem) > >> ARRAY /dev/md1 level=raid1 num-devices=2 > >> uuid=3782d93d:a491ffd4:f32c1014:94a2b3f7 (system LVM) (not a problem) > >> ARRAY /dev/md10 level=raid6 num-devices=15 > >> uuid=5ca86e2a-3b86-4c0b-9a7a-59143bdcd0f1 (partition 1) (problem) > >> ARRAY /dev/md11 level=raid6 num-devices=15 > >> uuid=61188c90-4825-44c5-8fac-9bc82a5799fe (partition 2) (problem) > >> ARRAY /dev/md12 level=raid6 num-devices=15 > >> uuid=fa939816-1d0f-4eaa-98dd-c131449c3921 (partition 3) (problem) > >> > >> These re-synchronisation events take about a week to complete (the RAID > >> is 18TB a pop) > >> > >> I know that the performance of this system is not great, but I wonder if > >> this resynchronisation is occurring because of some I/O time-out. > >> > >> Oddly enough, a restart of the server fixes the problem for a couple of > >> days, and then problem occurs again (humm - not good). > >> > >> I'm happy to post logs etc....just let me know what you need. > >> > > Disable /etc/cron.weekly/99-raid-check. They aren't resyncronizing, > > they are actually just checking themselves for consistency, but because > > the 2.6.18 kernel didn't have a different word for it in the output of > > /proc/mdstat it just looks that way. I can't remember if the version of > > mdadm in centos 5.4 has the /etc/sysconfig/raid-check config file, but > > if it does, it's easy to disable the weekly check there. > > > > > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Problems with RAID 6 across 15 disks 2010-04-01 20:43 ` Neil Brown @ 2010-04-01 22:46 ` Piergiorgio Sartor 2010-04-01 22:58 ` Jools Wills 2010-04-02 5:55 ` responsiveness during raid check (Was: Problems with RAID 6 across 15 disks) Luca Berra 1 sibling, 1 reply; 13+ messages in thread From: Piergiorgio Sartor @ 2010-04-01 22:46 UTC (permalink / raw) To: Neil Brown; +Cc: max, linux-raid, Doug Ledford Hi, > However there is real value in doing that check, at least occasionally. It > catches latent read errors. but since it is not possible to correct those errors, there is no point in doing it... :-) Sorry, I couldn't resist... ;-) bye, -- piergiorgio ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Problems with RAID 6 across 15 disks 2010-04-01 22:46 ` Piergiorgio Sartor @ 2010-04-01 22:58 ` Jools Wills 2010-04-01 23:04 ` Piergiorgio Sartor 0 siblings, 1 reply; 13+ messages in thread From: Jools Wills @ 2010-04-01 22:58 UTC (permalink / raw) To: Piergiorgio Sartor; +Cc: Neil Brown, max, linux-raid, Doug Ledford On Fri, 2010-04-02 at 00:46 +0200, Piergiorgio Sartor wrote: > Hi, > > > However there is real value in doing that check, at least occasionally. It > > catches latent read errors. > > but since it is not possible to correct those errors, > there is no point in doing it... :-) Well it can. It can try and rewrite the block based on the data from the other disks, and if the drive needs to, it can remap the bad block. Best Regards Jools Jools Wills -- IT Consultant Oxford Inspire - http://www.oxfordinspire.co.uk - be inspired t: 01235 519446 m: 07966 577498 jools@oxfordinspire.co.uk ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Problems with RAID 6 across 15 disks 2010-04-01 22:58 ` Jools Wills @ 2010-04-01 23:04 ` Piergiorgio Sartor 2010-04-01 23:46 ` Michael Evans 2010-04-02 1:40 ` Jools Wills 0 siblings, 2 replies; 13+ messages in thread From: Piergiorgio Sartor @ 2010-04-01 23:04 UTC (permalink / raw) To: Jools Wills; +Cc: Piergiorgio Sartor, Neil Brown, max, linux-raid, Doug Ledford Hi, > > but since it is not possible to correct those errors, > > there is no point in doing it... :-) > > Well it can. It can try and rewrite the block based on the data from the > other disks, and if the drive needs to, it can remap the bad block. you might be unaware of the repeated neverending discussions about this topic. It is *possible* to do it, but, as of today, it cannot do it. I mean, there is no functionality, in the RAID-6, to detect and correct those errors using the available double parity. Consider that the RAID check returns only how many mismatch are present, not where they are, i.e. on which disks. bye, -- piergiorgio ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Problems with RAID 6 across 15 disks 2010-04-01 23:04 ` Piergiorgio Sartor @ 2010-04-01 23:46 ` Michael Evans 2010-04-02 1:40 ` Jools Wills 1 sibling, 0 replies; 13+ messages in thread From: Michael Evans @ 2010-04-01 23:46 UTC (permalink / raw) To: Piergiorgio Sartor; +Cc: Jools Wills, Neil Brown, max, linux-raid, Doug Ledford On Thu, Apr 1, 2010 at 4:04 PM, Piergiorgio Sartor <piergiorgio.sartor@nexgo.de> wrote: > Hi, > >> > but since it is not possible to correct those errors, >> > there is no point in doing it... :-) >> >> Well it can. It can try and rewrite the block based on the data from the >> other disks, and if the drive needs to, it can remap the bad block. > > you might be unaware of the repeated neverending > discussions about this topic. > > It is *possible* to do it, but, as of today, it > cannot do it. > > I mean, there is no functionality, in the RAID-6, to > detect and correct those errors using the available > double parity. > > Consider that the RAID check returns only how many > mismatch are present, not where they are, i.e. on > which disks. > > bye, > > -- > > piergiorgio > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > You are correct in that /silent/ errors cannot be detected, however drives typically do not verify writes and if for whatever reason a sector that was written cannot be read back the drive will /eventually/ return an error. At this point a re-write is issued based on the data recovered from the other drives. Only if that fails is the drive kicked from the array. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Problems with RAID 6 across 15 disks 2010-04-01 23:04 ` Piergiorgio Sartor 2010-04-01 23:46 ` Michael Evans @ 2010-04-02 1:40 ` Jools Wills 2010-04-02 5:03 ` Neil Brown 1 sibling, 1 reply; 13+ messages in thread From: Jools Wills @ 2010-04-02 1:40 UTC (permalink / raw) To: Piergiorgio Sartor; +Cc: Neil Brown, max, linux-raid, Doug Ledford On Fri, 2010-04-02 at 01:04 +0200, Piergiorgio Sartor wrote: > you might be unaware of the repeated neverending > discussions about this topic. yup :) > It is *possible* to do it, but, as of today, it > cannot do it. > I mean, there is no functionality, in the RAID-6, to > detect and correct those errors using the available > double parity. Is this the same for raid 5 or specifically a raid 6 issue on linux ? I had assumed that with my raid5 array, if the raid check finds an error it will attempt to rewrite back to the disk, and then read again, and carry on if everything is ok. Best Regards Jools Jools Wills -- IT Consultant Oxford Inspire - http://www.oxfordinspire.co.uk - be inspired t: 01235 519446 m: 07966 577498 jools@oxfordinspire.co.uk ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Problems with RAID 6 across 15 disks 2010-04-02 1:40 ` Jools Wills @ 2010-04-02 5:03 ` Neil Brown 2010-04-02 8:22 ` Piergiorgio Sartor 2010-04-02 10:21 ` Max Eaves 0 siblings, 2 replies; 13+ messages in thread From: Neil Brown @ 2010-04-02 5:03 UTC (permalink / raw) To: jools; +Cc: Piergiorgio Sartor, max, linux-raid, Doug Ledford On Fri, 02 Apr 2010 02:40:13 +0100 Jools Wills <jools@oxfordinspire.co.uk> wrote: > On Fri, 2010-04-02 at 01:04 +0200, Piergiorgio Sartor wrote: > > you might be unaware of the repeated neverending > > discussions about this topic. > > yup :) > > > It is *possible* to do it, but, as of today, it > > cannot do it. > > I mean, there is no functionality, in the RAID-6, to > > detect and correct those errors using the available > > double parity. > > Is this the same for raid 5 or specifically a raid 6 issue on linux ? > > I had assumed that with my raid5 array, if the raid check finds an error > it will attempt to rewrite back to the disk, and then read again, and > carry on if everything is ok. Piergiogio is confusing you. Maybe he is confused himself. The most likely cause of error on modern drives is media problem. Maybe the data wasn't stored well, or maybe the charge in the media decayed. When you have trillions of bytes on a drive, the chance of something going wrong becomes quite significant. When this happens the drive will notice while reading and will report an error (after trying a few times). It detects an error because an error-detecting code (CRC?) reported an error. When this happens on a non-degraded array (RAID 1,10,4,5,6) md will recover the data from elsewhere and write out good data, which will normally fix the problem. Ofcourse md cannot do this if it never reads the data, and on a terabyte drive there is probably lots of data that won't be read often. So a regular check pass to 'scrub' the device is a good ideas as it will find these sleeping bad blocks by reading every single block. It doesn't have to be weekly, or even monthly. But regular is important. You need to find a frequency and speed that matches your storage size and throughput requirements, and how cautious you feel. The situation which Piergiogio is referring to is quite different. It is conceivably possible for wrong data to be written and a matching CRC to be written with it. In this case the drive doesn't notice so md doesn't notice. If you know the source of the error, or catch it before any write happens on the same stripe, then it is possible on RAID6 or RAID1 with >2 drives to work out with high probability which block has wrong data, and to fix it. This sort of problem is much more rare, and is very likely to be accompanied by other error the could well lead to general system failure. Bad memory, bit flips on a bus that is not ECC protected, things like that. As I said, it only make sense to attempt to 'correct' this if you know that the stripe has not be written to since the error occurred. You can only really know this if you check for errors before every write. We don't do that and it would be a significant performance impact (I expect) to do so. It does not make sense to try to fix these extreme rare possible errors on a regular scan. It does make sense to report them with more detail than we currently do. Patches always welcome. http://neil.brown.name/blog/20100211050355 NeilBrown ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Problems with RAID 6 across 15 disks 2010-04-02 5:03 ` Neil Brown @ 2010-04-02 8:22 ` Piergiorgio Sartor 2010-04-02 10:21 ` Max Eaves 1 sibling, 0 replies; 13+ messages in thread From: Piergiorgio Sartor @ 2010-04-02 8:22 UTC (permalink / raw) To: Neil Brown; +Cc: jools, Piergiorgio Sartor, max, linux-raid, Doug Ledford Hi, as usual very precise... :-) It's only that I like the topic, maybe someday someone will provide some patches, if there is a regular "scrubbing"... ;-) bye, -- piergiorgio ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Problems with RAID 6 across 15 disks 2010-04-02 5:03 ` Neil Brown 2010-04-02 8:22 ` Piergiorgio Sartor @ 2010-04-02 10:21 ` Max Eaves 1 sibling, 0 replies; 13+ messages in thread From: Max Eaves @ 2010-04-02 10:21 UTC (permalink / raw) To: linux-raid Dear all, Thank you all very much for everybody's replies over the past 24 hours on this. It did make me chuckle on how I seem to have wandered into a hornets nest and given it a jolly good stir. So - I have decided that what I will do is make the checking script a bi-monthly process (it runs every other month), in a new folder on my server called /etc/cron.bimonthly and referenced it in /etc/crontab I feel what should really happen is a more sensible checking of the raid arrays, and instead of scanning every single RAID array at the same time (not good for my I/O). I have a slow PCI-X 133Mhz bus here where my RAID cards are connected into, so I feel that this is the way forward. I'll see what I can do in that direction. Thanks Max ^ permalink raw reply [flat|nested] 13+ messages in thread
* responsiveness during raid check (Was: Problems with RAID 6 across 15 disks) 2010-04-01 20:43 ` Neil Brown 2010-04-01 22:46 ` Piergiorgio Sartor @ 2010-04-02 5:55 ` Luca Berra 1 sibling, 0 replies; 13+ messages in thread From: Luca Berra @ 2010-04-02 5:55 UTC (permalink / raw) To: linux-raid On Fri, Apr 02, 2010 at 07:43:25AM +1100, Neil Brown wrote: >On Thu, 01 Apr 2010 15:07:27 +0100 >Max Eaves <max@maxeaves.co.uk> wrote: > >> Doug, >> >> Thank you very much for that; a great relief off my shoulders. >> >> You are right - there is a config file located in >> /etc/sysconfig/raid-check. I've changed ENABLED to no. > >However there is real value in doing that check, at least occasionally. It >catches latent read errors. > >You might want to run it only every couple of months, and you might want to >wind down one of both of the /proc/sys/dev/raid/speed_limit_* numbers so >there is minimal impact on your system. > sorry if i am hijacking, but i got a report from one user that the scheduled scrubbing is severely impacting responsiveness, lowering the speed_limits seems to help a bit, but he reports it is still sluggish, i always believed the check should use idle time, and not impact performance that much. could it be scheduler related? regards, L. -- Luca Berra -- bluca@comedia.it Communication Media & Services S.r.l. /"\ \ / ASCII RIBBON CAMPAIGN X AGAINST HTML MAIL / \ ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2010-04-02 10:21 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-04-01 13:23 Problems with RAID 6 across 15 disks Max Eaves 2010-04-01 13:49 ` Doug Ledford 2010-04-01 14:07 ` Max Eaves 2010-04-01 20:43 ` Neil Brown 2010-04-01 22:46 ` Piergiorgio Sartor 2010-04-01 22:58 ` Jools Wills 2010-04-01 23:04 ` Piergiorgio Sartor 2010-04-01 23:46 ` Michael Evans 2010-04-02 1:40 ` Jools Wills 2010-04-02 5:03 ` Neil Brown 2010-04-02 8:22 ` Piergiorgio Sartor 2010-04-02 10:21 ` Max Eaves 2010-04-02 5:55 ` responsiveness during raid check (Was: Problems with RAID 6 across 15 disks) Luca Berra
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox