* raid5 software vs hardware: parity calculations?
@ 2007-01-11 22:44 James Ralston
2007-01-12 17:39 ` dean gaudet
0 siblings, 1 reply; 18+ messages in thread
From: James Ralston @ 2007-01-11 22:44 UTC (permalink / raw)
To: linux-raid
I'm having a discussion with a coworker concerning the cost of md's
raid5 implementation versus hardware raid5 implementations.
Specifically, he states:
> The performance [of raid5 in hardware] is so much better with the
> write-back caching on the card and the offload of the parity, it
> seems to me that the minor increase in work of having to upgrade the
> firmware if there's a buggy one is a highly acceptable trade-off to
> the increased performance. The md driver still commits you to
> longer run queues since IO calls to disk, parity calculator and the
> subsequent kflushd operations are non-interruptible in the CPU. A
> RAID card with write-back cache releases the IO operation virtually
> instantaneously.
It would seem that his comments have merit, as there appears to be
work underway to move stripe operations outside of the spinlock:
http://lwn.net/Articles/184102/
What I'm curious about is this: for real-world situations, how much
does this matter? In other words, how hard do you have to push md
raid5 before doing dedicated hardware raid5 becomes a real win?
James
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: raid5 software vs hardware: parity calculations? 2007-01-11 22:44 raid5 software vs hardware: parity calculations? James Ralston @ 2007-01-12 17:39 ` dean gaudet 2007-01-12 20:34 ` James Ralston 0 siblings, 1 reply; 18+ messages in thread From: dean gaudet @ 2007-01-12 17:39 UTC (permalink / raw) To: James Ralston; +Cc: linux-raid On Thu, 11 Jan 2007, James Ralston wrote: > I'm having a discussion with a coworker concerning the cost of md's > raid5 implementation versus hardware raid5 implementations. > > Specifically, he states: > > > The performance [of raid5 in hardware] is so much better with the > > write-back caching on the card and the offload of the parity, it > > seems to me that the minor increase in work of having to upgrade the > > firmware if there's a buggy one is a highly acceptable trade-off to > > the increased performance. The md driver still commits you to > > longer run queues since IO calls to disk, parity calculator and the > > subsequent kflushd operations are non-interruptible in the CPU. A > > RAID card with write-back cache releases the IO operation virtually > > instantaneously. > > It would seem that his comments have merit, as there appears to be > work underway to move stripe operations outside of the spinlock: > > http://lwn.net/Articles/184102/ > > What I'm curious about is this: for real-world situations, how much > does this matter? In other words, how hard do you have to push md > raid5 before doing dedicated hardware raid5 becomes a real win? hardware with battery backed write cache is going to beat the software at small write traffic latency essentially all the time but it's got nothing to do with the parity computation. -dean ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: raid5 software vs hardware: parity calculations? 2007-01-12 17:39 ` dean gaudet @ 2007-01-12 20:34 ` James Ralston 2007-01-13 9:20 ` Dan Williams 0 siblings, 1 reply; 18+ messages in thread From: James Ralston @ 2007-01-12 20:34 UTC (permalink / raw) To: linux-raid On 2007-01-12 at 09:39-08 dean gaudet <dean@arctic.org> wrote: > On Thu, 11 Jan 2007, James Ralston wrote: > > > I'm having a discussion with a coworker concerning the cost of > > md's raid5 implementation versus hardware raid5 implementations. > > > > Specifically, he states: > > > > > The performance [of raid5 in hardware] is so much better with > > > the write-back caching on the card and the offload of the > > > parity, it seems to me that the minor increase in work of having > > > to upgrade the firmware if there's a buggy one is a highly > > > acceptable trade-off to the increased performance. The md > > > driver still commits you to longer run queues since IO calls to > > > disk, parity calculator and the subsequent kflushd operations > > > are non-interruptible in the CPU. A RAID card with write-back > > > cache releases the IO operation virtually instantaneously. > > > > It would seem that his comments have merit, as there appears to be > > work underway to move stripe operations outside of the spinlock: > > > > http://lwn.net/Articles/184102/ > > > > What I'm curious about is this: for real-world situations, how > > much does this matter? In other words, how hard do you have to > > push md raid5 before doing dedicated hardware raid5 becomes a real > > win? > > hardware with battery backed write cache is going to beat the > software at small write traffic latency essentially all the time but > it's got nothing to do with the parity computation. I'm not convinced that's true. What my coworker is arguing is that md raid5 code spinlocks while it is performing this sequence of operations: 1. executing the write 2. reading the blocks necessary for recalculating the parity 3. recalculating the parity 4. updating the parity block My [admittedly cursory] read of the code, coupled with the link above, leads me to believe that my coworker is correct, which is why I was for trolling for [informed] opinions about how much of a performance hit the spinlock causes. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: raid5 software vs hardware: parity calculations? 2007-01-12 20:34 ` James Ralston @ 2007-01-13 9:20 ` Dan Williams 2007-01-13 17:32 ` Bill Davidsen 0 siblings, 1 reply; 18+ messages in thread From: Dan Williams @ 2007-01-13 9:20 UTC (permalink / raw) To: James Ralston; +Cc: linux-raid On 1/12/07, James Ralston <qralston+ml.linux-raid@andrew.cmu.edu> wrote: > On 2007-01-12 at 09:39-08 dean gaudet <dean@arctic.org> wrote: > > > On Thu, 11 Jan 2007, James Ralston wrote: > > > > > I'm having a discussion with a coworker concerning the cost of > > > md's raid5 implementation versus hardware raid5 implementations. > > > > > > Specifically, he states: > > > > > > > The performance [of raid5 in hardware] is so much better with > > > > the write-back caching on the card and the offload of the > > > > parity, it seems to me that the minor increase in work of having > > > > to upgrade the firmware if there's a buggy one is a highly > > > > acceptable trade-off to the increased performance. The md > > > > driver still commits you to longer run queues since IO calls to > > > > disk, parity calculator and the subsequent kflushd operations > > > > are non-interruptible in the CPU. A RAID card with write-back > > > > cache releases the IO operation virtually instantaneously. > > > > > > It would seem that his comments have merit, as there appears to be > > > work underway to move stripe operations outside of the spinlock: > > > > > > http://lwn.net/Articles/184102/ > > > > > > What I'm curious about is this: for real-world situations, how > > > much does this matter? In other words, how hard do you have to > > > push md raid5 before doing dedicated hardware raid5 becomes a real > > > win? > > > > hardware with battery backed write cache is going to beat the > > software at small write traffic latency essentially all the time but > > it's got nothing to do with the parity computation. > > I'm not convinced that's true. No, it's true. md implements a write-through cache to ensure that data reaches the disk. >What my coworker is arguing is that md > raid5 code spinlocks while it is performing this sequence of > operations: > > 1. executing the write not performed under the lock > 2. reading the blocks necessary for recalculating the parity not performed under the lock > 3. recalculating the parity > 4. updating the parity block > > My [admittedly cursory] read of the code, coupled with the link above, > leads me to believe that my coworker is correct, which is why I was > for trolling for [informed] opinions about how much of a performance > hit the spinlock causes. > The spinlock is not a source of performance loss, the reason for moving parity calculations outside the lock is to maximize the benefit of using asynchronous xor+copy engines. The hardware vs software raid trade-offs are well documented here: http://linux.yyz.us/why-software-raid.html Regards, Dan ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: raid5 software vs hardware: parity calculations? 2007-01-13 9:20 ` Dan Williams @ 2007-01-13 17:32 ` Bill Davidsen 2007-01-13 23:23 ` Robin Bowes 0 siblings, 1 reply; 18+ messages in thread From: Bill Davidsen @ 2007-01-13 17:32 UTC (permalink / raw) To: Dan Williams; +Cc: James Ralston, linux-raid Dan Williams wrote: > On 1/12/07, James Ralston <qralston+ml.linux-raid@andrew.cmu.edu> wrote: >> On 2007-01-12 at 09:39-08 dean gaudet <dean@arctic.org> wrote: >> >> > On Thu, 11 Jan 2007, James Ralston wrote: >> > >> > > I'm having a discussion with a coworker concerning the cost of >> > > md's raid5 implementation versus hardware raid5 implementations. >> > > >> > > Specifically, he states: >> > > >> > > > The performance [of raid5 in hardware] is so much better with >> > > > the write-back caching on the card and the offload of the >> > > > parity, it seems to me that the minor increase in work of having >> > > > to upgrade the firmware if there's a buggy one is a highly >> > > > acceptable trade-off to the increased performance. The md >> > > > driver still commits you to longer run queues since IO calls to >> > > > disk, parity calculator and the subsequent kflushd operations >> > > > are non-interruptible in the CPU. A RAID card with write-back >> > > > cache releases the IO operation virtually instantaneously. >> > > >> > > It would seem that his comments have merit, as there appears to be >> > > work underway to move stripe operations outside of the spinlock: >> > > >> > > http://lwn.net/Articles/184102/ >> > > >> > > What I'm curious about is this: for real-world situations, how >> > > much does this matter? In other words, how hard do you have to >> > > push md raid5 before doing dedicated hardware raid5 becomes a real >> > > win? >> > >> > hardware with battery backed write cache is going to beat the >> > software at small write traffic latency essentially all the time but >> > it's got nothing to do with the parity computation. >> >> I'm not convinced that's true. > No, it's true. md implements a write-through cache to ensure that > data reaches the disk. > >> What my coworker is arguing is that md >> raid5 code spinlocks while it is performing this sequence of >> operations: >> >> 1. executing the write > not performed under the lock >> 2. reading the blocks necessary for recalculating the parity > not performed under the lock >> 3. recalculating the parity >> 4. updating the parity block >> >> My [admittedly cursory] read of the code, coupled with the link above, >> leads me to believe that my coworker is correct, which is why I was >> for trolling for [informed] opinions about how much of a performance >> hit the spinlock causes. >> > The spinlock is not a source of performance loss, the reason for > moving parity calculations outside the lock is to maximize the benefit > of using asynchronous xor+copy engines. > > The hardware vs software raid trade-offs are well documented here: > http://linux.yyz.us/why-software-raid.html There have been several recent threads on the list regarding software RAID-5 performance. The reference might be updated to reflect the poor write performance of RAID-5 until/unless significant tuning is done. Read that as tuning obscure parameters and throwing a lot of memory into stripe cache. The reasons for hardware RAID should include "performance of RAID-5 writes is usually much better than software RAID-5 with default tuning. -- bill davidsen <davidsen@tmr.com> CTO TMR Associates, Inc Doing interesting things with small computers since 1979 ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: raid5 software vs hardware: parity calculations? 2007-01-13 17:32 ` Bill Davidsen @ 2007-01-13 23:23 ` Robin Bowes 2007-01-14 3:16 ` dean gaudet 2007-01-15 15:29 ` Bill Davidsen 0 siblings, 2 replies; 18+ messages in thread From: Robin Bowes @ 2007-01-13 23:23 UTC (permalink / raw) To: linux-raid Bill Davidsen wrote: > > There have been several recent threads on the list regarding software > RAID-5 performance. The reference might be updated to reflect the poor > write performance of RAID-5 until/unless significant tuning is done. > Read that as tuning obscure parameters and throwing a lot of memory into > stripe cache. The reasons for hardware RAID should include "performance > of RAID-5 writes is usually much better than software RAID-5 with > default tuning. Could you point me at a source of documentation describing how to perform such tuning? Specifically, I have 8x500GB WD STAT drives on a Supermicro PCI-X 8-port SATA card configured as a single RAID6 array (~3TB available space) Thanks, R. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: raid5 software vs hardware: parity calculations? 2007-01-13 23:23 ` Robin Bowes @ 2007-01-14 3:16 ` dean gaudet 2007-01-15 11:48 ` Michael Tokarev 2007-01-15 15:29 ` Bill Davidsen 1 sibling, 1 reply; 18+ messages in thread From: dean gaudet @ 2007-01-14 3:16 UTC (permalink / raw) To: Robin Bowes; +Cc: linux-raid On Sat, 13 Jan 2007, Robin Bowes wrote: > Bill Davidsen wrote: > > > > There have been several recent threads on the list regarding software > > RAID-5 performance. The reference might be updated to reflect the poor > > write performance of RAID-5 until/unless significant tuning is done. > > Read that as tuning obscure parameters and throwing a lot of memory into > > stripe cache. The reasons for hardware RAID should include "performance > > of RAID-5 writes is usually much better than software RAID-5 with > > default tuning. > > Could you point me at a source of documentation describing how to > perform such tuning? > > Specifically, I have 8x500GB WD STAT drives on a Supermicro PCI-X 8-port > SATA card configured as a single RAID6 array (~3TB available space) linux sw raid6 small write performance is bad because it reads the entire stripe, merges the small write, and writes back the changed disks. unlike raid5 where a small write can get away with a partial stripe read (i.e. the smallest raid5 write will read the target disk, read the parity, write the target, and write the updated parity)... afaik this optimization hasn't been implemented in raid6 yet. depending on your use model you might want to go with raid5+spare. benchmark if you're not sure. for raid5/6 i always recommend experimenting with moving your fs journal to a raid1 device instead (on separate spindles -- such as your root disks). if this is for a database or fs requiring lots of small writes then raid5/6 are generally a mistake... raid10 is the only way to get performance. (hw raid5/6 with nvram support can help a bit in this area, but you just can't beat raid10 if you need lots of writes/s.) beyond those config choices you'll want to become friendly with /sys/block and all the myriad of subdirectories and options under there. in particular: /sys/block/*/queue/scheduler /sys/block/*/queue/read_ahead_kb /sys/block/*/queue/nr_requests /sys/block/mdX/md/stripe_cache_size for * = any of the component disks or the mdX itself... some systems have an /etc/sysfs.conf you can place these settings in to have them take effect on reboot. (sysfsutils package on debuntu) -dean ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: raid5 software vs hardware: parity calculations? 2007-01-14 3:16 ` dean gaudet @ 2007-01-15 11:48 ` Michael Tokarev 0 siblings, 0 replies; 18+ messages in thread From: Michael Tokarev @ 2007-01-15 11:48 UTC (permalink / raw) To: dean gaudet; +Cc: Robin Bowes, linux-raid dean gaudet wrote: [] > if this is for a database or fs requiring lots of small writes then > raid5/6 are generally a mistake... raid10 is the only way to get > performance. (hw raid5/6 with nvram support can help a bit in this area, > but you just can't beat raid10 if you need lots of writes/s.) A small nitpick. At least some databases never do "small"-sized I/O, at least not against the datafiles. That is, for example, Oracle uses a fixed-size I/O block size, specified at database (or tablespace) creation time, -- by default it's 4Kb or 8Kb, but may be 16Kb or 32Kb as well. Now, if you'll make your raid array stripe size to match the blocksize of a database, *and* ensure the files are aligned on disk properly, it will just work without needless reads to calculate parity blocks during writes. But the problem with that is it's near impossible to do. First, even if the db writes in 32Kb blocks, it means the stripe size should be 32Kb, which is only suitable for raid5 with 3 disks, having chunk size of 16Kb, or with 5 disks, chunk size 8Kb (this last variant is quite bad, because chunk size of 8Kb is too small). In other words, only very limited set of configurations will be more-or-less good. And second, most filesystems used for databases don't care about "correct" file placement. For example, ext[23]fs with maximum blocksize of 4Kb will align files by 4Kb, not by stripe size - which means that a whole 32Kb block will be laid like - first 4Kb on first stripe, rest 24Kb on the next stripe, which means that for both parts full read-write cycle will be needed again to update parity blocks - the thing we tried to avoid by choosing the sizes in a previous step. Only xfs so far (from the list of filesystems I've checked) pays attention to stripe size and tries to ensure files are aligned to stripe size. (Yes I know mke2fs's stride=xxx parameter, but it only affects metadata, not data). That's why all the above is a "small nitpick" - i.e., in theory, it IS possible to use raid5 for database workload in certain cases, but due to all the gory details, it's nearly impossible to do right. /mjt ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: raid5 software vs hardware: parity calculations? 2007-01-13 23:23 ` Robin Bowes 2007-01-14 3:16 ` dean gaudet @ 2007-01-15 15:29 ` Bill Davidsen 2007-01-15 16:22 ` Robin Bowes 1 sibling, 1 reply; 18+ messages in thread From: Bill Davidsen @ 2007-01-15 15:29 UTC (permalink / raw) To: Robin Bowes; +Cc: linux-raid Robin Bowes wrote: > Bill Davidsen wrote: > >> There have been several recent threads on the list regarding software >> RAID-5 performance. The reference might be updated to reflect the poor >> write performance of RAID-5 until/unless significant tuning is done. >> Read that as tuning obscure parameters and throwing a lot of memory into >> stripe cache. The reasons for hardware RAID should include "performance >> of RAID-5 writes is usually much better than software RAID-5 with >> default tuning. >> > > Could you point me at a source of documentation describing how to > perform such tuning? > No. There has been a lot of discussion of this topic on this list, and a trip through the archives of the last 60 days or so will let you pull out a number of tuning tips which allow very good performance. My concern was writing large blocks of data, 1MB per write, to RAID-5, and didn't involve the overhead of small blocks at all, that leads through other code and behavior. I suppose while it's fresh in my mind I should write a script to rerun the whole write test suite and generate some graphs, lists of parameters, etc. If you are writing a LOT of data, you may find that tuning the dirty_* parameters will result in better system response, perhaps at the cost of some small total write throughput, although I didn't notice anything significant when I tried them. > Specifically, I have 8x500GB WD STAT drives on a Supermicro PCI-X 8-port > SATA card configured as a single RAID6 array (~3TB available space) > No hot spare(s)? -- bill davidsen <davidsen@tmr.com> CTO TMR Associates, Inc Doing interesting things with small computers since 1979 ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: raid5 software vs hardware: parity calculations? 2007-01-15 15:29 ` Bill Davidsen @ 2007-01-15 16:22 ` Robin Bowes 2007-01-15 17:37 ` Bill Davidsen 2007-01-15 21:25 ` dean gaudet 0 siblings, 2 replies; 18+ messages in thread From: Robin Bowes @ 2007-01-15 16:22 UTC (permalink / raw) To: Bill Davidsen; +Cc: linux-raid Bill Davidsen wrote: > Robin Bowes wrote: >> Bill Davidsen wrote: >> >>> There have been several recent threads on the list regarding software >>> RAID-5 performance. The reference might be updated to reflect the poor >>> write performance of RAID-5 until/unless significant tuning is done. >>> Read that as tuning obscure parameters and throwing a lot of memory into >>> stripe cache. The reasons for hardware RAID should include "performance >>> of RAID-5 writes is usually much better than software RAID-5 with >>> default tuning. >>> >> >> Could you point me at a source of documentation describing how to >> perform such tuning? >> > No. There has been a lot of discussion of this topic on this list, and a > trip through the archives of the last 60 days or so will let you pull > out a number of tuning tips which allow very good performance. My > concern was writing large blocks of data, 1MB per write, to RAID-5, and > didn't involve the overhead of small blocks at all, that leads through > other code and behavior. Actually Bill, I'm running RAID6 (my mistake for not mentioning it explicitly before) - I found some material relating to RAID5 but nothing on RAID6. Are the concepts similar, or is RAID6 a different beast altogether? >> Specifically, I have 8x500GB WD STAT drives on a Supermicro PCI-X 8-port >> SATA card configured as a single RAID6 array (~3TB available space) >> > No hot spare(s)? I'm running RAID6 instead of RAID5+1 - I've had a couple of instances where a drive has failed in a RAID5+1 array and a second has failed during the rebuild after the hot-spare had kicked in. R. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: raid5 software vs hardware: parity calculations? 2007-01-15 16:22 ` Robin Bowes @ 2007-01-15 17:37 ` Bill Davidsen 2007-01-15 21:25 ` dean gaudet 1 sibling, 0 replies; 18+ messages in thread From: Bill Davidsen @ 2007-01-15 17:37 UTC (permalink / raw) To: Robin Bowes; +Cc: linux-raid Robin Bowes wrote: > Bill Davidsen wrote: > >> Robin Bowes wrote: >> >>> Bill Davidsen wrote: >>> >>> >>>> There have been several recent threads on the list regarding software >>>> RAID-5 performance. The reference might be updated to reflect the poor >>>> write performance of RAID-5 until/unless significant tuning is done. >>>> Read that as tuning obscure parameters and throwing a lot of memory into >>>> stripe cache. The reasons for hardware RAID should include "performance >>>> of RAID-5 writes is usually much better than software RAID-5 with >>>> default tuning. >>>> >>>> >>> Could you point me at a source of documentation describing how to >>> perform such tuning? >>> >>> >> No. There has been a lot of discussion of this topic on this list, and a >> trip through the archives of the last 60 days or so will let you pull >> out a number of tuning tips which allow very good performance. My >> concern was writing large blocks of data, 1MB per write, to RAID-5, and >> didn't involve the overhead of small blocks at all, that leads through >> other code and behavior. >> > > Actually Bill, I'm running RAID6 (my mistake for not mentioning it > explicitly before) - I found some material relating to RAID5 but nothing > on RAID6. > > Are the concepts similar, or is RAID6 a different beast altogether? > You mentioned that before, and I think the concepts covered in the RAID-5 discussion apply to RAID-6 as well. I don't have enough unused drives to really test anything beyond RAID-5, so I have no particular tuning information to share. Testing on system drives introduces too much jitter to trust the results. > >>> Specifically, I have 8x500GB WD STAT drives on a Supermicro PCI-X 8-port >>> SATA card configured as a single RAID6 array (~3TB available space) >>> >>> >> No hot spare(s)? >> > > I'm running RAID6 instead of RAID5+1 - I've had a couple of instances > where a drive has failed in a RAID5+1 array and a second has failed > during the rebuild after the hot-spare had kicked in. -- bill davidsen <davidsen@tmr.com> CTO TMR Associates, Inc Doing interesting things with small computers since 1979 ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: raid5 software vs hardware: parity calculations? 2007-01-15 16:22 ` Robin Bowes 2007-01-15 17:37 ` Bill Davidsen @ 2007-01-15 21:25 ` dean gaudet 2007-01-15 21:32 ` Gordon Henderson 2007-01-16 0:35 ` berk walker 1 sibling, 2 replies; 18+ messages in thread From: dean gaudet @ 2007-01-15 21:25 UTC (permalink / raw) To: Robin Bowes; +Cc: Bill Davidsen, linux-raid On Mon, 15 Jan 2007, Robin Bowes wrote: > I'm running RAID6 instead of RAID5+1 - I've had a couple of instances > where a drive has failed in a RAID5+1 array and a second has failed > during the rebuild after the hot-spare had kicked in. if the failures were read errors without losing the entire disk (the typical case) then new kernels are much better -- on read error md will reconstruct the sectors from the other disks and attempt to write it back. you can also run monthly "checks"... echo check >/sys/block/mdX/md/sync_action it'll read the entire array (parity included) and correct read errors as they're discovered. -dean ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: raid5 software vs hardware: parity calculations? 2007-01-15 21:25 ` dean gaudet @ 2007-01-15 21:32 ` Gordon Henderson 2007-01-16 0:35 ` berk walker 1 sibling, 0 replies; 18+ messages in thread From: Gordon Henderson @ 2007-01-15 21:32 UTC (permalink / raw) To: linux-raid On Mon, 15 Jan 2007, dean gaudet wrote: > you can also run monthly "checks"... > > echo check >/sys/block/mdX/md/sync_action > > it'll read the entire array (parity included) and correct read errors as > they're discovered. A-Ha ... I've not been keeping up with the list for a bit - what's the minimum kernel version for this to work? Cheers, Gordon ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: raid5 software vs hardware: parity calculations? 2007-01-15 21:25 ` dean gaudet 2007-01-15 21:32 ` Gordon Henderson @ 2007-01-16 0:35 ` berk walker 2007-01-16 0:48 ` dean gaudet 2007-01-16 5:06 ` Bill Davidsen 1 sibling, 2 replies; 18+ messages in thread From: berk walker @ 2007-01-16 0:35 UTC (permalink / raw) To: dean gaudet; +Cc: Robin Bowes, Bill Davidsen, linux-raid dean gaudet wrote: > On Mon, 15 Jan 2007, Robin Bowes wrote: > > >> I'm running RAID6 instead of RAID5+1 - I've had a couple of instances >> where a drive has failed in a RAID5+1 array and a second has failed >> during the rebuild after the hot-spare had kicked in. >> > > if the failures were read errors without losing the entire disk (the > typical case) then new kernels are much better -- on read error md will > reconstruct the sectors from the other disks and attempt to write it back. > > you can also run monthly "checks"... > > echo check >/sys/block/mdX/md/sync_action > > it'll read the entire array (parity included) and correct read errors as > they're discovered. > > -dean > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > Could I get a pointer as to how I can do this "check" in my FC5 [BLAG] system? I can find no appropriate "check", nor "md" available to me. It would be a "good thing" if I were able to find potentially weak spots, rewrite them to good, and know that it might be time for a new drive. All of my arrays have drives of approx the same mfg date, so the possibility of more than one showing bad at the same time can not be ignored. thanks b- ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: raid5 software vs hardware: parity calculations? 2007-01-16 0:35 ` berk walker @ 2007-01-16 0:48 ` dean gaudet 2007-01-16 3:41 ` Mr. James W. Laferriere 2007-01-16 5:06 ` Bill Davidsen 1 sibling, 1 reply; 18+ messages in thread From: dean gaudet @ 2007-01-16 0:48 UTC (permalink / raw) To: berk walker; +Cc: Robin Bowes, Bill Davidsen, linux-raid On Mon, 15 Jan 2007, berk walker wrote: > dean gaudet wrote: > > echo check >/sys/block/mdX/md/sync_action > > > > it'll read the entire array (parity included) and correct read errors as > > they're discovered. > > Could I get a pointer as to how I can do this "check" in my FC5 [BLAG] system? > I can find no appropriate "check", nor "md" available to me. It would be a > "good thing" if I were able to find potentially weak spots, rewrite them to > good, and know that it might be time for a new drive. > > All of my arrays have drives of approx the same mfg date, so the possibility > of more than one showing bad at the same time can not be ignored. it should just be: echo check >/sys/block/mdX/md/sync_action if you don't have a /sys/block/mdX/md/sync_action file then your kernel is too old... or you don't have /sys mounted... (or you didn't replace X with the raid number :) iirc there were kernel versions which had the sync_action file but didn't yet support the "check" action (i think possibly even as recent as 2.6.17 had a small bug initiating one of the sync_actions but i forget which one). if you can upgrade to 2.6.18.x it should work. debian unstable (and i presume etch) will do this for all your arrays automatically once a month. -dean ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: raid5 software vs hardware: parity calculations? 2007-01-16 0:48 ` dean gaudet @ 2007-01-16 3:41 ` Mr. James W. Laferriere 2007-01-16 4:16 ` dean gaudet 0 siblings, 1 reply; 18+ messages in thread From: Mr. James W. Laferriere @ 2007-01-16 3:41 UTC (permalink / raw) To: dean gaudet; +Cc: linux-raid maillist Hello Dean , On Mon, 15 Jan 2007, dean gaudet wrote: ...snip... > it should just be: > > echo check >/sys/block/mdX/md/sync_action > > if you don't have a /sys/block/mdX/md/sync_action file then your kernel is > too old... or you don't have /sys mounted... (or you didn't replace X with > the raid number :) > > iirc there were kernel versions which had the sync_action file but didn't > yet support the "check" action (i think possibly even as recent as 2.6.17 > had a small bug initiating one of the sync_actions but i forget which > one). if you can upgrade to 2.6.18.x it should work. > > debian unstable (and i presume etch) will do this for all your arrays > automatically once a month. > > -dean Being able to run a 'check' is a good thing (tm) . But without a method to acquire statii & data back from the check , Seems rather bland . Is there a tool/file to poll/... where data & statii can be acquired ? Tia , JimL -- +-----------------------------------------------------------------+ | James W. Laferriere | System Techniques | Give me VMS | | Network Engineer | 663 Beaumont Blvd | Give me Linux | | babydr@baby-dragons.com | Pacifica, CA. 94044 | only on AXP | +-----------------------------------------------------------------+ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: raid5 software vs hardware: parity calculations? 2007-01-16 3:41 ` Mr. James W. Laferriere @ 2007-01-16 4:16 ` dean gaudet 0 siblings, 0 replies; 18+ messages in thread From: dean gaudet @ 2007-01-16 4:16 UTC (permalink / raw) To: Mr. James W. Laferriere; +Cc: linux-raid maillist On Mon, 15 Jan 2007, Mr. James W. Laferriere wrote: > Hello Dean , > > On Mon, 15 Jan 2007, dean gaudet wrote: > ...snip... > > it should just be: > > > > echo check >/sys/block/mdX/md/sync_action > > > > if you don't have a /sys/block/mdX/md/sync_action file then your kernel is > > too old... or you don't have /sys mounted... (or you didn't replace X with > > the raid number :) > > > > iirc there were kernel versions which had the sync_action file but didn't > > yet support the "check" action (i think possibly even as recent as 2.6.17 > > had a small bug initiating one of the sync_actions but i forget which > > one). if you can upgrade to 2.6.18.x it should work. > > > > debian unstable (and i presume etch) will do this for all your arrays > > automatically once a month. > > > > -dean > > Being able to run a 'check' is a good thing (tm) . But without a > method to acquire statii & data back from the check , Seems rather bland . > Is there a tool/file to poll/... where data & statii can be acquired ? i'm not 100% certain what you mean, but i generally just monitor dmesg for the md read error message (mind you the message pre-2.6.19 or .20 isn't very informative but it's obvious enough). there is also a file mismatch_cnt in the same directory as sync_action ... the Documentation/md.txt (in 2.6.18) refers to it incorrectly as mismatch_count... but anyhow why don't i just repaste the relevant portion of md.txt. -dean ... Active md devices for levels that support data redundancy (1,4,5,6) also have sync_action a text file that can be used to monitor and control the rebuild process. It contains one word which can be one of: resync - redundancy is being recalculated after unclean shutdown or creation recover - a hot spare is being built to replace a failed/missing device idle - nothing is happening check - A full check of redundancy was requested and is happening. This reads all block and checks them. A repair may also happen for some raid levels. repair - A full check and repair is happening. This is similar to 'resync', but was requested by the user, and the write-intent bitmap is NOT used to optimise the process. This file is writable, and each of the strings that could be read are meaningful for writing. 'idle' will stop an active resync/recovery etc. There is no guarantee that another resync/recovery may not be automatically started again, though some event will be needed to trigger this. 'resync' or 'recovery' can be used to restart the corresponding operation if it was stopped with 'idle'. 'check' and 'repair' will start the appropriate process providing the current state is 'idle'. mismatch_count When performing 'check' and 'repair', and possibly when performing 'resync', md will count the number of errors that are found. The count in 'mismatch_cnt' is the number of sectors that were re-written, or (for 'check') would have been re-written. As most raid levels work in units of pages rather than sectors, this my be larger than the number of actual errors by a factor of the number of sectors in a page. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: raid5 software vs hardware: parity calculations? 2007-01-16 0:35 ` berk walker 2007-01-16 0:48 ` dean gaudet @ 2007-01-16 5:06 ` Bill Davidsen 1 sibling, 0 replies; 18+ messages in thread From: Bill Davidsen @ 2007-01-16 5:06 UTC (permalink / raw) To: berk walker; +Cc: dean gaudet, Robin Bowes, linux-raid berk walker wrote: > > dean gaudet wrote: >> On Mon, 15 Jan 2007, Robin Bowes wrote: >> >> >>> I'm running RAID6 instead of RAID5+1 - I've had a couple of instances >>> where a drive has failed in a RAID5+1 array and a second has failed >>> during the rebuild after the hot-spare had kicked in. >>> >> >> if the failures were read errors without losing the entire disk (the >> typical case) then new kernels are much better -- on read error md >> will reconstruct the sectors from the other disks and attempt to >> write it back. >> >> you can also run monthly "checks"... >> >> echo check >/sys/block/mdX/md/sync_action >> >> it'll read the entire array (parity included) and correct read errors >> as they're discovered. >> >> -dean >> - >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> > > Could I get a pointer as to how I can do this "check" in my FC5 [BLAG] > system? I can find no appropriate "check", nor "md" available to me. > It would be a "good thing" if I were able to find potentially weak > spots, rewrite them to good, and know that it might be time for a new > drive. Grab a recent mdadm source, it's a part of that. > > All of my arrays have drives of approx the same mfg date, so the > possibility of more than one showing bad at the same time can not be > ignored. Never can, but it is highly unlikely, given the MTBF of modern drives. And when you consider total failures as opposed to bad sectors it gets even smaller. There is no perfect way to avoid ever losing data, just ways to reduce the chance to balance the cost of data loss vs. hardware. Current Linux will rewrite bad sectors, whole drive failures are an argument for spares. -- bill davidsen <davidsen@tmr.com> CTO TMR Associates, Inc Doing interesting things with small computers since 1979 ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2007-01-16 5:06 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-01-11 22:44 raid5 software vs hardware: parity calculations? James Ralston 2007-01-12 17:39 ` dean gaudet 2007-01-12 20:34 ` James Ralston 2007-01-13 9:20 ` Dan Williams 2007-01-13 17:32 ` Bill Davidsen 2007-01-13 23:23 ` Robin Bowes 2007-01-14 3:16 ` dean gaudet 2007-01-15 11:48 ` Michael Tokarev 2007-01-15 15:29 ` Bill Davidsen 2007-01-15 16:22 ` Robin Bowes 2007-01-15 17:37 ` Bill Davidsen 2007-01-15 21:25 ` dean gaudet 2007-01-15 21:32 ` Gordon Henderson 2007-01-16 0:35 ` berk walker 2007-01-16 0:48 ` dean gaudet 2007-01-16 3:41 ` Mr. James W. Laferriere 2007-01-16 4:16 ` dean gaudet 2007-01-16 5:06 ` Bill Davidsen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).