* Using Video cards (CUDA) for RAID parity
@ 2013-12-12 10:27 Pieter De Wit
2013-12-12 11:44 ` Benjamin ESTRABAUD
` (3 more replies)
0 siblings, 4 replies; 9+ messages in thread
From: Pieter De Wit @ 2013-12-12 10:27 UTC (permalink / raw)
To: linux-raid
Hi List,
Given the recent work done with techs like CUDA etc. - has the idea been
floated to use the video card for RAID parity calculations vs the CPU ?
Bitcoin and plenty others have shown the true speed of these cards. This
might be a cheaper version of a RAID card.
Cheers,
Pieter
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: Using Video cards (CUDA) for RAID parity 2013-12-12 10:27 Using Video cards (CUDA) for RAID parity Pieter De Wit @ 2013-12-12 11:44 ` Benjamin ESTRABAUD 2013-12-16 16:07 ` Wolfgang Denk 2013-12-12 11:52 ` David Brown ` (2 subsequent siblings) 3 siblings, 1 reply; 9+ messages in thread From: Benjamin ESTRABAUD @ 2013-12-12 11:44 UTC (permalink / raw) To: Pieter De Wit; +Cc: linux-raid On several more RAID specific SoC (like the PowerPC 440x/460x) a XOR engine is included to speed up RAID calculations (effectively a hardware RAID engine). I think Intel Atom CPUs were planned/got a similar engine added, but with the advance in CPU efficiencies and frequencies, these XOR engines are practically obsolete: You will barely use 5% of your CPU (if a high end CPU) for the parity calculation on RAID5. These days it seems that the overhead from the parity calculations are so small that they become insignificant. On the other hand this is just based from my limited experience, maybe there are some cases where we would benefit from that. Regards, Ben. On 12/12/13 10:27, Pieter De Wit wrote: > Hi List, > > Given the recent work done with techs like CUDA etc. - has the idea been > floated to use the video card for RAID parity calculations vs the CPU ? > Bitcoin and plenty others have shown the true speed of these cards. This > might be a cheaper version of a RAID card. > > Cheers, > > Pieter > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Using Video cards (CUDA) for RAID parity 2013-12-12 11:44 ` Benjamin ESTRABAUD @ 2013-12-16 16:07 ` Wolfgang Denk 0 siblings, 0 replies; 9+ messages in thread From: Wolfgang Denk @ 2013-12-16 16:07 UTC (permalink / raw) To: Benjamin ESTRABAUD; +Cc: Pieter De Wit, linux-raid Dear Benjamin, In message <52A9A1B3.5080706@mpstor.com> you wrote: > On several more RAID specific SoC (like the PowerPC 440x/460x) a XOR > engine is included to speed up RAID calculations (effectively a hardware > RAID engine). > > I think Intel Atom CPUs were planned/got a similar engine added, but > with the advance in CPU efficiencies and frequencies, these XOR engines > are practically obsolete: You will barely use 5% of your CPU (if a high > end CPU) for the parity calculation on RAID5. > > These days it seems that the overhead from the parity calculations are > so small that they become insignificant. I fully agree here. Even on a not so fresh desktop CPU (Intel Core2 Quad CPU Q9550 at 2.83GHz) I see this in the kernel logs: ... [ 15.944466] xor: measuring software checksum speed [ 15.983008] prefetch64-sse: 11592.000 MB/sec [ 16.020008] generic_sse: 10232.000 MB/sec [ 16.045623] xor: using function: prefetch64-sse (11592.000 MB/sec) ... So for each percent of CPU bandwith I offer I get 115 MB/s bandwith for parity calculations. None of the RAID arrays I have running would ever need even close to 10 % of my CPU (in theory, at least). Best regards, Wolfgang Denk -- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de I can't say I've ever been lost, but I was bewildered once for three days. - Daniel Boone (Attributed) ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Using Video cards (CUDA) for RAID parity 2013-12-12 10:27 Using Video cards (CUDA) for RAID parity Pieter De Wit 2013-12-12 11:44 ` Benjamin ESTRABAUD @ 2013-12-12 11:52 ` David Brown 2013-12-12 16:57 ` Pieter De Wit 2013-12-12 17:30 ` Chris Green 2013-12-12 17:51 ` joystick 3 siblings, 1 reply; 9+ messages in thread From: David Brown @ 2013-12-12 11:52 UTC (permalink / raw) To: Pieter De Wit, linux-raid On 12/12/13 11:27, Pieter De Wit wrote: > Hi List, > > Given the recent work done with techs like CUDA etc. - has the idea been > floated to use the video card for RAID parity calculations vs the CPU ? > Bitcoin and plenty others have shown the true speed of these cards. This > might be a cheaper version of a RAID card. > > Cheers, > > Pieter I am almost certain that you /could/ use a graphics card to do parity calculations faster than a cpu core. However, even the newly proposed multi-parity calculations are not a big challenge for a modern cpu. A bigger issue is getting optimal threading so that multiple cores (or at least threads) can be used at the same time, and this work is well under way already. Once that work is completed, my guess is that I/O, cache or memory bandwidth will be the bottleneck for big raid arrays rather than cpu power - and using graphics cards will not help there. David ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Using Video cards (CUDA) for RAID parity 2013-12-12 11:52 ` David Brown @ 2013-12-12 16:57 ` Pieter De Wit 2013-12-12 17:13 ` Benjamin ESTRABAUD 0 siblings, 1 reply; 9+ messages in thread From: Pieter De Wit @ 2013-12-12 16:57 UTC (permalink / raw) To: linux-raid On 13/12/2013 00:52, David Brown wrote: > On 12/12/13 11:27, Pieter De Wit wrote: >> Hi List, >> >> Given the recent work done with techs like CUDA etc. - has the idea been >> floated to use the video card for RAID parity calculations vs the CPU ? >> Bitcoin and plenty others have shown the true speed of these cards. This >> might be a cheaper version of a RAID card. >> >> Cheers, >> >> Pieter > I am almost certain that you /could/ use a graphics card to do parity > calculations faster than a cpu core. However, even the newly proposed > multi-parity calculations are not a big challenge for a modern cpu. A > bigger issue is getting optimal threading so that multiple cores (or at > least threads) can be used at the same time, and this work is well under > way already. Once that work is completed, my guess is that I/O, cache > or memory bandwidth will be the bottleneck for big raid arrays rather > than cpu power - and using graphics cards will not help there. > > David > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html Ah - I see - I also thought it was multi-threaded, but, tbh, I never looked that hard into it. My question comes from the fact that I now have access to 32x750gig (and more if needed) drives on a fiber array. The down side is that I only have *old* CPUs driving the array. RAID5's sync speed (15 disks) is 8meg/second. Change the array to RAID10 and the sync speed is above 100meg/second. I, naively perhaps, assumed the bottleneck to be the Intel CPU's which sparked this idea. What about block level hashing ? (Unless this is already done and I just never knew it :) ) Cheers, Pieter ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Using Video cards (CUDA) for RAID parity 2013-12-12 16:57 ` Pieter De Wit @ 2013-12-12 17:13 ` Benjamin ESTRABAUD [not found] ` <ff87dc745635b18a71b98ce36356eea7@insync.za.net> 0 siblings, 1 reply; 9+ messages in thread From: Benjamin ESTRABAUD @ 2013-12-12 17:13 UTC (permalink / raw) To: Pieter De Wit; +Cc: linux-raid On 12/12/13 16:57, Pieter De Wit wrote: > On 13/12/2013 00:52, David Brown wrote: >> On 12/12/13 11:27, Pieter De Wit wrote: >>> Hi List, >>> >>> Given the recent work done with techs like CUDA etc. - has the idea been >>> floated to use the video card for RAID parity calculations vs the CPU ? >>> Bitcoin and plenty others have shown the true speed of these cards. This >>> might be a cheaper version of a RAID card. >>> >>> Cheers, >>> >>> Pieter >> I am almost certain that you /could/ use a graphics card to do parity >> calculations faster than a cpu core. However, even the newly proposed >> multi-parity calculations are not a big challenge for a modern cpu. A >> bigger issue is getting optimal threading so that multiple cores (or at >> least threads) can be used at the same time, and this work is well under >> way already. Once that work is completed, my guess is that I/O, cache >> or memory bandwidth will be the bottleneck for big raid arrays rather >> than cpu power - and using graphics cards will not help there. >> >> David >> >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > Ah - I see - I also thought it was multi-threaded, but, tbh, I never > looked that hard into it. My question comes from the fact that I now > have access to 32x750gig (and more if needed) drives on a fiber array. > The down side is that I only have *old* CPUs driving the array. RAID5's > sync speed (15 disks) is 8meg/second. Change the array to RAID10 and the > sync speed is above 100meg/second. When resyncing a RAID5, is the CPU running 100%? Even old CPUs should be able to resync faster than that. What CPU is that if I may ask? > > I, naively perhaps, assumed the bottleneck to be the Intel CPU's which > sparked this idea. > > What about block level hashing ? (Unless this is already done and I just > never knew it :) ) Like keeping a checksum of each RAID chunk or stripe for data consistency checks? RAID doesn't deal with that, it deals with reconstructing data in the event of a drive failure but doesn't guarantee data consistency. Therefore, if some bits on a drive were to be "flipped", the next RAID "scrub" (repair) would detect a mismatch between the data chunks and the parity and would simply update the parity from the available data (not "repairing" the corrupted data), as far as I know that's because there's no way for the array to know whether the data or the parity was written wrong or corrupted. Regards, Ben. > > Cheers, > > Pieter > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <ff87dc745635b18a71b98ce36356eea7@insync.za.net>]
* Re: Using Video cards (CUDA) for RAID parity [not found] ` <ff87dc745635b18a71b98ce36356eea7@insync.za.net> @ 2013-12-12 18:57 ` Benjamin ESTRABAUD 0 siblings, 0 replies; 9+ messages in thread From: Benjamin ESTRABAUD @ 2013-12-12 18:57 UTC (permalink / raw) To: Pieter De Wit; +Cc: linux-raid On 12/12/13 18:11, Pieter De Wit wrote: >> Hi, >> >> <snip> >> >> When resyncing a RAID5, is the CPU running 100%? Even old CPUs should be >> able to resync faster than that. What CPU is that if I may ask? >> >> I just confirmed they are *not* at 100% so the bottleneck is somewhere else. Here is the last cpu in cpuinfo: >> >> processor : 3 >> vendor_id : GenuineIntel >> cpu family : 15 >> model : 2 >> model name : Intel(R) Xeon(TM) CPU 2.80GHz >> stepping : 7 >> microcode : 0x38 >> cpu MHz : 2791.161 >> cache size : 512 KB >> physical id : 3 >> siblings : 1 >> core id : 0 >> cpu cores : 0 >> apicid : 7 >> initial apicid : 7 >> fdiv_bug : no >> hlt_bug : no >> f00f_bug : no >> coma_bug : no >> fpu : yes >> fpu_exception : yes >> cpuid level : 2 >> wp : yes >> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov >> pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe pebs bts cid >> xtpr >> bogomips : 5582.28 >> clflush size : 64 >> cache_alignment : 128 >> address sizes : 36 bits physical, 32 bits virtual >> power management: >> >> This is rebuilding of RAID6: >> >> [>....................] resync = 0.1% (1215876/732443136) >> finish=4444.2min speed=2741K/sec >> >> # tail sync_speed sync_speed_max sync_speed_min >> >> ==> sync_speed <== >> 2788 >> >> ==> sync_speed_max <== >> 200000 (system) >> >> ==> sync_speed_min <== >> 200000 (local) >> >> # tail sync_speed sync_speed_max sync_speed_min >> ==> sync_speed <== >> 2788 >> >> ==> sync_speed_max <== >> 200000 (system) >> >> ==> sync_speed_min <== >> 200000 (local) >> >> # dmesg | grep "raid6: using algorithm" >> [ 2.244060] raid6: using algorithm mmxx2 (1916 MB/s) >> We have similar CPUs here and we get good resync speed. How many disks are in your array, and what's their size? Maybe one of your disk is not performing well, you could try to profile their individual speed. >>> I, naively perhaps, assumed the bottleneck to be the Intel CPU's >>> which sparked this idea. What about block level hashing ? (Unless >>> this is already done and I just never knew it :) ) >> Like keeping a checksum of each RAID chunk or stripe for data >> consistency checks? RAID doesn't deal with that, it deals with >> reconstructing data in the event of a drive failure but doesn't >> guarantee data consistency. Therefore, if some bits on a drive were to >> be "flipped", the next RAID "scrub" (repair) would detect a mismatch >> between the data chunks and the parity and would simply update the >> parity from the available data (not "repairing" the corrupted data), as >> far as I know that's because there's no way for the array to know >> whether the data or the parity was written wrong or corrupted. >> >> Regards, >> Ben. >> >> Is there parity in RAID10/RAID1 ? It seems my idea won't work either way as there isn't the bus bandwidth to support it, this is now more an educational thing :) >> RAID1 is a mirror (a full copy), while RAID10 is different than RAID1 it's still not using parity but chunk copies. >> Cheers, >> >> Pieter ^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: Using Video cards (CUDA) for RAID parity 2013-12-12 10:27 Using Video cards (CUDA) for RAID parity Pieter De Wit 2013-12-12 11:44 ` Benjamin ESTRABAUD 2013-12-12 11:52 ` David Brown @ 2013-12-12 17:30 ` Chris Green 2013-12-12 17:51 ` joystick 3 siblings, 0 replies; 9+ messages in thread From: Chris Green @ 2013-12-12 17:30 UTC (permalink / raw) To: 'Pieter De Wit', linux-raid@vger.kernel.org At least for the raid5 case that would seem to be a loss performance-wise: - AFAIK, even a single good x86 core can XOR memory at faster than the DRAM memory bandwidth of the system. - High end GPUs have more memory bandwidth than the x86 and could perform large XOR operations between buffers at a faster rate. But the memory bandwidth over the bus to the gfx card is lower than the main memory DRAM bandwidth, so you would have a net loss from having to move the data between the GPU and CPU. -----Original Message----- From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Pieter De Wit Sent: Thursday, December 12, 2013 2:28 AM To: linux-raid@vger.kernel.org Subject: Using Video cards (CUDA) for RAID parity Hi List, Given the recent work done with techs like CUDA etc. - has the idea been floated to use the video card for RAID parity calculations vs the CPU ? Bitcoin and plenty others have shown the true speed of these cards. This might be a cheaper version of a RAID card. Cheers, Pieter -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Using Video cards (CUDA) for RAID parity 2013-12-12 10:27 Using Video cards (CUDA) for RAID parity Pieter De Wit ` (2 preceding siblings ...) 2013-12-12 17:30 ` Chris Green @ 2013-12-12 17:51 ` joystick 3 siblings, 0 replies; 9+ messages in thread From: joystick @ 2013-12-12 17:51 UTC (permalink / raw) To: Pieter De Wit; +Cc: linux-raid On 12/12/2013 11:27, Pieter De Wit wrote: > Hi List, > > Given the recent work done with techs like CUDA etc. - has the idea > been floated to use the video card for RAID parity calculations vs the > CPU ? Sending the XOR computation to the GPU is like shooting a fly with a cannon. The bandwidth to the GPU would be the bottleneck by 2 orders of magnitude if you try to do this. XOR is a way too simple operation. Even if it was a stream of double * double multiplications, the bottleneck would lie in the bandwidth to/from the GPU. You can gain something only if you do a matrix multiplication where each float or double is uploaded only once but reused many times in all the row x column multiplications. The best performers on the GPU are the autoctonous applications, which operate autonomously and communicate very little with the CPU for a very long time. The XOR computation is WAY fast enough on modern processors. There is a benchmark at boot about this: dmesg | grep "raid6: using algorithm" returns: [ 5.072162] raid6: using algorithm sse2x4 (7556 MB/s) 7.5 GB/sec, and that's raid6, not even XOR. Probably even single-threaded. (probably this does not include the memory-copy overhead) ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2013-12-16 16:07 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-12 10:27 Using Video cards (CUDA) for RAID parity Pieter De Wit
2013-12-12 11:44 ` Benjamin ESTRABAUD
2013-12-16 16:07 ` Wolfgang Denk
2013-12-12 11:52 ` David Brown
2013-12-12 16:57 ` Pieter De Wit
2013-12-12 17:13 ` Benjamin ESTRABAUD
[not found] ` <ff87dc745635b18a71b98ce36356eea7@insync.za.net>
2013-12-12 18:57 ` Benjamin ESTRABAUD
2013-12-12 17:30 ` Chris Green
2013-12-12 17:51 ` joystick
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).