* RAID5 @ 2010-04-19 3:46 Kaushal Shriyan 2010-04-19 4:21 ` RAID5 Michael Evans 0 siblings, 1 reply; 12+ messages in thread From: Kaushal Shriyan @ 2010-04-19 3:46 UTC (permalink / raw) To: linux-raid Hi I am a newbie to RAID. is strip size and block size same. How is it calculated. is it 64Kb by default. what should be the strip size ? I have referred to http://en.wikipedia.org/wiki/Raid5#RAID_5_parity_handling. How is parity handled in case of RAID 5. Please explain me with an example. Thanks and Regards, Kaushal ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: RAID5 2010-04-19 3:46 RAID5 Kaushal Shriyan @ 2010-04-19 4:21 ` Michael Evans 2010-04-21 13:32 ` RAID5 Bill Davidsen 0 siblings, 1 reply; 12+ messages in thread From: Michael Evans @ 2010-04-19 4:21 UTC (permalink / raw) To: Kaushal Shriyan; +Cc: linux-raid On Sun, Apr 18, 2010 at 8:46 PM, Kaushal Shriyan <kaushalshriyan@gmail.com> wrote: > Hi > > I am a newbie to RAID. is strip size and block size same. How is it > calculated. is it 64Kb by default. what should be the strip size ? > > I have referred to > http://en.wikipedia.org/wiki/Raid5#RAID_5_parity_handling. How is > parity handled in case of RAID 5. > > Please explain me with an example. > > Thanks and Regards, > > Kaushal > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > You already have one good resource. I wrote this a while ago, and the preface may answer some questions you have about the terminology used. http://wiki.tldp.org/LVM-on-RAID However the question you're asking is more or less borderline off-topic for this mailing list. If the linked information is insufficient I suggest using the Wikipedia article's links to learn more. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: RAID5 2010-04-19 4:21 ` RAID5 Michael Evans @ 2010-04-21 13:32 ` Bill Davidsen 2010-04-21 19:43 ` RAID5 Michael Evans 0 siblings, 1 reply; 12+ messages in thread From: Bill Davidsen @ 2010-04-21 13:32 UTC (permalink / raw) To: Michael Evans; +Cc: Kaushal Shriyan, linux-raid Michael Evans wrote: > On Sun, Apr 18, 2010 at 8:46 PM, Kaushal Shriyan > <kaushalshriyan@gmail.com> wrote: > >> Hi >> >> I am a newbie to RAID. is strip size and block size same. How is it >> calculated. is it 64Kb by default. what should be the strip size ? >> >> I have referred to >> http://en.wikipedia.org/wiki/Raid5#RAID_5_parity_handling. How is >> parity handled in case of RAID 5. >> >> Please explain me with an example. >> >> Thanks and Regards, >> >> Kaushal >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> > > You already have one good resource. > > I wrote this a while ago, and the preface may answer some questions > you have about the terminology used. > > http://wiki.tldp.org/LVM-on-RAID > > However the question you're asking is more or less borderline > off-topic for this mailing list. If the linked information is > insufficient I suggest using the Wikipedia article's links to learn > more. > I have some recent experience with this gained the hard way, by looking for a problem rather than curiousity. My experience with LVM on RAID is that, at least for RAID-5, write performance sucks. I created two partitions on each of three drives, and two raid-5 arrays using those partitions. Same block size, same tuning for stripe-cache, etc. I dropped an ext4 on on array, and LVM on the other, put ext4 on the LVM drive, and copied 500GB to each. LVM had a 50% performance penalty, took twice as long. Repeated with four drives (all I could spare) and found that the speed right on an array was roughly 3x slower with LVM. I did not look into it further, I know why the performance is bad, I don't have the hardware to change things right now, so I live with it. When I get back from a trip I will change that. -- Bill Davidsen <davidsen@tmr.com> "We can't solve today's problems by using the same thinking we used in creating them." - Einstein ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: RAID5 2010-04-21 13:32 ` RAID5 Bill Davidsen @ 2010-04-21 19:43 ` Michael Evans 2010-04-23 14:26 ` RAID5 Michael Tokarev 2010-05-02 22:45 ` RAID5 Bill Davidsen 0 siblings, 2 replies; 12+ messages in thread From: Michael Evans @ 2010-04-21 19:43 UTC (permalink / raw) To: Bill Davidsen; +Cc: Kaushal Shriyan, linux-raid On Wed, Apr 21, 2010 at 6:32 AM, Bill Davidsen <davidsen@tmr.com> wrote: > Michael Evans wrote: >> >> On Sun, Apr 18, 2010 at 8:46 PM, Kaushal Shriyan >> <kaushalshriyan@gmail.com> wrote: >> >>> >>> Hi >>> >>> I am a newbie to RAID. is strip size and block size same. How is it >>> calculated. is it 64Kb by default. what should be the strip size ? >>> >>> I have referred to >>> http://en.wikipedia.org/wiki/Raid5#RAID_5_parity_handling. How is >>> parity handled in case of RAID 5. >>> >>> Please explain me with an example. >>> >>> Thanks and Regards, >>> >>> Kaushal >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >>> >> >> You already have one good resource. >> >> I wrote this a while ago, and the preface may answer some questions >> you have about the terminology used. >> >> http://wiki.tldp.org/LVM-on-RAID >> >> However the question you're asking is more or less borderline >> off-topic for this mailing list. If the linked information is >> insufficient I suggest using the Wikipedia article's links to learn >> more. >> > > I have some recent experience with this gained the hard way, by looking for > a problem rather than curiousity. My experience with LVM on RAID is that, at > least for RAID-5, write performance sucks. I created two partitions on each > of three drives, and two raid-5 arrays using those partitions. Same block > size, same tuning for stripe-cache, etc. I dropped an ext4 on on array, and > LVM on the other, put ext4 on the LVM drive, and copied 500GB to each. LVM > had a 50% performance penalty, took twice as long. Repeated with four drives > (all I could spare) and found that the speed right on an array was roughly > 3x slower with LVM. > > I did not look into it further, I know why the performance is bad, I don't > have the hardware to change things right now, so I live with it. When I get > back from a trip I will change that. > > -- > Bill Davidsen <davidsen@tmr.com> > "We can't solve today's problems by using the same thinking we > used in creating them." - Einstein > > This issues sounds very likely to be write barrier related. Were you using an external journal on a write-barrier honoring device? -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: RAID5 2010-04-21 19:43 ` RAID5 Michael Evans @ 2010-04-23 14:26 ` Michael Tokarev 2010-04-23 14:57 ` RAID5 MRK ` (3 more replies) 2010-05-02 22:45 ` RAID5 Bill Davidsen 1 sibling, 4 replies; 12+ messages in thread From: Michael Tokarev @ 2010-04-23 14:26 UTC (permalink / raw) To: Michael Evans; +Cc: Bill Davidsen, Kaushal Shriyan, linux-raid Michael Evans wrote: > On Wed, Apr 21, 2010 at 6:32 AM, Bill Davidsen <davidsen@tmr.com> wrote: [] >> I have some recent experience with this gained the hard way, by looking for >> a problem rather than curiousity. My experience with LVM on RAID is that, at >> least for RAID-5, write performance sucks. I created two partitions on each >> of three drives, and two raid-5 arrays using those partitions. Same block >> size, same tuning for stripe-cache, etc. I dropped an ext4 on on array, and >> LVM on the other, put ext4 on the LVM drive, and copied 500GB to each. LVM >> had a 50% performance penalty, took twice as long. Repeated with four drives >> (all I could spare) and found that the speed right on an array was roughly >> 3x slower with LVM. >> > This issues sounds very likely to be write barrier related. Were you > using an external journal on a write-barrier honoring device? This is most likely due to read-modify-write cycle which is present on lvm-on-raid[456] if the number of data drives is not a power of two. LVM requires the block size to be a power of two, so if you can't fit some number of LVM blocks on whole raid stripe size your write speed is expected to be ~3 times worse... Even creating partitions on such raid array is difficult. 'Hwell. Unfortunately very few people understand this. As of write barriers, it looks like either they already work (in 2.6.33) or will be (in 2.6.34) for whole raid5-lvm stack. /mjt ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: RAID5 2010-04-23 14:26 ` RAID5 Michael Tokarev @ 2010-04-23 14:57 ` MRK 2010-04-23 20:57 ` RAID5 Michael Evans ` (2 subsequent siblings) 3 siblings, 0 replies; 12+ messages in thread From: MRK @ 2010-04-23 14:57 UTC (permalink / raw) To: Michael Tokarev; +Cc: Michael Evans, Bill Davidsen, Kaushal Shriyan, linux-raid On 04/23/2010 04:26 PM, Michael Tokarev wrote: > > This is most likely due to read-modify-write cycle which is present on > lvm-on-raid[456] if the number of data drives is not a power of two. > LVM requires the block size to be a power of two, so if you can't fit > some number of LVM blocks on whole raid stripe size your write speed > is expected to be ~3 times worse... > > Even creating partitions on such raid array is difficult. > > > > Seriously? a number of data drives power of 2 would be an immense limitation. Why should that be? I understand that LVM blocks would not be aligned to raid stripes, and this can worsen the problem for random writes, but if the write is sequential, the raid stripe will still be filled at the next block-output by LVM. Maybe the very first stripe you write will get an RMW but the next ones will be filled in the wait, and also consider you have the preread_bypass_threshold feature by MD which helps in this. Also if you really need to put an integer number of LVM blocks in an MD stripe (which I doubt, as I wrote above), this still does not mean that the number of drives needs to be a power of 2: e.g. you could put 10 LVM blocks in 5 data disks, couldn't you? I would think more to a barriers thing... I'd try to repeat the test with nobarrier upon ext4 mount and see. But Bill says that he "knows" what's the problem so maybe he will tell us earlier or later :-) ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: RAID5 2010-04-23 14:26 ` RAID5 Michael Tokarev 2010-04-23 14:57 ` RAID5 MRK @ 2010-04-23 20:57 ` Michael Evans 2010-04-24 1:47 ` RAID5 Mikael Abrahamsson 2010-05-02 22:51 ` RAID5 Bill Davidsen 2010-05-03 5:51 ` RAID5 Luca Berra 3 siblings, 1 reply; 12+ messages in thread From: Michael Evans @ 2010-04-23 20:57 UTC (permalink / raw) To: Michael Tokarev; +Cc: Bill Davidsen, Kaushal Shriyan, linux-raid On Fri, Apr 23, 2010 at 7:26 AM, Michael Tokarev <mjt@tls.msk.ru> wrote: > Michael Evans wrote: >> On Wed, Apr 21, 2010 at 6:32 AM, Bill Davidsen <davidsen@tmr.com> wrote: > [] >>> I have some recent experience with this gained the hard way, by looking for >>> a problem rather than curiousity. My experience with LVM on RAID is that, at >>> least for RAID-5, write performance sucks. I created two partitions on each >>> of three drives, and two raid-5 arrays using those partitions. Same block >>> size, same tuning for stripe-cache, etc. I dropped an ext4 on on array, and >>> LVM on the other, put ext4 on the LVM drive, and copied 500GB to each. LVM >>> had a 50% performance penalty, took twice as long. Repeated with four drives >>> (all I could spare) and found that the speed right on an array was roughly >>> 3x slower with LVM. >>> >> This issues sounds very likely to be write barrier related. Were you >> using an external journal on a write-barrier honoring device? > > This is most likely due to read-modify-write cycle which is present on > lvm-on-raid[456] if the number of data drives is not a power of two. > LVM requires the block size to be a power of two, so if you can't fit > some number of LVM blocks on whole raid stripe size your write speed > is expected to be ~3 times worse... > > Even creating partitions on such raid array is difficult. > > 'Hwell. > > Unfortunately very few people understand this. > > As of write barriers, it looks like either they already work > (in 2.6.33) or will be (in 2.6.34) for whole raid5-lvm stack. > > /mjt > Even when write barriers are supported what will a typical transaction look like? Journal Flush Data Flush Journal Flush (maybe) If the operations are small (which the journal ops should be) then you're forced to wait for a read, and then make a write barrier after it. J.read(2 drives) J.write(2 drives) -- Barrier D.read(2 drives) D.write(2 drives) -- Barrier Then maybe J.read(2 drives) (Hopefully cached, but could cross in to a new stripe...) J.write(2 drives) -- Barrier This is why an external journal on another device is a great idea. Unfortunately what I really want is something like 512mb of battery backed ram (at any vaguely modern speed) to split up as a journal devices, but now everyone is selling SDDs which are broken for such needs. Any ram drive units still being sold seem to be more along data-center grade sizes. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: RAID5 2010-04-23 20:57 ` RAID5 Michael Evans @ 2010-04-24 1:47 ` Mikael Abrahamsson 2010-04-24 3:34 ` RAID5 Michael Evans 0 siblings, 1 reply; 12+ messages in thread From: Mikael Abrahamsson @ 2010-04-24 1:47 UTC (permalink / raw) To: Michael Evans; +Cc: Michael Tokarev, Bill Davidsen, Kaushal Shriyan, linux-raid On Fri, 23 Apr 2010, Michael Evans wrote: > devices, but now everyone is selling SDDs which are broken for such > needs. Any ram drive units still being sold seem to be more along > data-center grade sizes. http://benchmarkreviews.com/index.php?option=com_content&task=view&id=308&Itemid=60 Basically it's DRAM with a battery backup and a CF slot where the data goes in case of poewr failure. It's a bit big and so on, but it should be perfect for journals... Or is this the kind of device you were referring to as "data center grade size"? Some of te SSDs sold today have a capacitor for power failure as well, so all writes will complete, but they're not so common. -- Mikael Abrahamsson email: swmike@swm.pp.se ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: RAID5 2010-04-24 1:47 ` RAID5 Mikael Abrahamsson @ 2010-04-24 3:34 ` Michael Evans 0 siblings, 0 replies; 12+ messages in thread From: Michael Evans @ 2010-04-24 3:34 UTC (permalink / raw) To: Mikael Abrahamsson Cc: Michael Tokarev, Bill Davidsen, Kaushal Shriyan, linux-raid On Fri, Apr 23, 2010 at 6:47 PM, Mikael Abrahamsson <swmike@swm.pp.se> wrote: > On Fri, 23 Apr 2010, Michael Evans wrote: > >> devices, but now everyone is selling SDDs which are broken for such >> needs. Any ram drive units still being sold seem to be more along >> data-center grade sizes. > > http://benchmarkreviews.com/index.php?option=com_content&task=view&id=308&Itemid=60 > > Basically it's DRAM with a battery backup and a CF slot where the data goes > in case of poewr failure. It's a bit big and so on, but it should be perfect > for journals... Or is this the kind of device you were referring to as "data > center grade size"? > > Some of te SSDs sold today have a capacitor for power failure as well, so > all writes will complete, but they're not so common. > > -- > Mikael Abrahamsson email: swmike@swm.pp.se > Yeah, that's in the range I call 'data center grade' since the least expensive model I can find using search tools is about 236 USD. For that price I could /buy/ two to three hard drives and get nearly the same effect by reusing old drives (but wasting more power). I should be able to find something with a cheep plastic shell for mounting and a very simple PCB that has slots for older ram of my selection, and a minimal onboard CPU for less than 50USD; I seriously doubt the components cost that much. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: RAID5 2010-04-23 14:26 ` RAID5 Michael Tokarev 2010-04-23 14:57 ` RAID5 MRK 2010-04-23 20:57 ` RAID5 Michael Evans @ 2010-05-02 22:51 ` Bill Davidsen 2010-05-03 5:51 ` RAID5 Luca Berra 3 siblings, 0 replies; 12+ messages in thread From: Bill Davidsen @ 2010-05-02 22:51 UTC (permalink / raw) To: Michael Tokarev; +Cc: Michael Evans, Kaushal Shriyan, linux-raid Michael Tokarev wrote: > Michael Evans wrote: > >> On Wed, Apr 21, 2010 at 6:32 AM, Bill Davidsen <davidsen@tmr.com> wrote: >> > [] > >>> I have some recent experience with this gained the hard way, by looking for >>> a problem rather than curiousity. My experience with LVM on RAID is that, at >>> least for RAID-5, write performance sucks. I created two partitions on each >>> of three drives, and two raid-5 arrays using those partitions. Same block >>> size, same tuning for stripe-cache, etc. I dropped an ext4 on on array, and >>> LVM on the other, put ext4 on the LVM drive, and copied 500GB to each. LVM >>> had a 50% performance penalty, took twice as long. Repeated with four drives >>> (all I could spare) and found that the speed right on an array was roughly >>> 3x slower with LVM. >>> >>> >> This issues sounds very likely to be write barrier related. Were you >> using an external journal on a write-barrier honoring device? >> > > This is most likely due to read-modify-write cycle which is present on > lvm-on-raid[456] if the number of data drives is not a power of two. > LVM requires the block size to be a power of two, so if you can't fit > some number of LVM blocks on whole raid stripe size your write speed > is expected to be ~3 times worse... > > Since I tried 3 and 4 drive setups, with several chunk sizes, I would hope that no matter how lvm counts data drives (why does it care?) it would find a power of two there. > Even creating partitions on such raid array is difficult. > > 'Hwell. > > Unfortunately very few people understand this. > > As of write barriers, it looks like either they already work > (in 2.6.33) or will be (in 2.6.34) for whole raid5-lvm stack. > > /mjt > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- Bill Davidsen <davidsen@tmr.com> "We can't solve today's problems by using the same thinking we used in creating them." - Einstein ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: RAID5 2010-04-23 14:26 ` RAID5 Michael Tokarev ` (2 preceding siblings ...) 2010-05-02 22:51 ` RAID5 Bill Davidsen @ 2010-05-03 5:51 ` Luca Berra 3 siblings, 0 replies; 12+ messages in thread From: Luca Berra @ 2010-05-03 5:51 UTC (permalink / raw) To: linux-raid On Fri, Apr 23, 2010 at 06:26:20PM +0400, Michael Tokarev wrote: >This is most likely due to read-modify-write cycle which is present on >lvm-on-raid[456] if the number of data drives is not a power of two. >LVM requires the block size to be a power of two, so if you can't fit >some number of LVM blocks on whole raid stripe size your write speed >is expected to be ~3 times worse... uh? PE size != block size. PE size is not used for io, it is only used for laying out data. It will influence data alignment, but i believe the issue may be bypassed if we make PE size == chunk_size and do all creation/extension of LV in multiple of data_disks, the resulting device-mapper tables should be aligned. L. -- Luca Berra -- bluca@comedia.it Communication Media & Services S.r.l. /"\ \ / ASCII RIBBON CAMPAIGN X AGAINST HTML MAIL / \ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: RAID5 2010-04-21 19:43 ` RAID5 Michael Evans 2010-04-23 14:26 ` RAID5 Michael Tokarev @ 2010-05-02 22:45 ` Bill Davidsen 1 sibling, 0 replies; 12+ messages in thread From: Bill Davidsen @ 2010-05-02 22:45 UTC (permalink / raw) To: Michael Evans; +Cc: Kaushal Shriyan, linux-raid Michael Evans wrote: > On Wed, Apr 21, 2010 at 6:32 AM, Bill Davidsen <davidsen@tmr.com> wrote: > >> Michael Evans wrote: >> >>> On Sun, Apr 18, 2010 at 8:46 PM, Kaushal Shriyan >>> <kaushalshriyan@gmail.com> wrote: >>> >>> >>>> Hi >>>> >>>> I am a newbie to RAID. is strip size and block size same. How is it >>>> calculated. is it 64Kb by default. what should be the strip size ? >>>> >>>> I have referred to >>>> http://en.wikipedia.org/wiki/Raid5#RAID_5_parity_handling. How is >>>> parity handled in case of RAID 5. >>>> >>>> Please explain me with an example. >>>> >>>> Thanks and Regards, >>>> >>>> Kaushal >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> >>>> >>>> >>> You already have one good resource. >>> >>> I wrote this a while ago, and the preface may answer some questions >>> you have about the terminology used. >>> >>> http://wiki.tldp.org/LVM-on-RAID >>> >>> However the question you're asking is more or less borderline >>> off-topic for this mailing list. If the linked information is >>> insufficient I suggest using the Wikipedia article's links to learn >>> more. >>> >>> >> I have some recent experience with this gained the hard way, by looking for >> a problem rather than curiousity. My experience with LVM on RAID is that, at >> least for RAID-5, write performance sucks. I created two partitions on each >> of three drives, and two raid-5 arrays using those partitions. Same block >> size, same tuning for stripe-cache, etc. I dropped an ext4 on on array, and >> LVM on the other, put ext4 on the LVM drive, and copied 500GB to each. LVM >> had a 50% performance penalty, took twice as long. Repeated with four drives >> (all I could spare) and found that the speed right on an array was roughly >> 3x slower with LVM. >> >> I did not look into it further, I know why the performance is bad, I don't >> have the hardware to change things right now, so I live with it. When I get >> back from a trip I will change that. >> >> >> > > This issues sounds very likely to be write barrier related. Were you > using an external journal on a write-barrier honoring device? > Not at all, just taking 60G of free space of the drives, creating two partitions (on 64 sector boundaries) and using them for raid-5. Tried various chunk sizes, better for some things, not so much for others. -- Bill Davidsen <davidsen@tmr.com> "We can't solve today's problems by using the same thinking we used in creating them." - Einstein ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2010-05-03 5:51 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-04-19 3:46 RAID5 Kaushal Shriyan 2010-04-19 4:21 ` RAID5 Michael Evans 2010-04-21 13:32 ` RAID5 Bill Davidsen 2010-04-21 19:43 ` RAID5 Michael Evans 2010-04-23 14:26 ` RAID5 Michael Tokarev 2010-04-23 14:57 ` RAID5 MRK 2010-04-23 20:57 ` RAID5 Michael Evans 2010-04-24 1:47 ` RAID5 Mikael Abrahamsson 2010-04-24 3:34 ` RAID5 Michael Evans 2010-05-02 22:51 ` RAID5 Bill Davidsen 2010-05-03 5:51 ` RAID5 Luca Berra 2010-05-02 22:45 ` RAID5 Bill Davidsen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).