* [linux-lvm] pvmove painfully slow on parity RAID @ 2010-12-29 2:40 Spelic 2010-12-29 14:02 ` Spelic 0 siblings, 1 reply; 8+ messages in thread From: Spelic @ 2010-12-29 2:40 UTC (permalink / raw) To: linux-lvm Hello list pvmove is painfully slow if the destination is on a 6-disks MD raid-5, it performs at 200-500Kbytes/sec! (kernel 2.6.36.2) Same for lvconvert add mirror. Instead, if the destination is on a 4 devices MD raid10near, it performs at 60MBytes/sec which is much more reasonable. (this is a 120-fold difference at least!) Same for lvconvert add mirror. How come such a difference? Are you using barriers every tiny block of data maybe? (This could explain the slowness on parity raid) If yes, could you use barriers (and hence checkpointing) every, like, 100MB? (However please note that with lvconvert add mirror I also tried various --regionsize settings but they don't improve the speed much, i.e. +50% at most) Thanks for any information ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [linux-lvm] pvmove painfully slow on parity RAID 2010-12-29 2:40 [linux-lvm] pvmove painfully slow on parity RAID Spelic @ 2010-12-29 14:02 ` Spelic 2010-12-30 2:42 ` Stuart D. Gathman 0 siblings, 1 reply; 8+ messages in thread From: Spelic @ 2010-12-29 14:02 UTC (permalink / raw) To: linux-lvm On 12/29/2010 03:40 AM, Spelic wrote: > Hello list > > pvmove is painfully slow if the destination is on a 6-disks MD raid-5, > it performs at 200-500Kbytes/sec! (kernel 2.6.36.2) > Same for lvconvert add mirror. > > Instead, if the destination is on a 4 devices MD raid10near, it > performs at 60MBytes/sec which is much more reasonable. (this is a > 120-fold difference at least!) > Same for lvconvert add mirror. > Sorry, yesterday I made a few mistakes computing the speeds. Here are the times for moving a 200MB logical volume towards various types of MD arrays (either pvmove or lvconvert add mirror: doesn't change much) It's the destination array that matters, not the source array. raid5, 8 devices, 1024k chunk: 36 seconds (5.5MB/sec) raid5, 6 device, 4096k chunk: 2m18sec ?!?! (1.44 MB/sec!?) raid5, 5 devices, 1024k chunk: 25sec (8MB/sec) raid5, 4 devices, 16384k chunk: 41sec (4.9MB/sec) raid10, 4 devices, 1024k chunk, near-copies: 5 sec! (40MB/sec) raid1, 2 devices: 3.4sec! (59MB/sec) raid1, 2 devices (another, identical to the above): 3.4sec! (59MB/sec) I tried multiple times for every device with consistent results, so I'm pretty sure these are actual numbers. What's happening? Apart from the amazing difference of parity raid vs nonparity raid, with parity raid it seems to vary randomly with the number of devices and the chunksize..? I tried various --regionsize settings for lvconvert add mirror but the times didn't change much. I even tried to set my SATA controller to ignore-FUA mode (it fakes the FUA, returns immediately) => no change. Thanks for any info ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [linux-lvm] pvmove painfully slow on parity RAID 2010-12-29 14:02 ` Spelic @ 2010-12-30 2:42 ` Stuart D. Gathman 2010-12-30 3:13 ` Spelic 0 siblings, 1 reply; 8+ messages in thread From: Stuart D. Gathman @ 2010-12-30 2:42 UTC (permalink / raw) To: LVM general discussion and development On Wed, 29 Dec 2010, Spelic wrote: > I tried multiple times for every device with consistent results, so I'm pretty > sure these are actual numbers. > What's happening? > Apart from the amazing difference of parity raid vs nonparity raid, with > parity raid it seems to vary randomly with the number of devices and the > chunksize..? This is pretty much my experience with parity raid all around. Which is why I stick with raid1 and raid10. That said, the sequential writes of pvmove should be fast for raid5 *if* the chunks are aligned so that there is no read/modify/write cycle. 1) Perhaps your test targets are not properly aligned? 2) Perhaps the raid5 implementation (hardware? linux md? experimental lvm raid5?) does a read modify write even when it doesn't have to. Your numbers sure look like read/modify/write is happening for some reason. -- Stuart D. Gathman <stuart@bmsi.com> Business Management Systems Inc. Phone: 703 591-0911 Fax: 703 591-6154 "Confutatis maledictis, flammis acribus addictis" - background song for a Microsoft sponsored "Where do you want to go from here?" commercial. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [linux-lvm] pvmove painfully slow on parity RAID 2010-12-30 2:42 ` Stuart D. Gathman @ 2010-12-30 3:13 ` Spelic 2010-12-30 19:12 ` Stuart D. Gathman 0 siblings, 1 reply; 8+ messages in thread From: Spelic @ 2010-12-30 3:13 UTC (permalink / raw) To: linux-lvm On 12/30/2010 03:42 AM, Stuart D. Gathman wrote: > On Wed, 29 Dec 2010, Spelic wrote: > >> I tried multiple times for every device with consistent results, so I'm pretty >> sure these are actual numbers. >> What's happening? >> Apart from the amazing difference of parity raid vs nonparity raid, with >> parity raid it seems to vary randomly with the number of devices and the >> chunksize..? >> > This is pretty much my experience with parity raid all around. Which > is why I stick with raid1 and raid10. > Parity raid goes fast for me for normal filesystem operations, that's why I suppose there is some strict sequentiality is enforced here. > That said, the sequential writes of pvmove should be fast for raid5 *if* > the chunks are aligned so that there is no read/modify/write cycle. > > 1) Perhaps your test targets are not properly aligned? > aligned to zero yes (arrays are empty now), but all raids have different chunk sizes and stripe sizes as I reported, which are all bigger than the lvm chunksize which is 1M for the VG. > 2) Perhaps the raid5 implementation (hardware? linux md? > experimental lvm raid5?) does a read modify write even when it > doesn't have to. > > Your numbers sure look like read/modify/write is happening for some reason. > Ok but strict sequentiality is probably enforced too much. There must be some barrier or flush & wait thing going on here at each tiny bit of information (at each lvm chunk maybe?). Are you a lvm devel? Consider that a sequential dd write goes hundreds of megabytes per second on my arrays, not hundreds of... kilobytes! Even random io goes *much* faster than this, if one stripe does not have to wait for another stripe to be fully updated (i.e. sequentiality not enforced from the application layer). If pvmove ouputed 100MB before every sync or flush, I'm pretty sure I would see speeds almost 100 times higher. Also there is still the mystery of why times appear *randomly* related to the number of devices, chunk sizes, and stripe sizes! if the rmw cycle was the culprit, how come I see: raid5, 4 devices, 16384k chunk: 41sec (4.9MB/sec) raid5, 6 device, 4096k chunk: 2m18sec ?!?! (1.44 MB/sec!?) the first has much larger stripe size of 49152K , the second has 20480K ! Thank you ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [linux-lvm] pvmove painfully slow on parity RAID 2010-12-30 3:13 ` Spelic @ 2010-12-30 19:12 ` Stuart D. Gathman 2010-12-31 3:41 ` Spelic 0 siblings, 1 reply; 8+ messages in thread From: Stuart D. Gathman @ 2010-12-30 19:12 UTC (permalink / raw) To: LVM general discussion and development On Thu, 30 Dec 2010, Spelic wrote: > Also there is still the mystery of why times appear *randomly* related to the > number of devices, chunk sizes, and stripe sizes! if the rmw cycle was the > culprit, how come I see: > raid5, 4 devices, 16384k chunk: 41sec (4.9MB/sec) > raid5, 6 device, 4096k chunk: 2m18sec ?!?! (1.44 MB/sec!?) > the first has much larger stripe size of 49152K , the second has 20480K ! Ok, next theory. Pvmove works by allocating a mirror for each contiguous segment of the source LV, update metadata (how many metadata copies do you have?), sync mirror, update metadata and allocate and sync next segment until finished. Pvmove will be fastest when the source LV has a single contiguous chunk. If you restored the metadata after every test, then the variation by dest PV would blow this theory. But if not, then the slow pvmoves would be for fragmented source LVs. The metadata updates between every segment are rather expensive (but necessary). -- Stuart D. Gathman <stuart@bmsi.com> Business Management Systems Inc. Phone: 703 591-0911 Fax: 703 591-6154 "Confutatis maledictis, flammis acribus addictis" - background song for a Microsoft sponsored "Where do you want to go from here?" commercial. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [linux-lvm] pvmove painfully slow on parity RAID 2010-12-30 19:12 ` Stuart D. Gathman @ 2010-12-31 3:41 ` Spelic 2010-12-31 15:36 ` Stuart D. Gathman 0 siblings, 1 reply; 8+ messages in thread From: Spelic @ 2010-12-31 3:41 UTC (permalink / raw) To: linux-lvm On 12/30/2010 08:12 PM, Stuart D. Gathman wrote: > On Thu, 30 Dec 2010, Spelic wrote: > > >> Also there is still the mystery of why times appear *randomly* related to the >> number of devices, chunk sizes, and stripe sizes! if the rmw cycle was the >> culprit, how come I see: >> raid5, 4 devices, 16384k chunk: 41sec (4.9MB/sec) >> raid5, 6 device, 4096k chunk: 2m18sec ?!?! (1.44 MB/sec!?) >> the first has much larger stripe size of 49152K , the second has 20480K ! >> > Ok, next theory. Pvmove works by allocating a mirror for each > contiguous segment of the source LV, update metadata Ok never mind, I found the problem: LVM probably uses O_DIRECT, right? Well it's absymally slow on MD parity raid (I checked with dd on the bare MD device just now) and I don't know why it's so slow. It's not because of the rmw because it's slow even the second time I try, when it does not read anything anymore because all reads are in cache already. I understand this is probably to be fixed at MD side (and I will report the problem to linux-raid, but I see it has already been discussed without much results) However... ...is there any chance you might fix it at lvm side too, changing LVM to use nondirect IO so to "support" MD? In my raid5 array between direct and nondirect (dd bs=1M or smaller) there's the difference of 2.1MB/s to 250MB/sec, and would probably be greater on larger arrays. Also in raid10 nondirect is much faster for small transfer sizes like bs=4K (28MB/sec to 160MB/sec) but not at 1M, however LVM probably uses low transfer sizes, right? Thank you ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [linux-lvm] pvmove painfully slow on parity RAID 2010-12-31 3:41 ` Spelic @ 2010-12-31 15:36 ` Stuart D. Gathman 2010-12-31 17:23 ` Stuart D. Gathman 0 siblings, 1 reply; 8+ messages in thread From: Stuart D. Gathman @ 2010-12-31 15:36 UTC (permalink / raw) To: LVM general discussion and development On Fri, 31 Dec 2010, Spelic wrote: > Ok never mind, I found the problem: > LVM probably uses O_DIRECT, right? > Well it's absymally slow on MD parity raid (I checked with dd on the bare MD > device just now) and I don't know why it's so slow. It's not because of the > rmw because it's slow even the second time I try, when it does not read > anything anymore because all reads are in cache already. The point of O_DIRECT is to *not* use the cache. Although a write-through cache would seem to be OK, you have to make sure that ALL writes write-through the cache, or the data on parity raid will be corrupted. The R/M/W problem afflicts every level of parity raid in subtle ways. That's why I don't like it. -- Stuart D. Gathman <stuart@bmsi.com> Business Management Systems Inc. Phone: 703 591-0911 Fax: 703 591-6154 "Confutatis maledictis, flammis acribus addictis" - background song for a Microsoft sponsored "Where do you want to go from here?" commercial. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [linux-lvm] pvmove painfully slow on parity RAID 2010-12-31 15:36 ` Stuart D. Gathman @ 2010-12-31 17:23 ` Stuart D. Gathman 0 siblings, 0 replies; 8+ messages in thread From: Stuart D. Gathman @ 2010-12-31 17:23 UTC (permalink / raw) To: LVM general discussion and development On Fri, 31 Dec 2010, Stuart D. Gathman wrote: > On Fri, 31 Dec 2010, Spelic wrote: > > > Ok never mind, I found the problem: > > LVM probably uses O_DIRECT, right? > > Well it's absymally slow on MD parity raid (I checked with dd on the bare MD > > device just now) and I don't know why it's so slow. It's not because of the > > rmw because it's slow even the second time I try, when it does not read > > anything anymore because all reads are in cache already. > > The point of O_DIRECT is to *not* use the cache. Although a write-through > cache would seem to be OK, you have to make sure that ALL writes write-through > the cache, or the data on parity raid will be corrupted. > > The R/M/W problem afflicts every level of parity raid in subtle ways. > That's why I don't like it. Plus, any write to *part* of a chunk, even with a write-through cache, still has to write the *entire* chunk. So if chunk size is 64K, and pvmove writes to 32K blocks with O_DIRECT, that is 2 writes of the 64K chunk, even with the write-through cache (without the cache, it is 2 reads + 2 writes of the same chunk). -- Stuart D. Gathman <stuart@bmsi.com> Business Management Systems Inc. Phone: 703 591-0911 Fax: 703 591-6154 "Confutatis maledictis, flammis acribus addictis" - background song for a Microsoft sponsored "Where do you want to go from here?" commercial. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2010-12-31 17:23 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-12-29 2:40 [linux-lvm] pvmove painfully slow on parity RAID Spelic 2010-12-29 14:02 ` Spelic 2010-12-30 2:42 ` Stuart D. Gathman 2010-12-30 3:13 ` Spelic 2010-12-30 19:12 ` Stuart D. Gathman 2010-12-31 3:41 ` Spelic 2010-12-31 15:36 ` Stuart D. Gathman 2010-12-31 17:23 ` Stuart D. Gathman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).