From mboxrd@z Thu Jan 1 00:00:00 1970 References: <42E7ED35-B32E-4C02-976A-7A9E5380EEA8@mac.com> From: Zdenek Kabelac Message-ID: Date: Thu, 14 Sep 2017 11:00:46 +0200 MIME-Version: 1.0 In-Reply-To: <42E7ED35-B32E-4C02-976A-7A9E5380EEA8@mac.com> Content-Language: en-US Content-Transfer-Encoding: 8bit Subject: Re: [linux-lvm] Performance penalty for 4k requests on thin provisioned volume Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: Content-Type: text/plain; charset="utf-8"; format="flowed" To: LVM general discussion and development , Dale Stephenson Dne 14.9.2017 v 00:39 Dale Stephenson napsal(a): > >> On Sep 13, 2017, at 4:19 PM, Zdenek Kabelac wrote: >> >> Dne 13.9.2017 v 17:33 Dale Stephenson napsal(a): >>> Distribution: centos-release-7-3.1611.el7.centos.x86_64 >>> Kernel: Linux 3.10.0-514.26.2.el7.x86_64 >>> LVM: 2.02.166(2)-RHEL7 (2016-11-16) >>> Volume group consisted of an 8-drive SSD (500G drives) array, plus an additional SSD of the same size. The array had 64 k stripes. >>> Thin pool had -Zn option and 512k chunksize (full stripe), size 3T with metadata volume 16G. data was entirely on the 8-drive raid, metadata was entirely on the 9th drive. >>> Virtual volume “thin” was 300 GB. I also filled it with dd so that it would be fully provisioned before the test. >>> Volume “thick” was also 300GB, just an ordinary volume also entirely on the 8-drive array. >>> Four tests were run directlyagainst each volume using fio-2.2.8, random read, random write, sequential read, sequential write. Single thread, 4k blocksize, 90s run time. >> >> Hi >> >> Can you please provide output of: >> >> lvs -a -o+stripes,stripesize,seg_pe_ranges >> >> so we can see how is your stripe placed on devices ? > > Sure, thank you for your help: > # lvs -a -o+stripes,stripesize,seg_pe_ranges > LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert #Str Stripe PE Ranges > [lvol0_pmspare] volgr0 ewi------- 16.00g 1 0 /dev/md127:867328-871423 > thick volgr0 -wi-a----- 300.00g 1 0 /dev/md127:790528-867327 > thin volgr0 Vwi-a-t--- 300.00g thinpool 100.00 0 0 > thinpool volgr0 twi-aot--- 3.00t 9.77 0.13 1 0 thinpool_tdata:0-786431 > [thinpool_tdata] volgr0 Twi-ao---- 3.00t 1 0 /dev/md127:0-786431 > [thinpool_tmeta] volgr0 ewi-ao---- 16.00g 1 0 /dev/sdb4:0-4095 > > md127 is an 8-drive RAID 0 > > As you can see, there’s no lvm striping; I rely on the software RAID underneath for that. Both thick and thin lvols are on the same PV. >> >> SSD typically do needs ideally write 512K chunks. > > I could create the md to use 512k chunks for RAID 0, but I wouldn’t expect that to have any impact on a single threaded test using 4k request size. Is there a hidden relationship that I’m unaware of? Yep - it seems the setup in this case is the best fit. If you can reevaluate different setups you may possibly get much higher throughput. My guess would be - the best targeting layout should be probably striping no more then 2-3 disks and use bigger striping block. And then just 'join' 'smaller' arrays together in lvm2 in 1 big LV. > >> (something like 'lvcreate -LXXX -i8 -I512k vgname’) >> > Would making lvm stripe on top of an md that already stripes confer any performance benefit in general, or for small (4k) requests in particular? Rule #1 - try to avoid 'over-combining' things together. - measure performance from 'bottom' upward in your device stack. If the underlying devices gives poor speed - you can't make it better by any super0smart disk-layout on top of it. > >> Wouldn't be 'faster' to just concatenate 8 disks together instead of striping - or stripe only across 2 disk - and then you concatenate 4 such striped areas… >> > For sustained throughput I would expect striping of 8 disks to blow away concatenation — however, for small requests I wouldn’t expect any advantage. On a non-redundant array, I would expect a single threaded test using 4k requests is going to end up reading/writing data from exactly one disk regardless of whether the underlying drives are concatenated or stripes. It always depends which kind of load you expect the most. I suspect spreading 4K blocks across 8 SSD is likely very far away from ideal layout. Any SSD is typically very bad with 4K blocks - it you want to 'spread' the load on mores SSDs do not use less the 64K stripe chunks per SSD - this gives you (8*64) 512K stripe size. As for thin-pool chunksize - if you plan to use lots of snapshots - keep the value lowest possible - 64K or 128K thin-pool chunksize. But I'd still suggest to reevaluate/benchmark setup where you will use much lower number of SSD for load spreading - and use bigger strip chunks per each device. This should nicely improve performance in case of 'bigger' writes and not that much slow things down with 4K loads.... > What is the best choice for handling 4k request sizes? Possibly NVMe can do a better job here. Regards Zdenek