* [linux-lvm] Testing the new LVM cache feature @ 2014-05-22 10:18 Richard W.M. Jones 2014-05-22 14:43 ` Zdenek Kabelac 0 siblings, 1 reply; 37+ messages in thread From: Richard W.M. Jones @ 2014-05-22 10:18 UTC (permalink / raw) To: linux-lvm I've set up a computer in order to test the new LVM cache feature. It has a pair of 2 TB HDDs in RAID 1 configuration, and a 256 GB SSD. The setup will be used to store large VM disk images in an ext4 filesystem, to be served both locally and over NFS. Before I start I have some questions about this feature: (1) Is there a minimum recommended version of LVM or kernel to use? I currently have lvm2-2.02.106-1.fc20.x86_64, which mentions LVM cache in the lvm(8) man page. I have kernel 3.14.3-200.fc20.x86_64. (2) There is no lvmcache(7) man page in any released version of LVM2. Was this man page ever created or is lvm(8) the definitive documentation? (3) It looks as if cached LVs cannot be resized: https://www.redhat.com/archives/lvm-devel/2014-February/msg00119.html Will this be fixed in future? Is there any workaround -- perhaps removing the caching layer, resizing the original LV, then recreating the cache? I really need to be able to resize LVs :-) (4) To calculate the size of the cache metadata LV, do I really just divide by 1000, min 8 MB? It's that simple? Doesn't it depend on dm-cache block size? Or dm-cache algorithm? How can I choose block size and algorithm? (5) Is there an explicit command for flushing the cache layer back to the origin LV? (6) Is the on-disk format stable for future kernel/LVM upgrades? Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com libguestfs lets you edit virtual machines. Supports shell scripting, bindings from many languages. http://libguestfs.org ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [linux-lvm] Testing the new LVM cache feature 2014-05-22 10:18 [linux-lvm] Testing the new LVM cache feature Richard W.M. Jones @ 2014-05-22 14:43 ` Zdenek Kabelac 2014-05-22 15:22 ` Richard W.M. Jones 0 siblings, 1 reply; 37+ messages in thread From: Zdenek Kabelac @ 2014-05-22 14:43 UTC (permalink / raw) To: LVM general discussion and development Dne 22.5.2014 12:18, Richard W.M. Jones napsal(a): > > I've set up a computer in order to test the new LVM cache feature. It > has a pair of 2 TB HDDs in RAID 1 configuration, and a 256 GB SSD. > The setup will be used to store large VM disk images in an ext4 > filesystem, to be served both locally and over NFS. > > Before I start I have some questions about this feature: > > (1) Is there a minimum recommended version of LVM or kernel to use? I > currently have lvm2-2.02.106-1.fc20.x86_64, which mentions LVM cache > in the lvm(8) man page. I have kernel 3.14.3-200.fc20.x86_64. With these new targets usually always applies - the newer the kernel and tools are - the better for you. > > (2) There is no lvmcache(7) man page in any released version of LVM2. > Was this man page ever created or is lvm(8) the definitive > documentation? It's now in upstream git as a separate man page (moved from lvm(8)) > (3) It looks as if cached LVs cannot be resized: > https://www.redhat.com/archives/lvm-devel/2014-February/msg00119.html > Will this be fixed in future? Is there any workaround -- perhaps Yes - cache is still missing a lot of feature - it needs further integration with tools like cache_check, cache_repair.... So far it's really only for a preview - I'd not consider to use it for anything serious yet. > removing the caching layer, resizing the original LV, then recreating > the cache? I really need to be able to resize LVs :-) Surely this feature will be implemented. Meanwhile - you have to drop cache, resize LV, reattach cache... (drop cache - means to remove cache) > (4) To calculate the size of the cache metadata LV, do I really just > divide by 1000, min 8 MB? It's that simple? Doesn't it depend on > dm-cache block size? Or dm-cache algorithm? How can I choose block > size and algorithm? Well this is where your experimenting may begin. However for now lvm2 doesn't allow you to play with algorithms - the lvchange interface is not yet upstream... > (5) Is there an explicit command for flushing the cache layer back to > the origin LV? To be developed... > (6) Is the on-disk format stable for future kernel/LVM upgrades? Well it's still experiemental - so if there will be found some huge problem, which requires to change/modify format it may happen. Zdenek ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [linux-lvm] Testing the new LVM cache feature 2014-05-22 14:43 ` Zdenek Kabelac @ 2014-05-22 15:22 ` Richard W.M. Jones 2014-05-22 15:49 ` Richard W.M. Jones 0 siblings, 1 reply; 37+ messages in thread From: Richard W.M. Jones @ 2014-05-22 15:22 UTC (permalink / raw) To: LVM general discussion and development; +Cc: Zdenek Kabelac Well I'm happy to experiment for you. At the moment I'm stuck here: # vgcreate vg_cache /dev/sdc1 Volume group "vg_cache" successfully created # lvcreate -L 1G -n lv_cache_meta vg_cache Logical volume "lv_cache_meta" created # lvcreate -L 229G -n lv_cache vg_cache Logical volume "lv_cache" created # lvs LV VG Attr LSize [...] lv_cache vg_cache Cwi---C--- 229.00g lv_cache_meta vg_cache -wi-a----- 1.00g testoriginlv vg_guests -wi-a----- 100.00g # lvconvert --type cache-pool --poolmetadata /dev/vg_cache/lv_cache_meta /dev/vg_cache/lv_cache Logical volume "lvol0" created Converted vg_cache/lv_cache to cache pool. # lvs LV VG Attr LSize [...] lv_cache vg_cache Cwi---C--- 229.00g testoriginlv vg_guests -wi-a----- 100.00g # lvconvert --type cache --cachepool vg_cache/lv_cache vg_guests/testoriginlv Unable to find cache pool LV, vg_cache/lv_cache ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ It seems as if vg_cache/lv_cache is a "cache pool" but for some reason lvconvert is unable to use it. The error seems to come from this code: if (!(cachepool = find_lv(origin->vg, lp->cachepool))) { log_error("Unable to find cache pool LV, %s", lp->cachepool); return 0; } Is it looking in the wrong VG? Or do I have to have a single VG for this to work? (That's not made clear in the documentation, and it seems like a strange restriction). Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com libguestfs lets you edit virtual machines. Supports shell scripting, bindings from many languages. http://libguestfs.org ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [linux-lvm] Testing the new LVM cache feature 2014-05-22 15:22 ` Richard W.M. Jones @ 2014-05-22 15:49 ` Richard W.M. Jones 2014-05-22 18:04 ` Mike Snitzer 0 siblings, 1 reply; 37+ messages in thread From: Richard W.M. Jones @ 2014-05-22 15:49 UTC (permalink / raw) To: LVM general discussion and development; +Cc: Zdenek Kabelac It works once I use a single VG. However the performance is exactly the same as the backing hard disk, not the SDD. It seems I'm getting no benefit ... # lvs [...] testoriginlv vg_guests Cwi-a-C--- 100.00g lv_cache [testoriginlv_corig] # mount /dev/vg_guests/testoriginlv /tmp/mnt # cd /tmp/mnt # dd if=/dev/zero of=test.file bs=64K count=100000 oflag=direct 100000+0 records in 100000+0 records out 6553600000 bytes (6.6 GB) copied, 57.6301 s, 114 MB/s # dd if=test.file of=/dev/zero bs=64K iflag=direct 100000+0 records in 100000+0 records out 6553600000 bytes (6.6 GB) copied, 47.6587 s, 138 MB/s (Exactly the same numbers as when I tested the underlying HDD, and about half the performance of the SDD.) Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-builder quickly builds VMs from scratch http://libguestfs.org/virt-builder.1.html ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [linux-lvm] Testing the new LVM cache feature 2014-05-22 15:49 ` Richard W.M. Jones @ 2014-05-22 18:04 ` Mike Snitzer 2014-05-22 18:13 ` Richard W.M. Jones 0 siblings, 1 reply; 37+ messages in thread From: Mike Snitzer @ 2014-05-22 18:04 UTC (permalink / raw) To: Richard W.M. Jones Cc: Zdenek Kabelac, thornber, LVM general discussion and development On Thu, May 22 2014 at 11:49am -0400, Richard W.M. Jones <rjones@redhat.com> wrote: > > It works once I use a single VG. > > However the performance is exactly the same as the backing hard disk, > not the SDD. It seems I'm getting no benefit ... > > # lvs > [...] > testoriginlv vg_guests Cwi-a-C--- 100.00g lv_cache [testoriginlv_corig] > > # mount /dev/vg_guests/testoriginlv /tmp/mnt > # cd /tmp/mnt > > # dd if=/dev/zero of=test.file bs=64K count=100000 oflag=direct > 100000+0 records in > 100000+0 records out > 6553600000 bytes (6.6 GB) copied, 57.6301 s, 114 MB/s > > # dd if=test.file of=/dev/zero bs=64K iflag=direct > 100000+0 records in > 100000+0 records out > 6553600000 bytes (6.6 GB) copied, 47.6587 s, 138 MB/s > > (Exactly the same numbers as when I tested the underlying HDD, and > about half the performance of the SDD.) By default dm-cache (as is currently upstream) is _not_ going to cache sequential IO, and it also isn't going to cache IO that is first written. It waits for hit counts to elevate to the promote threshold. So dm-cache effectively acts as a hot-spot cache by default. If you want dm-cache to be more aggressive for initial writes, you can: 1) discard the entire dm-cache device before use (either with mkfs, blkdiscard, or fstrim) 2) set the dm-cache 'write_promote_adjustment' tunable to 0 with the DM message interface, e.g.: dmsetup message <mapped device> 0 write_promote_adjustment 0 Additional documentation is available in the kernel tree: Documentation/device-mapper/cache.txt Documentation/device-mapper/cache-policies.txt Joe Thornber is also working on significant bursty write performance improvements for dm-cache. Hopefully they'll be ready to go upstream for the Linux 3.16 merge window. Mike ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [linux-lvm] Testing the new LVM cache feature 2014-05-22 18:04 ` Mike Snitzer @ 2014-05-22 18:13 ` Richard W.M. Jones 2014-05-29 13:52 ` Richard W.M. Jones 0 siblings, 1 reply; 37+ messages in thread From: Richard W.M. Jones @ 2014-05-22 18:13 UTC (permalink / raw) To: Mike Snitzer Cc: Zdenek Kabelac, thornber, LVM general discussion and development On Thu, May 22, 2014 at 02:04:05PM -0400, Mike Snitzer wrote: > By default dm-cache (as is currently upstream) is _not_ going to cache > sequential IO, and it also isn't going to cache IO that is first > written. It waits for hit counts to elevate to the promote threshold. > So dm-cache effectively acts as a hot-spot cache by default. OK, that makes sense, thanks. I wrote about using the LVM cache feature here: https://rwmj.wordpress.com/2014/05/22/using-lvms-new-cache-feature/#content Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-p2v converts physical machines to virtual machines. Boot with a live CD or over the network (PXE) and turn machines into KVM guests. http://libguestfs.org/virt-v2v ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [linux-lvm] Testing the new LVM cache feature 2014-05-22 18:13 ` Richard W.M. Jones @ 2014-05-29 13:52 ` Richard W.M. Jones 2014-05-29 20:34 ` Mike Snitzer 0 siblings, 1 reply; 37+ messages in thread From: Richard W.M. Jones @ 2014-05-29 13:52 UTC (permalink / raw) To: Mike Snitzer Cc: LVM general discussion and development, thornber, Zdenek Kabelac [-- Attachment #1: Type: text/plain, Size: 912 bytes --] I've done some more testing, comparing RAID 1 HDD with RAID 1 HDD + an SSD overlay (using lvm-cache). I'm now using 'fio', with the following job file: [virt] ioengine=libaio iodepth=4 rw=randrw bs=64k direct=1 size=1g numjobs=4 I'm still seeing almost no benefit from LVM cache. It's about 4% faster than the underlying, slow HDDs. See attached runs. The SSD LV is 200 GB and the underlying LV is 800 GB, so I would expect there is plenty of space to cache things in the SSD during the test. For comparison, the fio tests runs about 11 times faster on the SSD. Any ideas? Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-p2v converts physical machines to virtual machines. Boot with a live CD or over the network (PXE) and turn machines into KVM guests. http://libguestfs.org/virt-v2v [-- Attachment #2: virt-ham0-raid1.txt --] [-- Type: text/plain, Size: 9385 bytes --] virt: (g=0): rw=randrw, bs=64K-64K/64K-64K/64K-64K, ioengine=libaio, iodepth=4 ... virt: (g=0): rw=randrw, bs=64K-64K/64K-64K/64K-64K, ioengine=libaio, iodepth=4 fio-2.1.2 Starting 4 processes virt: Laying out IO file(s) (1 file(s) / 1024MB) virt: Laying out IO file(s) (1 file(s) / 1024MB) virt: Laying out IO file(s) (1 file(s) / 1024MB) virt: Laying out IO file(s) (1 file(s) / 1024MB) virt: (groupid=0, jobs=1): err= 0: pid=2195: Wed May 28 22:12:50 2014 read : io=523520KB, bw=2600.4KB/s, iops=40, runt=201329msec slat (usec): min=23, max=24586, avg=65.89, stdev=306.38 clat (usec): min=305, max=1765.7K, avg=84912.67, stdev=124153.30 lat (usec): min=367, max=1765.8K, avg=84979.16, stdev=124150.29 clat percentiles (usec): | 1.00th=[ 780], 5.00th=[ 6944], 10.00th=[ 9536], 20.00th=[14144], | 30.00th=[19840], 40.00th=[28032], 50.00th=[40704], 60.00th=[57600], | 70.00th=[82432], 80.00th=[125440], 90.00th=[209920], 95.00th=[309248], | 99.00th=[593920], 99.50th=[790528], 99.90th=[1204224], 99.95th=[1286144], | 99.99th=[1761280] bw (KB /s): min= 82, max=12416, per=25.85%, avg=2688.32, stdev=1545.40 write: io=525056KB, bw=2607.1KB/s, iops=40, runt=201329msec slat (usec): min=31, max=140675, avg=132.77, stdev=1945.34 clat (usec): min=346, max=1355.5K, avg=13280.27, stdev=57149.27 lat (usec): min=404, max=1355.6K, avg=13413.69, stdev=57202.63 clat percentiles (usec): | 1.00th=[ 358], 5.00th=[ 374], 10.00th=[ 434], 20.00th=[ 446], | 30.00th=[ 644], 40.00th=[ 852], 50.00th=[ 1272], 60.00th=[ 1320], | 70.00th=[ 1496], 80.00th=[ 5728], 90.00th=[18048], 95.00th=[63232], | 99.00th=[257024], 99.50th=[382976], 99.90th=[831488], 99.95th=[946176], | 99.99th=[1351680] bw (KB /s): min= 121, max=10709, per=25.96%, avg=2708.14, stdev=1769.64 lat (usec) : 500=12.91%, 750=6.04%, 1000=3.25% lat (msec) : 2=16.32%, 4=1.65%, 10=7.59%, 20=12.91%, 50=14.75% lat (msec) : 100=9.90%, 250=10.51%, 500=3.19%, 750=0.66%, 1000=0.20% lat (msec) : 2000=0.13% cpu : usr=0.11%, sys=0.54%, ctx=16504, majf=0, minf=23 IO depths : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=8180/w=8204/d=0, short=r=0/w=0/d=0 virt: (groupid=0, jobs=1): err= 0: pid=2196: Wed May 28 22:12:50 2014 read : io=523520KB, bw=2947.6KB/s, iops=46, runt=177615msec slat (usec): min=24, max=59936, avg=81.73, stdev=987.31 clat (usec): min=149, max=1054.1K, avg=74995.11, stdev=93418.23 lat (usec): min=369, max=1054.2K, avg=75077.47, stdev=93411.06 clat percentiles (msec): | 1.00th=[ 5], 5.00th=[ 8], 10.00th=[ 10], 20.00th=[ 16], | 30.00th=[ 22], 40.00th=[ 31], 50.00th=[ 42], 60.00th=[ 57], | 70.00th=[ 80], 80.00th=[ 116], 90.00th=[ 180], 95.00th=[ 260], | 99.00th=[ 437], 99.50th=[ 529], 99.90th=[ 840], 99.95th=[ 979], | 99.99th=[ 1057] bw (KB /s): min= 113, max= 6898, per=29.26%, avg=3043.36, stdev=1217.82 write: io=525056KB, bw=2956.2KB/s, iops=46, runt=177615msec slat (usec): min=33, max=140655, avg=128.77, stdev=2069.57 clat (usec): min=258, max=1000.6K, avg=11590.37, stdev=57029.08 lat (usec): min=403, max=1000.7K, avg=11719.76, stdev=57077.03 clat percentiles (usec): | 1.00th=[ 362], 5.00th=[ 378], 10.00th=[ 434], 20.00th=[ 446], | 30.00th=[ 612], 40.00th=[ 748], 50.00th=[ 1224], 60.00th=[ 1304], | 70.00th=[ 1352], 80.00th=[ 1528], 90.00th=[ 7776], 95.00th=[55040], | 99.00th=[244736], 99.50th=[362496], 99.90th=[913408], 99.95th=[929792], | 99.99th=[1003520] bw (KB /s): min= 140, max= 7409, per=29.16%, avg=3042.19, stdev=1466.35 lat (usec) : 250=0.01%, 500=13.49%, 750=6.57%, 1000=3.19% lat (msec) : 2=19.84%, 4=1.70%, 10=5.92%, 20=9.95%, 50=14.89% lat (msec) : 100=10.81%, 250=10.45%, 500=2.73%, 750=0.31%, 1000=0.13% lat (msec) : 2000=0.02% cpu : usr=0.14%, sys=0.59%, ctx=16858, majf=0, minf=23 IO depths : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=8180/w=8204/d=0, short=r=0/w=0/d=0 virt: (groupid=0, jobs=1): err= 0: pid=2197: Wed May 28 22:12:50 2014 read : io=523520KB, bw=2923.2KB/s, iops=45, runt=179092msec slat (usec): min=20, max=99838, avg=91.84, stdev=1411.97 clat (usec): min=160, max=1512.4K, avg=75755.06, stdev=105522.71 lat (usec): min=382, max=1512.9K, avg=75847.54, stdev=105514.14 clat percentiles (msec): | 1.00th=[ 5], 5.00th=[ 8], 10.00th=[ 10], 20.00th=[ 15], | 30.00th=[ 21], 40.00th=[ 29], 50.00th=[ 40], 60.00th=[ 56], | 70.00th=[ 76], 80.00th=[ 112], 90.00th=[ 186], 95.00th=[ 269], | 99.00th=[ 469], 99.50th=[ 586], 99.90th=[ 1156], 99.95th=[ 1287], | 99.99th=[ 1516] bw (KB /s): min= 124, max= 6144, per=29.37%, avg=3055.29, stdev=1223.87 write: io=525056KB, bw=2931.8KB/s, iops=45, runt=179092msec slat (usec): min=35, max=140660, avg=114.41, stdev=1768.12 clat (usec): min=345, max=1441.6K, avg=11547.93, stdev=62451.29 lat (usec): min=415, max=1441.7K, avg=11663.01, stdev=62476.14 clat percentiles (usec): | 1.00th=[ 362], 5.00th=[ 378], 10.00th=[ 434], 20.00th=[ 446], | 30.00th=[ 596], 40.00th=[ 756], 50.00th=[ 1224], 60.00th=[ 1304], | 70.00th=[ 1352], 80.00th=[ 1544], 90.00th=[ 8896], 95.00th=[37632], | 99.00th=[232448], 99.50th=[350208], 99.90th=[995328], 99.95th=[1044480], | 99.99th=[1433600] bw (KB /s): min= 80, max= 9325, per=29.37%, avg=3063.24, stdev=1532.25 lat (usec) : 250=0.01%, 500=13.56%, 750=6.50%, 1000=3.08% lat (msec) : 2=19.73%, 4=1.62%, 10=6.32%, 20=10.52%, 50=14.89% lat (msec) : 100=10.77%, 250=9.75%, 500=2.72%, 750=0.27%, 1000=0.13% lat (msec) : 2000=0.14% cpu : usr=0.14%, sys=0.59%, ctx=16985, majf=0, minf=23 IO depths : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=8180/w=8204/d=0, short=r=0/w=0/d=0 virt: (groupid=0, jobs=1): err= 0: pid=2198: Wed May 28 22:12:50 2014 read : io=523520KB, bw=2629.9KB/s, iops=41, runt=199069msec slat (usec): min=25, max=99063, avg=89.77, stdev=1365.24 clat (usec): min=112, max=1392.1K, avg=83373.43, stdev=118987.34 lat (usec): min=369, max=1392.1K, avg=83463.84, stdev=118977.09 clat percentiles (msec): | 1.00th=[ 3], 5.00th=[ 7], 10.00th=[ 10], 20.00th=[ 15], | 30.00th=[ 21], 40.00th=[ 28], 50.00th=[ 40], 60.00th=[ 57], | 70.00th=[ 81], 80.00th=[ 122], 90.00th=[ 206], 95.00th=[ 310], | 99.00th=[ 603], 99.50th=[ 734], 99.90th=[ 979], 99.95th=[ 1156], | 99.99th=[ 1401] bw (KB /s): min= 64, max= 9708, per=26.35%, avg=2740.70, stdev=1540.11 write: io=525056KB, bw=2637.6KB/s, iops=41, runt=199069msec slat (usec): min=38, max=140657, avg=121.47, stdev=1860.80 clat (usec): min=349, max=1002.9K, avg=13698.39, stdev=66153.66 lat (usec): min=405, max=1002.9K, avg=13820.49, stdev=66192.16 clat percentiles (usec): | 1.00th=[ 362], 5.00th=[ 378], 10.00th=[ 434], 20.00th=[ 446], | 30.00th=[ 652], 40.00th=[ 876], 50.00th=[ 1272], 60.00th=[ 1320], | 70.00th=[ 1448], 80.00th=[ 2992], 90.00th=[15552], 95.00th=[36096], | 99.00th=[321536], 99.50th=[489472], 99.90th=[962560], 99.95th=[995328], | 99.99th=[1003520] bw (KB /s): min= 71, max= 9836, per=26.41%, avg=2755.14, stdev=1757.17 lat (usec) : 250=0.02%, 500=12.84%, 750=5.83%, 1000=3.12% lat (msec) : 2=17.58%, 4=1.73%, 10=7.50%, 20=12.41%, 50=14.97% lat (msec) : 100=9.86%, 250=9.86%, 500=3.19%, 750=0.78%, 1000=0.25% lat (msec) : 2000=0.06% cpu : usr=0.12%, sys=0.53%, ctx=16540, majf=0, minf=22 IO depths : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=8180/w=8204/d=0, short=r=0/w=0/d=0 Run status group 0 (all jobs): READ: io=2045.0MB, aggrb=10401KB/s, minb=2600KB/s, maxb=2947KB/s, mint=177615msec, maxt=201329msec WRITE: io=2051.0MB, aggrb=10431KB/s, minb=2607KB/s, maxb=2956KB/s, mint=177615msec, maxt=201329msec Disk stats (read/write): dm-0: ios=32841/33299, merge=0/0, ticks=2623746/506809, in_queue=3130698, util=100.00%, aggrios=32855/33392, aggrmerge=0/0, aggrticks=0/0, aggrin_queue=0, aggrutil=0.00% md127: ios=32855/33392, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=16426/33225, aggrmerge=1/168, aggrticks=1311820/306619, aggrin_queue=1618332, aggrutil=98.91% sda: ios=8494/33223, merge=0/171, ticks=464540/232964, in_queue=697442, util=96.18% sdb: ios=24359/33228, merge=2/166, ticks=2159100/380274, in_queue=2539222, util=98.91% [-- Attachment #3: virt-ham0-lvmcache.txt --] [-- Type: text/plain, Size: 9798 bytes --] virt: (g=0): rw=randrw, bs=64K-64K/64K-64K/64K-64K, ioengine=libaio, iodepth=4 ... virt: (g=0): rw=randrw, bs=64K-64K/64K-64K/64K-64K, ioengine=libaio, iodepth=4 fio-2.1.2 Starting 4 processes virt: Laying out IO file(s) (1 file(s) / 1024MB) virt: Laying out IO file(s) (1 file(s) / 1024MB) virt: Laying out IO file(s) (1 file(s) / 1024MB) virt: Laying out IO file(s) (1 file(s) / 1024MB) virt: (groupid=0, jobs=1): err= 0: pid=1643: Thu May 29 14:44:39 2014 read : io=523520KB, bw=2721.8KB/s, iops=42, runt=192348msec slat (usec): min=36, max=11159, avg=79.21, stdev=140.35 clat (usec): min=305, max=1346.1K, avg=82931.26, stdev=118358.55 lat (usec): min=383, max=1347.8K, avg=83011.13, stdev=118357.33 clat percentiles (usec): | 1.00th=[ 556], 5.00th=[ 6368], 10.00th=[ 8896], 20.00th=[13632], | 30.00th=[18816], 40.00th=[26496], 50.00th=[38144], 60.00th=[55552], | 70.00th=[81408], 80.00th=[125440], 90.00th=[211968], 95.00th=[313344], | 99.00th=[561152], 99.50th=[708608], 99.90th=[1056768], 99.95th=[1138688], | 99.99th=[1351680] bw (KB /s): min= 64, max=13714, per=25.99%, avg=2828.82, stdev=1614.75 write: io=525056KB, bw=2729.8KB/s, iops=42, runt=192348msec slat (usec): min=40, max=120228, avg=113.58, stdev=1327.58 clat (usec): min=345, max=1401.6K, avg=10882.68, stdev=66932.84 lat (usec): min=428, max=1401.7K, avg=10996.89, stdev=66947.30 clat percentiles (usec): | 1.00th=[ 358], 5.00th=[ 366], 10.00th=[ 386], 20.00th=[ 442], | 30.00th=[ 612], 40.00th=[ 812], 50.00th=[ 1240], 60.00th=[ 1304], | 70.00th=[ 1448], 80.00th=[ 2960], 90.00th=[12736], 95.00th=[23168], | 99.00th=[259072], 99.50th=[403456], 99.90th=[995328], 99.95th=[1286144], | 99.99th=[1400832] bw (KB /s): min= 105, max=13079, per=26.20%, avg=2860.76, stdev=1798.77 lat (usec) : 500=13.62%, 750=6.18%, 1000=2.75% lat (msec) : 2=17.00%, 4=1.83%, 10=8.55%, 20=12.82%, 50=14.40% lat (msec) : 100=9.28%, 250=9.26%, 500=3.45%, 750=0.50%, 1000=0.23% lat (msec) : 2000=0.13% cpu : usr=0.13%, sys=0.69%, ctx=16575, majf=0, minf=23 IO depths : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=8180/w=8204/d=0, short=r=0/w=0/d=0 virt: (groupid=0, jobs=1): err= 0: pid=1644: Thu May 29 14:44:39 2014 read : io=523520KB, bw=3056.4KB/s, iops=47, runt=171287msec slat (usec): min=34, max=166, avg=77.72, stdev=12.41 clat (usec): min=314, max=1179.1K, avg=76362.24, stdev=95270.37 lat (usec): min=386, max=1180.8K, avg=76440.60, stdev=95269.97 clat percentiles (msec): | 1.00th=[ 5], 5.00th=[ 8], 10.00th=[ 10], 20.00th=[ 16], | 30.00th=[ 23], 40.00th=[ 31], 50.00th=[ 42], 60.00th=[ 58], | 70.00th=[ 80], 80.00th=[ 118], 90.00th=[ 186], 95.00th=[ 262], | 99.00th=[ 457], 99.50th=[ 570], 99.90th=[ 791], 99.95th=[ 906], | 99.99th=[ 1188] bw (KB /s): min= 237, max= 6259, per=28.43%, avg=3094.94, stdev=1094.43 write: io=525056KB, bw=3065.4KB/s, iops=47, runt=171287msec slat (usec): min=47, max=120139, avg=115.02, stdev=1329.52 clat (usec): min=343, max=958790, avg=7162.31, stdev=39237.72 lat (usec): min=422, max=958895, avg=7277.98, stdev=39265.48 clat percentiles (usec): | 1.00th=[ 358], 5.00th=[ 370], 10.00th=[ 386], 20.00th=[ 442], | 30.00th=[ 588], 40.00th=[ 740], 50.00th=[ 1192], 60.00th=[ 1288], | 70.00th=[ 1320], 80.00th=[ 1496], 90.00th=[ 3216], 95.00th=[15552], | 99.00th=[183296], 99.50th=[301056], 99.90th=[514048], 99.95th=[610304], | 99.99th=[962560] bw (KB /s): min= 100, max= 7418, per=28.37%, avg=3097.42, stdev=1395.25 lat (usec) : 500=13.75%, 750=6.58%, 1000=3.16% lat (msec) : 2=20.43%, 4=1.64%, 10=6.06%, 20=9.94%, 50=14.88% lat (msec) : 100=10.52%, 250=9.92%, 500=2.67%, 750=0.35%, 1000=0.06% lat (msec) : 2000=0.01% cpu : usr=0.14%, sys=0.79%, ctx=16933, majf=0, minf=23 IO depths : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=8180/w=8204/d=0, short=r=0/w=0/d=0 virt: (groupid=0, jobs=1): err= 0: pid=1645: Thu May 29 14:44:39 2014 read : io=523520KB, bw=3097.5KB/s, iops=48, runt=169019msec slat (usec): min=36, max=141, avg=77.24, stdev=12.48 clat (usec): min=311, max=1577.6K, avg=74149.52, stdev=97313.65 lat (usec): min=376, max=1577.6K, avg=74227.41, stdev=97313.32 clat percentiles (msec): | 1.00th=[ 5], 5.00th=[ 8], 10.00th=[ 10], 20.00th=[ 15], | 30.00th=[ 22], 40.00th=[ 30], 50.00th=[ 42], 60.00th=[ 56], | 70.00th=[ 77], 80.00th=[ 112], 90.00th=[ 176], 95.00th=[ 251], | 99.00th=[ 453], 99.50th=[ 578], 99.90th=[ 947], 99.95th=[ 1254], | 99.99th=[ 1582] bw (KB /s): min= 62, max= 7492, per=29.37%, avg=3197.36, stdev=1186.93 write: io=525056KB, bw=3106.6KB/s, iops=48, runt=169019msec slat (usec): min=47, max=120168, avg=112.54, stdev=1325.71 clat (usec): min=335, max=1474.2K, avg=8254.87, stdev=57083.94 lat (usec): min=416, max=1474.3K, avg=8368.05, stdev=57098.21 clat percentiles (usec): | 1.00th=[ 358], 5.00th=[ 366], 10.00th=[ 386], 20.00th=[ 442], | 30.00th=[ 564], 40.00th=[ 724], 50.00th=[ 1176], 60.00th=[ 1272], | 70.00th=[ 1320], 80.00th=[ 1464], 90.00th=[ 2224], 95.00th=[13504], | 99.00th=[185344], 99.50th=[321536], 99.90th=[1019904], 99.95th=[1073152], | 99.99th=[1466368] bw (KB /s): min= 109, max= 8172, per=29.43%, avg=3213.21, stdev=1535.62 lat (usec) : 500=14.15%, 750=6.58%, 1000=3.03% lat (msec) : 2=20.82%, 4=1.48%, 10=6.24%, 20=9.75%, 50=14.67% lat (msec) : 100=10.94%, 250=9.46%, 500=2.37%, 750=0.34%, 1000=0.07% lat (msec) : 2000=0.10% cpu : usr=0.13%, sys=0.81%, ctx=16936, majf=0, minf=23 IO depths : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=8180/w=8204/d=0, short=r=0/w=0/d=0 virt: (groupid=0, jobs=1): err= 0: pid=1646: Thu May 29 14:44:39 2014 read : io=523520KB, bw=2773.6KB/s, iops=43, runt=188788msec slat (usec): min=23, max=2618, avg=77.53, stdev=31.29 clat (usec): min=309, max=1755.5K, avg=82247.61, stdev=119257.63 lat (usec): min=356, max=1755.6K, avg=82325.78, stdev=119257.35 clat percentiles (msec): | 1.00th=[ 4], 5.00th=[ 7], 10.00th=[ 10], 20.00th=[ 14], | 30.00th=[ 20], 40.00th=[ 27], 50.00th=[ 38], 60.00th=[ 55], | 70.00th=[ 81], 80.00th=[ 122], 90.00th=[ 212], 95.00th=[ 306], | 99.00th=[ 578], 99.50th=[ 750], 99.90th=[ 988], 99.95th=[ 1106], | 99.99th=[ 1762] bw (KB /s): min= 122, max= 9325, per=26.30%, avg=2863.03, stdev=1442.98 write: io=525056KB, bw=2781.2KB/s, iops=43, runt=188788msec slat (usec): min=44, max=120232, avg=112.99, stdev=1326.51 clat (usec): min=346, max=1033.4K, avg=9830.80, stdev=52064.46 lat (usec): min=421, max=1033.5K, avg=9944.42, stdev=52084.32 clat percentiles (usec): | 1.00th=[ 362], 5.00th=[ 370], 10.00th=[ 386], 20.00th=[ 446], | 30.00th=[ 588], 40.00th=[ 788], 50.00th=[ 1240], 60.00th=[ 1304], | 70.00th=[ 1416], 80.00th=[ 1976], 90.00th=[12736], 95.00th=[23424], | 99.00th=[244736], 99.50th=[333824], 99.90th=[937984], 99.95th=[954368], | 99.99th=[1036288] bw (KB /s): min= 100, max= 8694, per=26.24%, avg=2865.37, stdev=1678.69 lat (usec) : 500=13.65%, 750=6.05%, 1000=2.78% lat (msec) : 2=18.05%, 4=1.64%, 10=7.68%, 20=12.79%, 50=14.66% lat (msec) : 100=9.39%, 250=9.17%, 500=3.31%, 750=0.51%, 1000=0.24% lat (msec) : 2000=0.07% cpu : usr=0.13%, sys=0.70%, ctx=16629, majf=0, minf=22 IO depths : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=8180/w=8204/d=0, short=r=0/w=0/d=0 Run status group 0 (all jobs): READ: io=2045.0MB, aggrb=10886KB/s, minb=2721KB/s, maxb=3097KB/s, mint=169019msec, maxt=192348msec WRITE: io=2051.0MB, aggrb=10918KB/s, minb=2729KB/s, maxb=3106KB/s, mint=169019msec, maxt=192348msec Disk stats (read/write): dm-3: ios=32687/32957, merge=0/0, ticks=2580936/388295, in_queue=2969480, util=100.00%, aggrios=10910/11025, aggrmerge=0/0, aggrticks=860321/160031, aggrin_queue=1020354, aggrutil=100.00% dm-0: ios=5/44, merge=0/0, ticks=10/85, in_queue=95, util=0.05%, aggrios=11/50, aggrmerge=0/2, aggrticks=11/94, aggrin_queue=105, aggrutil=0.05% sdc: ios=11/50, merge=0/2, ticks=11/94, in_queue=105, util=0.05% dm-1: ios=6/8, merge=0/0, ticks=1/9, in_queue=10, util=0.01% dm-2: ios=32721/33023, merge=0/0, ticks=2580952/479999, in_queue=3060959, util=100.00%, aggrios=32721/33023, aggrmerge=0/0, aggrticks=0/0, aggrin_queue=0, aggrutil=0.00% md127: ios=32721/33023, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=16360/32937, aggrmerge=0/87, aggrticks=1290283/255548, aggrin_queue=1545711, aggrutil=99.71% sda: ios=8101/32937, merge=1/88, ticks=402681/125603, in_queue=528232, util=96.21% sdb: ios=24619/32938, merge=0/87, ticks=2177886/385493, in_queue=2563190, util=99.71% ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [linux-lvm] Testing the new LVM cache feature 2014-05-29 13:52 ` Richard W.M. Jones @ 2014-05-29 20:34 ` Mike Snitzer 2014-05-29 20:47 ` Richard W.M. Jones 0 siblings, 1 reply; 37+ messages in thread From: Mike Snitzer @ 2014-05-29 20:34 UTC (permalink / raw) To: Richard W.M. Jones Cc: LVM general discussion and development, thornber, Zdenek Kabelac On Thu, May 29 2014 at 9:52am -0400, Richard W.M. Jones <rjones@redhat.com> wrote: > I've done some more testing, comparing RAID 1 HDD with RAID 1 HDD + an > SSD overlay (using lvm-cache). > > I'm now using 'fio', with the following job file: > > [virt] > ioengine=libaio > iodepth=4 > rw=randrw > bs=64k > direct=1 > size=1g > numjobs=4 randrw isn't giving you increased hits to the same blocks. fio does have random_distribution controls (zipf and pareto) that are more favorable for testing cache replacement policies (jens said that testing caching algorithms is what motivated him to develop these in fio). > I'm still seeing almost no benefit from LVM cache. It's about 4% > faster than the underlying, slow HDDs. See attached runs. > > The SSD LV is 200 GB and the underlying LV is 800 GB, so I would > expect there is plenty of space to cache things in the SSD during the > test. > > For comparison, the fio tests runs about 11 times faster on the SSD. > > Any ideas? Try using : dmsetup message <cache device> 0 write_promote_adjustment 0 Also, if you discard the entire cache device (e.g. using blkdiscard) before use you could get a big win, especially if you use: dmsetup message <cache device> 0 discard_promote_adjustment 0 Documentation/device-mapper/cache-policies.txt says: Internally the mq policy maintains a promotion threshold variable. If the hit count of a block not in the cache goes above this threshold it gets promoted to the cache. The read, write and discard promote adjustment tunables allow you to tweak the promotion threshold by adding a small value based on the io type. They default to 4, 8 and 1 respectively. If you're trying to quickly warm a new cache device you may wish to reduce these to encourage promotion. Remember to switch them back to their defaults after the cache fills though. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [linux-lvm] Testing the new LVM cache feature 2014-05-29 20:34 ` Mike Snitzer @ 2014-05-29 20:47 ` Richard W.M. Jones 2014-05-29 21:06 ` Mike Snitzer 2014-05-30 11:38 ` Alasdair G Kergon 0 siblings, 2 replies; 37+ messages in thread From: Richard W.M. Jones @ 2014-05-29 20:47 UTC (permalink / raw) To: Mike Snitzer Cc: LVM general discussion and development, thornber, Zdenek Kabelac On Thu, May 29, 2014 at 04:34:10PM -0400, Mike Snitzer wrote: > Try using : > dmsetup message <cache device> 0 write_promote_adjustment 0 > > Documentation/device-mapper/cache-policies.txt says: > > Internally the mq policy maintains a promotion threshold variable. If > the hit count of a block not in the cache goes above this threshold it > gets promoted to the cache. The read, write and discard promote adjustment > tunables allow you to tweak the promotion threshold by adding a small > value based on the io type. They default to 4, 8 and 1 respectively. > If you're trying to quickly warm a new cache device you may wish to > reduce these to encourage promotion. Remember to switch them back to > their defaults after the cache fills though. What would be bad about leaving write_promote_adjustment set at 0 or 1? Wouldn't that mean that I get a simple LRU policy? (That's probably what I want.) > Also, if you discard the entire cache device (e.g. using blkdiscard) > before use you could get a big win, especially if you use: > dmsetup message <cache device> 0 discard_promote_adjustment 0 To be clear, that means I should do: lvcreate -L 1G -n lv_cache_meta vg_guests /dev/fast lvcreate -L 229G -n lv_cache vg_guests /dev/fast lvconvert --type cache-pool --poolmetadata vg_guests/lv_cache_meta vg_guests/lv_cache blkdiscard /dev/vg_guests/lv_cache lvconvert --type cache --cachepool vg_guests/lv_cache vg_guests/testoriginlv Or should I do the blkdiscard earlier? [On the separate subject of volume groups ...] Is there a reason why fast and slow devices need to be in the same VG? I've talked to two other people who found this very confusing. No one knew that you could manually place LVs into different PVs, and it's something of a pain to have to remember to place LVs every time you create or resize one. It seems it would be a lot simpler if you could have the slow PVs in one VG and the fast PVs in another VG. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-p2v converts physical machines to virtual machines. Boot with a live CD or over the network (PXE) and turn machines into KVM guests. http://libguestfs.org/virt-v2v ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [linux-lvm] Testing the new LVM cache feature 2014-05-29 20:47 ` Richard W.M. Jones @ 2014-05-29 21:06 ` Mike Snitzer 2014-05-29 21:19 ` Richard W.M. Jones 2014-05-30 11:38 ` Alasdair G Kergon 1 sibling, 1 reply; 37+ messages in thread From: Mike Snitzer @ 2014-05-29 21:06 UTC (permalink / raw) To: Richard W.M. Jones Cc: LVM general discussion and development, thornber, Zdenek Kabelac On Thu, May 29 2014 at 4:47pm -0400, Richard W.M. Jones <rjones@redhat.com> wrote: > On Thu, May 29, 2014 at 04:34:10PM -0400, Mike Snitzer wrote: > > Try using : > > dmsetup message <cache device> 0 write_promote_adjustment 0 > > > > Documentation/device-mapper/cache-policies.txt says: > > > > Internally the mq policy maintains a promotion threshold variable. If > > the hit count of a block not in the cache goes above this threshold it > > gets promoted to the cache. The read, write and discard promote adjustment > > tunables allow you to tweak the promotion threshold by adding a small > > value based on the io type. They default to 4, 8 and 1 respectively. > > If you're trying to quickly warm a new cache device you may wish to > > reduce these to encourage promotion. Remember to switch them back to > > their defaults after the cache fills though. > > What would be bad about leaving write_promote_adjustment set at 0 or 1? > > Wouldn't that mean that I get a simple LRU policy? (That's probably > what I want.) Leaving them at 0 could result in cache thrashing. But given how large your SSD is in relation to the origin you'd likely be OK for a while (at least until your cache gets quite full). > > Also, if you discard the entire cache device (e.g. using blkdiscard) > > before use you could get a big win, especially if you use: > > dmsetup message <cache device> 0 discard_promote_adjustment 0 > > To be clear, that means I should do: > > lvcreate -L 1G -n lv_cache_meta vg_guests /dev/fast > lvcreate -L 229G -n lv_cache vg_guests /dev/fast > lvconvert --type cache-pool --poolmetadata vg_guests/lv_cache_meta vg_guests/lv_cache > blkdiscard /dev/vg_guests/lv_cache > lvconvert --type cache --cachepool vg_guests/lv_cache vg_guests/testoriginlv > > Or should I do the blkdiscard earlier? You want to discard the cached device before you run fio against it. I'm not completely sure what cache-pool vs cache is. But it looks like you'd want to run the discard against the /dev/vg_guests/testoriginlv (assuming it was converted to use the 'cache' DM target, 'dmsetup table vg_guests-testoriginlv' should confirm as much). > [On the separate subject of volume groups ...] > > Is there a reason why fast and slow devices need to be in the same VG? > > I've talked to two other people who found this very confusing. No one > knew that you could manually place LVs into different PVs, and it's > something of a pain to have to remember to place LVs every time you > create or resize one. It seems it would be a lot simpler if you could > have the slow PVs in one VG and the fast PVs in another VG. I cannot answer the lvm details. Best to ask Jon Brassow or Zdenek (hopefully they'll respond) ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [linux-lvm] Testing the new LVM cache feature 2014-05-29 21:06 ` Mike Snitzer @ 2014-05-29 21:19 ` Richard W.M. Jones 2014-05-29 21:58 ` Mike Snitzer 0 siblings, 1 reply; 37+ messages in thread From: Richard W.M. Jones @ 2014-05-29 21:19 UTC (permalink / raw) To: Mike Snitzer Cc: LVM general discussion and development, thornber, Zdenek Kabelac On Thu, May 29, 2014 at 05:06:48PM -0400, Mike Snitzer wrote: > On Thu, May 29 2014 at 4:47pm -0400, > Richard W.M. Jones <rjones@redhat.com> wrote: > > To be clear, that means I should do: > > > > lvcreate -L 1G -n lv_cache_meta vg_guests /dev/fast > > lvcreate -L 229G -n lv_cache vg_guests /dev/fast > > lvconvert --type cache-pool --poolmetadata vg_guests/lv_cache_meta vg_guests/lv_cache > > blkdiscard /dev/vg_guests/lv_cache > > lvconvert --type cache --cachepool vg_guests/lv_cache vg_guests/testoriginlv > > > > Or should I do the blkdiscard earlier? > > You want to discard the cached device before you run fio against it. > I'm not completely sure what cache-pool vs cache is. But it looks like > you'd want to run the discard against the /dev/vg_guests/testoriginlv > (assuming it was converted to use the 'cache' DM target, 'dmsetup table > vg_guests-testoriginlv' should confirm as much). I'm concerned that would delete all the data on the origin LV ... My origin LV now has a slightly different name. Here are the device-mapper tables: $ sudo dmsetup table vg_guests-lv_cache_cdata: 0 419430400 linear 8:33 2099200 vg_guests-lv_cache_cmeta: 0 2097152 linear 8:33 2048 vg_guests-home: 0 209715200 linear 9:127 2048 vg_guests-libvirt--images: 0 1677721600 cache 253:1 253:0 253:2 128 0 default 0 vg_guests-libvirt--images_corig: 0 1677721600 linear 9:127 2055211008 So it does look as if my origin LV (vg_guests/libvirt-images) does use the 'cache' target. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-df lists disk usage of guests without needing to install any software inside the virtual machine. Supports Linux and Windows. http://people.redhat.com/~rjones/virt-df/ ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [linux-lvm] Testing the new LVM cache feature 2014-05-29 21:19 ` Richard W.M. Jones @ 2014-05-29 21:58 ` Mike Snitzer 2014-05-30 9:04 ` Richard W.M. Jones 2014-05-30 11:53 ` Mike Snitzer 0 siblings, 2 replies; 37+ messages in thread From: Mike Snitzer @ 2014-05-29 21:58 UTC (permalink / raw) To: Richard W.M. Jones Cc: LVM general discussion and development, thornber, Zdenek Kabelac On Thu, May 29 2014 at 5:19pm -0400, Richard W.M. Jones <rjones@redhat.com> wrote: > On Thu, May 29, 2014 at 05:06:48PM -0400, Mike Snitzer wrote: > > On Thu, May 29 2014 at 4:47pm -0400, > > Richard W.M. Jones <rjones@redhat.com> wrote: > > > To be clear, that means I should do: > > > > > > lvcreate -L 1G -n lv_cache_meta vg_guests /dev/fast > > > lvcreate -L 229G -n lv_cache vg_guests /dev/fast > > > lvconvert --type cache-pool --poolmetadata vg_guests/lv_cache_meta vg_guests/lv_cache > > > blkdiscard /dev/vg_guests/lv_cache > > > lvconvert --type cache --cachepool vg_guests/lv_cache vg_guests/testoriginlv > > > > > > Or should I do the blkdiscard earlier? > > > > You want to discard the cached device before you run fio against it. > > I'm not completely sure what cache-pool vs cache is. But it looks like > > you'd want to run the discard against the /dev/vg_guests/testoriginlv > > (assuming it was converted to use the 'cache' DM target, 'dmsetup table > > vg_guests-testoriginlv' should confirm as much). > > I'm concerned that would delete all the data on the origin LV ... OK, but how are you testing with fio at this point? Doesn't that destroy data too? The cache target doesn't have passdown support. So none of your data would be discarded directly, but it could eat data as a side-effect of the cache bypassing promotion from the origin (because it thinks the origin's blocks were discarded). But on writeback you'd lose data. So you raise a valid point: if you're adding a cache in front of a volume with existing data you'll want to avoid discarding the logical address space that contains data you want to keep. Do you have a filesystem on the libvirt-images volume? If so, would be enough to run fstrim against /dev/vg_guests/libvirt-images BTW, this is all with a eye toward realizing the optimization that dm-cache provides for origin blocks that were discarded (like I said before dm-cache doesn't promote from the origin if the corresponding block was marked for discard). So you don't _need_ to do any of this.. purely about trying to optimize a bit more. > My origin LV now has a slightly different name. Here are the > device-mapper tables: > > $ sudo dmsetup table > vg_guests-lv_cache_cdata: 0 419430400 linear 8:33 2099200 > vg_guests-lv_cache_cmeta: 0 2097152 linear 8:33 2048 > vg_guests-home: 0 209715200 linear 9:127 2048 > vg_guests-libvirt--images: 0 1677721600 cache 253:1 253:0 253:2 128 0 default 0 > vg_guests-libvirt--images_corig: 0 1677721600 linear 9:127 2055211008 > > So it does look as if my origin LV (vg_guests/libvirt-images) does use > the 'cache' target. Yeap. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [linux-lvm] Testing the new LVM cache feature 2014-05-29 21:58 ` Mike Snitzer @ 2014-05-30 9:04 ` Richard W.M. Jones 2014-05-30 10:30 ` Richard W.M. Jones 2014-05-30 13:38 ` Mike Snitzer 2014-05-30 11:53 ` Mike Snitzer 1 sibling, 2 replies; 37+ messages in thread From: Richard W.M. Jones @ 2014-05-30 9:04 UTC (permalink / raw) To: LVM general discussion and development; +Cc: thornber, Zdenek Kabelac On Thu, May 29, 2014 at 05:58:15PM -0400, Mike Snitzer wrote: > On Thu, May 29 2014 at 5:19pm -0400, Richard W.M. Jones <rjones@redhat.com> wrote: > > I'm concerned that would delete all the data on the origin LV ... > > OK, but how are you testing with fio at this point? Doesn't that > destroy data too? I'm testing with files. This matches my final configuration which is to use qcow2 files on an ext4 filesystem to store the VM disk images. I set read_promote_adjustment == write_promote_adjustment == 1 and ran fio 6 times, reusing the same test files. It is faster than HDD (slower layer), but still much slower than the SSD (fast layer). Across the fio runs it's about 5 times slower than the SSD, and the times don't improve at all over the runs. (It is more than twice as fast as the HDD though). Somehow something is not working as I expected. Back to an earlier point. I wrote and you replied: > > What would be bad about leaving write_promote_adjustment set at 0 or 1? > > Wouldn't that mean that I get a simple LRU policy? (That's probably > > what I want.) > > Leaving them at 0 could result in cache thrashing. But given how > large your SSD is in relation to the origin you'd likely be OK for a > while (at least until your cache gets quite full). My SSD is ~200 GB and the backing origin LV is ~800 GB. It is unlikely the working set will ever grow > 200 GB, not least because I cannot run that many VMs at the same time on the cluster. So should I be concerned about cache thrashing? Specifically: If the cache layer gets full, then it will send the least recently used blocks back to the slow layer, right? (It seems obvious, but I'd like to check that) Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-builder quickly builds VMs from scratch http://libguestfs.org/virt-builder.1.html ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [linux-lvm] Testing the new LVM cache feature 2014-05-30 9:04 ` Richard W.M. Jones @ 2014-05-30 10:30 ` Richard W.M. Jones 2014-05-30 13:38 ` Mike Snitzer 1 sibling, 0 replies; 37+ messages in thread From: Richard W.M. Jones @ 2014-05-30 10:30 UTC (permalink / raw) To: LVM general discussion and development; +Cc: thornber, Zdenek Kabelac On Fri, May 30, 2014 at 10:04:22AM +0100, Richard W.M. Jones wrote: > On Thu, May 29, 2014 at 05:58:15PM -0400, Mike Snitzer wrote: > > On Thu, May 29 2014 at 5:19pm -0400, Richard W.M. Jones <rjones@redhat.com> wrote: > > > I'm concerned that would delete all the data on the origin LV ... > > > > OK, but how are you testing with fio at this point? Doesn't that > > destroy data too? > > I'm testing with files. This matches my final configuration which is > to use qcow2 files on an ext4 filesystem to store the VM disk images. > > I set read_promote_adjustment == write_promote_adjustment == 1 and ran > fio 6 times, reusing the same test files. > > It is faster than HDD (slower layer), but still much slower than the > SSD (fast layer). Across the fio runs it's about 5 times slower than > the SSD, and the times don't improve at all over the runs. (It is > more than twice as fast as the HDD though). > > Somehow something is not working as I expected. Additionally, I ran this command 5 times: md5sum virt.* # the test files and then reran the fio test. Since I have read_promote_adjustment == 1, I would expect that these files should be promoted to the fast layer by reading them several times. However the results are still the same. It's about twice as fast as the HDDs, but 5 times slower than with the SDD. Are there additional diagnostic commands I can use? Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-top is 'top' for virtual machines. Tiny program with many powerful monitoring features, net stats, disk stats, logging, etc. http://people.redhat.com/~rjones/virt-top ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [linux-lvm] Testing the new LVM cache feature 2014-05-30 9:04 ` Richard W.M. Jones 2014-05-30 10:30 ` Richard W.M. Jones @ 2014-05-30 13:38 ` Mike Snitzer 2014-05-30 13:40 ` Richard W.M. Jones ` (2 more replies) 1 sibling, 3 replies; 37+ messages in thread From: Mike Snitzer @ 2014-05-30 13:38 UTC (permalink / raw) To: Richard W.M. Jones Cc: Heinz Mauelshagen, Zdenek Kabelac, thornber, LVM general discussion and development On Fri, May 30 2014 at 5:04am -0400, Richard W.M. Jones <rjones@redhat.com> wrote: > On Thu, May 29, 2014 at 05:58:15PM -0400, Mike Snitzer wrote: > > On Thu, May 29 2014 at 5:19pm -0400, Richard W.M. Jones <rjones@redhat.com> wrote: > > > I'm concerned that would delete all the data on the origin LV ... > > > > OK, but how are you testing with fio at this point? Doesn't that > > destroy data too? > > I'm testing with files. This matches my final configuration which is > to use qcow2 files on an ext4 filesystem to store the VM disk images. > > I set read_promote_adjustment == write_promote_adjustment == 1 and ran > fio 6 times, reusing the same test files. > > It is faster than HDD (slower layer), but still much slower than the > SSD (fast layer). Across the fio runs it's about 5 times slower than > the SSD, and the times don't improve at all over the runs. (It is > more than twice as fast as the HDD though). > > Somehow something is not working as I expected. Why are you setting {read,write}_promote_adjustment to 1? I asked you to set write_promote_adjustment to 0. Your random fio job won't hit the same blocks, and md5sum likely uses buffered IO so unless you set 0 for both the cache won't aggressively cache like you're expecting. I explained earlier in this thread that the dm-cache is currently a "hotspot cache". Not a pure writeback cache like you're hoping. We're working to make it fit your expectations (you aren't alone in expecting more performance!) > Back to an earlier point. I wrote and you replied: > > > > What would be bad about leaving write_promote_adjustment set at 0 or 1? > > > Wouldn't that mean that I get a simple LRU policy? (That's probably > > > what I want.) > > > > Leaving them at 0 could result in cache thrashing. But given how > > large your SSD is in relation to the origin you'd likely be OK for a > > while (at least until your cache gets quite full). > > My SSD is ~200 GB and the backing origin LV is ~800 GB. It is > unlikely the working set will ever grow > 200 GB, not least because I > cannot run that many VMs at the same time on the cluster. > > So should I be concerned about cache thrashing? Specifically: If the > cache layer gets full, then it will send the least recently used > blocks back to the slow layer, right? (It seems obvious, but I'd like > to check that) Right, you should be fine. But I'll defer to Heinz on more particulars about the cache replacement strategy that is provided in this case for the "mq" (aka multi-queue policy). ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [linux-lvm] Testing the new LVM cache feature 2014-05-30 13:38 ` Mike Snitzer @ 2014-05-30 13:40 ` Richard W.M. Jones 2014-05-30 13:42 ` Heinz Mauelshagen 2014-05-30 13:46 ` Richard W.M. Jones 2 siblings, 0 replies; 37+ messages in thread From: Richard W.M. Jones @ 2014-05-30 13:40 UTC (permalink / raw) To: Mike Snitzer Cc: Heinz Mauelshagen, Zdenek Kabelac, thornber, LVM general discussion and development On Fri, May 30, 2014 at 09:38:14AM -0400, Mike Snitzer wrote: > Why are you setting {read,write}_promote_adjustment to 1? I asked you > to set write_promote_adjustment to 0. I didn't realize there would be (much) difference. However I will certainly try it with write_promote_adjustment == 0. > Your random fio job won't hit the same blocks, and md5sum likely uses > buffered IO so unless you set 0 for both the cache won't aggressively > cache like you're expecting. Right, that was definitely a mistake! I will drop_caches between each md5sum operation. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-df lists disk usage of guests without needing to install any software inside the virtual machine. Supports Linux and Windows. http://people.redhat.com/~rjones/virt-df/ ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [linux-lvm] Testing the new LVM cache feature 2014-05-30 13:38 ` Mike Snitzer 2014-05-30 13:40 ` Richard W.M. Jones @ 2014-05-30 13:42 ` Heinz Mauelshagen 2014-05-30 13:54 ` Richard W.M. Jones 2014-05-30 13:46 ` Richard W.M. Jones 2 siblings, 1 reply; 37+ messages in thread From: Heinz Mauelshagen @ 2014-05-30 13:42 UTC (permalink / raw) To: Mike Snitzer, Richard W.M. Jones Cc: Zdenek Kabelac, thornber, LVM general discussion and development On 05/30/2014 03:38 PM, Mike Snitzer wrote: > On Fri, May 30 2014 at 5:04am -0400, > Richard W.M. Jones <rjones@redhat.com> wrote: > >> On Thu, May 29, 2014 at 05:58:15PM -0400, Mike Snitzer wrote: >>> On Thu, May 29 2014 at 5:19pm -0400, Richard W.M. Jones <rjones@redhat.com> wrote: >>>> I'm concerned that would delete all the data on the origin LV ... >>> OK, but how are you testing with fio at this point? Doesn't that >>> destroy data too? >> I'm testing with files. This matches my final configuration which is >> to use qcow2 files on an ext4 filesystem to store the VM disk images. >> >> I set read_promote_adjustment == write_promote_adjustment == 1 and ran >> fio 6 times, reusing the same test files. >> >> It is faster than HDD (slower layer), but still much slower than the >> SSD (fast layer). Across the fio runs it's about 5 times slower than >> the SSD, and the times don't improve at all over the runs. (It is >> more than twice as fast as the HDD though). >> >> Somehow something is not working as I expected. > Why are you setting {read,write}_promote_adjustment to 1? I asked you > to set write_promote_adjustment to 0. > > Your random fio job won't hit the same blocks, and md5sum likely uses > buffered IO so unless you set 0 for both the cache won't aggressively > cache like you're expecting. > > I explained earlier in this thread that the dm-cache is currently a > "hotspot cache". Not a pure writeback cache like you're hoping. We're > working to make it fit your expectations (you aren't alone in expecting > more performance!) > >> Back to an earlier point. I wrote and you replied: >> >>>> What would be bad about leaving write_promote_adjustment set at 0 or 1? >>>> Wouldn't that mean that I get a simple LRU policy? (That's probably >>>> what I want.) >>> Leaving them at 0 could result in cache thrashing. But given how >>> large your SSD is in relation to the origin you'd likely be OK for a >>> while (at least until your cache gets quite full). >> My SSD is ~200 GB and the backing origin LV is ~800 GB. It is >> unlikely the working set will ever grow > 200 GB, not least because I >> cannot run that many VMs at the same time on the cluster. >> >> So should I be concerned about cache thrashing? Specifically: If the >> cache layer gets full, then it will send the least recently used >> blocks back to the slow layer, right? (It seems obvious, but I'd like >> to check that) > Right, you should be fine. But I'll defer to Heinz on more particulars > about the cache replacement strategy that is provided in this case for > the "mq" (aka multi-queue policy). If you ask for immediate promotion, you get immediate promotion if the cache gets overcommited. Of course you can tweak the promotion adjustments after warming the cache in order to reduce any thrashing Heinz ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [linux-lvm] Testing the new LVM cache feature 2014-05-30 13:42 ` Heinz Mauelshagen @ 2014-05-30 13:54 ` Richard W.M. Jones 2014-05-30 13:58 ` Zdenek Kabelac 0 siblings, 1 reply; 37+ messages in thread From: Richard W.M. Jones @ 2014-05-30 13:54 UTC (permalink / raw) To: Heinz Mauelshagen Cc: Zdenek Kabelac, thornber, Mike Snitzer, LVM general discussion and development [-- Attachment #1: Type: text/plain, Size: 843 bytes --] I'm attaching 3 tests that I have run so (hopefully) you can see what I'm observing, or point out if I'm making a mistake. - virt-ham0-raid1.txt Test with an ext4 filesystem located in an LV on the RAID 1 (md) array of 2 x WD NAS hard disks. - virt-ham0-ssd.txt Test with an ext4 filesystem located in an LV on the Samsung EVO SSD. - virt-ham0-lvmcache.txt Test with LVM-cache. For all tests, the same virt.job file is used: [virt] ioengine=libaio iodepth=4 rw=randrw bs=64k direct=1 size=1g numjobs=4 All tests are run on the same hardware. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com libguestfs lets you edit virtual machines. Supports shell scripting, bindings from many languages. http://libguestfs.org [-- Attachment #2: virt-ham0-raid1.txt --] [-- Type: text/plain, Size: 9385 bytes --] virt: (g=0): rw=randrw, bs=64K-64K/64K-64K/64K-64K, ioengine=libaio, iodepth=4 ... virt: (g=0): rw=randrw, bs=64K-64K/64K-64K/64K-64K, ioengine=libaio, iodepth=4 fio-2.1.2 Starting 4 processes virt: Laying out IO file(s) (1 file(s) / 1024MB) virt: Laying out IO file(s) (1 file(s) / 1024MB) virt: Laying out IO file(s) (1 file(s) / 1024MB) virt: Laying out IO file(s) (1 file(s) / 1024MB) virt: (groupid=0, jobs=1): err= 0: pid=2195: Wed May 28 22:12:50 2014 read : io=523520KB, bw=2600.4KB/s, iops=40, runt=201329msec slat (usec): min=23, max=24586, avg=65.89, stdev=306.38 clat (usec): min=305, max=1765.7K, avg=84912.67, stdev=124153.30 lat (usec): min=367, max=1765.8K, avg=84979.16, stdev=124150.29 clat percentiles (usec): | 1.00th=[ 780], 5.00th=[ 6944], 10.00th=[ 9536], 20.00th=[14144], | 30.00th=[19840], 40.00th=[28032], 50.00th=[40704], 60.00th=[57600], | 70.00th=[82432], 80.00th=[125440], 90.00th=[209920], 95.00th=[309248], | 99.00th=[593920], 99.50th=[790528], 99.90th=[1204224], 99.95th=[1286144], | 99.99th=[1761280] bw (KB /s): min= 82, max=12416, per=25.85%, avg=2688.32, stdev=1545.40 write: io=525056KB, bw=2607.1KB/s, iops=40, runt=201329msec slat (usec): min=31, max=140675, avg=132.77, stdev=1945.34 clat (usec): min=346, max=1355.5K, avg=13280.27, stdev=57149.27 lat (usec): min=404, max=1355.6K, avg=13413.69, stdev=57202.63 clat percentiles (usec): | 1.00th=[ 358], 5.00th=[ 374], 10.00th=[ 434], 20.00th=[ 446], | 30.00th=[ 644], 40.00th=[ 852], 50.00th=[ 1272], 60.00th=[ 1320], | 70.00th=[ 1496], 80.00th=[ 5728], 90.00th=[18048], 95.00th=[63232], | 99.00th=[257024], 99.50th=[382976], 99.90th=[831488], 99.95th=[946176], | 99.99th=[1351680] bw (KB /s): min= 121, max=10709, per=25.96%, avg=2708.14, stdev=1769.64 lat (usec) : 500=12.91%, 750=6.04%, 1000=3.25% lat (msec) : 2=16.32%, 4=1.65%, 10=7.59%, 20=12.91%, 50=14.75% lat (msec) : 100=9.90%, 250=10.51%, 500=3.19%, 750=0.66%, 1000=0.20% lat (msec) : 2000=0.13% cpu : usr=0.11%, sys=0.54%, ctx=16504, majf=0, minf=23 IO depths : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=8180/w=8204/d=0, short=r=0/w=0/d=0 virt: (groupid=0, jobs=1): err= 0: pid=2196: Wed May 28 22:12:50 2014 read : io=523520KB, bw=2947.6KB/s, iops=46, runt=177615msec slat (usec): min=24, max=59936, avg=81.73, stdev=987.31 clat (usec): min=149, max=1054.1K, avg=74995.11, stdev=93418.23 lat (usec): min=369, max=1054.2K, avg=75077.47, stdev=93411.06 clat percentiles (msec): | 1.00th=[ 5], 5.00th=[ 8], 10.00th=[ 10], 20.00th=[ 16], | 30.00th=[ 22], 40.00th=[ 31], 50.00th=[ 42], 60.00th=[ 57], | 70.00th=[ 80], 80.00th=[ 116], 90.00th=[ 180], 95.00th=[ 260], | 99.00th=[ 437], 99.50th=[ 529], 99.90th=[ 840], 99.95th=[ 979], | 99.99th=[ 1057] bw (KB /s): min= 113, max= 6898, per=29.26%, avg=3043.36, stdev=1217.82 write: io=525056KB, bw=2956.2KB/s, iops=46, runt=177615msec slat (usec): min=33, max=140655, avg=128.77, stdev=2069.57 clat (usec): min=258, max=1000.6K, avg=11590.37, stdev=57029.08 lat (usec): min=403, max=1000.7K, avg=11719.76, stdev=57077.03 clat percentiles (usec): | 1.00th=[ 362], 5.00th=[ 378], 10.00th=[ 434], 20.00th=[ 446], | 30.00th=[ 612], 40.00th=[ 748], 50.00th=[ 1224], 60.00th=[ 1304], | 70.00th=[ 1352], 80.00th=[ 1528], 90.00th=[ 7776], 95.00th=[55040], | 99.00th=[244736], 99.50th=[362496], 99.90th=[913408], 99.95th=[929792], | 99.99th=[1003520] bw (KB /s): min= 140, max= 7409, per=29.16%, avg=3042.19, stdev=1466.35 lat (usec) : 250=0.01%, 500=13.49%, 750=6.57%, 1000=3.19% lat (msec) : 2=19.84%, 4=1.70%, 10=5.92%, 20=9.95%, 50=14.89% lat (msec) : 100=10.81%, 250=10.45%, 500=2.73%, 750=0.31%, 1000=0.13% lat (msec) : 2000=0.02% cpu : usr=0.14%, sys=0.59%, ctx=16858, majf=0, minf=23 IO depths : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=8180/w=8204/d=0, short=r=0/w=0/d=0 virt: (groupid=0, jobs=1): err= 0: pid=2197: Wed May 28 22:12:50 2014 read : io=523520KB, bw=2923.2KB/s, iops=45, runt=179092msec slat (usec): min=20, max=99838, avg=91.84, stdev=1411.97 clat (usec): min=160, max=1512.4K, avg=75755.06, stdev=105522.71 lat (usec): min=382, max=1512.9K, avg=75847.54, stdev=105514.14 clat percentiles (msec): | 1.00th=[ 5], 5.00th=[ 8], 10.00th=[ 10], 20.00th=[ 15], | 30.00th=[ 21], 40.00th=[ 29], 50.00th=[ 40], 60.00th=[ 56], | 70.00th=[ 76], 80.00th=[ 112], 90.00th=[ 186], 95.00th=[ 269], | 99.00th=[ 469], 99.50th=[ 586], 99.90th=[ 1156], 99.95th=[ 1287], | 99.99th=[ 1516] bw (KB /s): min= 124, max= 6144, per=29.37%, avg=3055.29, stdev=1223.87 write: io=525056KB, bw=2931.8KB/s, iops=45, runt=179092msec slat (usec): min=35, max=140660, avg=114.41, stdev=1768.12 clat (usec): min=345, max=1441.6K, avg=11547.93, stdev=62451.29 lat (usec): min=415, max=1441.7K, avg=11663.01, stdev=62476.14 clat percentiles (usec): | 1.00th=[ 362], 5.00th=[ 378], 10.00th=[ 434], 20.00th=[ 446], | 30.00th=[ 596], 40.00th=[ 756], 50.00th=[ 1224], 60.00th=[ 1304], | 70.00th=[ 1352], 80.00th=[ 1544], 90.00th=[ 8896], 95.00th=[37632], | 99.00th=[232448], 99.50th=[350208], 99.90th=[995328], 99.95th=[1044480], | 99.99th=[1433600] bw (KB /s): min= 80, max= 9325, per=29.37%, avg=3063.24, stdev=1532.25 lat (usec) : 250=0.01%, 500=13.56%, 750=6.50%, 1000=3.08% lat (msec) : 2=19.73%, 4=1.62%, 10=6.32%, 20=10.52%, 50=14.89% lat (msec) : 100=10.77%, 250=9.75%, 500=2.72%, 750=0.27%, 1000=0.13% lat (msec) : 2000=0.14% cpu : usr=0.14%, sys=0.59%, ctx=16985, majf=0, minf=23 IO depths : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=8180/w=8204/d=0, short=r=0/w=0/d=0 virt: (groupid=0, jobs=1): err= 0: pid=2198: Wed May 28 22:12:50 2014 read : io=523520KB, bw=2629.9KB/s, iops=41, runt=199069msec slat (usec): min=25, max=99063, avg=89.77, stdev=1365.24 clat (usec): min=112, max=1392.1K, avg=83373.43, stdev=118987.34 lat (usec): min=369, max=1392.1K, avg=83463.84, stdev=118977.09 clat percentiles (msec): | 1.00th=[ 3], 5.00th=[ 7], 10.00th=[ 10], 20.00th=[ 15], | 30.00th=[ 21], 40.00th=[ 28], 50.00th=[ 40], 60.00th=[ 57], | 70.00th=[ 81], 80.00th=[ 122], 90.00th=[ 206], 95.00th=[ 310], | 99.00th=[ 603], 99.50th=[ 734], 99.90th=[ 979], 99.95th=[ 1156], | 99.99th=[ 1401] bw (KB /s): min= 64, max= 9708, per=26.35%, avg=2740.70, stdev=1540.11 write: io=525056KB, bw=2637.6KB/s, iops=41, runt=199069msec slat (usec): min=38, max=140657, avg=121.47, stdev=1860.80 clat (usec): min=349, max=1002.9K, avg=13698.39, stdev=66153.66 lat (usec): min=405, max=1002.9K, avg=13820.49, stdev=66192.16 clat percentiles (usec): | 1.00th=[ 362], 5.00th=[ 378], 10.00th=[ 434], 20.00th=[ 446], | 30.00th=[ 652], 40.00th=[ 876], 50.00th=[ 1272], 60.00th=[ 1320], | 70.00th=[ 1448], 80.00th=[ 2992], 90.00th=[15552], 95.00th=[36096], | 99.00th=[321536], 99.50th=[489472], 99.90th=[962560], 99.95th=[995328], | 99.99th=[1003520] bw (KB /s): min= 71, max= 9836, per=26.41%, avg=2755.14, stdev=1757.17 lat (usec) : 250=0.02%, 500=12.84%, 750=5.83%, 1000=3.12% lat (msec) : 2=17.58%, 4=1.73%, 10=7.50%, 20=12.41%, 50=14.97% lat (msec) : 100=9.86%, 250=9.86%, 500=3.19%, 750=0.78%, 1000=0.25% lat (msec) : 2000=0.06% cpu : usr=0.12%, sys=0.53%, ctx=16540, majf=0, minf=22 IO depths : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=8180/w=8204/d=0, short=r=0/w=0/d=0 Run status group 0 (all jobs): READ: io=2045.0MB, aggrb=10401KB/s, minb=2600KB/s, maxb=2947KB/s, mint=177615msec, maxt=201329msec WRITE: io=2051.0MB, aggrb=10431KB/s, minb=2607KB/s, maxb=2956KB/s, mint=177615msec, maxt=201329msec Disk stats (read/write): dm-0: ios=32841/33299, merge=0/0, ticks=2623746/506809, in_queue=3130698, util=100.00%, aggrios=32855/33392, aggrmerge=0/0, aggrticks=0/0, aggrin_queue=0, aggrutil=0.00% md127: ios=32855/33392, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=16426/33225, aggrmerge=1/168, aggrticks=1311820/306619, aggrin_queue=1618332, aggrutil=98.91% sda: ios=8494/33223, merge=0/171, ticks=464540/232964, in_queue=697442, util=96.18% sdb: ios=24359/33228, merge=2/166, ticks=2159100/380274, in_queue=2539222, util=98.91% [-- Attachment #3: virt-ham0-ssd.txt --] [-- Type: text/plain, Size: 8181 bytes --] virt: (g=0): rw=randrw, bs=64K-64K/64K-64K/64K-64K, ioengine=libaio, iodepth=4 ... virt: (g=0): rw=randrw, bs=64K-64K/64K-64K/64K-64K, ioengine=libaio, iodepth=4 fio-2.1.2 Starting 4 processes virt: (groupid=0, jobs=1): err= 0: pid=2177: Wed May 28 22:07:58 2014 read : io=523520KB, bw=28983KB/s, iops=452, runt= 18063msec slat (usec): min=23, max=4451, avg=42.52, stdev=61.50 clat (usec): min=136, max=26872, avg=4360.12, stdev=1103.80 lat (msec): min=2, max=26, avg= 4.40, stdev= 1.10 clat percentiles (usec): | 1.00th=[ 3824], 5.00th=[ 3888], 10.00th=[ 3920], 20.00th=[ 3952], | 30.00th=[ 4016], 40.00th=[ 4080], 50.00th=[ 4128], 60.00th=[ 4256], | 70.00th=[ 4320], 80.00th=[ 4448], 90.00th=[ 4640], 95.00th=[ 4960], | 99.00th=[ 9792], 99.50th=[10304], 99.90th=[17024], 99.95th=[21888], | 99.99th=[26752] bw (KB /s): min=25600, max=33280, per=25.02%, avg=29007.28, stdev=1840.52 write: io=525056KB, bw=29068KB/s, iops=454, runt= 18063msec slat (usec): min=26, max=5046, avg=48.33, stdev=57.35 clat (msec): min=3, max=29, avg= 4.36, stdev= 1.10 lat (msec): min=3, max=29, avg= 4.41, stdev= 1.11 clat percentiles (usec): | 1.00th=[ 3856], 5.00th=[ 3920], 10.00th=[ 3952], 20.00th=[ 3984], | 30.00th=[ 4016], 40.00th=[ 4080], 50.00th=[ 4128], 60.00th=[ 4256], | 70.00th=[ 4320], 80.00th=[ 4448], 90.00th=[ 4640], 95.00th=[ 4896], | 99.00th=[ 9920], 99.50th=[10560], 99.90th=[16320], 99.95th=[21376], | 99.99th=[29056] bw (KB /s): min=25447, max=31744, per=25.02%, avg=29091.58, stdev=1503.68 lat (usec) : 250=0.01% lat (msec) : 4=24.84%, 10=74.26%, 20=0.82%, 50=0.08% cpu : usr=1.10%, sys=4.57%, ctx=16802, majf=0, minf=23 IO depths : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=8180/w=8204/d=0, short=r=0/w=0/d=0 virt: (groupid=0, jobs=1): err= 0: pid=2178: Wed May 28 22:07:58 2014 read : io=523520KB, bw=28986KB/s, iops=452, runt= 18061msec slat (usec): min=20, max=4734, avg=44.14, stdev=65.97 clat (usec): min=134, max=34582, avg=4367.22, stdev=1102.36 lat (msec): min=2, max=34, avg= 4.41, stdev= 1.10 clat percentiles (usec): | 1.00th=[ 3824], 5.00th=[ 3888], 10.00th=[ 3920], 20.00th=[ 3984], | 30.00th=[ 4016], 40.00th=[ 4080], 50.00th=[ 4128], 60.00th=[ 4256], | 70.00th=[ 4320], 80.00th=[ 4448], 90.00th=[ 4704], 95.00th=[ 4960], | 99.00th=[ 9920], 99.50th=[10304], 99.90th=[16512], 99.95th=[17024], | 99.99th=[34560] bw (KB /s): min=25804, max=33280, per=25.03%, avg=29016.61, stdev=1835.93 write: io=525056KB, bw=29071KB/s, iops=454, runt= 18061msec slat (usec): min=26, max=2297, avg=49.25, stdev=29.79 clat (msec): min=3, max=28, avg= 4.35, stdev= 1.07 lat (msec): min=3, max=28, avg= 4.40, stdev= 1.07 clat percentiles (usec): | 1.00th=[ 3824], 5.00th=[ 3888], 10.00th=[ 3920], 20.00th=[ 3984], | 30.00th=[ 4016], 40.00th=[ 4080], 50.00th=[ 4128], 60.00th=[ 4192], | 70.00th=[ 4320], 80.00th=[ 4448], 90.00th=[ 4640], 95.00th=[ 4896], | 99.00th=[ 9920], 99.50th=[10304], 99.90th=[16192], 99.95th=[18816], | 99.99th=[28288] bw (KB /s): min=25447, max=31936, per=25.03%, avg=29099.78, stdev=1497.17 lat (usec) : 250=0.01% lat (msec) : 4=25.34%, 10=73.77%, 20=0.83%, 50=0.04% cpu : usr=1.23%, sys=4.71%, ctx=16888, majf=0, minf=23 IO depths : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=8180/w=8204/d=0, short=r=0/w=0/d=0 virt: (groupid=0, jobs=1): err= 0: pid=2179: Wed May 28 22:07:58 2014 read : io=523520KB, bw=28983KB/s, iops=452, runt= 18063msec slat (usec): min=15, max=4262, avg=41.83, stdev=65.79 clat (usec): min=128, max=35128, avg=4352.56, stdev=1194.36 lat (msec): min=1, max=35, avg= 4.39, stdev= 1.19 clat percentiles (usec): | 1.00th=[ 3824], 5.00th=[ 3888], 10.00th=[ 3920], 20.00th=[ 3952], | 30.00th=[ 4016], 40.00th=[ 4080], 50.00th=[ 4128], 60.00th=[ 4192], | 70.00th=[ 4320], 80.00th=[ 4448], 90.00th=[ 4640], 95.00th=[ 4896], | 99.00th=[ 9792], 99.50th=[10432], 99.90th=[17280], 99.95th=[20864], | 99.99th=[35072] bw (KB /s): min=25676, max=33402, per=25.02%, avg=29002.72, stdev=1797.83 write: io=525056KB, bw=29068KB/s, iops=454, runt= 18063msec slat (usec): min=22, max=1784, avg=47.23, stdev=24.88 clat (usec): min=296, max=35165, avg=4367.18, stdev=1113.83 lat (msec): min=1, max=35, avg= 4.41, stdev= 1.11 clat percentiles (usec): | 1.00th=[ 3856], 5.00th=[ 3920], 10.00th=[ 3952], 20.00th=[ 3984], | 30.00th=[ 4048], 40.00th=[ 4080], 50.00th=[ 4128], 60.00th=[ 4256], | 70.00th=[ 4320], 80.00th=[ 4448], 90.00th=[ 4640], 95.00th=[ 4960], | 99.00th=[ 9792], 99.50th=[10176], 99.90th=[16320], 99.95th=[20608], | 99.99th=[35072] bw (KB /s): min=25223, max=32127, per=25.02%, avg=29093.50, stdev=1608.39 lat (usec) : 250=0.01%, 500=0.01% lat (msec) : 2=0.01%, 4=23.93%, 10=75.24%, 20=0.73%, 50=0.07% cpu : usr=1.07%, sys=4.55%, ctx=16766, majf=0, minf=23 IO depths : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=8180/w=8204/d=0, short=r=0/w=0/d=0 virt: (groupid=0, jobs=1): err= 0: pid=2180: Wed May 28 22:07:58 2014 read : io=523520KB, bw=28985KB/s, iops=452, runt= 18062msec slat (usec): min=24, max=5553, avg=44.79, stdev=80.00 clat (usec): min=138, max=34979, avg=4358.30, stdev=1106.07 lat (msec): min=2, max=35, avg= 4.40, stdev= 1.10 clat percentiles (usec): | 1.00th=[ 3824], 5.00th=[ 3888], 10.00th=[ 3920], 20.00th=[ 3984], | 30.00th=[ 4016], 40.00th=[ 4080], 50.00th=[ 4128], 60.00th=[ 4256], | 70.00th=[ 4320], 80.00th=[ 4448], 90.00th=[ 4704], 95.00th=[ 4960], | 99.00th=[ 9792], 99.50th=[10304], 99.90th=[16192], 99.95th=[19584], | 99.99th=[35072] bw (KB /s): min=25243, max=33280, per=25.02%, avg=29005.64, stdev=1815.27 write: io=525056KB, bw=29070KB/s, iops=454, runt= 18062msec slat (usec): min=27, max=4550, avg=50.17, stdev=52.18 clat (usec): min=372, max=34869, avg=4354.62, stdev=1175.13 lat (msec): min=3, max=34, avg= 4.41, stdev= 1.17 clat percentiles (usec): | 1.00th=[ 3856], 5.00th=[ 3888], 10.00th=[ 3952], 20.00th=[ 3984], | 30.00th=[ 4016], 40.00th=[ 4080], 50.00th=[ 4128], 60.00th=[ 4192], | 70.00th=[ 4320], 80.00th=[ 4448], 90.00th=[ 4640], 95.00th=[ 4896], | 99.00th=[ 9920], 99.50th=[10432], 99.90th=[18304], 99.95th=[22144], | 99.99th=[35072] bw (KB /s): min=25377, max=32000, per=25.02%, avg=29094.31, stdev=1546.49 lat (usec) : 250=0.01%, 500=0.01% lat (msec) : 4=25.07%, 10=74.04%, 20=0.82%, 50=0.05% cpu : usr=1.02%, sys=4.93%, ctx=16748, majf=0, minf=22 IO depths : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=8180/w=8204/d=0, short=r=0/w=0/d=0 Run status group 0 (all jobs): READ: io=2045.0MB, aggrb=115932KB/s, minb=28983KB/s, maxb=28986KB/s, mint=18061msec, maxt=18063msec WRITE: io=2051.0MB, aggrb=116272KB/s, minb=29068KB/s, maxb=29071KB/s, mint=18061msec, maxt=18063msec Disk stats (read/write): dm-1: ios=32531/32589, merge=0/0, ticks=141395/170612, in_queue=312036, util=99.54%, aggrios=32720/32831, aggrmerge=0/12, aggrticks=142412/171944, aggrin_queue=314244, aggrutil=99.45% sdc: ios=32720/32831, merge=0/12, ticks=142412/171944, in_queue=314244, util=99.45% [-- Attachment #4: virt-ham0-lvmcache.txt --] [-- Type: text/plain, Size: 9695 bytes --] virt: (g=0): rw=randrw, bs=64K-64K/64K-64K/64K-64K, ioengine=libaio, iodepth=4 ... virt: (g=0): rw=randrw, bs=64K-64K/64K-64K/64K-64K, ioengine=libaio, iodepth=4 fio-2.1.2 Starting 4 processes virt: (groupid=0, jobs=1): err= 0: pid=4678: Fri May 30 14:49:36 2014 read : io=523520KB, bw=6385.7KB/s, iops=99, runt= 81984msec slat (usec): min=15, max=51287, avg=92.23, stdev=1109.90 clat (usec): min=3, max=17110, avg=741.35, stdev=1099.40 lat (usec): min=374, max=51293, avg=834.14, stdev=1547.02 clat percentiles (usec): | 1.00th=[ 346], 5.00th=[ 350], 10.00th=[ 354], 20.00th=[ 362], | 30.00th=[ 374], 40.00th=[ 378], 50.00th=[ 398], 60.00th=[ 430], | 70.00th=[ 450], 80.00th=[ 564], 90.00th=[ 1448], 95.00th=[ 2960], | 99.00th=[ 5664], 99.50th=[ 6880], 99.90th=[12096], 99.95th=[12608], | 99.99th=[17024] bw (KB /s): min= 106, max=25344, per=28.73%, avg=6890.66, stdev=3382.41 write: io=525056KB, bw=6404.4KB/s, iops=100, runt= 81984msec slat (usec): min=23, max=79877, avg=113.74, stdev=1656.45 clat (usec): min=267, max=939139, avg=38930.63, stdev=72364.78 lat (usec): min=343, max=939175, avg=39045.06, stdev=72510.87 clat percentiles (usec): | 1.00th=[ 298], 5.00th=[ 302], 10.00th=[ 326], 20.00th=[ 844], | 30.00th=[ 6688], 40.00th=[38144], 50.00th=[41728], 60.00th=[43776], | 70.00th=[46848], 80.00th=[50944], 90.00th=[56064], 95.00th=[61184], | 99.00th=[181248], 99.50th=[790528], 99.90th=[872448], 99.95th=[905216], | 99.99th=[937984] bw (KB /s): min= 71, max=22528, per=28.70%, avg=6904.82, stdev=3083.01 lat (usec) : 4=0.01%, 10=0.02%, 100=0.01%, 250=0.01%, 500=47.35% lat (usec) : 750=4.99%, 1000=1.57% lat (msec) : 2=3.75%, 4=5.01%, 10=2.75%, 20=0.51%, 50=23.00% lat (msec) : 100=10.40%, 250=0.19%, 500=0.04%, 750=0.07%, 1000=0.32% cpu : usr=0.25%, sys=1.29%, ctx=16566, majf=0, minf=24 IO depths : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=8180/w=8204/d=0, short=r=0/w=0/d=0 virt: (groupid=0, jobs=1): err= 0: pid=4679: Fri May 30 14:49:36 2014 read : io=523520KB, bw=6537.6KB/s, iops=102, runt= 80079msec slat (usec): min=16, max=61433, avg=102.37, stdev=1288.95 clat (usec): min=2, max=16641, avg=737.39, stdev=1134.94 lat (usec): min=376, max=61436, avg=840.31, stdev=1699.15 clat percentiles (usec): | 1.00th=[ 342], 5.00th=[ 350], 10.00th=[ 354], 20.00th=[ 362], | 30.00th=[ 374], 40.00th=[ 378], 50.00th=[ 398], 60.00th=[ 430], | 70.00th=[ 450], 80.00th=[ 580], 90.00th=[ 1288], 95.00th=[ 2896], | 99.00th=[ 5664], 99.50th=[ 7648], 99.90th=[12864], 99.95th=[14656], | 99.99th=[16768] bw (KB /s): min= 298, max=27181, per=29.47%, avg=7067.77, stdev=3871.99 write: io=525056KB, bw=6556.8KB/s, iops=102, runt= 80079msec slat (usec): min=26, max=48770, avg=83.15, stdev=890.23 clat (usec): min=266, max=5409.6K, avg=38023.69, stdev=102346.26 lat (usec): min=337, max=5409.7K, avg=38107.52, stdev=102438.81 clat percentiles (usec): | 1.00th=[ 294], 5.00th=[ 302], 10.00th=[ 318], 20.00th=[ 382], | 30.00th=[ 3248], 40.00th=[37120], 50.00th=[41216], 60.00th=[43776], | 70.00th=[46336], 80.00th=[50432], 90.00th=[56064], 95.00th=[61184], | 99.00th=[173056], 99.50th=[790528], 99.90th=[897024], 99.95th=[921600], | 99.99th=[5406720] bw (KB /s): min= 298, max=24710, per=29.43%, avg=7078.80, stdev=3321.80 lat (usec) : 4=0.02%, 10=0.03%, 50=0.01%, 250=0.01%, 500=48.87% lat (usec) : 750=5.78%, 1000=1.87% lat (msec) : 2=3.39%, 4=4.72%, 10=2.66%, 20=0.51%, 50=21.49% lat (msec) : 100=10.09%, 250=0.15%, 500=0.04%, 750=0.05%, 1000=0.31% lat (msec) : >=2000=0.01% cpu : usr=0.25%, sys=1.35%, ctx=16791, majf=0, minf=24 IO depths : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=8180/w=8204/d=0, short=r=0/w=0/d=0 virt: (groupid=0, jobs=1): err= 0: pid=4680: Fri May 30 14:49:36 2014 read : io=523520KB, bw=5996.4KB/s, iops=93, runt= 87307msec slat (usec): min=15, max=50215, avg=79.95, stdev=812.30 clat (usec): min=4, max=23674, avg=754.82, stdev=1161.37 lat (usec): min=380, max=50222, avg=835.35, stdev=1406.50 clat percentiles (usec): | 1.00th=[ 346], 5.00th=[ 350], 10.00th=[ 354], 20.00th=[ 362], | 30.00th=[ 370], 40.00th=[ 378], 50.00th=[ 394], 60.00th=[ 430], | 70.00th=[ 446], 80.00th=[ 572], 90.00th=[ 1496], 95.00th=[ 3024], | 99.00th=[ 6112], 99.50th=[ 7712], 99.90th=[12352], 99.95th=[13888], | 99.99th=[23680] bw (KB /s): min= 372, max=26368, per=26.72%, avg=6409.15, stdev=3611.01 write: io=525056KB, bw=6013.1KB/s, iops=93, runt= 87307msec slat (usec): min=25, max=69281, avg=119.08, stdev=1629.76 clat (usec): min=288, max=4229.2K, avg=41517.28, stdev=86297.67 lat (usec): min=345, max=4229.3K, avg=41637.09, stdev=86496.27 clat percentiles (usec): | 1.00th=[ 298], 5.00th=[ 326], 10.00th=[ 540], 20.00th=[ 5280], | 30.00th=[22144], 40.00th=[38656], 50.00th=[41728], 60.00th=[44288], | 70.00th=[46848], 80.00th=[50944], 90.00th=[56064], 95.00th=[62208], | 99.00th=[183296], 99.50th=[790528], 99.90th=[888832], 99.95th=[913408], | 99.99th=[4227072] bw (KB /s): min= 91, max=23808, per=26.53%, avg=6381.50, stdev=3178.76 lat (usec) : 10=0.02%, 100=0.01%, 500=43.29%, 750=4.35%, 1000=1.67% lat (msec) : 2=3.61%, 4=5.15%, 10=3.24%, 20=2.69%, 50=25.01% lat (msec) : 100=10.25%, 250=0.26%, 500=0.04%, 750=0.10%, 1000=0.32% lat (msec) : >=2000=0.01% cpu : usr=0.22%, sys=1.24%, ctx=16506, majf=0, minf=24 IO depths : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=8180/w=8204/d=0, short=r=0/w=0/d=0 virt: (groupid=0, jobs=1): err= 0: pid=4681: Fri May 30 14:49:36 2014 read : io=523520KB, bw=6017.6KB/s, iops=94, runt= 86999msec slat (usec): min=15, max=50273, avg=88.62, stdev=1003.90 clat (usec): min=2, max=16356, avg=742.71, stdev=1140.56 lat (usec): min=368, max=50278, avg=831.90, stdev=1505.02 clat percentiles (usec): | 1.00th=[ 346], 5.00th=[ 350], 10.00th=[ 354], 20.00th=[ 362], | 30.00th=[ 370], 40.00th=[ 378], 50.00th=[ 398], 60.00th=[ 430], | 70.00th=[ 446], 80.00th=[ 548], 90.00th=[ 1416], 95.00th=[ 2960], | 99.00th=[ 6048], 99.50th=[ 8032], 99.90th=[12608], 99.95th=[13120], | 99.99th=[16320] bw (KB /s): min= 212, max=23936, per=26.82%, avg=6433.62, stdev=3648.50 write: io=525056KB, bw=6035.2KB/s, iops=94, runt= 86999msec slat (usec): min=21, max=83882, avg=116.67, stdev=1719.48 clat (usec): min=279, max=2542.4K, avg=41373.74, stdev=77980.27 lat (usec): min=352, max=2542.4K, avg=41491.13, stdev=78185.74 clat percentiles (usec): | 1.00th=[ 298], 5.00th=[ 322], 10.00th=[ 394], 20.00th=[ 4448], | 30.00th=[22656], 40.00th=[38656], 50.00th=[41728], 60.00th=[44288], | 70.00th=[46848], 80.00th=[50944], 90.00th=[56064], 95.00th=[62208], | 99.00th=[183296], 99.50th=[782336], 99.90th=[897024], 99.95th=[913408], | 99.99th=[2539520] bw (KB /s): min= 268, max=21760, per=26.76%, avg=6437.61, stdev=3158.84 lat (usec) : 4=0.01%, 10=0.02%, 100=0.01%, 500=44.11%, 750=4.68% lat (usec) : 1000=1.61% lat (msec) : 2=3.28%, 4=4.80%, 10=2.88%, 20=2.45%, 50=25.24% lat (msec) : 100=10.17%, 250=0.27%, 500=0.04%, 750=0.10%, 1000=0.31% lat (msec) : >=2000=0.01% cpu : usr=0.26%, sys=1.19%, ctx=16414, majf=0, minf=23 IO depths : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=8180/w=8204/d=0, short=r=0/w=0/d=0 Run status group 0 (all jobs): READ: io=2045.0MB, aggrb=23985KB/s, minb=5996KB/s, maxb=6537KB/s, mint=80079msec, maxt=87307msec WRITE: io=2051.0MB, aggrb=24055KB/s, minb=6013KB/s, maxb=6556KB/s, mint=80079msec, maxt=87307msec Disk stats (read/write): dm-3: ios=32666/32817, merge=0/0, ticks=24343/1321747, in_queue=1346205, util=99.98%, aggrios=11107/11174, aggrmerge=0/0, aggrticks=8553/834112, aggrin_queue=843695, aggrutil=99.96% dm-0: ios=33323/6886, merge=0/0, ticks=25660/6683, in_queue=32346, util=16.79%, aggrios=33299/6884, aggrmerge=24/2, aggrticks=25549/6673, aggrin_queue=32121, aggrutil=16.75% sdc: ios=33299/6884, merge=24/2, ticks=25549/6673, in_queue=32121, util=16.75% dm-1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00% dm-2: ios=0/26636, merge=0/0, ticks=0/2495655, in_queue=2498741, util=99.96%, aggrios=0/26654, aggrmerge=0/0, aggrticks=0/0, aggrin_queue=0, aggrutil=0.00% md127: ios=0/26654, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=0/26616, aggrmerge=0/27, aggrticks=0/1625073, aggrin_queue=1626606, aggrutil=99.64% sda: ios=0/26610, merge=0/26, ticks=0/2380053, in_queue=2383117, util=99.64% sdb: ios=0/26622, merge=0/28, ticks=0/870094, in_queue=870095, util=89.86% ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [linux-lvm] Testing the new LVM cache feature 2014-05-30 13:54 ` Richard W.M. Jones @ 2014-05-30 13:58 ` Zdenek Kabelac 0 siblings, 0 replies; 37+ messages in thread From: Zdenek Kabelac @ 2014-05-30 13:58 UTC (permalink / raw) To: Richard W.M. Jones, Heinz Mauelshagen Cc: thornber, Mike Snitzer, LVM general discussion and development Dne 30.5.2014 15:54, Richard W.M. Jones napsal(a): > I'm attaching 3 tests that I have run so (hopefully) you can see > what I'm observing, or point out if I'm making a mistake. > I'd have asked - is there any difference in the test perfomance if you use ramdisk device for your cache metadata device. (So _cdata stays on 'ssd', just _cmeta is located on i.e. loop0 with backend file in your tmpfs ramdisk device ?) Zdenek ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [linux-lvm] Testing the new LVM cache feature 2014-05-30 13:38 ` Mike Snitzer 2014-05-30 13:40 ` Richard W.M. Jones 2014-05-30 13:42 ` Heinz Mauelshagen @ 2014-05-30 13:46 ` Richard W.M. Jones 2014-05-30 13:54 ` Heinz Mauelshagen 2014-05-30 13:55 ` Mike Snitzer 2 siblings, 2 replies; 37+ messages in thread From: Richard W.M. Jones @ 2014-05-30 13:46 UTC (permalink / raw) To: Mike Snitzer Cc: Heinz Mauelshagen, Zdenek Kabelac, thornber, LVM general discussion and development I have now set both read_promote_adjustment == write_promote_adjustment == 0 and used drop_caches between runs. I also read Documentation/device-mapper/cache-policies.txt at Heinz's suggestion. I'm afraid the performance of the fio test is still not the same as the SSD (4.8 times slower than the SSD-only test now). Would repeated runs of (md5sum virt.* ; echo 3 > /proc/sys/vm/drop_caches) not eventually cause the whole file to be placed on the SSD? It does seem very counter-intuitive if not. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-df lists disk usage of guests without needing to install any software inside the virtual machine. Supports Linux and Windows. http://people.redhat.com/~rjones/virt-df/ ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [linux-lvm] Testing the new LVM cache feature 2014-05-30 13:46 ` Richard W.M. Jones @ 2014-05-30 13:54 ` Heinz Mauelshagen 2014-05-30 14:26 ` Richard W.M. Jones 2014-05-30 13:55 ` Mike Snitzer 1 sibling, 1 reply; 37+ messages in thread From: Heinz Mauelshagen @ 2014-05-30 13:54 UTC (permalink / raw) To: Richard W.M. Jones, Mike Snitzer Cc: Zdenek Kabelac, thornber, LVM general discussion and development On 05/30/2014 03:46 PM, Richard W.M. Jones wrote: > I have now set both read_promote_adjustment == > write_promote_adjustment == 0 and used drop_caches between runs. Did you adjust "sequential_threshold 0" as well? dm-cache tries to avoid promoting large sequential files to the cache, because spindles have good bandwidth. This is again because of the hot spot caching nature of dm-cache. > > I also read Documentation/device-mapper/cache-policies.txt at Heinz's > suggestion. > > I'm afraid the performance of the fio test is still not the same as > the SSD (4.8 times slower than the SSD-only test now). > > Would repeated runs of (md5sum virt.* ; echo 3 > /proc/sys/vm/drop_caches) > not eventually cause the whole file to be placed on the SSD? > It does seem very counter-intuitive if not. Please retry with "sequential_threshold 0" Heinz > > Rich. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [linux-lvm] Testing the new LVM cache feature 2014-05-30 13:54 ` Heinz Mauelshagen @ 2014-05-30 14:26 ` Richard W.M. Jones 2014-05-30 14:29 ` Mike Snitzer 0 siblings, 1 reply; 37+ messages in thread From: Richard W.M. Jones @ 2014-05-30 14:26 UTC (permalink / raw) To: Heinz Mauelshagen Cc: Zdenek Kabelac, thornber, Mike Snitzer, LVM general discussion and development On Fri, May 30, 2014 at 03:54:49PM +0200, Heinz Mauelshagen wrote: > On 05/30/2014 03:46 PM, Richard W.M. Jones wrote: > >I have now set both read_promote_adjustment == > >write_promote_adjustment == 0 and used drop_caches between runs. > > Did you adjust "sequential_threshold 0" as well? > > dm-cache tries to avoid promoting large sequential files to the cache, > because spindles have good bandwidth. > > This is again because of the hot spot caching nature of dm-cache. Setting this had no effect. I starting to wonder if my settings are having any effect at all. Here are the device-mapper tables: $ sudo dmsetup table vg_guests-lv_cache_cdata: 0 419430400 linear 8:33 2099200 vg_guests-lv_cache_cmeta: 0 2097152 linear 8:33 2048 vg_guests-home: 0 209715200 linear 9:127 2048 vg_guests-libvirt--images: 0 1677721600 cache 253:1 253:0 253:2 128 0 default 0 vg_guests-libvirt--images_corig: 0 1677721600 linear 9:127 2055211008 And here is the command I used to set sequential_threshold to 0 (there was no error and no other output): $ sudo dmsetup message vg_guests-libvirt--images 0 sequential_threshold 0 Is there a way to print the current settings? Could writethrough be enabled? (I'm supposed to be using writeback). How do I find out? Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-top is 'top' for virtual machines. Tiny program with many powerful monitoring features, net stats, disk stats, logging, etc. http://people.redhat.com/~rjones/virt-top ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [linux-lvm] Testing the new LVM cache feature 2014-05-30 14:26 ` Richard W.M. Jones @ 2014-05-30 14:29 ` Mike Snitzer 2014-05-30 14:36 ` Richard W.M. Jones 0 siblings, 1 reply; 37+ messages in thread From: Mike Snitzer @ 2014-05-30 14:29 UTC (permalink / raw) To: Richard W.M. Jones Cc: Heinz Mauelshagen, Zdenek Kabelac, thornber, LVM general discussion and development On Fri, May 30 2014 at 10:26am -0400, Richard W.M. Jones <rjones@redhat.com> wrote: > On Fri, May 30, 2014 at 03:54:49PM +0200, Heinz Mauelshagen wrote: > > On 05/30/2014 03:46 PM, Richard W.M. Jones wrote: > > >I have now set both read_promote_adjustment == > > >write_promote_adjustment == 0 and used drop_caches between runs. > > > > Did you adjust "sequential_threshold 0" as well? > > > > dm-cache tries to avoid promoting large sequential files to the cache, > > because spindles have good bandwidth. > > > > This is again because of the hot spot caching nature of dm-cache. > > Setting this had no effect. > > I starting to wonder if my settings are having any effect at all. > > Here are the device-mapper tables: > > $ sudo dmsetup table > vg_guests-lv_cache_cdata: 0 419430400 linear 8:33 2099200 > vg_guests-lv_cache_cmeta: 0 2097152 linear 8:33 2048 > vg_guests-home: 0 209715200 linear 9:127 2048 > vg_guests-libvirt--images: 0 1677721600 cache 253:1 253:0 253:2 128 0 default 0 > vg_guests-libvirt--images_corig: 0 1677721600 linear 9:127 2055211008 > > And here is the command I used to set sequential_threshold to 0 > (there was no error and no other output): > > $ sudo dmsetup message vg_guests-libvirt--images 0 sequential_threshold 0 sequential_threshold is only going to help the md5sum's IO get promoted (assuming you're having it read a large file). > Is there a way to print the current settings? > > Could writethrough be enabled? (I'm supposed to be using writeback). > How do I find out? dmsetup status vg_guests-libvirt--images But I'm really wondering if your IO is misaligned (like my earlier email brought up). It _could_ be promoting 2 64K blocks from the origin for every 64K IO. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [linux-lvm] Testing the new LVM cache feature 2014-05-30 14:29 ` Mike Snitzer @ 2014-05-30 14:36 ` Richard W.M. Jones 2014-05-30 14:44 ` Mike Snitzer 0 siblings, 1 reply; 37+ messages in thread From: Richard W.M. Jones @ 2014-05-30 14:36 UTC (permalink / raw) To: Mike Snitzer Cc: Heinz Mauelshagen, Zdenek Kabelac, thornber, LVM general discussion and development On Fri, May 30, 2014 at 10:29:26AM -0400, Mike Snitzer wrote: > sequential_threshold is only going to help the md5sum's IO get promoted > (assuming you're having it read a large file). Note the fio test runs on the virt.* files. I'm using md5sum in an attempt to pull those same files into the SSD. > > Is there a way to print the current settings? > > > > Could writethrough be enabled? (I'm supposed to be using writeback). > > How do I find out? > > dmsetup status vg_guests-libvirt--images Here's dmsetup status on various objects: $ sudo dmsetup table vg_guests-lv_cache_cdata: 0 419430400 linear 8:33 2099200 vg_guests-lv_cache_cmeta: 0 2097152 linear 8:33 2048 vg_guests-home: 0 209715200 linear 9:127 2048 vg_guests-libvirt--images: 0 1677721600 cache 253:1 253:0 253:2 128 0 default 0 vg_guests-libvirt--images_corig: 0 1677721600 linear 9:127 2055211008 $ sudo dmsetup status vg_guests-libvirt--images 0 1677721600 cache 8 10162/262144 128 39839/3276800 1087840 821795 116320 2057235 0 39835 0 1 writeback 2 migration_threshold 2048 mq 10 random_threshold 4 sequential_threshold 0 discard_promote_adjustment 1 read_promote_adjustment 0 write_promote_adjustment 0 $ sudo dmsetup status vg_guests-lv_cache_cdata 0 419430400 linear $ sudo dmsetup status vg_guests-lv_cache_cmeta 0 2097152 linear $ sudo dmsetup status vg_guests-libvirt--images_corig 0 1677721600 linear > But I'm really wondering if your IO is misaligned (like my earlier email > brought up). It _could_ be promoting 2 64K blocks from the origin for > every 64K IO. There's nothing obviously wrong ... ** For the SSD ** $ sudo fdisk -l /dev/sdc Disk /dev/sdc: 232.9 GiB, 250059350016 bytes, 488397168 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel type: dos Disk identifier: 0x3e302f2a Device Boot Start End Blocks Id System /dev/sdc1 2048 488397167 244197560 8e Linux LVM The PV is placed directly on /dev/sdc1. ** For the HDD array ** $ sudo fdisk -l /dev/sd{a,b} Disk /dev/sda: 1.8 TiB, 2000398934016 bytes, 3907029168 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disklabel type: gpt Disk identifier: B9545B67-681D-4729-A8A0-C75CB2EFFCB1 Device Start End Size Type /dev/sda1 2048 3907029134 1.8T Linux filesystem Disk /dev/sdb: 1.8 TiB, 2000398934016 bytes, 3907029168 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disklabel type: gpt Disk identifier: EFA66BD1-E813-4826-88A2-F2BB3C2E093E Device Start End Size Type /dev/sdb1 2048 3907029134 1.8T Linux filesystem $ cat /proc/mdstat Personalities : [raid1] md127 : active raid1 sdb1[2] sda1[1] 1953382272 blocks super 1.2 [2/2] [UU] unused devices: <none> The PV is placed on /dev/md127. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com Fedora Windows cross-compiler. Compile Windows programs, test, and build Windows installers. Over 100 libraries supported. http://fedoraproject.org/wiki/MinGW ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [linux-lvm] Testing the new LVM cache feature 2014-05-30 14:36 ` Richard W.M. Jones @ 2014-05-30 14:44 ` Mike Snitzer 2014-05-30 14:51 ` Richard W.M. Jones 0 siblings, 1 reply; 37+ messages in thread From: Mike Snitzer @ 2014-05-30 14:44 UTC (permalink / raw) To: Richard W.M. Jones Cc: Heinz Mauelshagen, Zdenek Kabelac, thornber, LVM general discussion and development On Fri, May 30 2014 at 10:36am -0400, Richard W.M. Jones <rjones@redhat.com> wrote: > On Fri, May 30, 2014 at 10:29:26AM -0400, Mike Snitzer wrote: > > sequential_threshold is only going to help the md5sum's IO get promoted > > (assuming you're having it read a large file). > > Note the fio test runs on the virt.* files. I'm using md5sum in an > attempt to pull those same files into the SSD. > > > > Is there a way to print the current settings? > > > > > > Could writethrough be enabled? (I'm supposed to be using writeback). > > > How do I find out? > > > > dmsetup status vg_guests-libvirt--images > > Here's dmsetup status on various objects: > > $ sudo dmsetup table > vg_guests-lv_cache_cdata: 0 419430400 linear 8:33 2099200 > vg_guests-lv_cache_cmeta: 0 2097152 linear 8:33 2048 > vg_guests-home: 0 209715200 linear 9:127 2048 > vg_guests-libvirt--images: 0 1677721600 cache 253:1 253:0 253:2 128 0 default 0 > vg_guests-libvirt--images_corig: 0 1677721600 linear 9:127 2055211008 > $ sudo dmsetup status vg_guests-libvirt--images > 0 1677721600 cache 8 10162/262144 128 39839/3276800 1087840 821795 116320 2057235 0 39835 0 1 writeback 2 migration_threshold 2048 mq 10 random_threshold 4 sequential_threshold 0 discard_promote_adjustment 1 read_promote_adjustment 0 write_promote_adjustment 0 > $ sudo dmsetup status vg_guests-lv_cache_cdata > 0 419430400 linear > $ sudo dmsetup status vg_guests-lv_cache_cmeta > 0 2097152 linear > $ sudo dmsetup status vg_guests-libvirt--images_corig > 0 1677721600 linear > > > But I'm really wondering if your IO is misaligned (like my earlier email > > brought up). It _could_ be promoting 2 64K blocks from the origin for > > every 64K IO. > > There's nothing obviously wrong ... I'm not talking about alignment relative to the physical device's limits. I'm talking about alignment of ext4's data areas relative to the 64K block boundaries. Also a point of conern would be: how fragmented is the ext4 space? It could be that it cannot get contiguous 64K regions from the namespace. If that is the case than a lot more IO would get pulled in. Can you try reducing the cache blocksize to 32K (lowest we support at the moment, it'll require you to remove the cache and recreate) to see if performance for this 64K random IO workload improves? If so it does start to add weight to my alignment concerns. Mike ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [linux-lvm] Testing the new LVM cache feature 2014-05-30 14:44 ` Mike Snitzer @ 2014-05-30 14:51 ` Richard W.M. Jones 2014-05-30 14:58 ` Mike Snitzer 0 siblings, 1 reply; 37+ messages in thread From: Richard W.M. Jones @ 2014-05-30 14:51 UTC (permalink / raw) To: Mike Snitzer Cc: Heinz Mauelshagen, Zdenek Kabelac, thornber, LVM general discussion and development On Fri, May 30, 2014 at 10:44:54AM -0400, Mike Snitzer wrote: > I'm not talking about alignment relative to the physical device's > limits. I'm talking about alignment of ext4's data areas relative to > the 64K block boundaries. > > Also a point of conern would be: how fragmented is the ext4 space? It > could be that it cannot get contiguous 64K regions from the namespace. > If that is the case than a lot more IO would get pulled in. I would be surprised if it was fragmented, since it's a recently created filesystem which has only been used to store a few huge disk images ... > Can you try reducing the cache blocksize to 32K (lowest we support at > the moment, it'll require you to remove the cache and recreate) to see > if performance for this 64K random IO workload improves? If so it does > start to add weight to my alignment concerns. ... nevertheless what I will do is recreate the origin LV, ext4 filesystem, and change the block size. What is the command to set the cache blocksize? It doesn't seem to be covered in the documentation anywhere. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-builder quickly builds VMs from scratch http://libguestfs.org/virt-builder.1.html ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [linux-lvm] Testing the new LVM cache feature 2014-05-30 14:51 ` Richard W.M. Jones @ 2014-05-30 14:58 ` Mike Snitzer 2014-05-30 15:28 ` Richard W.M. Jones 0 siblings, 1 reply; 37+ messages in thread From: Mike Snitzer @ 2014-05-30 14:58 UTC (permalink / raw) To: Richard W.M. Jones Cc: Heinz Mauelshagen, Zdenek Kabelac, thornber, LVM general discussion and development On Fri, May 30 2014 at 10:51am -0400, Richard W.M. Jones <rjones@redhat.com> wrote: > On Fri, May 30, 2014 at 10:44:54AM -0400, Mike Snitzer wrote: > > I'm not talking about alignment relative to the physical device's > > limits. I'm talking about alignment of ext4's data areas relative to > > the 64K block boundaries. > > > > Also a point of conern would be: how fragmented is the ext4 space? It > > could be that it cannot get contiguous 64K regions from the namespace. > > If that is the case than a lot more IO would get pulled in. > > I would be surprised if it was fragmented, since it's a recently > created filesystem which has only been used to store a few huge disk > images ... > > > Can you try reducing the cache blocksize to 32K (lowest we support at > > the moment, it'll require you to remove the cache and recreate) to see > > if performance for this 64K random IO workload improves? If so it does > > start to add weight to my alignment concerns. > > ... nevertheless what I will do is recreate the origin LV, ext4 > filesystem, and change the block size. You don't need to recreate the origin LV or FS. If anything that'd reduce our ability to answer what may be currently wrong with the setup. I was just suggesting removing the cache and recreating the cache layer. Not saure how easy it is to do that with the lvm2 interface. Jon and/or Kabi? > What is the command to set the cache blocksize? It doesn't seem to be > covered in the documentation anywhere. I would think it is lvconvert's --chunksize... ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [linux-lvm] Testing the new LVM cache feature 2014-05-30 14:58 ` Mike Snitzer @ 2014-05-30 15:28 ` Richard W.M. Jones 2014-05-30 18:16 ` Mike Snitzer 0 siblings, 1 reply; 37+ messages in thread From: Richard W.M. Jones @ 2014-05-30 15:28 UTC (permalink / raw) To: Mike Snitzer Cc: Heinz Mauelshagen, Zdenek Kabelac, thornber, LVM general discussion and development [-- Attachment #1: Type: text/plain, Size: 1729 bytes --] I did in fact recreate the ext4 filesystem, because I didn't read your email in time. Here are the commands I used to create the whole lot: ---------------------------------------------------------------------- lvcreate -L 800G -n testorigin vg_guests @slow mkfs -t ext4 /dev/vg_guests/testorigin # at this point, I tested the speed of the uncached LV, see below lvcreate -L 1G -n lv_cache_meta vg_guests @ssd lvcreate -L 200G -n lv_cache vg_guests @ssd lvconvert --type cache-pool --chunksize 32k --poolmetadata vg_guests/lv_cache_meta vg_guests/lv_cache lvconvert --type cache --cachepool vg_guests/lv_cache vg_guests/testorigin dmsetup message vg_guests-testorigin 0 sequential_threshold 0 dmsetup message vg_guests-testorigin 0 read_promote_adjustment 0 dmsetup message vg_guests-testorigin 0 write_promote_adjustment 0 # at this point, I tested the speed of the cached LV, see below ---------------------------------------------------------------------- To test the uncached LV, I ran the same fio test twice on the mounted ext4 filesystem. The results of the second run are in the first attachment. To test the cached LV, I ran these commands 3 times in a row: md5sum virt.* echo 3 > /proc/sys/vm/drop_caches then I ran the fio test twice. The results of the second run are attached. This time the LVM cache test is about 10% slower than the HDD test. I'm not sure what to make of that at all. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com Fedora Windows cross-compiler. Compile Windows programs, test, and build Windows installers. Over 100 libraries supported. http://fedoraproject.org/wiki/MinGW [-- Attachment #2: virt-ham0-testorigin-hdd.txt --] [-- Type: text/plain, Size: 9289 bytes --] virt: (g=0): rw=randrw, bs=64K-64K/64K-64K/64K-64K, ioengine=libaio, iodepth=4 ... virt: (g=0): rw=randrw, bs=64K-64K/64K-64K/64K-64K, ioengine=libaio, iodepth=4 fio-2.1.2 Starting 4 processes virt: (groupid=0, jobs=1): err= 0: pid=5346: Fri May 30 16:06:21 2014 read : io=523520KB, bw=2910.4KB/s, iops=45, runt=179881msec slat (usec): min=21, max=307271, avg=162.28, stdev=4500.17 clat (usec): min=4, max=1491.2K, avg=78284.57, stdev=119672.87 lat (usec): min=362, max=1491.2K, avg=78447.46, stdev=119690.26 clat percentiles (usec): | 1.00th=[ 410], 5.00th=[ 5536], 10.00th=[ 7968], 20.00th=[12352], | 30.00th=[17280], 40.00th=[24448], 50.00th=[35072], 60.00th=[48896], | 70.00th=[74240], 80.00th=[116224], 90.00th=[195584], 95.00th=[288768], | 99.00th=[577536], 99.50th=[782336], 99.90th=[1187840], 99.95th=[1335296], | 99.99th=[1499136] bw (KB /s): min= 228, max=14924, per=25.99%, avg=3025.00, stdev=1860.17 write: io=525056KB, bw=2918.1KB/s, iops=45, runt=179881msec slat (usec): min=32, max=327239, avg=577.55, stdev=9388.69 clat (usec): min=330, max=1294.2K, avg=8890.55, stdev=46138.42 lat (usec): min=402, max=1294.3K, avg=9468.75, stdev=47132.58 clat percentiles (usec): | 1.00th=[ 358], 5.00th=[ 370], 10.00th=[ 430], 20.00th=[ 450], | 30.00th=[ 668], 40.00th=[ 940], 50.00th=[ 1272], 60.00th=[ 1336], | 70.00th=[ 1560], 80.00th=[ 5600], 90.00th=[15424], 95.00th=[24192], | 99.00th=[144384], 99.50th=[228352], 99.90th=[741376], 99.95th=[872448], | 99.99th=[1286144] bw (KB /s): min= 105, max=13660, per=25.98%, avg=3033.09, stdev=2056.10 lat (usec) : 10=0.01%, 100=0.01%, 250=0.02%, 500=12.04%, 750=6.01% lat (usec) : 1000=3.24% lat (msec) : 2=16.93%, 4=2.02%, 10=9.87%, 20=13.46%, 50=15.23% lat (msec) : 100=8.54%, 250=9.04%, 500=2.72%, 750=0.54%, 1000=0.24% lat (msec) : 2000=0.09% cpu : usr=0.14%, sys=0.59%, ctx=16625, majf=0, minf=24 IO depths : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=8180/w=8204/d=0, short=r=0/w=0/d=0 virt: (groupid=0, jobs=1): err= 0: pid=5347: Fri May 30 16:06:21 2014 read : io=523520KB, bw=3348.3KB/s, iops=52, runt=156355msec slat (usec): min=25, max=242606, avg=177.64, stdev=4926.94 clat (usec): min=5, max=1128.2K, avg=69030.33, stdev=92416.74 lat (usec): min=357, max=1128.2K, avg=69208.61, stdev=92459.50 clat percentiles (msec): | 1.00th=[ 4], 5.00th=[ 7], 10.00th=[ 9], 20.00th=[ 14], | 30.00th=[ 19], 40.00th=[ 26], 50.00th=[ 37], 60.00th=[ 51], | 70.00th=[ 71], 80.00th=[ 105], 90.00th=[ 172], 95.00th=[ 241], | 99.00th=[ 416], 99.50th=[ 545], 99.90th=[ 922], 99.95th=[ 1004], | 99.99th=[ 1123] bw (KB /s): min= 63, max= 6876, per=29.66%, avg=3452.75, stdev=1274.21 write: io=525056KB, bw=3358.2KB/s, iops=52, runt=156355msec slat (usec): min=37, max=335316, avg=588.63, stdev=10049.71 clat (usec): min=326, max=1003.8K, avg=6620.20, stdev=43120.65 lat (usec): min=413, max=1003.9K, avg=7209.47, stdev=44354.58 clat percentiles (usec): | 1.00th=[ 358], 5.00th=[ 366], 10.00th=[ 406], 20.00th=[ 442], | 30.00th=[ 620], 40.00th=[ 756], 50.00th=[ 1176], 60.00th=[ 1272], | 70.00th=[ 1320], 80.00th=[ 1480], 90.00th=[ 2640], 95.00th=[15808], | 99.00th=[140288], 99.50th=[193536], 99.90th=[864256], 99.95th=[897024], | 99.99th=[1003520] bw (KB /s): min= 72, max= 8762, per=29.71%, avg=3468.34, stdev=1603.79 lat (usec) : 10=0.01%, 250=0.02%, 500=13.07%, 750=7.15%, 1000=3.34% lat (msec) : 2=21.17%, 4=1.43%, 10=7.20%, 20=10.63%, 50=14.55% lat (msec) : 100=10.03%, 250=9.04%, 500=1.93%, 750=0.23%, 1000=0.15% lat (msec) : 2000=0.04% cpu : usr=0.16%, sys=0.68%, ctx=16915, majf=0, minf=24 IO depths : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=8180/w=8204/d=0, short=r=0/w=0/d=0 virt: (groupid=0, jobs=1): err= 0: pid=5348: Fri May 30 16:06:21 2014 read : io=523520KB, bw=3268.2KB/s, iops=51, runt=160195msec slat (usec): min=26, max=338078, avg=259.14, stdev=6769.07 clat (usec): min=4, max=903517, avg=70958.19, stdev=87737.74 lat (usec): min=375, max=903571, avg=71217.97, stdev=87838.17 clat percentiles (msec): | 1.00th=[ 5], 5.00th=[ 7], 10.00th=[ 10], 20.00th=[ 15], | 30.00th=[ 21], 40.00th=[ 29], 50.00th=[ 40], 60.00th=[ 55], | 70.00th=[ 76], 80.00th=[ 110], 90.00th=[ 174], 95.00th=[ 243], | 99.00th=[ 429], 99.50th=[ 506], 99.90th=[ 725], 99.95th=[ 816], | 99.99th=[ 906] bw (KB /s): min= 173, max= 7153, per=28.77%, avg=3349.59, stdev=1188.82 write: io=525056KB, bw=3277.7KB/s, iops=51, runt=160195msec slat (usec): min=42, max=303703, avg=517.35, stdev=9112.62 clat (usec): min=340, max=1461.5K, avg=6556.03, stdev=47381.54 lat (usec): min=411, max=1461.6K, avg=7074.03, stdev=48289.76 clat percentiles (usec): | 1.00th=[ 362], 5.00th=[ 370], 10.00th=[ 398], 20.00th=[ 446], | 30.00th=[ 636], 40.00th=[ 780], 50.00th=[ 1192], 60.00th=[ 1288], | 70.00th=[ 1336], 80.00th=[ 1496], 90.00th=[ 3600], 95.00th=[15680], | 99.00th=[138240], 99.50th=[201728], 99.90th=[733184], 99.95th=[856064], | 99.99th=[1466368] bw (KB /s): min= 173, max= 9708, per=28.86%, avg=3369.21, stdev=1468.01 lat (usec) : 10=0.01%, 100=0.01%, 250=0.03%, 500=12.98%, 750=6.61% lat (usec) : 1000=3.67% lat (msec) : 2=20.85%, 4=1.40%, 10=6.91%, 20=10.44%, 50=14.89% lat (msec) : 100=10.28%, 250=9.41%, 500=2.17%, 750=0.25%, 1000=0.07% lat (msec) : 2000=0.02% cpu : usr=0.15%, sys=0.68%, ctx=16814, majf=0, minf=24 IO depths : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=8180/w=8204/d=0, short=r=0/w=0/d=0 virt: (groupid=0, jobs=1): err= 0: pid=5349: Fri May 30 16:06:21 2014 read : io=523520KB, bw=2992.8KB/s, iops=46, runt=174928msec slat (usec): min=23, max=311675, avg=171.27, stdev=4974.27 clat (usec): min=75, max=1114.8K, avg=76585.62, stdev=105112.42 lat (usec): min=364, max=1114.9K, avg=76757.53, stdev=105149.80 clat percentiles (usec): | 1.00th=[ 1704], 5.00th=[ 6240], 10.00th=[ 8768], 20.00th=[13376], | 30.00th=[19328], 40.00th=[27520], 50.00th=[39168], 60.00th=[54528], | 70.00th=[77312], 80.00th=[114176], 90.00th=[187392], 95.00th=[272384], | 99.00th=[518144], 99.50th=[675840], 99.90th=[913408], 99.95th=[995328], | 99.99th=[1122304] bw (KB /s): min= 145, max= 9984, per=26.41%, avg=3074.43, stdev=1438.71 write: io=525056KB, bw=3001.6KB/s, iops=46, runt=174928msec slat (usec): min=38, max=265589, avg=587.60, stdev=9637.20 clat (usec): min=342, max=906290, avg=8148.18, stdev=44226.22 lat (usec): min=406, max=906371, avg=8736.42, stdev=45334.93 clat percentiles (usec): | 1.00th=[ 358], 5.00th=[ 366], 10.00th=[ 422], 20.00th=[ 446], | 30.00th=[ 644], 40.00th=[ 852], 50.00th=[ 1240], 60.00th=[ 1304], | 70.00th=[ 1416], 80.00th=[ 1960], 90.00th=[11200], 95.00th=[21888], | 99.00th=[146432], 99.50th=[216064], 99.90th=[815104], 99.95th=[856064], | 99.99th=[905216] bw (KB /s): min= 74, max= 9472, per=26.38%, avg=3079.66, stdev=1727.46 lat (usec) : 100=0.01%, 250=0.01%, 500=12.76%, 750=6.02%, 1000=3.50% lat (msec) : 2=18.48%, 4=1.65%, 10=8.47%, 20=11.68%, 50=14.81% lat (msec) : 100=10.03%, 250=9.42%, 500=2.51%, 750=0.41%, 1000=0.23% lat (msec) : 2000=0.02% cpu : usr=0.12%, sys=0.63%, ctx=16535, majf=0, minf=23 IO depths : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=8180/w=8204/d=0, short=r=0/w=0/d=0 Run status group 0 (all jobs): READ: io=2045.0MB, aggrb=11641KB/s, minb=2910KB/s, maxb=3348KB/s, mint=156355msec, maxt=179881msec WRITE: io=2051.0MB, aggrb=11675KB/s, minb=2918KB/s, maxb=3358KB/s, mint=156355msec, maxt=179881msec Disk stats (read/write): dm-0: ios=32704/33128, merge=0/0, ticks=2371685/821383, in_queue=3193165, util=100.00%, aggrios=32720/33200, aggrmerge=0/0, aggrticks=0/0, aggrin_queue=0, aggrutil=0.00% md127: ios=32720/33200, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=16359/33122, aggrmerge=0/78, aggrticks=1185825/498678, aggrin_queue=1684398, aggrutil=99.75% sda: ios=8461/33122, merge=1/78, ticks=393799/195017, in_queue=588764, util=95.85% sdb: ios=24258/33122, merge=0/78, ticks=1977851/802340, in_queue=2780032, util=99.75% [-- Attachment #3: virt-ham0-testorigin-lvmcache.txt --] [-- Type: text/plain, Size: 9749 bytes --] virt: (g=0): rw=randrw, bs=64K-64K/64K-64K/64K-64K, ioengine=libaio, iodepth=4 ... virt: (g=0): rw=randrw, bs=64K-64K/64K-64K/64K-64K, ioengine=libaio, iodepth=4 fio-2.1.2 Starting 4 processes virt: (groupid=0, jobs=1): err= 0: pid=5531: Fri May 30 16:25:54 2014 read : io=523520KB, bw=2629.4KB/s, iops=41, runt=199130msec slat (usec): min=24, max=197634, avg=143.59, stdev=2991.82 clat (usec): min=174, max=26698K, avg=67465.64, stdev=365815.35 lat (usec): min=388, max=26698K, avg=67609.87, stdev=365817.53 clat percentiles (usec): | 1.00th=[ 398], 5.00th=[ 5600], 10.00th=[ 7968], 20.00th=[11584], | 30.00th=[15552], 40.00th=[21376], 50.00th=[29568], 60.00th=[41216], | 70.00th=[60160], 80.00th=[90624], 90.00th=[152576], 95.00th=[226304], | 99.00th=[440320], 99.50th=[544768], 99.90th=[880640], 99.95th=[1515520], | 99.99th=[16711680] bw (KB /s): min= 320, max=13128, per=25.53%, avg=2685.01, stdev=1713.14 write: io=525056KB, bw=2636.8KB/s, iops=41, runt=199130msec slat (usec): min=33, max=240957, avg=320.30, stdev=5617.33 clat (usec): min=353, max=65433K, avg=29338.06, stdev=1020379.84 lat (usec): min=437, max=65433K, avg=29659.01, stdev=1020395.97 clat percentiles (usec): | 1.00th=[ 370], 5.00th=[ 378], 10.00th=[ 398], 20.00th=[ 454], | 30.00th=[ 620], 40.00th=[ 900], 50.00th=[ 1240], 60.00th=[ 1320], | 70.00th=[ 1608], 80.00th=[ 7584], 90.00th=[19584], 95.00th=[30848], | 99.00th=[103936], 99.50th=[175104], 99.90th=[815104], 99.95th=[3063808], | 99.99th=[16711680] bw (KB /s): min= 114, max=13254, per=25.43%, avg=2681.93, stdev=1885.06 lat (usec) : 250=0.01%, 500=13.29%, 750=5.71%, 1000=2.83% lat (msec) : 2=15.81%, 4=1.70%, 10=9.86%, 20=14.95%, 50=17.03% lat (msec) : 100=9.34%, 250=7.23%, 500=1.78%, 750=0.32%, 1000=0.05% lat (msec) : 2000=0.02%, >=2000=0.05% cpu : usr=0.10%, sys=0.77%, ctx=16731, majf=0, minf=24 IO depths : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=8180/w=8204/d=0, short=r=0/w=0/d=0 virt: (groupid=0, jobs=1): err= 0: pid=5532: Fri May 30 16:25:54 2014 read : io=523520KB, bw=3213.1KB/s, iops=50, runt=162892msec slat (usec): min=47, max=185115, avg=128.16, stdev=2376.71 clat (usec): min=4, max=25770K, avg=59285.18, stdev=394418.17 lat (usec): min=401, max=25770K, avg=59413.99, stdev=394421.00 clat percentiles (msec): | 1.00th=[ 4], 5.00th=[ 7], 10.00th=[ 9], 20.00th=[ 13], | 30.00th=[ 17], 40.00th=[ 22], 50.00th=[ 30], 60.00th=[ 41], | 70.00th=[ 56], 80.00th=[ 78], 90.00th=[ 126], 95.00th=[ 182], | 99.00th=[ 347], 99.50th=[ 420], 99.90th=[ 578], 99.95th=[ 635], | 99.99th=[16712] bw (KB /s): min= 242, max= 6629, per=31.56%, avg=3318.74, stdev=1376.74 write: io=525056KB, bw=3223.4KB/s, iops=50, runt=162892msec slat (usec): min=61, max=269043, avg=299.24, stdev=5817.49 clat (usec): min=326, max=69958K, avg=19848.94, stdev=886022.82 lat (usec): min=444, max=69959K, avg=20148.82, stdev=886040.88 clat percentiles (usec): | 1.00th=[ 366], 5.00th=[ 374], 10.00th=[ 390], 20.00th=[ 446], | 30.00th=[ 524], 40.00th=[ 700], 50.00th=[ 1112], 60.00th=[ 1256], | 70.00th=[ 1304], 80.00th=[ 1448], 90.00th=[ 2024], 95.00th=[17024], | 99.00th=[64768], 99.50th=[115200], 99.90th=[346112], 99.95th=[3260416], | 99.99th=[16711680] bw (KB /s): min= 121, max= 8796, per=31.72%, avg=3345.35, stdev=1578.99 lat (usec) : 10=0.01%, 100=0.01%, 500=14.81%, 750=6.41%, 1000=3.04% lat (msec) : 2=21.03%, 4=1.18%, 10=7.43%, 20=12.62%, 50=16.28% lat (msec) : 100=9.69%, 250=6.25%, 500=1.08%, 750=0.12%, 1000=0.01% lat (msec) : >=2000=0.04% cpu : usr=0.13%, sys=0.93%, ctx=17115, majf=0, minf=24 IO depths : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=8180/w=8204/d=0, short=r=0/w=0/d=0 virt: (groupid=0, jobs=1): err= 0: pid=5533: Fri May 30 16:25:54 2014 read : io=523520KB, bw=2713.7KB/s, iops=42, runt=192920msec slat (usec): min=35, max=30712, avg=92.16, stdev=338.87 clat (usec): min=315, max=63712K, avg=73273.62, stdev=983181.94 lat (usec): min=401, max=63713K, avg=73366.42, stdev=983181.85 clat percentiles (msec): | 1.00th=[ 3], 5.00th=[ 7], 10.00th=[ 9], 20.00th=[ 13], | 30.00th=[ 18], 40.00th=[ 24], 50.00th=[ 32], 60.00th=[ 42], | 70.00th=[ 58], 80.00th=[ 83], 90.00th=[ 135], 95.00th=[ 202], | 99.00th=[ 383], 99.50th=[ 482], 99.90th=[ 1401], 99.95th=[ 1483], | 99.99th=[16712] bw (KB /s): min= 121, max= 9216, per=31.94%, avg=3358.56, stdev=1911.18 write: io=525056KB, bw=2721.7KB/s, iops=42, runt=192920msec slat (usec): min=56, max=258524, avg=400.46, stdev=7012.38 clat (usec): min=348, max=70388K, avg=20488.66, stdev=849098.66 lat (usec): min=435, max=70389K, avg=20889.77, stdev=849131.29 clat percentiles (usec): | 1.00th=[ 366], 5.00th=[ 374], 10.00th=[ 394], 20.00th=[ 450], | 30.00th=[ 620], 40.00th=[ 876], 50.00th=[ 1224], 60.00th=[ 1288], | 70.00th=[ 1464], 80.00th=[ 2960], 90.00th=[15808], 95.00th=[24960], | 99.00th=[87552], 99.50th=[218112], 99.90th=[888832], 99.95th=[1400832], | 99.99th=[16711680] bw (KB /s): min= 2, max=10112, per=31.98%, avg=3372.91, stdev=2142.52 lat (usec) : 500=12.87%, 750=5.95%, 1000=2.91% lat (msec) : 2=18.21%, 4=1.31%, 10=8.83%, 20=13.82%, 50=17.58% lat (msec) : 100=10.40%, 250=6.34%, 500=1.43%, 750=0.20%, 1000=0.04% lat (msec) : 2000=0.07%, >=2000=0.04% cpu : usr=0.12%, sys=0.77%, ctx=16939, majf=0, minf=24 IO depths : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=8180/w=8204/d=0, short=r=0/w=0/d=0 virt: (groupid=0, jobs=1): err= 0: pid=5534: Fri May 30 16:25:54 2014 read : io=523520KB, bw=2857.1KB/s, iops=44, runt=183181msec slat (usec): min=44, max=195215, avg=112.54, stdev=2157.49 clat (usec): min=312, max=43560K, avg=74513.46, stdev=550534.41 lat (usec): min=375, max=43560K, avg=74626.63, stdev=550535.28 clat percentiles (msec): | 1.00th=[ 4], 5.00th=[ 7], 10.00th=[ 9], 20.00th=[ 13], | 30.00th=[ 18], 40.00th=[ 25], 50.00th=[ 35], 60.00th=[ 48], | 70.00th=[ 67], 80.00th=[ 96], 90.00th=[ 159], 95.00th=[ 233], | 99.00th=[ 441], 99.50th=[ 545], 99.90th=[ 979], 99.95th=[ 1205], | 99.99th=[16712] bw (KB /s): min= 348, max= 7734, per=27.54%, avg=2895.85, stdev=1213.49 write: io=525056KB, bw=2866.4KB/s, iops=44, runt=183181msec slat (usec): min=54, max=275450, avg=324.36, stdev=5740.29 clat (usec): min=348, max=47249K, avg=14560.10, stdev=569838.75 lat (usec): min=441, max=47249K, avg=14885.10, stdev=569878.87 clat percentiles (usec): | 1.00th=[ 366], 5.00th=[ 374], 10.00th=[ 394], 20.00th=[ 450], | 30.00th=[ 564], 40.00th=[ 740], 50.00th=[ 1192], 60.00th=[ 1256], | 70.00th=[ 1336], 80.00th=[ 1576], 90.00th=[10432], 95.00th=[21888], | 99.00th=[99840], 99.50th=[166912], 99.90th=[528384], 99.95th=[692224], | 99.99th=[16711680] bw (KB /s): min= 213, max= 7303, per=27.52%, avg=2901.91, stdev=1403.33 lat (usec) : 500=14.29%, 750=6.23%, 1000=2.95% lat (msec) : 2=19.19%, 4=1.25%, 10=8.00%, 20=12.42%, 50=15.55% lat (msec) : 100=10.17%, 250=7.70%, 500=1.84%, 750=0.31%, 1000=0.04% lat (msec) : 2000=0.03%, >=2000=0.04% cpu : usr=0.13%, sys=0.81%, ctx=16805, majf=0, minf=23 IO depths : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=8180/w=8204/d=0, short=r=0/w=0/d=0 Run status group 0 (all jobs): READ: io=2045.0MB, aggrb=10516KB/s, minb=2629KB/s, maxb=3213KB/s, mint=162892msec, maxt=199130msec WRITE: io=2051.0MB, aggrb=10546KB/s, minb=2636KB/s, maxb=3223KB/s, mint=162892msec, maxt=199130msec Disk stats (read/write): dm-0: ios=65408/67846, merge=0/0, ticks=4160319/11781438, in_queue=15944596, util=100.00%, aggrios=21839/22771, aggrmerge=0/0, aggrticks=1288260/3918688, aggrin_queue=5206991, aggrutil=100.00% dm-1: ios=84/117, merge=0/0, ticks=120/182, in_queue=302, util=0.11%, aggrios=96/383, aggrmerge=0/62, aggrticks=124/431, aggrin_queue=555, aggrutil=0.22% sdc: ios=96/383, merge=0/62, ticks=124/431, in_queue=555, util=0.22% dm-2: ios=12/328, merge=0/0, ticks=3/268, in_queue=271, util=0.12% dm-5: ios=65421/67870, merge=0/0, ticks=3864659/11755615, in_queue=15620402, util=100.00%, aggrios=65421/67888, aggrmerge=0/0, aggrticks=0/0, aggrin_queue=0, aggrutil=0.00% md127: ios=65421/67888, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=16422/33219, aggrmerge=16286/34662, aggrticks=970890/852897, aggrin_queue=1823577, aggrutil=99.59% sda: ios=9379/33225, merge=9264/34659, ticks=406471/203345, in_queue=609659, util=96.99% sdb: ios=23466/33213, merge=23309/34666, ticks=1535310/1502449, in_queue=3037496, util=99.59% ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [linux-lvm] Testing the new LVM cache feature 2014-05-30 15:28 ` Richard W.M. Jones @ 2014-05-30 18:16 ` Mike Snitzer 2014-05-30 20:53 ` Mike Snitzer 0 siblings, 1 reply; 37+ messages in thread From: Mike Snitzer @ 2014-05-30 18:16 UTC (permalink / raw) To: Richard W.M. Jones Cc: Heinz Mauelshagen, LVM general discussion and development, thornber, Zdenek Kabelac On Fri, May 30 2014 at 11:28am -0400, Richard W.M. Jones <rjones@redhat.com> wrote: > I did in fact recreate the ext4 filesystem, because I didn't read your > email in time. > > Here are the commands I used to create the whole lot: > > ---------------------------------------------------------------------- > lvcreate -L 800G -n testorigin vg_guests @slow > mkfs -t ext4 /dev/vg_guests/testorigin > # at this point, I tested the speed of the uncached LV, see below > lvcreate -L 1G -n lv_cache_meta vg_guests @ssd > lvcreate -L 200G -n lv_cache vg_guests @ssd > lvconvert --type cache-pool --chunksize 32k --poolmetadata vg_guests/lv_cache_meta vg_guests/lv_cache > lvconvert --type cache --cachepool vg_guests/lv_cache vg_guests/testorigin > dmsetup message vg_guests-testorigin 0 sequential_threshold 0 > dmsetup message vg_guests-testorigin 0 read_promote_adjustment 0 > dmsetup message vg_guests-testorigin 0 write_promote_adjustment 0 > # at this point, I tested the speed of the cached LV, see below > ---------------------------------------------------------------------- > > To test the uncached LV, I ran the same fio test twice on the mounted > ext4 filesystem. The results of the second run are in the first > attachment. > > To test the cached LV, I ran these commands 3 times in a row: > > md5sum virt.* > echo 3 > /proc/sys/vm/drop_caches > > then I ran the fio test twice. The results of the second run are > attached. > > This time the LVM cache test is about 10% slower than the HDD test. > I'm not sure what to make of that at all. It could be that the 32k cache blocksize increased the metadata overhead enough to reduce the performance to that degree. And even though you recreated the filesystem it still could be the case that the IO issued from ext4 is slightly misaligned. I'd welcome you going to back to a blocksize of 64K (you don't _need_ to go to 64K but it seems you're giving up quite a bit of performance now). And then collecting blktraces of the origin volume for the fio run -- to see if 64K * 2 IOs are being issued for each 64K fio IO. I would think it would be fairly clear from the blktrace but maybe not. It could be that a targeted debug line in dm-cache would serve as a better canary for whether misalignment is a concern. I'll see if I can come up with a patch that helps us assess misalignment. Joe Thornber will be back from holiday on Monday so we may get some additional insight from him soon enough. Sorry for your troubles but this is good feedback. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [linux-lvm] Testing the new LVM cache feature 2014-05-30 18:16 ` Mike Snitzer @ 2014-05-30 20:53 ` Mike Snitzer 0 siblings, 0 replies; 37+ messages in thread From: Mike Snitzer @ 2014-05-30 20:53 UTC (permalink / raw) To: Richard W.M. Jones Cc: Heinz Mauelshagen, LVM general discussion and development, thornber, Zdenek Kabelac On Fri, May 30 2014 at 2:16pm -0400, Mike Snitzer <snitzer@redhat.com> wrote: > On Fri, May 30 2014 at 11:28am -0400, > Richard W.M. Jones <rjones@redhat.com> wrote: > > > > This time the LVM cache test is about 10% slower than the HDD test. > > I'm not sure what to make of that at all. > > It could be that the 32k cache blocksize increased the metadata overhead > enough to reduce the performance to that degree. > > And even though you recreated the filesystem it still could be the case > that the IO issued from ext4 is slightly misaligned. I'd welcome you > going to back to a blocksize of 64K (you don't _need_ to go to 64K but it > seems you're giving up quite a bit of performance now). And then > collecting blktraces of the origin volume for the fio run -- to see if > 64K * 2 IOs are being issued for each 64K fio IO. I would think it > would be fairly clear from the blktrace but maybe not. Thinking about this a little more: if the IO that ext4 is issuing to the cache is aligned on a blocksize boundary (e.g. 64K) we really shouldn't see _any_ IO from the origin device when you are running fio. The reason is we avoid promoting (aka copying) from the origin if an entire cache block is being overwritten. Looking at the fio output from the cache run you did using the 32K blocksize it is very clear that the MD array (on sda and sdb) is involved quite a lot. And your even older fio run output when using the original 64K blocksize shows a bunch of IO to md127... So it seems fairly clear that dm-cache isn't utilizing the cache block overwrite optimization it has to avoid promotions from the origin. This would _seem_ to validate my concern about alignment.. or something else needs to explain why we're not able to avoid promotions. If you have time to reconfigure with 64K blocksize and rerun the fio test, please look at the amount of write IO performed by md127 (and sda and sdb).. and also look at the number of promotions, via 'dmsetup status' for the cache device, before and after the fio run. We can try to reproduce using a pristine ext4 filesystem ontop of MD with the fio job you provided... and I'm now wondering if we're getting bitten by DM stacked on MD (due to bvec merge being limited to 1 page, see linux.git commit 8cbeb67a for some additional context). So it may be worth trying _without_ MD raid1 just as a test. Use either sda or sdb directly as the origin volume. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [linux-lvm] Testing the new LVM cache feature 2014-05-30 13:46 ` Richard W.M. Jones 2014-05-30 13:54 ` Heinz Mauelshagen @ 2014-05-30 13:55 ` Mike Snitzer 2014-05-30 14:29 ` Richard W.M. Jones 1 sibling, 1 reply; 37+ messages in thread From: Mike Snitzer @ 2014-05-30 13:55 UTC (permalink / raw) To: Richard W.M. Jones Cc: Heinz Mauelshagen, Zdenek Kabelac, thornber, LVM general discussion and development On Fri, May 30 2014 at 9:46am -0400, Richard W.M. Jones <rjones@redhat.com> wrote: > I have now set both read_promote_adjustment == > write_promote_adjustment == 0 and used drop_caches between runs. > > I also read Documentation/device-mapper/cache-policies.txt at Heinz's > suggestion. > > I'm afraid the performance of the fio test is still not the same as > the SSD (4.8 times slower than the SSD-only test now). Obviously not what we want. But you're not doing any repeated IO to those blocks.. it is purely random right? So really, the cache is waiting for blocks to get promoted from the origin if the IOs from fio don't completely cover the cache block size you've specified. Can you go back over those settings? From your dmsetup table output you shared earlier in the thread you're using a blocksize of 128 sectors (or 64K). And your fio random write workload is using 64K. So unless you have misaligned IO you _should_ be able to avoid reading from the origin. But XFS is in play here.. I'm wondering if it is issuing IO differently than we'd otherwise see if you were testing against the block devices directly... > Would repeated runs of (md5sum virt.* ; echo 3 > /proc/sys/vm/drop_caches) > not eventually cause the whole file to be placed on the SSD? > It does seem very counter-intuitive if not. If you set read_promote_adjustment to 0 it should pull the associated blocks into the cache. What makes you think it isn't? How are you judging the performance of the md5sum IO? Do you see IO being issued to the origin via blktrace or something? ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [linux-lvm] Testing the new LVM cache feature 2014-05-30 13:55 ` Mike Snitzer @ 2014-05-30 14:29 ` Richard W.M. Jones 2014-05-30 14:36 ` Mike Snitzer 0 siblings, 1 reply; 37+ messages in thread From: Richard W.M. Jones @ 2014-05-30 14:29 UTC (permalink / raw) To: Mike Snitzer Cc: Heinz Mauelshagen, Zdenek Kabelac, thornber, LVM general discussion and development On Fri, May 30, 2014 at 09:55:29AM -0400, Mike Snitzer wrote: > So unless you have misaligned IO you _should_ be able to avoid reading > from the origin. But XFS is in play here.. I'm wondering if it is The filesystem is ext4. > If you set read_promote_adjustment to 0 it should pull the associated > blocks into the cache. What makes you think it isn't? The fio test is about twice as fast as when I ran the fio test directly on the hard disk array. This test runs about 5 times slower than when I ran it directly on the SSD. I'm not measuring the speed of the md5sum operation. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-top is 'top' for virtual machines. Tiny program with many powerful monitoring features, net stats, disk stats, logging, etc. http://people.redhat.com/~rjones/virt-top ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [linux-lvm] Testing the new LVM cache feature 2014-05-30 14:29 ` Richard W.M. Jones @ 2014-05-30 14:36 ` Mike Snitzer 0 siblings, 0 replies; 37+ messages in thread From: Mike Snitzer @ 2014-05-30 14:36 UTC (permalink / raw) To: Richard W.M. Jones Cc: Heinz Mauelshagen, LVM general discussion and development, thornber, Zdenek Kabelac On Fri, May 30 2014 at 10:29am -0400, Richard W.M. Jones <rjones@redhat.com> wrote: > On Fri, May 30, 2014 at 09:55:29AM -0400, Mike Snitzer wrote: > > So unless you have misaligned IO you _should_ be able to avoid reading > > from the origin. But XFS is in play here.. I'm wondering if it is > > The filesystem is ext4. OK, so I have even more concern about misalignment then. At least XFS goes to great lengths to build large IOs if Direct IO is used (via bio_add_page, the optimal io size is used to build the IO up). I'm not aware of ext4 taking similar steps but it could be it does now (I vaguely remember ext4 borrowing heavily from XFS at one point, could've been for direct IO). We need better tools for assessing whether the IO is misaligned. But for now we'd have to start with looking at blktrace data to the underlying origin device. If we keep seeing >64K sequential IOs to the origin that would speak to dm-cache pulling in 2 64K blocks from the origin. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [linux-lvm] Testing the new LVM cache feature 2014-05-29 21:58 ` Mike Snitzer 2014-05-30 9:04 ` Richard W.M. Jones @ 2014-05-30 11:53 ` Mike Snitzer 1 sibling, 0 replies; 37+ messages in thread From: Mike Snitzer @ 2014-05-30 11:53 UTC (permalink / raw) To: Richard W.M. Jones Cc: Zdenek Kabelac, thornber, LVM general discussion and development On Thu, May 29 2014 at 5:58pm -0400, Mike Snitzer <snitzer@redhat.com> wrote: > BTW, this is all with a eye toward realizing the optimization that > dm-cache provides for origin blocks that were discarded (like I said > before dm-cache doesn't promote from the origin if the corresponding > block was marked for discard). So you don't _need_ to do any of > this.. purely about trying to optimize a bit more. And if you do make use of discards, you should have this stable fix applied to your kernel: https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=for-linus&id=f1daa838e861ae1a0fb7cd9721a21258430fcc8c ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [linux-lvm] Testing the new LVM cache feature 2014-05-29 20:47 ` Richard W.M. Jones 2014-05-29 21:06 ` Mike Snitzer @ 2014-05-30 11:38 ` Alasdair G Kergon 2014-05-30 11:45 ` Alasdair G Kergon 1 sibling, 1 reply; 37+ messages in thread From: Alasdair G Kergon @ 2014-05-30 11:38 UTC (permalink / raw) To: Richard W.M. Jones; +Cc: LVM general discussion and development On Thu, May 29, 2014 at 09:47:20PM +0100, Richard W.M. Jones wrote: > Is there a reason why fast and slow devices need to be in the same VG? > I've talked to two other people who found this very confusing. No one > knew that you could manually place LVs into different PVs, and it's > something of a pain to have to remember to place LVs every time you > create or resize one. It seems it would be a lot simpler if you could > have the slow PVs in one VG and the fast PVs in another VG. We recommend you use tags. Much more flexible/dynamic solution than forcing the use of separate VGs. pvchange --addtag ssd pvs -o+tags lvcreate ... $vg @ssd to restrict the allocation the command performs to the PVs with the 'ssd' tag. Alasdair ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [linux-lvm] Testing the new LVM cache feature 2014-05-30 11:38 ` Alasdair G Kergon @ 2014-05-30 11:45 ` Alasdair G Kergon 2014-05-30 12:45 ` Werner Gold 0 siblings, 1 reply; 37+ messages in thread From: Alasdair G Kergon @ 2014-05-30 11:45 UTC (permalink / raw) To: Richard W.M. Jones; +Cc: LVM general discussion and development And for lvextend, you should add any tags you are using in this way to lvm.conf: # When searching for free space to extend an LV, the "cling" # allocation policy will choose space on the same PVs as the last # segment of the existing LV. If there is insufficient space and a # list of tags is defined here, it will check whether any of them are # attached to the PVs concerned and then seek to match those PV tags # between existing extents and new extents. # Use the special tag "@*" as a wildcard to match any PV tag. # Example: LVs are mirrored between two sites within a single VG. # PVs are tagged with either @site1 or @site2 to indicate where # they are situated. # cling_tag_list = [ "@site1", "@site2" ] # cling_tag_list = [ "@*" ] (The "cling" allocation policy is enabled by default.) Alasdair ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [linux-lvm] Testing the new LVM cache feature 2014-05-30 11:45 ` Alasdair G Kergon @ 2014-05-30 12:45 ` Werner Gold 0 siblings, 0 replies; 37+ messages in thread From: Werner Gold @ 2014-05-30 12:45 UTC (permalink / raw) To: linux-lvm [-- Attachment #1: Type: text/plain, Size: 999 bytes --] Many thanks to Alasdair and Heinz for the hint with the tagging feature. More convenient than dealing with UUIDs. I also stumbled across the "same VG" issue when I tried to set up the test environment. Thanks to Richard for that hint. :-) I ran bonnie++ on my X230 (RHEL7) here where I used an external USB3 SSD. Attached you find the results. With cache, there is a significant difference in random create. That's what I would expect from an SSD cache. Werner -- Werner Gold wgold@redhat.com Partner Enablement / EMEA phone: 49.9331.803 855 Steinbachweg 23 fax: +49.9331.4407 97252 Frickenhausen/Main, Germany cell: +49.172.764 4633 Key fingerprint = FF91B07C 6F3D340E A71791AC 5E3A6CB4 D44CBC37 Reg. Adresse: Red Hat GmbH, Werner-von-Siemens-Ring 14, D-85630 Grasbrunn Handelsregister: Amtsgericht Muenchen HRB 153243 Geschaeftsfuehrer: Mark Hegarty, Charlie Peters, Michael Cunningham, Charles Cachera [-- Attachment #2: bonnie-dm-cache.html --] [-- Type: text/html, Size: 12622 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
end of thread, other threads:[~2014-05-30 20:53 UTC | newest] Thread overview: 37+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-05-22 10:18 [linux-lvm] Testing the new LVM cache feature Richard W.M. Jones 2014-05-22 14:43 ` Zdenek Kabelac 2014-05-22 15:22 ` Richard W.M. Jones 2014-05-22 15:49 ` Richard W.M. Jones 2014-05-22 18:04 ` Mike Snitzer 2014-05-22 18:13 ` Richard W.M. Jones 2014-05-29 13:52 ` Richard W.M. Jones 2014-05-29 20:34 ` Mike Snitzer 2014-05-29 20:47 ` Richard W.M. Jones 2014-05-29 21:06 ` Mike Snitzer 2014-05-29 21:19 ` Richard W.M. Jones 2014-05-29 21:58 ` Mike Snitzer 2014-05-30 9:04 ` Richard W.M. Jones 2014-05-30 10:30 ` Richard W.M. Jones 2014-05-30 13:38 ` Mike Snitzer 2014-05-30 13:40 ` Richard W.M. Jones 2014-05-30 13:42 ` Heinz Mauelshagen 2014-05-30 13:54 ` Richard W.M. Jones 2014-05-30 13:58 ` Zdenek Kabelac 2014-05-30 13:46 ` Richard W.M. Jones 2014-05-30 13:54 ` Heinz Mauelshagen 2014-05-30 14:26 ` Richard W.M. Jones 2014-05-30 14:29 ` Mike Snitzer 2014-05-30 14:36 ` Richard W.M. Jones 2014-05-30 14:44 ` Mike Snitzer 2014-05-30 14:51 ` Richard W.M. Jones 2014-05-30 14:58 ` Mike Snitzer 2014-05-30 15:28 ` Richard W.M. Jones 2014-05-30 18:16 ` Mike Snitzer 2014-05-30 20:53 ` Mike Snitzer 2014-05-30 13:55 ` Mike Snitzer 2014-05-30 14:29 ` Richard W.M. Jones 2014-05-30 14:36 ` Mike Snitzer 2014-05-30 11:53 ` Mike Snitzer 2014-05-30 11:38 ` Alasdair G Kergon 2014-05-30 11:45 ` Alasdair G Kergon 2014-05-30 12:45 ` Werner Gold
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).