[linux-lvm] Testing the new LVM cache feature

linux-lvm.redhat.com archive mirror
 help / color / mirror / Atom feed

* [linux-lvm] Testing the new LVM cache feature
@ 2014-05-22 10:18 Richard W.M. Jones
  2014-05-22 14:43 ` Zdenek Kabelac
  0 siblings, 1 reply; 37+ messages in thread
From: Richard W.M. Jones @ 2014-05-22 10:18 UTC (permalink / raw)
  To: linux-lvm

I've set up a computer in order to test the new LVM cache feature.  It
has a pair of 2 TB HDDs in RAID 1 configuration, and a 256 GB SSD.
The setup will be used to store large VM disk images in an ext4
filesystem, to be served both locally and over NFS.

Before I start I have some questions about this feature:

(1) Is there a minimum recommended version of LVM or kernel to use?  I
currently have lvm2-2.02.106-1.fc20.x86_64, which mentions LVM cache
in the lvm(8) man page.  I have kernel 3.14.3-200.fc20.x86_64.

(2) There is no lvmcache(7) man page in any released version of LVM2.
Was this man page ever created or is lvm(8) the definitive
documentation?

(3) It looks as if cached LVs cannot be resized:
https://www.redhat.com/archives/lvm-devel/2014-February/msg00119.html
Will this be fixed in future?  Is there any workaround -- perhaps
removing the caching layer, resizing the original LV, then recreating
the cache?  I really need to be able to resize LVs :-)

(4) To calculate the size of the cache metadata LV, do I really just
divide by 1000, min 8 MB?  It's that simple?  Doesn't it depend on
dm-cache block size?  Or dm-cache algorithm?  How can I choose block
size and algorithm?

(5) Is there an explicit command for flushing the cache layer back to
the origin LV?

(6) Is the on-disk format stable for future kernel/LVM upgrades?

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://libguestfs.org

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] Testing the new LVM cache feature
  2014-05-22 10:18 [linux-lvm] Testing the new LVM cache feature Richard W.M. Jones
@ 2014-05-22 14:43 ` Zdenek Kabelac
  2014-05-22 15:22   ` Richard W.M. Jones
  0 siblings, 1 reply; 37+ messages in thread
From: Zdenek Kabelac @ 2014-05-22 14:43 UTC (permalink / raw)
  To: LVM general discussion and development

Dne 22.5.2014 12:18, Richard W.M. Jones napsal(a):
>
> I've set up a computer in order to test the new LVM cache feature.  It
> has a pair of 2 TB HDDs in RAID 1 configuration, and a 256 GB SSD.
> The setup will be used to store large VM disk images in an ext4
> filesystem, to be served both locally and over NFS.
>
> Before I start I have some questions about this feature:
>
> (1) Is there a minimum recommended version of LVM or kernel to use?  I
> currently have lvm2-2.02.106-1.fc20.x86_64, which mentions LVM cache
> in the lvm(8) man page.  I have kernel 3.14.3-200.fc20.x86_64.

With these new targets usually always applies - the newer the kernel and tools 
are - the better for you.

>
> (2) There is no lvmcache(7) man page in any released version of LVM2.
> Was this man page ever created or is lvm(8) the definitive
> documentation?

It's now in upstream git as a separate man page (moved from lvm(8))

> (3) It looks as if cached LVs cannot be resized:
> https://www.redhat.com/archives/lvm-devel/2014-February/msg00119.html
> Will this be fixed in future?  Is there any workaround -- perhaps

Yes - cache is still missing a lot of feature - it needs further
integration with tools like  cache_check, cache_repair....

So far it's really only for a preview - I'd not consider to use it
for anything serious yet.

> removing the caching layer, resizing the original LV, then recreating
> the cache?  I really need to be able to resize LVs :-)

Surely this feature will be implemented.
Meanwhile - you have to drop cache, resize LV, reattach cache...
(drop cache - means to remove cache)

> (4) To calculate the size of the cache metadata LV, do I really just
> divide by 1000, min 8 MB?  It's that simple?  Doesn't it depend on
> dm-cache block size?  Or dm-cache algorithm?  How can I choose block
> size and algorithm?

Well this is where your experimenting may begin.
However for now  lvm2 doesn't allow you to play with algorithms - the lvchange 
interface is not yet upstream...

> (5) Is there an explicit command for flushing the cache layer back to
> the origin LV?

To be developed...

> (6) Is the on-disk format stable for future kernel/LVM upgrades?

Well it's still experiemental - so if there will be found some huge problem,
which requires to change/modify format it may happen.

Zdenek

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] Testing the new LVM cache feature
  2014-05-22 14:43 ` Zdenek Kabelac
@ 2014-05-22 15:22   ` Richard W.M. Jones
  2014-05-22 15:49     ` Richard W.M. Jones
  0 siblings, 1 reply; 37+ messages in thread
From: Richard W.M. Jones @ 2014-05-22 15:22 UTC (permalink / raw)
  To: LVM general discussion and development; +Cc: Zdenek Kabelac

Well I'm happy to experiment for you.

At the moment I'm stuck here:

# vgcreate vg_cache /dev/sdc1
  Volume group "vg_cache" successfully created
# lvcreate -L 1G -n lv_cache_meta vg_cache
  Logical volume "lv_cache_meta" created
# lvcreate -L 229G -n lv_cache vg_cache
  Logical volume "lv_cache" created
# lvs
  LV                     VG        Attr       LSize   [...]
  lv_cache               vg_cache  Cwi---C--- 229.00g
  lv_cache_meta          vg_cache  -wi-a-----   1.00g
  testoriginlv           vg_guests -wi-a----- 100.00g

# lvconvert --type cache-pool --poolmetadata /dev/vg_cache/lv_cache_meta /dev/vg_cache/lv_cache
  Logical volume "lvol0" created
  Converted vg_cache/lv_cache to cache pool.

# lvs
  LV                     VG        Attr       LSize   [...]
  lv_cache               vg_cache  Cwi---C--- 229.00g
  testoriginlv           vg_guests -wi-a----- 100.00g

# lvconvert --type cache --cachepool vg_cache/lv_cache vg_guests/testoriginlv
  Unable to find cache pool LV, vg_cache/lv_cache
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

It seems as if vg_cache/lv_cache is a "cache pool" but for some reason
lvconvert is unable to use it.

The error seems to come from this code:

    if (!(cachepool = find_lv(origin->vg, lp->cachepool))) {
            log_error("Unable to find cache pool LV, %s", lp->cachepool);
            return 0;
    }

Is it looking in the wrong VG?

Or do I have to have a single VG for this to work?  (That's not made
clear in the documentation, and it seems like a strange restriction).

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://libguestfs.org

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] Testing the new LVM cache feature
  2014-05-22 15:22   ` Richard W.M. Jones
@ 2014-05-22 15:49     ` Richard W.M. Jones
  2014-05-22 18:04       ` Mike Snitzer
  0 siblings, 1 reply; 37+ messages in thread
From: Richard W.M. Jones @ 2014-05-22 15:49 UTC (permalink / raw)
  To: LVM general discussion and development; +Cc: Zdenek Kabelac


It works once I use a single VG.

However the performance is exactly the same as the backing hard disk,
not the SDD.  It seems I'm getting no benefit ...

# lvs
[...]
  testoriginlv           vg_guests Cwi-a-C--- 100.00g lv_cache [testoriginlv_corig]                                 

# mount /dev/vg_guests/testoriginlv /tmp/mnt
# cd /tmp/mnt

# dd if=/dev/zero of=test.file bs=64K count=100000 oflag=direct
100000+0 records in
100000+0 records out
6553600000 bytes (6.6 GB) copied, 57.6301 s, 114 MB/s

# dd if=test.file of=/dev/zero bs=64K iflag=direct
100000+0 records in
100000+0 records out
6553600000 bytes (6.6 GB) copied, 47.6587 s, 138 MB/s

(Exactly the same numbers as when I tested the underlying HDD, and
about half the performance of the SDD.)

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] Testing the new LVM cache feature
  2014-05-22 15:49     ` Richard W.M. Jones
@ 2014-05-22 18:04       ` Mike Snitzer
  2014-05-22 18:13         ` Richard W.M. Jones
  0 siblings, 1 reply; 37+ messages in thread
From: Mike Snitzer @ 2014-05-22 18:04 UTC (permalink / raw)
  To: Richard W.M. Jones
  Cc: Zdenek Kabelac, thornber, LVM general discussion and development

On Thu, May 22 2014 at 11:49am -0400,
Richard W.M. Jones <rjones@redhat.com> wrote:

> 
> It works once I use a single VG.
> 
> However the performance is exactly the same as the backing hard disk,
> not the SDD.  It seems I'm getting no benefit ...
> 
> # lvs
> [...]
>   testoriginlv           vg_guests Cwi-a-C--- 100.00g lv_cache [testoriginlv_corig]                                 
> 
> # mount /dev/vg_guests/testoriginlv /tmp/mnt
> # cd /tmp/mnt
> 
> # dd if=/dev/zero of=test.file bs=64K count=100000 oflag=direct
> 100000+0 records in
> 100000+0 records out
> 6553600000 bytes (6.6 GB) copied, 57.6301 s, 114 MB/s
> 
> # dd if=test.file of=/dev/zero bs=64K iflag=direct
> 100000+0 records in
> 100000+0 records out
> 6553600000 bytes (6.6 GB) copied, 47.6587 s, 138 MB/s
> 
> (Exactly the same numbers as when I tested the underlying HDD, and
> about half the performance of the SDD.)

By default dm-cache (as is currently upstream) is _not_ going to cache
sequential IO, and it also isn't going to cache IO that is first
written.  It waits for hit counts to elevate to the promote threshold.
So dm-cache effectively acts as a hot-spot cache by default.

If you want dm-cache to be more aggressive for initial writes, you can:
1) discard the entire dm-cache device before use (either with mkfs,
   blkdiscard, or fstrim)
2) set the dm-cache 'write_promote_adjustment' tunable to 0 with the DM
   message interface, e.g.: 
   dmsetup message <mapped device> 0 write_promote_adjustment 0

Additional documentation is available in the kernel tree:
Documentation/device-mapper/cache.txt
Documentation/device-mapper/cache-policies.txt

Joe Thornber is also working on significant bursty write performance
improvements for dm-cache.  Hopefully they'll be ready to go upstream
for the Linux 3.16 merge window.

Mike

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] Testing the new LVM cache feature
  2014-05-22 18:04       ` Mike Snitzer
@ 2014-05-22 18:13         ` Richard W.M. Jones
  2014-05-29 13:52           ` Richard W.M. Jones
  0 siblings, 1 reply; 37+ messages in thread
From: Richard W.M. Jones @ 2014-05-22 18:13 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Zdenek Kabelac, thornber, LVM general discussion and development

On Thu, May 22, 2014 at 02:04:05PM -0400, Mike Snitzer wrote:
> By default dm-cache (as is currently upstream) is _not_ going to cache
> sequential IO, and it also isn't going to cache IO that is first
> written.  It waits for hit counts to elevate to the promote threshold.
> So dm-cache effectively acts as a hot-spot cache by default.

OK, that makes sense, thanks.

I wrote about using the LVM cache feature here:

https://rwmj.wordpress.com/2014/05/22/using-lvms-new-cache-feature/#content

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] Testing the new LVM cache feature
  2014-05-22 18:13         ` Richard W.M. Jones
@ 2014-05-29 13:52           ` Richard W.M. Jones
  2014-05-29 20:34             ` Mike Snitzer
  0 siblings, 1 reply; 37+ messages in thread
From: Richard W.M. Jones @ 2014-05-29 13:52 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: LVM general discussion and development, thornber, Zdenek Kabelac

[-- Attachment #1: Type: text/plain, Size: 912 bytes --]

I've done some more testing, comparing RAID 1 HDD with RAID 1 HDD + an
SSD overlay (using lvm-cache).

I'm now using 'fio', with the following job file:

[virt]
ioengine=libaio
iodepth=4
rw=randrw
bs=64k
direct=1
size=1g
numjobs=4

I'm still seeing almost no benefit from LVM cache.  It's about 4%
faster than the underlying, slow HDDs.  See attached runs.

The SSD LV is 200 GB and the underlying LV is 800 GB, so I would
expect there is plenty of space to cache things in the SSD during the
test.

For comparison, the fio tests runs about 11 times faster on the SSD.

Any ideas?

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v

[-- Attachment #2: virt-ham0-raid1.txt --]
[-- Type: text/plain, Size: 9385 bytes --]

virt: (g=0): rw=randrw, bs=64K-64K/64K-64K/64K-64K, ioengine=libaio, iodepth=4
...
virt: (g=0): rw=randrw, bs=64K-64K/64K-64K/64K-64K, ioengine=libaio, iodepth=4
fio-2.1.2
Starting 4 processes
virt: Laying out IO file(s) (1 file(s) / 1024MB)
virt: Laying out IO file(s) (1 file(s) / 1024MB)
virt: Laying out IO file(s) (1 file(s) / 1024MB)
virt: Laying out IO file(s) (1 file(s) / 1024MB)

virt: (groupid=0, jobs=1): err= 0: pid=2195: Wed May 28 22:12:50 2014
  read : io=523520KB, bw=2600.4KB/s, iops=40, runt=201329msec
    slat (usec): min=23, max=24586, avg=65.89, stdev=306.38
    clat (usec): min=305, max=1765.7K, avg=84912.67, stdev=124153.30
     lat (usec): min=367, max=1765.8K, avg=84979.16, stdev=124150.29
    clat percentiles (usec):
     |  1.00th=[  780],  5.00th=[ 6944], 10.00th=[ 9536], 20.00th=[14144],
     | 30.00th=[19840], 40.00th=[28032], 50.00th=[40704], 60.00th=[57600],
     | 70.00th=[82432], 80.00th=[125440], 90.00th=[209920], 95.00th=[309248],
     | 99.00th=[593920], 99.50th=[790528], 99.90th=[1204224], 99.95th=[1286144],
     | 99.99th=[1761280]
    bw (KB  /s): min=   82, max=12416, per=25.85%, avg=2688.32, stdev=1545.40
  write: io=525056KB, bw=2607.1KB/s, iops=40, runt=201329msec
    slat (usec): min=31, max=140675, avg=132.77, stdev=1945.34
    clat (usec): min=346, max=1355.5K, avg=13280.27, stdev=57149.27
     lat (usec): min=404, max=1355.6K, avg=13413.69, stdev=57202.63
    clat percentiles (usec):
     |  1.00th=[  358],  5.00th=[  374], 10.00th=[  434], 20.00th=[  446],
     | 30.00th=[  644], 40.00th=[  852], 50.00th=[ 1272], 60.00th=[ 1320],
     | 70.00th=[ 1496], 80.00th=[ 5728], 90.00th=[18048], 95.00th=[63232],
     | 99.00th=[257024], 99.50th=[382976], 99.90th=[831488], 99.95th=[946176],
     | 99.99th=[1351680]
    bw (KB  /s): min=  121, max=10709, per=25.96%, avg=2708.14, stdev=1769.64
    lat (usec) : 500=12.91%, 750=6.04%, 1000=3.25%
    lat (msec) : 2=16.32%, 4=1.65%, 10=7.59%, 20=12.91%, 50=14.75%
    lat (msec) : 100=9.90%, 250=10.51%, 500=3.19%, 750=0.66%, 1000=0.20%
    lat (msec) : 2000=0.13%
  cpu          : usr=0.11%, sys=0.54%, ctx=16504, majf=0, minf=23
  IO depths    : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=8180/w=8204/d=0, short=r=0/w=0/d=0
virt: (groupid=0, jobs=1): err= 0: pid=2196: Wed May 28 22:12:50 2014
  read : io=523520KB, bw=2947.6KB/s, iops=46, runt=177615msec
    slat (usec): min=24, max=59936, avg=81.73, stdev=987.31
    clat (usec): min=149, max=1054.1K, avg=74995.11, stdev=93418.23
     lat (usec): min=369, max=1054.2K, avg=75077.47, stdev=93411.06
    clat percentiles (msec):
     |  1.00th=[    5],  5.00th=[    8], 10.00th=[   10], 20.00th=[   16],
     | 30.00th=[   22], 40.00th=[   31], 50.00th=[   42], 60.00th=[   57],
     | 70.00th=[   80], 80.00th=[  116], 90.00th=[  180], 95.00th=[  260],
     | 99.00th=[  437], 99.50th=[  529], 99.90th=[  840], 99.95th=[  979],
     | 99.99th=[ 1057]
    bw (KB  /s): min=  113, max= 6898, per=29.26%, avg=3043.36, stdev=1217.82
  write: io=525056KB, bw=2956.2KB/s, iops=46, runt=177615msec
    slat (usec): min=33, max=140655, avg=128.77, stdev=2069.57
    clat (usec): min=258, max=1000.6K, avg=11590.37, stdev=57029.08
     lat (usec): min=403, max=1000.7K, avg=11719.76, stdev=57077.03
    clat percentiles (usec):
     |  1.00th=[  362],  5.00th=[  378], 10.00th=[  434], 20.00th=[  446],
     | 30.00th=[  612], 40.00th=[  748], 50.00th=[ 1224], 60.00th=[ 1304],
     | 70.00th=[ 1352], 80.00th=[ 1528], 90.00th=[ 7776], 95.00th=[55040],
     | 99.00th=[244736], 99.50th=[362496], 99.90th=[913408], 99.95th=[929792],
     | 99.99th=[1003520]
    bw (KB  /s): min=  140, max= 7409, per=29.16%, avg=3042.19, stdev=1466.35
    lat (usec) : 250=0.01%, 500=13.49%, 750=6.57%, 1000=3.19%
    lat (msec) : 2=19.84%, 4=1.70%, 10=5.92%, 20=9.95%, 50=14.89%
    lat (msec) : 100=10.81%, 250=10.45%, 500=2.73%, 750=0.31%, 1000=0.13%
    lat (msec) : 2000=0.02%
  cpu          : usr=0.14%, sys=0.59%, ctx=16858, majf=0, minf=23
  IO depths    : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=8180/w=8204/d=0, short=r=0/w=0/d=0
virt: (groupid=0, jobs=1): err= 0: pid=2197: Wed May 28 22:12:50 2014
  read : io=523520KB, bw=2923.2KB/s, iops=45, runt=179092msec
    slat (usec): min=20, max=99838, avg=91.84, stdev=1411.97
    clat (usec): min=160, max=1512.4K, avg=75755.06, stdev=105522.71
     lat (usec): min=382, max=1512.9K, avg=75847.54, stdev=105514.14
    clat percentiles (msec):
     |  1.00th=[    5],  5.00th=[    8], 10.00th=[   10], 20.00th=[   15],
     | 30.00th=[   21], 40.00th=[   29], 50.00th=[   40], 60.00th=[   56],
     | 70.00th=[   76], 80.00th=[  112], 90.00th=[  186], 95.00th=[  269],
     | 99.00th=[  469], 99.50th=[  586], 99.90th=[ 1156], 99.95th=[ 1287],
     | 99.99th=[ 1516]
    bw (KB  /s): min=  124, max= 6144, per=29.37%, avg=3055.29, stdev=1223.87
  write: io=525056KB, bw=2931.8KB/s, iops=45, runt=179092msec
    slat (usec): min=35, max=140660, avg=114.41, stdev=1768.12
    clat (usec): min=345, max=1441.6K, avg=11547.93, stdev=62451.29
     lat (usec): min=415, max=1441.7K, avg=11663.01, stdev=62476.14
    clat percentiles (usec):
     |  1.00th=[  362],  5.00th=[  378], 10.00th=[  434], 20.00th=[  446],
     | 30.00th=[  596], 40.00th=[  756], 50.00th=[ 1224], 60.00th=[ 1304],
     | 70.00th=[ 1352], 80.00th=[ 1544], 90.00th=[ 8896], 95.00th=[37632],
     | 99.00th=[232448], 99.50th=[350208], 99.90th=[995328], 99.95th=[1044480],
     | 99.99th=[1433600]
    bw (KB  /s): min=   80, max= 9325, per=29.37%, avg=3063.24, stdev=1532.25
    lat (usec) : 250=0.01%, 500=13.56%, 750=6.50%, 1000=3.08%
    lat (msec) : 2=19.73%, 4=1.62%, 10=6.32%, 20=10.52%, 50=14.89%
    lat (msec) : 100=10.77%, 250=9.75%, 500=2.72%, 750=0.27%, 1000=0.13%
    lat (msec) : 2000=0.14%
  cpu          : usr=0.14%, sys=0.59%, ctx=16985, majf=0, minf=23
  IO depths    : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=8180/w=8204/d=0, short=r=0/w=0/d=0
virt: (groupid=0, jobs=1): err= 0: pid=2198: Wed May 28 22:12:50 2014
  read : io=523520KB, bw=2629.9KB/s, iops=41, runt=199069msec
    slat (usec): min=25, max=99063, avg=89.77, stdev=1365.24
    clat (usec): min=112, max=1392.1K, avg=83373.43, stdev=118987.34
     lat (usec): min=369, max=1392.1K, avg=83463.84, stdev=118977.09
    clat percentiles (msec):
     |  1.00th=[    3],  5.00th=[    7], 10.00th=[   10], 20.00th=[   15],
     | 30.00th=[   21], 40.00th=[   28], 50.00th=[   40], 60.00th=[   57],
     | 70.00th=[   81], 80.00th=[  122], 90.00th=[  206], 95.00th=[  310],
     | 99.00th=[  603], 99.50th=[  734], 99.90th=[  979], 99.95th=[ 1156],
     | 99.99th=[ 1401]
    bw (KB  /s): min=   64, max= 9708, per=26.35%, avg=2740.70, stdev=1540.11
  write: io=525056KB, bw=2637.6KB/s, iops=41, runt=199069msec
    slat (usec): min=38, max=140657, avg=121.47, stdev=1860.80
    clat (usec): min=349, max=1002.9K, avg=13698.39, stdev=66153.66
     lat (usec): min=405, max=1002.9K, avg=13820.49, stdev=66192.16
    clat percentiles (usec):
     |  1.00th=[  362],  5.00th=[  378], 10.00th=[  434], 20.00th=[  446],
     | 30.00th=[  652], 40.00th=[  876], 50.00th=[ 1272], 60.00th=[ 1320],
     | 70.00th=[ 1448], 80.00th=[ 2992], 90.00th=[15552], 95.00th=[36096],
     | 99.00th=[321536], 99.50th=[489472], 99.90th=[962560], 99.95th=[995328],
     | 99.99th=[1003520]
    bw (KB  /s): min=   71, max= 9836, per=26.41%, avg=2755.14, stdev=1757.17
    lat (usec) : 250=0.02%, 500=12.84%, 750=5.83%, 1000=3.12%
    lat (msec) : 2=17.58%, 4=1.73%, 10=7.50%, 20=12.41%, 50=14.97%
    lat (msec) : 100=9.86%, 250=9.86%, 500=3.19%, 750=0.78%, 1000=0.25%
    lat (msec) : 2000=0.06%
  cpu          : usr=0.12%, sys=0.53%, ctx=16540, majf=0, minf=22
  IO depths    : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=8180/w=8204/d=0, short=r=0/w=0/d=0

Run status group 0 (all jobs):
   READ: io=2045.0MB, aggrb=10401KB/s, minb=2600KB/s, maxb=2947KB/s, mint=177615msec, maxt=201329msec
  WRITE: io=2051.0MB, aggrb=10431KB/s, minb=2607KB/s, maxb=2956KB/s, mint=177615msec, maxt=201329msec

Disk stats (read/write):
    dm-0: ios=32841/33299, merge=0/0, ticks=2623746/506809, in_queue=3130698, util=100.00%, aggrios=32855/33392, aggrmerge=0/0, aggrticks=0/0, aggrin_queue=0, aggrutil=0.00%
    md127: ios=32855/33392, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=16426/33225, aggrmerge=1/168, aggrticks=1311820/306619, aggrin_queue=1618332, aggrutil=98.91%
  sda: ios=8494/33223, merge=0/171, ticks=464540/232964, in_queue=697442, util=96.18%
  sdb: ios=24359/33228, merge=2/166, ticks=2159100/380274, in_queue=2539222, util=98.91%

[-- Attachment #3: virt-ham0-lvmcache.txt --]
[-- Type: text/plain, Size: 9798 bytes --]

virt: (g=0): rw=randrw, bs=64K-64K/64K-64K/64K-64K, ioengine=libaio, iodepth=4
...
virt: (g=0): rw=randrw, bs=64K-64K/64K-64K/64K-64K, ioengine=libaio, iodepth=4
fio-2.1.2
Starting 4 processes
virt: Laying out IO file(s) (1 file(s) / 1024MB)
virt: Laying out IO file(s) (1 file(s) / 1024MB)
virt: Laying out IO file(s) (1 file(s) / 1024MB)
virt: Laying out IO file(s) (1 file(s) / 1024MB)

virt: (groupid=0, jobs=1): err= 0: pid=1643: Thu May 29 14:44:39 2014
  read : io=523520KB, bw=2721.8KB/s, iops=42, runt=192348msec
    slat (usec): min=36, max=11159, avg=79.21, stdev=140.35
    clat (usec): min=305, max=1346.1K, avg=82931.26, stdev=118358.55
     lat (usec): min=383, max=1347.8K, avg=83011.13, stdev=118357.33
    clat percentiles (usec):
     |  1.00th=[  556],  5.00th=[ 6368], 10.00th=[ 8896], 20.00th=[13632],
     | 30.00th=[18816], 40.00th=[26496], 50.00th=[38144], 60.00th=[55552],
     | 70.00th=[81408], 80.00th=[125440], 90.00th=[211968], 95.00th=[313344],
     | 99.00th=[561152], 99.50th=[708608], 99.90th=[1056768], 99.95th=[1138688],
     | 99.99th=[1351680]
    bw (KB  /s): min=   64, max=13714, per=25.99%, avg=2828.82, stdev=1614.75
  write: io=525056KB, bw=2729.8KB/s, iops=42, runt=192348msec
    slat (usec): min=40, max=120228, avg=113.58, stdev=1327.58
    clat (usec): min=345, max=1401.6K, avg=10882.68, stdev=66932.84
     lat (usec): min=428, max=1401.7K, avg=10996.89, stdev=66947.30
    clat percentiles (usec):
     |  1.00th=[  358],  5.00th=[  366], 10.00th=[  386], 20.00th=[  442],
     | 30.00th=[  612], 40.00th=[  812], 50.00th=[ 1240], 60.00th=[ 1304],
     | 70.00th=[ 1448], 80.00th=[ 2960], 90.00th=[12736], 95.00th=[23168],
     | 99.00th=[259072], 99.50th=[403456], 99.90th=[995328], 99.95th=[1286144],
     | 99.99th=[1400832]
    bw (KB  /s): min=  105, max=13079, per=26.20%, avg=2860.76, stdev=1798.77
    lat (usec) : 500=13.62%, 750=6.18%, 1000=2.75%
    lat (msec) : 2=17.00%, 4=1.83%, 10=8.55%, 20=12.82%, 50=14.40%
    lat (msec) : 100=9.28%, 250=9.26%, 500=3.45%, 750=0.50%, 1000=0.23%
    lat (msec) : 2000=0.13%
  cpu          : usr=0.13%, sys=0.69%, ctx=16575, majf=0, minf=23
  IO depths    : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=8180/w=8204/d=0, short=r=0/w=0/d=0
virt: (groupid=0, jobs=1): err= 0: pid=1644: Thu May 29 14:44:39 2014
  read : io=523520KB, bw=3056.4KB/s, iops=47, runt=171287msec
    slat (usec): min=34, max=166, avg=77.72, stdev=12.41
    clat (usec): min=314, max=1179.1K, avg=76362.24, stdev=95270.37
     lat (usec): min=386, max=1180.8K, avg=76440.60, stdev=95269.97
    clat percentiles (msec):
     |  1.00th=[    5],  5.00th=[    8], 10.00th=[   10], 20.00th=[   16],
     | 30.00th=[   23], 40.00th=[   31], 50.00th=[   42], 60.00th=[   58],
     | 70.00th=[   80], 80.00th=[  118], 90.00th=[  186], 95.00th=[  262],
     | 99.00th=[  457], 99.50th=[  570], 99.90th=[  791], 99.95th=[  906],
     | 99.99th=[ 1188]
    bw (KB  /s): min=  237, max= 6259, per=28.43%, avg=3094.94, stdev=1094.43
  write: io=525056KB, bw=3065.4KB/s, iops=47, runt=171287msec
    slat (usec): min=47, max=120139, avg=115.02, stdev=1329.52
    clat (usec): min=343, max=958790, avg=7162.31, stdev=39237.72
     lat (usec): min=422, max=958895, avg=7277.98, stdev=39265.48
    clat percentiles (usec):
     |  1.00th=[  358],  5.00th=[  370], 10.00th=[  386], 20.00th=[  442],
     | 30.00th=[  588], 40.00th=[  740], 50.00th=[ 1192], 60.00th=[ 1288],
     | 70.00th=[ 1320], 80.00th=[ 1496], 90.00th=[ 3216], 95.00th=[15552],
     | 99.00th=[183296], 99.50th=[301056], 99.90th=[514048], 99.95th=[610304],
     | 99.99th=[962560]
    bw (KB  /s): min=  100, max= 7418, per=28.37%, avg=3097.42, stdev=1395.25
    lat (usec) : 500=13.75%, 750=6.58%, 1000=3.16%
    lat (msec) : 2=20.43%, 4=1.64%, 10=6.06%, 20=9.94%, 50=14.88%
    lat (msec) : 100=10.52%, 250=9.92%, 500=2.67%, 750=0.35%, 1000=0.06%
    lat (msec) : 2000=0.01%
  cpu          : usr=0.14%, sys=0.79%, ctx=16933, majf=0, minf=23
  IO depths    : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=8180/w=8204/d=0, short=r=0/w=0/d=0
virt: (groupid=0, jobs=1): err= 0: pid=1645: Thu May 29 14:44:39 2014
  read : io=523520KB, bw=3097.5KB/s, iops=48, runt=169019msec
    slat (usec): min=36, max=141, avg=77.24, stdev=12.48
    clat (usec): min=311, max=1577.6K, avg=74149.52, stdev=97313.65
     lat (usec): min=376, max=1577.6K, avg=74227.41, stdev=97313.32
    clat percentiles (msec):
     |  1.00th=[    5],  5.00th=[    8], 10.00th=[   10], 20.00th=[   15],
     | 30.00th=[   22], 40.00th=[   30], 50.00th=[   42], 60.00th=[   56],
     | 70.00th=[   77], 80.00th=[  112], 90.00th=[  176], 95.00th=[  251],
     | 99.00th=[  453], 99.50th=[  578], 99.90th=[  947], 99.95th=[ 1254],
     | 99.99th=[ 1582]
    bw (KB  /s): min=   62, max= 7492, per=29.37%, avg=3197.36, stdev=1186.93
  write: io=525056KB, bw=3106.6KB/s, iops=48, runt=169019msec
    slat (usec): min=47, max=120168, avg=112.54, stdev=1325.71
    clat (usec): min=335, max=1474.2K, avg=8254.87, stdev=57083.94
     lat (usec): min=416, max=1474.3K, avg=8368.05, stdev=57098.21
    clat percentiles (usec):
     |  1.00th=[  358],  5.00th=[  366], 10.00th=[  386], 20.00th=[  442],
     | 30.00th=[  564], 40.00th=[  724], 50.00th=[ 1176], 60.00th=[ 1272],
     | 70.00th=[ 1320], 80.00th=[ 1464], 90.00th=[ 2224], 95.00th=[13504],
     | 99.00th=[185344], 99.50th=[321536], 99.90th=[1019904], 99.95th=[1073152],
     | 99.99th=[1466368]
    bw (KB  /s): min=  109, max= 8172, per=29.43%, avg=3213.21, stdev=1535.62
    lat (usec) : 500=14.15%, 750=6.58%, 1000=3.03%
    lat (msec) : 2=20.82%, 4=1.48%, 10=6.24%, 20=9.75%, 50=14.67%
    lat (msec) : 100=10.94%, 250=9.46%, 500=2.37%, 750=0.34%, 1000=0.07%
    lat (msec) : 2000=0.10%
  cpu          : usr=0.13%, sys=0.81%, ctx=16936, majf=0, minf=23
  IO depths    : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=8180/w=8204/d=0, short=r=0/w=0/d=0
virt: (groupid=0, jobs=1): err= 0: pid=1646: Thu May 29 14:44:39 2014
  read : io=523520KB, bw=2773.6KB/s, iops=43, runt=188788msec
    slat (usec): min=23, max=2618, avg=77.53, stdev=31.29
    clat (usec): min=309, max=1755.5K, avg=82247.61, stdev=119257.63
     lat (usec): min=356, max=1755.6K, avg=82325.78, stdev=119257.35
    clat percentiles (msec):
     |  1.00th=[    4],  5.00th=[    7], 10.00th=[   10], 20.00th=[   14],
     | 30.00th=[   20], 40.00th=[   27], 50.00th=[   38], 60.00th=[   55],
     | 70.00th=[   81], 80.00th=[  122], 90.00th=[  212], 95.00th=[  306],
     | 99.00th=[  578], 99.50th=[  750], 99.90th=[  988], 99.95th=[ 1106],
     | 99.99th=[ 1762]
    bw (KB  /s): min=  122, max= 9325, per=26.30%, avg=2863.03, stdev=1442.98
  write: io=525056KB, bw=2781.2KB/s, iops=43, runt=188788msec
    slat (usec): min=44, max=120232, avg=112.99, stdev=1326.51
    clat (usec): min=346, max=1033.4K, avg=9830.80, stdev=52064.46
     lat (usec): min=421, max=1033.5K, avg=9944.42, stdev=52084.32
    clat percentiles (usec):
     |  1.00th=[  362],  5.00th=[  370], 10.00th=[  386], 20.00th=[  446],
     | 30.00th=[  588], 40.00th=[  788], 50.00th=[ 1240], 60.00th=[ 1304],
     | 70.00th=[ 1416], 80.00th=[ 1976], 90.00th=[12736], 95.00th=[23424],
     | 99.00th=[244736], 99.50th=[333824], 99.90th=[937984], 99.95th=[954368],
     | 99.99th=[1036288]
    bw (KB  /s): min=  100, max= 8694, per=26.24%, avg=2865.37, stdev=1678.69
    lat (usec) : 500=13.65%, 750=6.05%, 1000=2.78%
    lat (msec) : 2=18.05%, 4=1.64%, 10=7.68%, 20=12.79%, 50=14.66%
    lat (msec) : 100=9.39%, 250=9.17%, 500=3.31%, 750=0.51%, 1000=0.24%
    lat (msec) : 2000=0.07%
  cpu          : usr=0.13%, sys=0.70%, ctx=16629, majf=0, minf=22
  IO depths    : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=8180/w=8204/d=0, short=r=0/w=0/d=0

Run status group 0 (all jobs):
   READ: io=2045.0MB, aggrb=10886KB/s, minb=2721KB/s, maxb=3097KB/s, mint=169019msec, maxt=192348msec
  WRITE: io=2051.0MB, aggrb=10918KB/s, minb=2729KB/s, maxb=3106KB/s, mint=169019msec, maxt=192348msec

Disk stats (read/write):
    dm-3: ios=32687/32957, merge=0/0, ticks=2580936/388295, in_queue=2969480, util=100.00%, aggrios=10910/11025, aggrmerge=0/0, aggrticks=860321/160031, aggrin_queue=1020354, aggrutil=100.00%
    dm-0: ios=5/44, merge=0/0, ticks=10/85, in_queue=95, util=0.05%, aggrios=11/50, aggrmerge=0/2, aggrticks=11/94, aggrin_queue=105, aggrutil=0.05%
  sdc: ios=11/50, merge=0/2, ticks=11/94, in_queue=105, util=0.05%
  dm-1: ios=6/8, merge=0/0, ticks=1/9, in_queue=10, util=0.01%
    dm-2: ios=32721/33023, merge=0/0, ticks=2580952/479999, in_queue=3060959, util=100.00%, aggrios=32721/33023, aggrmerge=0/0, aggrticks=0/0, aggrin_queue=0, aggrutil=0.00%
    md127: ios=32721/33023, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=16360/32937, aggrmerge=0/87, aggrticks=1290283/255548, aggrin_queue=1545711, aggrutil=99.71%
  sda: ios=8101/32937, merge=1/88, ticks=402681/125603, in_queue=528232, util=96.21%
  sdb: ios=24619/32938, merge=0/87, ticks=2177886/385493, in_queue=2563190, util=99.71%

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] Testing the new LVM cache feature
  2014-05-29 13:52           ` Richard W.M. Jones
@ 2014-05-29 20:34             ` Mike Snitzer
  2014-05-29 20:47               ` Richard W.M. Jones
  0 siblings, 1 reply; 37+ messages in thread
From: Mike Snitzer @ 2014-05-29 20:34 UTC (permalink / raw)
  To: Richard W.M. Jones
  Cc: LVM general discussion and development, thornber, Zdenek Kabelac

On Thu, May 29 2014 at  9:52am -0400,
Richard W.M. Jones <rjones@redhat.com> wrote:

> I've done some more testing, comparing RAID 1 HDD with RAID 1 HDD + an
> SSD overlay (using lvm-cache).
> 
> I'm now using 'fio', with the following job file:
> 
> [virt]
> ioengine=libaio
> iodepth=4
> rw=randrw
> bs=64k
> direct=1
> size=1g
> numjobs=4

randrw isn't giving you increased hits to the same blocks.  fio does
have random_distribution controls (zipf and pareto) that are more
favorable for testing cache replacement policies (jens said that testing
caching algorithms is what motivated him to develop these in fio).

> I'm still seeing almost no benefit from LVM cache.  It's about 4%
> faster than the underlying, slow HDDs.  See attached runs.
> 
> The SSD LV is 200 GB and the underlying LV is 800 GB, so I would
> expect there is plenty of space to cache things in the SSD during the
> test.
> 
> For comparison, the fio tests runs about 11 times faster on the SSD.
> 
> Any ideas?

Try using :
dmsetup message <cache device> 0 write_promote_adjustment 0

Also, if you discard the entire cache device (e.g. using blkdiscard)
before use you could get a big win, especially if you use:
dmsetup message <cache device> 0 discard_promote_adjustment 0

Documentation/device-mapper/cache-policies.txt says:

Internally the mq policy maintains a promotion threshold variable.  If
the hit count of a block not in the cache goes above this threshold it
gets promoted to the cache.  The read, write and discard promote adjustment
tunables allow you to tweak the promotion threshold by adding a small
value based on the io type.  They default to 4, 8 and 1 respectively.
If you're trying to quickly warm a new cache device you may wish to
reduce these to encourage promotion.  Remember to switch them back to
their defaults after the cache fills though.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] Testing the new LVM cache feature
  2014-05-29 20:34             ` Mike Snitzer
@ 2014-05-29 20:47               ` Richard W.M. Jones
  2014-05-29 21:06                 ` Mike Snitzer
  2014-05-30 11:38                 ` Alasdair G Kergon
  0 siblings, 2 replies; 37+ messages in thread
From: Richard W.M. Jones @ 2014-05-29 20:47 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: LVM general discussion and development, thornber, Zdenek Kabelac

On Thu, May 29, 2014 at 04:34:10PM -0400, Mike Snitzer wrote:
> Try using :
> dmsetup message <cache device> 0 write_promote_adjustment 0
> 
> Documentation/device-mapper/cache-policies.txt says:
> 
> Internally the mq policy maintains a promotion threshold variable.  If
> the hit count of a block not in the cache goes above this threshold it
> gets promoted to the cache.  The read, write and discard promote adjustment
> tunables allow you to tweak the promotion threshold by adding a small
> value based on the io type.  They default to 4, 8 and 1 respectively.
> If you're trying to quickly warm a new cache device you may wish to
> reduce these to encourage promotion.  Remember to switch them back to
> their defaults after the cache fills though.

What would be bad about leaving write_promote_adjustment set at 0 or 1?

Wouldn't that mean that I get a simple LRU policy?  (That's probably
what I want.)

> Also, if you discard the entire cache device (e.g. using blkdiscard)
> before use you could get a big win, especially if you use:
> dmsetup message <cache device> 0 discard_promote_adjustment 0

To be clear, that means I should do:

lvcreate -L 1G -n lv_cache_meta vg_guests /dev/fast
lvcreate -L 229G -n lv_cache vg_guests /dev/fast
lvconvert --type cache-pool --poolmetadata vg_guests/lv_cache_meta vg_guests/lv_cache
blkdiscard /dev/vg_guests/lv_cache
lvconvert --type cache --cachepool vg_guests/lv_cache vg_guests/testoriginlv

Or should I do the blkdiscard earlier?

[On the separate subject of volume groups ...]

Is there a reason why fast and slow devices need to be in the same VG?

I've talked to two other people who found this very confusing.  No one
knew that you could manually place LVs into different PVs, and it's
something of a pain to have to remember to place LVs every time you
create or resize one.  It seems it would be a lot simpler if you could
have the slow PVs in one VG and the fast PVs in another VG.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] Testing the new LVM cache feature
  2014-05-29 20:47               ` Richard W.M. Jones
@ 2014-05-29 21:06                 ` Mike Snitzer
  2014-05-29 21:19                   ` Richard W.M. Jones
  2014-05-30 11:38                 ` Alasdair G Kergon
  1 sibling, 1 reply; 37+ messages in thread
From: Mike Snitzer @ 2014-05-29 21:06 UTC (permalink / raw)
  To: Richard W.M. Jones
  Cc: LVM general discussion and development, thornber, Zdenek Kabelac

On Thu, May 29 2014 at  4:47pm -0400,
Richard W.M. Jones <rjones@redhat.com> wrote:

> On Thu, May 29, 2014 at 04:34:10PM -0400, Mike Snitzer wrote:
> > Try using :
> > dmsetup message <cache device> 0 write_promote_adjustment 0
> > 
> > Documentation/device-mapper/cache-policies.txt says:
> > 
> > Internally the mq policy maintains a promotion threshold variable.  If
> > the hit count of a block not in the cache goes above this threshold it
> > gets promoted to the cache.  The read, write and discard promote adjustment
> > tunables allow you to tweak the promotion threshold by adding a small
> > value based on the io type.  They default to 4, 8 and 1 respectively.
> > If you're trying to quickly warm a new cache device you may wish to
> > reduce these to encourage promotion.  Remember to switch them back to
> > their defaults after the cache fills though.
> 
> What would be bad about leaving write_promote_adjustment set at 0 or 1?
> 
> Wouldn't that mean that I get a simple LRU policy?  (That's probably
> what I want.)

Leaving them at 0 could result in cache thrashing.  But given how large
your SSD is in relation to the origin you'd likely be OK for a while (at
least until your cache gets quite full).
 
> > Also, if you discard the entire cache device (e.g. using blkdiscard)
> > before use you could get a big win, especially if you use:
> > dmsetup message <cache device> 0 discard_promote_adjustment 0
> 
> To be clear, that means I should do:
> 
> lvcreate -L 1G -n lv_cache_meta vg_guests /dev/fast
> lvcreate -L 229G -n lv_cache vg_guests /dev/fast
> lvconvert --type cache-pool --poolmetadata vg_guests/lv_cache_meta vg_guests/lv_cache
> blkdiscard /dev/vg_guests/lv_cache
> lvconvert --type cache --cachepool vg_guests/lv_cache vg_guests/testoriginlv
> 
> Or should I do the blkdiscard earlier?

You want to discard the cached device before you run fio against it.
I'm not completely sure what cache-pool vs cache is.  But it looks like
you'd want to run the discard against the /dev/vg_guests/testoriginlv
(assuming it was converted to use the 'cache' DM target, 'dmsetup table
vg_guests-testoriginlv' should confirm as much).

> [On the separate subject of volume groups ...]
> 
> Is there a reason why fast and slow devices need to be in the same VG?
> 
> I've talked to two other people who found this very confusing.  No one
> knew that you could manually place LVs into different PVs, and it's
> something of a pain to have to remember to place LVs every time you
> create or resize one.  It seems it would be a lot simpler if you could
> have the slow PVs in one VG and the fast PVs in another VG.

I cannot answer the lvm details.  Best to ask Jon Brassow or Zdenek
(hopefully they'll respond)

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] Testing the new LVM cache feature
  2014-05-29 21:06                 ` Mike Snitzer
@ 2014-05-29 21:19                   ` Richard W.M. Jones
  2014-05-29 21:58                     ` Mike Snitzer
  0 siblings, 1 reply; 37+ messages in thread
From: Richard W.M. Jones @ 2014-05-29 21:19 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: LVM general discussion and development, thornber, Zdenek Kabelac

On Thu, May 29, 2014 at 05:06:48PM -0400, Mike Snitzer wrote:
> On Thu, May 29 2014 at  4:47pm -0400,
> Richard W.M. Jones <rjones@redhat.com> wrote:
> > To be clear, that means I should do:
> > 
> > lvcreate -L 1G -n lv_cache_meta vg_guests /dev/fast
> > lvcreate -L 229G -n lv_cache vg_guests /dev/fast
> > lvconvert --type cache-pool --poolmetadata vg_guests/lv_cache_meta vg_guests/lv_cache
> > blkdiscard /dev/vg_guests/lv_cache
> > lvconvert --type cache --cachepool vg_guests/lv_cache vg_guests/testoriginlv
> > 
> > Or should I do the blkdiscard earlier?
> 
> You want to discard the cached device before you run fio against it.
> I'm not completely sure what cache-pool vs cache is.  But it looks like
> you'd want to run the discard against the /dev/vg_guests/testoriginlv
> (assuming it was converted to use the 'cache' DM target, 'dmsetup table
> vg_guests-testoriginlv' should confirm as much).

I'm concerned that would delete all the data on the origin LV ...

My origin LV now has a slightly different name.  Here are the
device-mapper tables:

$ sudo dmsetup table
vg_guests-lv_cache_cdata: 0 419430400 linear 8:33 2099200
vg_guests-lv_cache_cmeta: 0 2097152 linear 8:33 2048
vg_guests-home: 0 209715200 linear 9:127 2048
vg_guests-libvirt--images: 0 1677721600 cache 253:1 253:0 253:2 128 0 default 0
vg_guests-libvirt--images_corig: 0 1677721600 linear 9:127 2055211008

So it does look as if my origin LV (vg_guests/libvirt-images) does use
the 'cache' target.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] Testing the new LVM cache feature
  2014-05-29 21:19                   ` Richard W.M. Jones
@ 2014-05-29 21:58                     ` Mike Snitzer
  2014-05-30  9:04                       ` Richard W.M. Jones
  2014-05-30 11:53                       ` Mike Snitzer
  0 siblings, 2 replies; 37+ messages in thread
From: Mike Snitzer @ 2014-05-29 21:58 UTC (permalink / raw)
  To: Richard W.M. Jones
  Cc: LVM general discussion and development, thornber, Zdenek Kabelac

On Thu, May 29 2014 at  5:19pm -0400,
Richard W.M. Jones <rjones@redhat.com> wrote:

> On Thu, May 29, 2014 at 05:06:48PM -0400, Mike Snitzer wrote:
> > On Thu, May 29 2014 at  4:47pm -0400,
> > Richard W.M. Jones <rjones@redhat.com> wrote:
> > > To be clear, that means I should do:
> > > 
> > > lvcreate -L 1G -n lv_cache_meta vg_guests /dev/fast
> > > lvcreate -L 229G -n lv_cache vg_guests /dev/fast
> > > lvconvert --type cache-pool --poolmetadata vg_guests/lv_cache_meta vg_guests/lv_cache
> > > blkdiscard /dev/vg_guests/lv_cache
> > > lvconvert --type cache --cachepool vg_guests/lv_cache vg_guests/testoriginlv
> > > 
> > > Or should I do the blkdiscard earlier?
> > 
> > You want to discard the cached device before you run fio against it.
> > I'm not completely sure what cache-pool vs cache is.  But it looks like
> > you'd want to run the discard against the /dev/vg_guests/testoriginlv
> > (assuming it was converted to use the 'cache' DM target, 'dmsetup table
> > vg_guests-testoriginlv' should confirm as much).
> 
> I'm concerned that would delete all the data on the origin LV ...

OK, but how are you testing with fio at this point?  Doesn't that
destroy data too?

The cache target doesn't have passdown support.  So none of your data
would be discarded directly, but it could eat data as a side-effect of
the cache bypassing promotion from the origin (because it thinks the
origin's blocks were discarded).  But on writeback you'd lose data.

So you raise a valid point: if you're adding a cache in front of a
volume with existing data you'll want to avoid discarding the logical
address space that contains data you want to keep.

Do you have a filesystem on the libvirt-images volume?  If so, would be
enough to run fstrim against /dev/vg_guests/libvirt-images

BTW, this is all with a eye toward realizing the optimization that
dm-cache provides for origin blocks that were discarded (like I said
before dm-cache doesn't promote from the origin if the corresponding
block was marked for discard).  So you don't _need_ to do any of
this.. purely about trying to optimize a bit more.

> My origin LV now has a slightly different name.  Here are the
> device-mapper tables:
> 
> $ sudo dmsetup table
> vg_guests-lv_cache_cdata: 0 419430400 linear 8:33 2099200
> vg_guests-lv_cache_cmeta: 0 2097152 linear 8:33 2048
> vg_guests-home: 0 209715200 linear 9:127 2048
> vg_guests-libvirt--images: 0 1677721600 cache 253:1 253:0 253:2 128 0 default 0
> vg_guests-libvirt--images_corig: 0 1677721600 linear 9:127 2055211008
> 
> So it does look as if my origin LV (vg_guests/libvirt-images) does use
> the 'cache' target.

Yeap.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] Testing the new LVM cache feature
  2014-05-29 21:58                     ` Mike Snitzer
@ 2014-05-30  9:04                       ` Richard W.M. Jones
  2014-05-30 10:30                         ` Richard W.M. Jones
  2014-05-30 13:38                         ` Mike Snitzer
  2014-05-30 11:53                       ` Mike Snitzer
  1 sibling, 2 replies; 37+ messages in thread
From: Richard W.M. Jones @ 2014-05-30  9:04 UTC (permalink / raw)
  To: LVM general discussion and development; +Cc: thornber, Zdenek Kabelac

On Thu, May 29, 2014 at 05:58:15PM -0400, Mike Snitzer wrote:
> On Thu, May 29 2014 at  5:19pm -0400, Richard W.M. Jones <rjones@redhat.com> wrote:
> > I'm concerned that would delete all the data on the origin LV ...
> 
> OK, but how are you testing with fio at this point?  Doesn't that
> destroy data too?

I'm testing with files.  This matches my final configuration which is
to use qcow2 files on an ext4 filesystem to store the VM disk images.

I set read_promote_adjustment == write_promote_adjustment == 1 and ran
fio 6 times, reusing the same test files.

It is faster than HDD (slower layer), but still much slower than the
SSD (fast layer).  Across the fio runs it's about 5 times slower than
the SSD, and the times don't improve at all over the runs.  (It is
more than twice as fast as the HDD though).

Somehow something is not working as I expected.

Back to an earlier point.  I wrote and you replied:

> > What would be bad about leaving write_promote_adjustment set at 0 or 1?
> > Wouldn't that mean that I get a simple LRU policy?  (That's probably
> > what I want.)
>
> Leaving them at 0 could result in cache thrashing.  But given how
> large your SSD is in relation to the origin you'd likely be OK for a
> while (at least until your cache gets quite full).

My SSD is ~200 GB and the backing origin LV is ~800 GB.  It is
unlikely the working set will ever grow > 200 GB, not least because I
cannot run that many VMs at the same time on the cluster.

So should I be concerned about cache thrashing?  Specifically: If the
cache layer gets full, then it will send the least recently used
blocks back to the slow layer, right?  (It seems obvious, but I'd like
to check that)

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] Testing the new LVM cache feature
  2014-05-30  9:04                       ` Richard W.M. Jones
@ 2014-05-30 10:30                         ` Richard W.M. Jones
  2014-05-30 13:38                         ` Mike Snitzer
  1 sibling, 0 replies; 37+ messages in thread
From: Richard W.M. Jones @ 2014-05-30 10:30 UTC (permalink / raw)
  To: LVM general discussion and development; +Cc: thornber, Zdenek Kabelac

On Fri, May 30, 2014 at 10:04:22AM +0100, Richard W.M. Jones wrote:
> On Thu, May 29, 2014 at 05:58:15PM -0400, Mike Snitzer wrote:
> > On Thu, May 29 2014 at  5:19pm -0400, Richard W.M. Jones <rjones@redhat.com> wrote:
> > > I'm concerned that would delete all the data on the origin LV ...
> > 
> > OK, but how are you testing with fio at this point?  Doesn't that
> > destroy data too?
> 
> I'm testing with files.  This matches my final configuration which is
> to use qcow2 files on an ext4 filesystem to store the VM disk images.
> 
> I set read_promote_adjustment == write_promote_adjustment == 1 and ran
> fio 6 times, reusing the same test files.
> 
> It is faster than HDD (slower layer), but still much slower than the
> SSD (fast layer).  Across the fio runs it's about 5 times slower than
> the SSD, and the times don't improve at all over the runs.  (It is
> more than twice as fast as the HDD though).
> 
> Somehow something is not working as I expected.

Additionally, I ran this command 5 times:

md5sum virt.*   # the test files

and then reran the fio test.  Since I have read_promote_adjustment == 1,
I would expect that these files should be promoted to the fast layer
by reading them several times.

However the results are still the same.  It's about twice as fast as
the HDDs, but 5 times slower than with the SDD.

Are there additional diagnostic commands I can use?

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] Testing the new LVM cache feature
  2014-05-30  9:04                       ` Richard W.M. Jones
  2014-05-30 10:30                         ` Richard W.M. Jones
@ 2014-05-30 13:38                         ` Mike Snitzer
  2014-05-30 13:40                           ` Richard W.M. Jones
                                             ` (2 more replies)
  1 sibling, 3 replies; 37+ messages in thread
From: Mike Snitzer @ 2014-05-30 13:38 UTC (permalink / raw)
  To: Richard W.M. Jones
  Cc: Heinz Mauelshagen, Zdenek Kabelac, thornber,
	LVM general discussion and development

On Fri, May 30 2014 at  5:04am -0400,
Richard W.M. Jones <rjones@redhat.com> wrote:

> On Thu, May 29, 2014 at 05:58:15PM -0400, Mike Snitzer wrote:
> > On Thu, May 29 2014 at  5:19pm -0400, Richard W.M. Jones <rjones@redhat.com> wrote:
> > > I'm concerned that would delete all the data on the origin LV ...
> > 
> > OK, but how are you testing with fio at this point?  Doesn't that
> > destroy data too?
> 
> I'm testing with files.  This matches my final configuration which is
> to use qcow2 files on an ext4 filesystem to store the VM disk images.
> 
> I set read_promote_adjustment == write_promote_adjustment == 1 and ran
> fio 6 times, reusing the same test files.
> 
> It is faster than HDD (slower layer), but still much slower than the
> SSD (fast layer).  Across the fio runs it's about 5 times slower than
> the SSD, and the times don't improve at all over the runs.  (It is
> more than twice as fast as the HDD though).
> 
> Somehow something is not working as I expected.

Why are you setting {read,write}_promote_adjustment to 1?  I asked you
to set write_promote_adjustment to 0.

Your random fio job won't hit the same blocks, and md5sum likely uses
buffered IO so unless you set 0 for both the cache won't aggressively
cache like you're expecting.

I explained earlier in this thread that the dm-cache is currently a
"hotspot cache".  Not a pure writeback cache like you're hoping.  We're
working to make it fit your expectations (you aren't alone in expecting
more performance!)

> Back to an earlier point.  I wrote and you replied:
> 
> > > What would be bad about leaving write_promote_adjustment set at 0 or 1?
> > > Wouldn't that mean that I get a simple LRU policy?  (That's probably
> > > what I want.)
> >
> > Leaving them at 0 could result in cache thrashing.  But given how
> > large your SSD is in relation to the origin you'd likely be OK for a
> > while (at least until your cache gets quite full).
> 
> My SSD is ~200 GB and the backing origin LV is ~800 GB.  It is
> unlikely the working set will ever grow > 200 GB, not least because I
> cannot run that many VMs at the same time on the cluster.
> 
> So should I be concerned about cache thrashing?  Specifically: If the
> cache layer gets full, then it will send the least recently used
> blocks back to the slow layer, right?  (It seems obvious, but I'd like
> to check that)

Right, you should be fine.  But I'll defer to Heinz on more particulars
about the cache replacement strategy that is provided in this case for
the "mq" (aka multi-queue policy).

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] Testing the new LVM cache feature
  2014-05-30 13:38                         ` Mike Snitzer
@ 2014-05-30 13:40                           ` Richard W.M. Jones
  2014-05-30 13:42                           ` Heinz Mauelshagen
  2014-05-30 13:46                           ` Richard W.M. Jones
  2 siblings, 0 replies; 37+ messages in thread
From: Richard W.M. Jones @ 2014-05-30 13:40 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Heinz Mauelshagen, Zdenek Kabelac, thornber,
	LVM general discussion and development

On Fri, May 30, 2014 at 09:38:14AM -0400, Mike Snitzer wrote:
> Why are you setting {read,write}_promote_adjustment to 1?  I asked you
> to set write_promote_adjustment to 0.

I didn't realize there would be (much) difference.  However I
will certainly try it with write_promote_adjustment == 0.

> Your random fio job won't hit the same blocks, and md5sum likely uses
> buffered IO so unless you set 0 for both the cache won't aggressively
> cache like you're expecting.

Right, that was definitely a mistake!  I will drop_caches between each
md5sum operation.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] Testing the new LVM cache feature
  2014-05-30 13:38                         ` Mike Snitzer
  2014-05-30 13:40                           ` Richard W.M. Jones
@ 2014-05-30 13:42                           ` Heinz Mauelshagen
  2014-05-30 13:54                             ` Richard W.M. Jones
  2014-05-30 13:46                           ` Richard W.M. Jones
  2 siblings, 1 reply; 37+ messages in thread
From: Heinz Mauelshagen @ 2014-05-30 13:42 UTC (permalink / raw)
  To: Mike Snitzer, Richard W.M. Jones
  Cc: Zdenek Kabelac, thornber, LVM general discussion and development


On 05/30/2014 03:38 PM, Mike Snitzer wrote:
> On Fri, May 30 2014 at  5:04am -0400,
> Richard W.M. Jones <rjones@redhat.com> wrote:
>
>> On Thu, May 29, 2014 at 05:58:15PM -0400, Mike Snitzer wrote:
>>> On Thu, May 29 2014 at  5:19pm -0400, Richard W.M. Jones <rjones@redhat.com> wrote:
>>>> I'm concerned that would delete all the data on the origin LV ...
>>> OK, but how are you testing with fio at this point?  Doesn't that
>>> destroy data too?
>> I'm testing with files.  This matches my final configuration which is
>> to use qcow2 files on an ext4 filesystem to store the VM disk images.
>>
>> I set read_promote_adjustment == write_promote_adjustment == 1 and ran
>> fio 6 times, reusing the same test files.
>>
>> It is faster than HDD (slower layer), but still much slower than the
>> SSD (fast layer).  Across the fio runs it's about 5 times slower than
>> the SSD, and the times don't improve at all over the runs.  (It is
>> more than twice as fast as the HDD though).
>>
>> Somehow something is not working as I expected.
> Why are you setting {read,write}_promote_adjustment to 1?  I asked you
> to set write_promote_adjustment to 0.
>
> Your random fio job won't hit the same blocks, and md5sum likely uses
> buffered IO so unless you set 0 for both the cache won't aggressively
> cache like you're expecting.
>
> I explained earlier in this thread that the dm-cache is currently a
> "hotspot cache".  Not a pure writeback cache like you're hoping.  We're
> working to make it fit your expectations (you aren't alone in expecting
> more performance!)
>
>> Back to an earlier point.  I wrote and you replied:
>>
>>>> What would be bad about leaving write_promote_adjustment set at 0 or 1?
>>>> Wouldn't that mean that I get a simple LRU policy?  (That's probably
>>>> what I want.)
>>> Leaving them at 0 could result in cache thrashing.  But given how
>>> large your SSD is in relation to the origin you'd likely be OK for a
>>> while (at least until your cache gets quite full).
>> My SSD is ~200 GB and the backing origin LV is ~800 GB.  It is
>> unlikely the working set will ever grow > 200 GB, not least because I
>> cannot run that many VMs at the same time on the cluster.
>>
>> So should I be concerned about cache thrashing?  Specifically: If the
>> cache layer gets full, then it will send the least recently used
>> blocks back to the slow layer, right?  (It seems obvious, but I'd like
>> to check that)
> Right, you should be fine.  But I'll defer to Heinz on more particulars
> about the cache replacement strategy that is provided in this case for
> the "mq" (aka multi-queue policy).

If you ask for immediate promotion, you get immediate promotion if the 
cache gets overcommited.
Of course you can tweak the promotion adjustments after warming the cache in
order to reduce any thrashing

Heinz

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] Testing the new LVM cache feature
  2014-05-30 13:42                           ` Heinz Mauelshagen
@ 2014-05-30 13:54                             ` Richard W.M. Jones
  2014-05-30 13:58                               ` Zdenek Kabelac
  0 siblings, 1 reply; 37+ messages in thread
From: Richard W.M. Jones @ 2014-05-30 13:54 UTC (permalink / raw)
  To: Heinz Mauelshagen
  Cc: Zdenek Kabelac, thornber, Mike Snitzer,
	LVM general discussion and development

[-- Attachment #1: Type: text/plain, Size: 843 bytes --]

I'm attaching 3 tests that I have run so (hopefully) you can see
what I'm observing, or point out if I'm making a mistake.

- virt-ham0-raid1.txt

  Test with an ext4 filesystem located in an LV on the RAID 1 (md)
  array of 2 x WD NAS hard disks.

- virt-ham0-ssd.txt

  Test with an ext4 filesystem located in an LV on the Samsung EVO SSD.

- virt-ham0-lvmcache.txt

  Test with LVM-cache.

For all tests, the same virt.job file is used:

[virt]
ioengine=libaio
iodepth=4
rw=randrw
bs=64k
direct=1
size=1g
numjobs=4

All tests are run on the same hardware.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://libguestfs.org

[-- Attachment #2: virt-ham0-raid1.txt --]
[-- Type: text/plain, Size: 9385 bytes --]

virt: (g=0): rw=randrw, bs=64K-64K/64K-64K/64K-64K, ioengine=libaio, iodepth=4
...
virt: (g=0): rw=randrw, bs=64K-64K/64K-64K/64K-64K, ioengine=libaio, iodepth=4
fio-2.1.2
Starting 4 processes
virt: Laying out IO file(s) (1 file(s) / 1024MB)
virt: Laying out IO file(s) (1 file(s) / 1024MB)
virt: Laying out IO file(s) (1 file(s) / 1024MB)
virt: Laying out IO file(s) (1 file(s) / 1024MB)

virt: (groupid=0, jobs=1): err= 0: pid=2195: Wed May 28 22:12:50 2014
  read : io=523520KB, bw=2600.4KB/s, iops=40, runt=201329msec
    slat (usec): min=23, max=24586, avg=65.89, stdev=306.38
    clat (usec): min=305, max=1765.7K, avg=84912.67, stdev=124153.30
     lat (usec): min=367, max=1765.8K, avg=84979.16, stdev=124150.29
    clat percentiles (usec):
     |  1.00th=[  780],  5.00th=[ 6944], 10.00th=[ 9536], 20.00th=[14144],
     | 30.00th=[19840], 40.00th=[28032], 50.00th=[40704], 60.00th=[57600],
     | 70.00th=[82432], 80.00th=[125440], 90.00th=[209920], 95.00th=[309248],
     | 99.00th=[593920], 99.50th=[790528], 99.90th=[1204224], 99.95th=[1286144],
     | 99.99th=[1761280]
    bw (KB  /s): min=   82, max=12416, per=25.85%, avg=2688.32, stdev=1545.40
  write: io=525056KB, bw=2607.1KB/s, iops=40, runt=201329msec
    slat (usec): min=31, max=140675, avg=132.77, stdev=1945.34
    clat (usec): min=346, max=1355.5K, avg=13280.27, stdev=57149.27
     lat (usec): min=404, max=1355.6K, avg=13413.69, stdev=57202.63
    clat percentiles (usec):
     |  1.00th=[  358],  5.00th=[  374], 10.00th=[  434], 20.00th=[  446],
     | 30.00th=[  644], 40.00th=[  852], 50.00th=[ 1272], 60.00th=[ 1320],
     | 70.00th=[ 1496], 80.00th=[ 5728], 90.00th=[18048], 95.00th=[63232],
     | 99.00th=[257024], 99.50th=[382976], 99.90th=[831488], 99.95th=[946176],
     | 99.99th=[1351680]
    bw (KB  /s): min=  121, max=10709, per=25.96%, avg=2708.14, stdev=1769.64
    lat (usec) : 500=12.91%, 750=6.04%, 1000=3.25%
    lat (msec) : 2=16.32%, 4=1.65%, 10=7.59%, 20=12.91%, 50=14.75%
    lat (msec) : 100=9.90%, 250=10.51%, 500=3.19%, 750=0.66%, 1000=0.20%
    lat (msec) : 2000=0.13%
  cpu          : usr=0.11%, sys=0.54%, ctx=16504, majf=0, minf=23
  IO depths    : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=8180/w=8204/d=0, short=r=0/w=0/d=0
virt: (groupid=0, jobs=1): err= 0: pid=2196: Wed May 28 22:12:50 2014
  read : io=523520KB, bw=2947.6KB/s, iops=46, runt=177615msec
    slat (usec): min=24, max=59936, avg=81.73, stdev=987.31
    clat (usec): min=149, max=1054.1K, avg=74995.11, stdev=93418.23
     lat (usec): min=369, max=1054.2K, avg=75077.47, stdev=93411.06
    clat percentiles (msec):
     |  1.00th=[    5],  5.00th=[    8], 10.00th=[   10], 20.00th=[   16],
     | 30.00th=[   22], 40.00th=[   31], 50.00th=[   42], 60.00th=[   57],
     | 70.00th=[   80], 80.00th=[  116], 90.00th=[  180], 95.00th=[  260],
     | 99.00th=[  437], 99.50th=[  529], 99.90th=[  840], 99.95th=[  979],
     | 99.99th=[ 1057]
    bw (KB  /s): min=  113, max= 6898, per=29.26%, avg=3043.36, stdev=1217.82
  write: io=525056KB, bw=2956.2KB/s, iops=46, runt=177615msec
    slat (usec): min=33, max=140655, avg=128.77, stdev=2069.57
    clat (usec): min=258, max=1000.6K, avg=11590.37, stdev=57029.08
     lat (usec): min=403, max=1000.7K, avg=11719.76, stdev=57077.03
    clat percentiles (usec):
     |  1.00th=[  362],  5.00th=[  378], 10.00th=[  434], 20.00th=[  446],
     | 30.00th=[  612], 40.00th=[  748], 50.00th=[ 1224], 60.00th=[ 1304],
     | 70.00th=[ 1352], 80.00th=[ 1528], 90.00th=[ 7776], 95.00th=[55040],
     | 99.00th=[244736], 99.50th=[362496], 99.90th=[913408], 99.95th=[929792],
     | 99.99th=[1003520]
    bw (KB  /s): min=  140, max= 7409, per=29.16%, avg=3042.19, stdev=1466.35
    lat (usec) : 250=0.01%, 500=13.49%, 750=6.57%, 1000=3.19%
    lat (msec) : 2=19.84%, 4=1.70%, 10=5.92%, 20=9.95%, 50=14.89%
    lat (msec) : 100=10.81%, 250=10.45%, 500=2.73%, 750=0.31%, 1000=0.13%
    lat (msec) : 2000=0.02%
  cpu          : usr=0.14%, sys=0.59%, ctx=16858, majf=0, minf=23
  IO depths    : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=8180/w=8204/d=0, short=r=0/w=0/d=0
virt: (groupid=0, jobs=1): err= 0: pid=2197: Wed May 28 22:12:50 2014
  read : io=523520KB, bw=2923.2KB/s, iops=45, runt=179092msec
    slat (usec): min=20, max=99838, avg=91.84, stdev=1411.97
    clat (usec): min=160, max=1512.4K, avg=75755.06, stdev=105522.71
     lat (usec): min=382, max=1512.9K, avg=75847.54, stdev=105514.14
    clat percentiles (msec):
     |  1.00th=[    5],  5.00th=[    8], 10.00th=[   10], 20.00th=[   15],
     | 30.00th=[   21], 40.00th=[   29], 50.00th=[   40], 60.00th=[   56],
     | 70.00th=[   76], 80.00th=[  112], 90.00th=[  186], 95.00th=[  269],
     | 99.00th=[  469], 99.50th=[  586], 99.90th=[ 1156], 99.95th=[ 1287],
     | 99.99th=[ 1516]
    bw (KB  /s): min=  124, max= 6144, per=29.37%, avg=3055.29, stdev=1223.87
  write: io=525056KB, bw=2931.8KB/s, iops=45, runt=179092msec
    slat (usec): min=35, max=140660, avg=114.41, stdev=1768.12
    clat (usec): min=345, max=1441.6K, avg=11547.93, stdev=62451.29
     lat (usec): min=415, max=1441.7K, avg=11663.01, stdev=62476.14
    clat percentiles (usec):
     |  1.00th=[  362],  5.00th=[  378], 10.00th=[  434], 20.00th=[  446],
     | 30.00th=[  596], 40.00th=[  756], 50.00th=[ 1224], 60.00th=[ 1304],
     | 70.00th=[ 1352], 80.00th=[ 1544], 90.00th=[ 8896], 95.00th=[37632],
     | 99.00th=[232448], 99.50th=[350208], 99.90th=[995328], 99.95th=[1044480],
     | 99.99th=[1433600]
    bw (KB  /s): min=   80, max= 9325, per=29.37%, avg=3063.24, stdev=1532.25
    lat (usec) : 250=0.01%, 500=13.56%, 750=6.50%, 1000=3.08%
    lat (msec) : 2=19.73%, 4=1.62%, 10=6.32%, 20=10.52%, 50=14.89%
    lat (msec) : 100=10.77%, 250=9.75%, 500=2.72%, 750=0.27%, 1000=0.13%
    lat (msec) : 2000=0.14%
  cpu          : usr=0.14%, sys=0.59%, ctx=16985, majf=0, minf=23
  IO depths    : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=8180/w=8204/d=0, short=r=0/w=0/d=0
virt: (groupid=0, jobs=1): err= 0: pid=2198: Wed May 28 22:12:50 2014
  read : io=523520KB, bw=2629.9KB/s, iops=41, runt=199069msec
    slat (usec): min=25, max=99063, avg=89.77, stdev=1365.24
    clat (usec): min=112, max=1392.1K, avg=83373.43, stdev=118987.34
     lat (usec): min=369, max=1392.1K, avg=83463.84, stdev=118977.09
    clat percentiles (msec):
     |  1.00th=[    3],  5.00th=[    7], 10.00th=[   10], 20.00th=[   15],
     | 30.00th=[   21], 40.00th=[   28], 50.00th=[   40], 60.00th=[   57],
     | 70.00th=[   81], 80.00th=[  122], 90.00th=[  206], 95.00th=[  310],
     | 99.00th=[  603], 99.50th=[  734], 99.90th=[  979], 99.95th=[ 1156],
     | 99.99th=[ 1401]
    bw (KB  /s): min=   64, max= 9708, per=26.35%, avg=2740.70, stdev=1540.11
  write: io=525056KB, bw=2637.6KB/s, iops=41, runt=199069msec
    slat (usec): min=38, max=140657, avg=121.47, stdev=1860.80
    clat (usec): min=349, max=1002.9K, avg=13698.39, stdev=66153.66
     lat (usec): min=405, max=1002.9K, avg=13820.49, stdev=66192.16
    clat percentiles (usec):
     |  1.00th=[  362],  5.00th=[  378], 10.00th=[  434], 20.00th=[  446],
     | 30.00th=[  652], 40.00th=[  876], 50.00th=[ 1272], 60.00th=[ 1320],
     | 70.00th=[ 1448], 80.00th=[ 2992], 90.00th=[15552], 95.00th=[36096],
     | 99.00th=[321536], 99.50th=[489472], 99.90th=[962560], 99.95th=[995328],
     | 99.99th=[1003520]
    bw (KB  /s): min=   71, max= 9836, per=26.41%, avg=2755.14, stdev=1757.17
    lat (usec) : 250=0.02%, 500=12.84%, 750=5.83%, 1000=3.12%
    lat (msec) : 2=17.58%, 4=1.73%, 10=7.50%, 20=12.41%, 50=14.97%
    lat (msec) : 100=9.86%, 250=9.86%, 500=3.19%, 750=0.78%, 1000=0.25%
    lat (msec) : 2000=0.06%
  cpu          : usr=0.12%, sys=0.53%, ctx=16540, majf=0, minf=22
  IO depths    : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=8180/w=8204/d=0, short=r=0/w=0/d=0

Run status group 0 (all jobs):
   READ: io=2045.0MB, aggrb=10401KB/s, minb=2600KB/s, maxb=2947KB/s, mint=177615msec, maxt=201329msec
  WRITE: io=2051.0MB, aggrb=10431KB/s, minb=2607KB/s, maxb=2956KB/s, mint=177615msec, maxt=201329msec

Disk stats (read/write):
    dm-0: ios=32841/33299, merge=0/0, ticks=2623746/506809, in_queue=3130698, util=100.00%, aggrios=32855/33392, aggrmerge=0/0, aggrticks=0/0, aggrin_queue=0, aggrutil=0.00%
    md127: ios=32855/33392, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=16426/33225, aggrmerge=1/168, aggrticks=1311820/306619, aggrin_queue=1618332, aggrutil=98.91%
  sda: ios=8494/33223, merge=0/171, ticks=464540/232964, in_queue=697442, util=96.18%
  sdb: ios=24359/33228, merge=2/166, ticks=2159100/380274, in_queue=2539222, util=98.91%

[-- Attachment #3: virt-ham0-ssd.txt --]
[-- Type: text/plain, Size: 8181 bytes --]

virt: (g=0): rw=randrw, bs=64K-64K/64K-64K/64K-64K, ioengine=libaio, iodepth=4
...
virt: (g=0): rw=randrw, bs=64K-64K/64K-64K/64K-64K, ioengine=libaio, iodepth=4
fio-2.1.2
Starting 4 processes

virt: (groupid=0, jobs=1): err= 0: pid=2177: Wed May 28 22:07:58 2014
  read : io=523520KB, bw=28983KB/s, iops=452, runt= 18063msec
    slat (usec): min=23, max=4451, avg=42.52, stdev=61.50
    clat (usec): min=136, max=26872, avg=4360.12, stdev=1103.80
     lat (msec): min=2, max=26, avg= 4.40, stdev= 1.10
    clat percentiles (usec):
     |  1.00th=[ 3824],  5.00th=[ 3888], 10.00th=[ 3920], 20.00th=[ 3952],
     | 30.00th=[ 4016], 40.00th=[ 4080], 50.00th=[ 4128], 60.00th=[ 4256],
     | 70.00th=[ 4320], 80.00th=[ 4448], 90.00th=[ 4640], 95.00th=[ 4960],
     | 99.00th=[ 9792], 99.50th=[10304], 99.90th=[17024], 99.95th=[21888],
     | 99.99th=[26752]
    bw (KB  /s): min=25600, max=33280, per=25.02%, avg=29007.28, stdev=1840.52
  write: io=525056KB, bw=29068KB/s, iops=454, runt= 18063msec
    slat (usec): min=26, max=5046, avg=48.33, stdev=57.35
    clat (msec): min=3, max=29, avg= 4.36, stdev= 1.10
     lat (msec): min=3, max=29, avg= 4.41, stdev= 1.11
    clat percentiles (usec):
     |  1.00th=[ 3856],  5.00th=[ 3920], 10.00th=[ 3952], 20.00th=[ 3984],
     | 30.00th=[ 4016], 40.00th=[ 4080], 50.00th=[ 4128], 60.00th=[ 4256],
     | 70.00th=[ 4320], 80.00th=[ 4448], 90.00th=[ 4640], 95.00th=[ 4896],
     | 99.00th=[ 9920], 99.50th=[10560], 99.90th=[16320], 99.95th=[21376],
     | 99.99th=[29056]
    bw (KB  /s): min=25447, max=31744, per=25.02%, avg=29091.58, stdev=1503.68
    lat (usec) : 250=0.01%
    lat (msec) : 4=24.84%, 10=74.26%, 20=0.82%, 50=0.08%
  cpu          : usr=1.10%, sys=4.57%, ctx=16802, majf=0, minf=23
  IO depths    : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=8180/w=8204/d=0, short=r=0/w=0/d=0
virt: (groupid=0, jobs=1): err= 0: pid=2178: Wed May 28 22:07:58 2014
  read : io=523520KB, bw=28986KB/s, iops=452, runt= 18061msec
    slat (usec): min=20, max=4734, avg=44.14, stdev=65.97
    clat (usec): min=134, max=34582, avg=4367.22, stdev=1102.36
     lat (msec): min=2, max=34, avg= 4.41, stdev= 1.10
    clat percentiles (usec):
     |  1.00th=[ 3824],  5.00th=[ 3888], 10.00th=[ 3920], 20.00th=[ 3984],
     | 30.00th=[ 4016], 40.00th=[ 4080], 50.00th=[ 4128], 60.00th=[ 4256],
     | 70.00th=[ 4320], 80.00th=[ 4448], 90.00th=[ 4704], 95.00th=[ 4960],
     | 99.00th=[ 9920], 99.50th=[10304], 99.90th=[16512], 99.95th=[17024],
     | 99.99th=[34560]
    bw (KB  /s): min=25804, max=33280, per=25.03%, avg=29016.61, stdev=1835.93
  write: io=525056KB, bw=29071KB/s, iops=454, runt= 18061msec
    slat (usec): min=26, max=2297, avg=49.25, stdev=29.79
    clat (msec): min=3, max=28, avg= 4.35, stdev= 1.07
     lat (msec): min=3, max=28, avg= 4.40, stdev= 1.07
    clat percentiles (usec):
     |  1.00th=[ 3824],  5.00th=[ 3888], 10.00th=[ 3920], 20.00th=[ 3984],
     | 30.00th=[ 4016], 40.00th=[ 4080], 50.00th=[ 4128], 60.00th=[ 4192],
     | 70.00th=[ 4320], 80.00th=[ 4448], 90.00th=[ 4640], 95.00th=[ 4896],
     | 99.00th=[ 9920], 99.50th=[10304], 99.90th=[16192], 99.95th=[18816],
     | 99.99th=[28288]
    bw (KB  /s): min=25447, max=31936, per=25.03%, avg=29099.78, stdev=1497.17
    lat (usec) : 250=0.01%
    lat (msec) : 4=25.34%, 10=73.77%, 20=0.83%, 50=0.04%
  cpu          : usr=1.23%, sys=4.71%, ctx=16888, majf=0, minf=23
  IO depths    : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=8180/w=8204/d=0, short=r=0/w=0/d=0
virt: (groupid=0, jobs=1): err= 0: pid=2179: Wed May 28 22:07:58 2014
  read : io=523520KB, bw=28983KB/s, iops=452, runt= 18063msec
    slat (usec): min=15, max=4262, avg=41.83, stdev=65.79
    clat (usec): min=128, max=35128, avg=4352.56, stdev=1194.36
     lat (msec): min=1, max=35, avg= 4.39, stdev= 1.19
    clat percentiles (usec):
     |  1.00th=[ 3824],  5.00th=[ 3888], 10.00th=[ 3920], 20.00th=[ 3952],
     | 30.00th=[ 4016], 40.00th=[ 4080], 50.00th=[ 4128], 60.00th=[ 4192],
     | 70.00th=[ 4320], 80.00th=[ 4448], 90.00th=[ 4640], 95.00th=[ 4896],
     | 99.00th=[ 9792], 99.50th=[10432], 99.90th=[17280], 99.95th=[20864],
     | 99.99th=[35072]
    bw (KB  /s): min=25676, max=33402, per=25.02%, avg=29002.72, stdev=1797.83
  write: io=525056KB, bw=29068KB/s, iops=454, runt= 18063msec
    slat (usec): min=22, max=1784, avg=47.23, stdev=24.88
    clat (usec): min=296, max=35165, avg=4367.18, stdev=1113.83
     lat (msec): min=1, max=35, avg= 4.41, stdev= 1.11
    clat percentiles (usec):
     |  1.00th=[ 3856],  5.00th=[ 3920], 10.00th=[ 3952], 20.00th=[ 3984],
     | 30.00th=[ 4048], 40.00th=[ 4080], 50.00th=[ 4128], 60.00th=[ 4256],
     | 70.00th=[ 4320], 80.00th=[ 4448], 90.00th=[ 4640], 95.00th=[ 4960],
     | 99.00th=[ 9792], 99.50th=[10176], 99.90th=[16320], 99.95th=[20608],
     | 99.99th=[35072]
    bw (KB  /s): min=25223, max=32127, per=25.02%, avg=29093.50, stdev=1608.39
    lat (usec) : 250=0.01%, 500=0.01%
    lat (msec) : 2=0.01%, 4=23.93%, 10=75.24%, 20=0.73%, 50=0.07%
  cpu          : usr=1.07%, sys=4.55%, ctx=16766, majf=0, minf=23
  IO depths    : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=8180/w=8204/d=0, short=r=0/w=0/d=0
virt: (groupid=0, jobs=1): err= 0: pid=2180: Wed May 28 22:07:58 2014
  read : io=523520KB, bw=28985KB/s, iops=452, runt= 18062msec
    slat (usec): min=24, max=5553, avg=44.79, stdev=80.00
    clat (usec): min=138, max=34979, avg=4358.30, stdev=1106.07
     lat (msec): min=2, max=35, avg= 4.40, stdev= 1.10
    clat percentiles (usec):
     |  1.00th=[ 3824],  5.00th=[ 3888], 10.00th=[ 3920], 20.00th=[ 3984],
     | 30.00th=[ 4016], 40.00th=[ 4080], 50.00th=[ 4128], 60.00th=[ 4256],
     | 70.00th=[ 4320], 80.00th=[ 4448], 90.00th=[ 4704], 95.00th=[ 4960],
     | 99.00th=[ 9792], 99.50th=[10304], 99.90th=[16192], 99.95th=[19584],
     | 99.99th=[35072]
    bw (KB  /s): min=25243, max=33280, per=25.02%, avg=29005.64, stdev=1815.27
  write: io=525056KB, bw=29070KB/s, iops=454, runt= 18062msec
    slat (usec): min=27, max=4550, avg=50.17, stdev=52.18
    clat (usec): min=372, max=34869, avg=4354.62, stdev=1175.13
     lat (msec): min=3, max=34, avg= 4.41, stdev= 1.17
    clat percentiles (usec):
     |  1.00th=[ 3856],  5.00th=[ 3888], 10.00th=[ 3952], 20.00th=[ 3984],
     | 30.00th=[ 4016], 40.00th=[ 4080], 50.00th=[ 4128], 60.00th=[ 4192],
     | 70.00th=[ 4320], 80.00th=[ 4448], 90.00th=[ 4640], 95.00th=[ 4896],
     | 99.00th=[ 9920], 99.50th=[10432], 99.90th=[18304], 99.95th=[22144],
     | 99.99th=[35072]
    bw (KB  /s): min=25377, max=32000, per=25.02%, avg=29094.31, stdev=1546.49
    lat (usec) : 250=0.01%, 500=0.01%
    lat (msec) : 4=25.07%, 10=74.04%, 20=0.82%, 50=0.05%
  cpu          : usr=1.02%, sys=4.93%, ctx=16748, majf=0, minf=22
  IO depths    : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=8180/w=8204/d=0, short=r=0/w=0/d=0

Run status group 0 (all jobs):
   READ: io=2045.0MB, aggrb=115932KB/s, minb=28983KB/s, maxb=28986KB/s, mint=18061msec, maxt=18063msec
  WRITE: io=2051.0MB, aggrb=116272KB/s, minb=29068KB/s, maxb=29071KB/s, mint=18061msec, maxt=18063msec

Disk stats (read/write):
    dm-1: ios=32531/32589, merge=0/0, ticks=141395/170612, in_queue=312036, util=99.54%, aggrios=32720/32831, aggrmerge=0/12, aggrticks=142412/171944, aggrin_queue=314244, aggrutil=99.45%
  sdc: ios=32720/32831, merge=0/12, ticks=142412/171944, in_queue=314244, util=99.45%

[-- Attachment #4: virt-ham0-lvmcache.txt --]
[-- Type: text/plain, Size: 9695 bytes --]

virt: (g=0): rw=randrw, bs=64K-64K/64K-64K/64K-64K, ioengine=libaio, iodepth=4
...
virt: (g=0): rw=randrw, bs=64K-64K/64K-64K/64K-64K, ioengine=libaio, iodepth=4
fio-2.1.2
Starting 4 processes

virt: (groupid=0, jobs=1): err= 0: pid=4678: Fri May 30 14:49:36 2014
  read : io=523520KB, bw=6385.7KB/s, iops=99, runt= 81984msec
    slat (usec): min=15, max=51287, avg=92.23, stdev=1109.90
    clat (usec): min=3, max=17110, avg=741.35, stdev=1099.40
     lat (usec): min=374, max=51293, avg=834.14, stdev=1547.02
    clat percentiles (usec):
     |  1.00th=[  346],  5.00th=[  350], 10.00th=[  354], 20.00th=[  362],
     | 30.00th=[  374], 40.00th=[  378], 50.00th=[  398], 60.00th=[  430],
     | 70.00th=[  450], 80.00th=[  564], 90.00th=[ 1448], 95.00th=[ 2960],
     | 99.00th=[ 5664], 99.50th=[ 6880], 99.90th=[12096], 99.95th=[12608],
     | 99.99th=[17024]
    bw (KB  /s): min=  106, max=25344, per=28.73%, avg=6890.66, stdev=3382.41
  write: io=525056KB, bw=6404.4KB/s, iops=100, runt= 81984msec
    slat (usec): min=23, max=79877, avg=113.74, stdev=1656.45
    clat (usec): min=267, max=939139, avg=38930.63, stdev=72364.78
     lat (usec): min=343, max=939175, avg=39045.06, stdev=72510.87
    clat percentiles (usec):
     |  1.00th=[  298],  5.00th=[  302], 10.00th=[  326], 20.00th=[  844],
     | 30.00th=[ 6688], 40.00th=[38144], 50.00th=[41728], 60.00th=[43776],
     | 70.00th=[46848], 80.00th=[50944], 90.00th=[56064], 95.00th=[61184],
     | 99.00th=[181248], 99.50th=[790528], 99.90th=[872448], 99.95th=[905216],
     | 99.99th=[937984]
    bw (KB  /s): min=   71, max=22528, per=28.70%, avg=6904.82, stdev=3083.01
    lat (usec) : 4=0.01%, 10=0.02%, 100=0.01%, 250=0.01%, 500=47.35%
    lat (usec) : 750=4.99%, 1000=1.57%
    lat (msec) : 2=3.75%, 4=5.01%, 10=2.75%, 20=0.51%, 50=23.00%
    lat (msec) : 100=10.40%, 250=0.19%, 500=0.04%, 750=0.07%, 1000=0.32%
  cpu          : usr=0.25%, sys=1.29%, ctx=16566, majf=0, minf=24
  IO depths    : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=8180/w=8204/d=0, short=r=0/w=0/d=0
virt: (groupid=0, jobs=1): err= 0: pid=4679: Fri May 30 14:49:36 2014
  read : io=523520KB, bw=6537.6KB/s, iops=102, runt= 80079msec
    slat (usec): min=16, max=61433, avg=102.37, stdev=1288.95
    clat (usec): min=2, max=16641, avg=737.39, stdev=1134.94
     lat (usec): min=376, max=61436, avg=840.31, stdev=1699.15
    clat percentiles (usec):
     |  1.00th=[  342],  5.00th=[  350], 10.00th=[  354], 20.00th=[  362],
     | 30.00th=[  374], 40.00th=[  378], 50.00th=[  398], 60.00th=[  430],
     | 70.00th=[  450], 80.00th=[  580], 90.00th=[ 1288], 95.00th=[ 2896],
     | 99.00th=[ 5664], 99.50th=[ 7648], 99.90th=[12864], 99.95th=[14656],
     | 99.99th=[16768]
    bw (KB  /s): min=  298, max=27181, per=29.47%, avg=7067.77, stdev=3871.99
  write: io=525056KB, bw=6556.8KB/s, iops=102, runt= 80079msec
    slat (usec): min=26, max=48770, avg=83.15, stdev=890.23
    clat (usec): min=266, max=5409.6K, avg=38023.69, stdev=102346.26
     lat (usec): min=337, max=5409.7K, avg=38107.52, stdev=102438.81
    clat percentiles (usec):
     |  1.00th=[  294],  5.00th=[  302], 10.00th=[  318], 20.00th=[  382],
     | 30.00th=[ 3248], 40.00th=[37120], 50.00th=[41216], 60.00th=[43776],
     | 70.00th=[46336], 80.00th=[50432], 90.00th=[56064], 95.00th=[61184],
     | 99.00th=[173056], 99.50th=[790528], 99.90th=[897024], 99.95th=[921600],
     | 99.99th=[5406720]
    bw (KB  /s): min=  298, max=24710, per=29.43%, avg=7078.80, stdev=3321.80
    lat (usec) : 4=0.02%, 10=0.03%, 50=0.01%, 250=0.01%, 500=48.87%
    lat (usec) : 750=5.78%, 1000=1.87%
    lat (msec) : 2=3.39%, 4=4.72%, 10=2.66%, 20=0.51%, 50=21.49%
    lat (msec) : 100=10.09%, 250=0.15%, 500=0.04%, 750=0.05%, 1000=0.31%
    lat (msec) : >=2000=0.01%
  cpu          : usr=0.25%, sys=1.35%, ctx=16791, majf=0, minf=24
  IO depths    : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=8180/w=8204/d=0, short=r=0/w=0/d=0
virt: (groupid=0, jobs=1): err= 0: pid=4680: Fri May 30 14:49:36 2014
  read : io=523520KB, bw=5996.4KB/s, iops=93, runt= 87307msec
    slat (usec): min=15, max=50215, avg=79.95, stdev=812.30
    clat (usec): min=4, max=23674, avg=754.82, stdev=1161.37
     lat (usec): min=380, max=50222, avg=835.35, stdev=1406.50
    clat percentiles (usec):
     |  1.00th=[  346],  5.00th=[  350], 10.00th=[  354], 20.00th=[  362],
     | 30.00th=[  370], 40.00th=[  378], 50.00th=[  394], 60.00th=[  430],
     | 70.00th=[  446], 80.00th=[  572], 90.00th=[ 1496], 95.00th=[ 3024],
     | 99.00th=[ 6112], 99.50th=[ 7712], 99.90th=[12352], 99.95th=[13888],
     | 99.99th=[23680]
    bw (KB  /s): min=  372, max=26368, per=26.72%, avg=6409.15, stdev=3611.01
  write: io=525056KB, bw=6013.1KB/s, iops=93, runt= 87307msec
    slat (usec): min=25, max=69281, avg=119.08, stdev=1629.76
    clat (usec): min=288, max=4229.2K, avg=41517.28, stdev=86297.67
     lat (usec): min=345, max=4229.3K, avg=41637.09, stdev=86496.27
    clat percentiles (usec):
     |  1.00th=[  298],  5.00th=[  326], 10.00th=[  540], 20.00th=[ 5280],
     | 30.00th=[22144], 40.00th=[38656], 50.00th=[41728], 60.00th=[44288],
     | 70.00th=[46848], 80.00th=[50944], 90.00th=[56064], 95.00th=[62208],
     | 99.00th=[183296], 99.50th=[790528], 99.90th=[888832], 99.95th=[913408],
     | 99.99th=[4227072]
    bw (KB  /s): min=   91, max=23808, per=26.53%, avg=6381.50, stdev=3178.76
    lat (usec) : 10=0.02%, 100=0.01%, 500=43.29%, 750=4.35%, 1000=1.67%
    lat (msec) : 2=3.61%, 4=5.15%, 10=3.24%, 20=2.69%, 50=25.01%
    lat (msec) : 100=10.25%, 250=0.26%, 500=0.04%, 750=0.10%, 1000=0.32%
    lat (msec) : >=2000=0.01%
  cpu          : usr=0.22%, sys=1.24%, ctx=16506, majf=0, minf=24
  IO depths    : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=8180/w=8204/d=0, short=r=0/w=0/d=0
virt: (groupid=0, jobs=1): err= 0: pid=4681: Fri May 30 14:49:36 2014
  read : io=523520KB, bw=6017.6KB/s, iops=94, runt= 86999msec
    slat (usec): min=15, max=50273, avg=88.62, stdev=1003.90
    clat (usec): min=2, max=16356, avg=742.71, stdev=1140.56
     lat (usec): min=368, max=50278, avg=831.90, stdev=1505.02
    clat percentiles (usec):
     |  1.00th=[  346],  5.00th=[  350], 10.00th=[  354], 20.00th=[  362],
     | 30.00th=[  370], 40.00th=[  378], 50.00th=[  398], 60.00th=[  430],
     | 70.00th=[  446], 80.00th=[  548], 90.00th=[ 1416], 95.00th=[ 2960],
     | 99.00th=[ 6048], 99.50th=[ 8032], 99.90th=[12608], 99.95th=[13120],
     | 99.99th=[16320]
    bw (KB  /s): min=  212, max=23936, per=26.82%, avg=6433.62, stdev=3648.50
  write: io=525056KB, bw=6035.2KB/s, iops=94, runt= 86999msec
    slat (usec): min=21, max=83882, avg=116.67, stdev=1719.48
    clat (usec): min=279, max=2542.4K, avg=41373.74, stdev=77980.27
     lat (usec): min=352, max=2542.4K, avg=41491.13, stdev=78185.74
    clat percentiles (usec):
     |  1.00th=[  298],  5.00th=[  322], 10.00th=[  394], 20.00th=[ 4448],
     | 30.00th=[22656], 40.00th=[38656], 50.00th=[41728], 60.00th=[44288],
     | 70.00th=[46848], 80.00th=[50944], 90.00th=[56064], 95.00th=[62208],
     | 99.00th=[183296], 99.50th=[782336], 99.90th=[897024], 99.95th=[913408],
     | 99.99th=[2539520]
    bw (KB  /s): min=  268, max=21760, per=26.76%, avg=6437.61, stdev=3158.84
    lat (usec) : 4=0.01%, 10=0.02%, 100=0.01%, 500=44.11%, 750=4.68%
    lat (usec) : 1000=1.61%
    lat (msec) : 2=3.28%, 4=4.80%, 10=2.88%, 20=2.45%, 50=25.24%
    lat (msec) : 100=10.17%, 250=0.27%, 500=0.04%, 750=0.10%, 1000=0.31%
    lat (msec) : >=2000=0.01%
  cpu          : usr=0.26%, sys=1.19%, ctx=16414, majf=0, minf=23
  IO depths    : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=8180/w=8204/d=0, short=r=0/w=0/d=0

Run status group 0 (all jobs):
   READ: io=2045.0MB, aggrb=23985KB/s, minb=5996KB/s, maxb=6537KB/s, mint=80079msec, maxt=87307msec
  WRITE: io=2051.0MB, aggrb=24055KB/s, minb=6013KB/s, maxb=6556KB/s, mint=80079msec, maxt=87307msec

Disk stats (read/write):
    dm-3: ios=32666/32817, merge=0/0, ticks=24343/1321747, in_queue=1346205, util=99.98%, aggrios=11107/11174, aggrmerge=0/0, aggrticks=8553/834112, aggrin_queue=843695, aggrutil=99.96%
    dm-0: ios=33323/6886, merge=0/0, ticks=25660/6683, in_queue=32346, util=16.79%, aggrios=33299/6884, aggrmerge=24/2, aggrticks=25549/6673, aggrin_queue=32121, aggrutil=16.75%
  sdc: ios=33299/6884, merge=24/2, ticks=25549/6673, in_queue=32121, util=16.75%
  dm-1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
    dm-2: ios=0/26636, merge=0/0, ticks=0/2495655, in_queue=2498741, util=99.96%, aggrios=0/26654, aggrmerge=0/0, aggrticks=0/0, aggrin_queue=0, aggrutil=0.00%
    md127: ios=0/26654, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=0/26616, aggrmerge=0/27, aggrticks=0/1625073, aggrin_queue=1626606, aggrutil=99.64%
  sda: ios=0/26610, merge=0/26, ticks=0/2380053, in_queue=2383117, util=99.64%
  sdb: ios=0/26622, merge=0/28, ticks=0/870094, in_queue=870095, util=89.86%

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] Testing the new LVM cache feature
  2014-05-30 13:54                             ` Richard W.M. Jones
@ 2014-05-30 13:58                               ` Zdenek Kabelac
  0 siblings, 0 replies; 37+ messages in thread
From: Zdenek Kabelac @ 2014-05-30 13:58 UTC (permalink / raw)
  To: Richard W.M. Jones, Heinz Mauelshagen
  Cc: thornber, Mike Snitzer, LVM general discussion and development

Dne 30.5.2014 15:54, Richard W.M. Jones napsal(a):
> I'm attaching 3 tests that I have run so (hopefully) you can see
> what I'm observing, or point out if I'm making a mistake.
>


I'd have asked - is there any difference in the test perfomance if you use
ramdisk device for your cache metadata device.
(So  _cdata stays on  'ssd', just _cmeta is located on i.e. loop0 with backend 
file in your tmpfs ramdisk device ?)

Zdenek

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] Testing the new LVM cache feature
  2014-05-30 13:38                         ` Mike Snitzer
  2014-05-30 13:40                           ` Richard W.M. Jones
  2014-05-30 13:42                           ` Heinz Mauelshagen
@ 2014-05-30 13:46                           ` Richard W.M. Jones
  2014-05-30 13:54                             ` Heinz Mauelshagen
  2014-05-30 13:55                             ` Mike Snitzer
  2 siblings, 2 replies; 37+ messages in thread
From: Richard W.M. Jones @ 2014-05-30 13:46 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Heinz Mauelshagen, Zdenek Kabelac, thornber,
	LVM general discussion and development

I have now set both read_promote_adjustment ==
write_promote_adjustment == 0 and used drop_caches between runs.

I also read Documentation/device-mapper/cache-policies.txt at Heinz's
suggestion.

I'm afraid the performance of the fio test is still not the same as
the SSD (4.8 times slower than the SSD-only test now).

Would repeated runs of (md5sum virt.* ; echo 3 > /proc/sys/vm/drop_caches)
not eventually cause the whole file to be placed on the SSD?
It does seem very counter-intuitive if not.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] Testing the new LVM cache feature
  2014-05-30 13:46                           ` Richard W.M. Jones
@ 2014-05-30 13:54                             ` Heinz Mauelshagen
  2014-05-30 14:26                               ` Richard W.M. Jones
  2014-05-30 13:55                             ` Mike Snitzer
  1 sibling, 1 reply; 37+ messages in thread
From: Heinz Mauelshagen @ 2014-05-30 13:54 UTC (permalink / raw)
  To: Richard W.M. Jones, Mike Snitzer
  Cc: Zdenek Kabelac, thornber, LVM general discussion and development

On 05/30/2014 03:46 PM, Richard W.M. Jones wrote:
> I have now set both read_promote_adjustment ==
> write_promote_adjustment == 0 and used drop_caches between runs.

Did you adjust "sequential_threshold 0" as well?

dm-cache tries to avoid promoting large sequential files to the cache,
because spindles have good bandwidth.

This is again because of the hot spot caching nature of dm-cache.


>
> I also read Documentation/device-mapper/cache-policies.txt at Heinz's
> suggestion.
>
> I'm afraid the performance of the fio test is still not the same as
> the SSD (4.8 times slower than the SSD-only test now).
>
> Would repeated runs of (md5sum virt.* ; echo 3 > /proc/sys/vm/drop_caches)
> not eventually cause the whole file to be placed on the SSD?
> It does seem very counter-intuitive if not.

Please retry with "sequential_threshold 0"

Heinz

>
> Rich.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] Testing the new LVM cache feature
  2014-05-30 13:54                             ` Heinz Mauelshagen
@ 2014-05-30 14:26                               ` Richard W.M. Jones
  2014-05-30 14:29                                 ` Mike Snitzer
  0 siblings, 1 reply; 37+ messages in thread
From: Richard W.M. Jones @ 2014-05-30 14:26 UTC (permalink / raw)
  To: Heinz Mauelshagen
  Cc: Zdenek Kabelac, thornber, Mike Snitzer,
	LVM general discussion and development

On Fri, May 30, 2014 at 03:54:49PM +0200, Heinz Mauelshagen wrote:
> On 05/30/2014 03:46 PM, Richard W.M. Jones wrote:
> >I have now set both read_promote_adjustment ==
> >write_promote_adjustment == 0 and used drop_caches between runs.
> 
> Did you adjust "sequential_threshold 0" as well?
> 
> dm-cache tries to avoid promoting large sequential files to the cache,
> because spindles have good bandwidth.
> 
> This is again because of the hot spot caching nature of dm-cache.

Setting this had no effect.

I starting to wonder if my settings are having any effect at all.

Here are the device-mapper tables:

$ sudo dmsetup table
vg_guests-lv_cache_cdata: 0 419430400 linear 8:33 2099200
vg_guests-lv_cache_cmeta: 0 2097152 linear 8:33 2048
vg_guests-home: 0 209715200 linear 9:127 2048
vg_guests-libvirt--images: 0 1677721600 cache 253:1 253:0 253:2 128 0 default 0
vg_guests-libvirt--images_corig: 0 1677721600 linear 9:127 2055211008

And here is the command I used to set sequential_threshold to 0
(there was no error and no other output):

$ sudo dmsetup message vg_guests-libvirt--images 0 sequential_threshold 0

Is there a way to print the current settings?

Could writethrough be enabled?  (I'm supposed to be using writeback).
How do I find out?

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] Testing the new LVM cache feature
  2014-05-30 14:26                               ` Richard W.M. Jones
@ 2014-05-30 14:29                                 ` Mike Snitzer
  2014-05-30 14:36                                   ` Richard W.M. Jones
  0 siblings, 1 reply; 37+ messages in thread
From: Mike Snitzer @ 2014-05-30 14:29 UTC (permalink / raw)
  To: Richard W.M. Jones
  Cc: Heinz Mauelshagen, Zdenek Kabelac, thornber,
	LVM general discussion and development

On Fri, May 30 2014 at 10:26am -0400,
Richard W.M. Jones <rjones@redhat.com> wrote:

> On Fri, May 30, 2014 at 03:54:49PM +0200, Heinz Mauelshagen wrote:
> > On 05/30/2014 03:46 PM, Richard W.M. Jones wrote:
> > >I have now set both read_promote_adjustment ==
> > >write_promote_adjustment == 0 and used drop_caches between runs.
> > 
> > Did you adjust "sequential_threshold 0" as well?
> > 
> > dm-cache tries to avoid promoting large sequential files to the cache,
> > because spindles have good bandwidth.
> > 
> > This is again because of the hot spot caching nature of dm-cache.
> 
> Setting this had no effect.
> 
> I starting to wonder if my settings are having any effect at all.
> 
> Here are the device-mapper tables:
> 
> $ sudo dmsetup table
> vg_guests-lv_cache_cdata: 0 419430400 linear 8:33 2099200
> vg_guests-lv_cache_cmeta: 0 2097152 linear 8:33 2048
> vg_guests-home: 0 209715200 linear 9:127 2048
> vg_guests-libvirt--images: 0 1677721600 cache 253:1 253:0 253:2 128 0 default 0
> vg_guests-libvirt--images_corig: 0 1677721600 linear 9:127 2055211008
> 
> And here is the command I used to set sequential_threshold to 0
> (there was no error and no other output):
> 
> $ sudo dmsetup message vg_guests-libvirt--images 0 sequential_threshold 0

sequential_threshold is only going to help the md5sum's IO get promoted
(assuming you're having it read a large file).

> Is there a way to print the current settings?
> 
> Could writethrough be enabled?  (I'm supposed to be using writeback).
> How do I find out?

dmsetup status vg_guests-libvirt--images

But I'm really wondering if your IO is misaligned (like my earlier email
brought up).  It _could_ be promoting 2 64K blocks from the origin for
every 64K IO.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] Testing the new LVM cache feature
  2014-05-30 14:29                                 ` Mike Snitzer
@ 2014-05-30 14:36                                   ` Richard W.M. Jones
  2014-05-30 14:44                                     ` Mike Snitzer
  0 siblings, 1 reply; 37+ messages in thread
From: Richard W.M. Jones @ 2014-05-30 14:36 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Heinz Mauelshagen, Zdenek Kabelac, thornber,
	LVM general discussion and development

On Fri, May 30, 2014 at 10:29:26AM -0400, Mike Snitzer wrote:
> sequential_threshold is only going to help the md5sum's IO get promoted
> (assuming you're having it read a large file).

Note the fio test runs on the virt.* files.  I'm using md5sum in an
attempt to pull those same files into the SSD.

> > Is there a way to print the current settings?
> > 
> > Could writethrough be enabled?  (I'm supposed to be using writeback).
> > How do I find out?
> 
> dmsetup status vg_guests-libvirt--images

Here's dmsetup status on various objects:

$ sudo dmsetup table
vg_guests-lv_cache_cdata: 0 419430400 linear 8:33 2099200
vg_guests-lv_cache_cmeta: 0 2097152 linear 8:33 2048
vg_guests-home: 0 209715200 linear 9:127 2048
vg_guests-libvirt--images: 0 1677721600 cache 253:1 253:0 253:2 128 0 default 0
vg_guests-libvirt--images_corig: 0 1677721600 linear 9:127 2055211008
$ sudo dmsetup status vg_guests-libvirt--images
0 1677721600 cache 8 10162/262144 128 39839/3276800 1087840 821795 116320 2057235 0 39835 0 1 writeback 2 migration_threshold 2048 mq 10 random_threshold 4 sequential_threshold 0 discard_promote_adjustment 1 read_promote_adjustment 0 write_promote_adjustment 0
$ sudo dmsetup status vg_guests-lv_cache_cdata
0 419430400 linear 
$ sudo dmsetup status vg_guests-lv_cache_cmeta
0 2097152 linear 
$ sudo dmsetup status vg_guests-libvirt--images_corig
0 1677721600 linear 

> But I'm really wondering if your IO is misaligned (like my earlier email
> brought up).  It _could_ be promoting 2 64K blocks from the origin for
> every 64K IO.

There's nothing obviously wrong ...

** For the SSD **

$ sudo fdisk -l /dev/sdc

Disk /dev/sdc: 232.9 GiB, 250059350016 bytes, 488397168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x3e302f2a

Device    Boot Start       End    Blocks  Id System
/dev/sdc1       2048 488397167 244197560  8e Linux LVM

The PV is placed directly on /dev/sdc1.

** For the HDD array **

$ sudo fdisk -l /dev/sd{a,b}

Disk /dev/sda: 1.8 TiB, 2000398934016 bytes, 3907029168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: B9545B67-681D-4729-A8A0-C75CB2EFFCB1

Device    Start          End   Size Type
/dev/sda1  2048   3907029134   1.8T Linux filesystem


Disk /dev/sdb: 1.8 TiB, 2000398934016 bytes, 3907029168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: EFA66BD1-E813-4826-88A2-F2BB3C2E093E

Device    Start          End   Size Type
/dev/sdb1  2048   3907029134   1.8T Linux filesystem

$ cat /proc/mdstat 
Personalities : [raid1] 
md127 : active raid1 sdb1[2] sda1[1]
      1953382272 blocks super 1.2 [2/2] [UU]
      
unused devices: <none>


The PV is placed on /dev/md127.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] Testing the new LVM cache feature
  2014-05-30 14:36                                   ` Richard W.M. Jones
@ 2014-05-30 14:44                                     ` Mike Snitzer
  2014-05-30 14:51                                       ` Richard W.M. Jones
  0 siblings, 1 reply; 37+ messages in thread
From: Mike Snitzer @ 2014-05-30 14:44 UTC (permalink / raw)
  To: Richard W.M. Jones
  Cc: Heinz Mauelshagen, Zdenek Kabelac, thornber,
	LVM general discussion and development

On Fri, May 30 2014 at 10:36am -0400,
Richard W.M. Jones <rjones@redhat.com> wrote:

> On Fri, May 30, 2014 at 10:29:26AM -0400, Mike Snitzer wrote:
> > sequential_threshold is only going to help the md5sum's IO get promoted
> > (assuming you're having it read a large file).
> 
> Note the fio test runs on the virt.* files.  I'm using md5sum in an
> attempt to pull those same files into the SSD.
> 
> > > Is there a way to print the current settings?
> > > 
> > > Could writethrough be enabled?  (I'm supposed to be using writeback).
> > > How do I find out?
> > 
> > dmsetup status vg_guests-libvirt--images
> 
> Here's dmsetup status on various objects:
> 
> $ sudo dmsetup table
> vg_guests-lv_cache_cdata: 0 419430400 linear 8:33 2099200
> vg_guests-lv_cache_cmeta: 0 2097152 linear 8:33 2048
> vg_guests-home: 0 209715200 linear 9:127 2048
> vg_guests-libvirt--images: 0 1677721600 cache 253:1 253:0 253:2 128 0 default 0
> vg_guests-libvirt--images_corig: 0 1677721600 linear 9:127 2055211008
> $ sudo dmsetup status vg_guests-libvirt--images
> 0 1677721600 cache 8 10162/262144 128 39839/3276800 1087840 821795 116320 2057235 0 39835 0 1 writeback 2 migration_threshold 2048 mq 10 random_threshold 4 sequential_threshold 0 discard_promote_adjustment 1 read_promote_adjustment 0 write_promote_adjustment 0
> $ sudo dmsetup status vg_guests-lv_cache_cdata
> 0 419430400 linear 
> $ sudo dmsetup status vg_guests-lv_cache_cmeta
> 0 2097152 linear 
> $ sudo dmsetup status vg_guests-libvirt--images_corig
> 0 1677721600 linear 
> 
> > But I'm really wondering if your IO is misaligned (like my earlier email
> > brought up).  It _could_ be promoting 2 64K blocks from the origin for
> > every 64K IO.
> 
> There's nothing obviously wrong ...

I'm not talking about alignment relative to the physical device's
limits.  I'm talking about alignment of ext4's data areas relative to
the 64K block boundaries.

Also a point of conern would be: how fragmented is the ext4 space?  It
could be that it cannot get contiguous 64K regions from the namespace.
If that is the case than a lot more IO would get pulled in.

Can you try reducing the cache blocksize to 32K (lowest we support at
the moment, it'll require you to remove the cache and recreate) to see
if performance for this 64K random IO workload improves?  If so it does
start to add weight to my alignment concerns.

Mike

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] Testing the new LVM cache feature
  2014-05-30 14:44                                     ` Mike Snitzer
@ 2014-05-30 14:51                                       ` Richard W.M. Jones
  2014-05-30 14:58                                         ` Mike Snitzer
  0 siblings, 1 reply; 37+ messages in thread
From: Richard W.M. Jones @ 2014-05-30 14:51 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Heinz Mauelshagen, Zdenek Kabelac, thornber,
	LVM general discussion and development

On Fri, May 30, 2014 at 10:44:54AM -0400, Mike Snitzer wrote:
> I'm not talking about alignment relative to the physical device's
> limits.  I'm talking about alignment of ext4's data areas relative to
> the 64K block boundaries.
> 
> Also a point of conern would be: how fragmented is the ext4 space?  It
> could be that it cannot get contiguous 64K regions from the namespace.
> If that is the case than a lot more IO would get pulled in.

I would be surprised if it was fragmented, since it's a recently
created filesystem which has only been used to store a few huge disk
images ...

> Can you try reducing the cache blocksize to 32K (lowest we support at
> the moment, it'll require you to remove the cache and recreate) to see
> if performance for this 64K random IO workload improves?  If so it does
> start to add weight to my alignment concerns.

... nevertheless what I will do is recreate the origin LV, ext4
filesystem, and change the block size.

What is the command to set the cache blocksize?  It doesn't seem to be
covered in the documentation anywhere.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] Testing the new LVM cache feature
  2014-05-30 14:51                                       ` Richard W.M. Jones
@ 2014-05-30 14:58                                         ` Mike Snitzer
  2014-05-30 15:28                                           ` Richard W.M. Jones
  0 siblings, 1 reply; 37+ messages in thread
From: Mike Snitzer @ 2014-05-30 14:58 UTC (permalink / raw)
  To: Richard W.M. Jones
  Cc: Heinz Mauelshagen, Zdenek Kabelac, thornber,
	LVM general discussion and development

On Fri, May 30 2014 at 10:51am -0400,
Richard W.M. Jones <rjones@redhat.com> wrote:

> On Fri, May 30, 2014 at 10:44:54AM -0400, Mike Snitzer wrote:
> > I'm not talking about alignment relative to the physical device's
> > limits.  I'm talking about alignment of ext4's data areas relative to
> > the 64K block boundaries.
> > 
> > Also a point of conern would be: how fragmented is the ext4 space?  It
> > could be that it cannot get contiguous 64K regions from the namespace.
> > If that is the case than a lot more IO would get pulled in.
> 
> I would be surprised if it was fragmented, since it's a recently
> created filesystem which has only been used to store a few huge disk
> images ...
> 
> > Can you try reducing the cache blocksize to 32K (lowest we support at
> > the moment, it'll require you to remove the cache and recreate) to see
> > if performance for this 64K random IO workload improves?  If so it does
> > start to add weight to my alignment concerns.
> 
> ... nevertheless what I will do is recreate the origin LV, ext4
> filesystem, and change the block size.

You don't need to recreate the origin LV or FS.
If anything that'd reduce our ability to answer what may be currently
wrong with the setup.  I was just suggesting removing the cache and
recreating the cache layer.  Not saure how easy it is to do that with
the lvm2 interface.  Jon and/or Kabi?
 
> What is the command to set the cache blocksize?  It doesn't seem to be
> covered in the documentation anywhere.

I would think it is lvconvert's --chunksize...

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] Testing the new LVM cache feature
  2014-05-30 14:58                                         ` Mike Snitzer
@ 2014-05-30 15:28                                           ` Richard W.M. Jones
  2014-05-30 18:16                                             ` Mike Snitzer
  0 siblings, 1 reply; 37+ messages in thread
From: Richard W.M. Jones @ 2014-05-30 15:28 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Heinz Mauelshagen, Zdenek Kabelac, thornber,
	LVM general discussion and development

[-- Attachment #1: Type: text/plain, Size: 1729 bytes --]

I did in fact recreate the ext4 filesystem, because I didn't read your
email in time.

Here are the commands I used to create the whole lot:

----------------------------------------------------------------------
lvcreate -L 800G -n testorigin vg_guests @slow
mkfs -t ext4 /dev/vg_guests/testorigin
# at this point, I tested the speed of the uncached LV, see below
lvcreate -L 1G -n lv_cache_meta vg_guests @ssd
lvcreate -L 200G -n lv_cache vg_guests @ssd
lvconvert --type cache-pool --chunksize 32k --poolmetadata vg_guests/lv_cache_meta vg_guests/lv_cache
lvconvert --type cache --cachepool vg_guests/lv_cache vg_guests/testorigin
dmsetup message vg_guests-testorigin 0 sequential_threshold 0
dmsetup message vg_guests-testorigin 0 read_promote_adjustment 0
dmsetup message vg_guests-testorigin 0 write_promote_adjustment 0
# at this point, I tested the speed of the cached LV, see below
----------------------------------------------------------------------

To test the uncached LV, I ran the same fio test twice on the mounted
ext4 filesystem.  The results of the second run are in the first
attachment.

To test the cached LV, I ran these commands 3 times in a row:

md5sum virt.*
echo 3 > /proc/sys/vm/drop_caches

then I ran the fio test twice.  The results of the second run are
attached.

This time the LVM cache test is about 10% slower than the HDD test.
I'm not sure what to make of that at all.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW

[-- Attachment #2: virt-ham0-testorigin-hdd.txt --]
[-- Type: text/plain, Size: 9289 bytes --]

virt: (g=0): rw=randrw, bs=64K-64K/64K-64K/64K-64K, ioengine=libaio, iodepth=4
...
virt: (g=0): rw=randrw, bs=64K-64K/64K-64K/64K-64K, ioengine=libaio, iodepth=4
fio-2.1.2
Starting 4 processes

virt: (groupid=0, jobs=1): err= 0: pid=5346: Fri May 30 16:06:21 2014
  read : io=523520KB, bw=2910.4KB/s, iops=45, runt=179881msec
    slat (usec): min=21, max=307271, avg=162.28, stdev=4500.17
    clat (usec): min=4, max=1491.2K, avg=78284.57, stdev=119672.87
     lat (usec): min=362, max=1491.2K, avg=78447.46, stdev=119690.26
    clat percentiles (usec):
     |  1.00th=[  410],  5.00th=[ 5536], 10.00th=[ 7968], 20.00th=[12352],
     | 30.00th=[17280], 40.00th=[24448], 50.00th=[35072], 60.00th=[48896],
     | 70.00th=[74240], 80.00th=[116224], 90.00th=[195584], 95.00th=[288768],
     | 99.00th=[577536], 99.50th=[782336], 99.90th=[1187840], 99.95th=[1335296],
     | 99.99th=[1499136]
    bw (KB  /s): min=  228, max=14924, per=25.99%, avg=3025.00, stdev=1860.17
  write: io=525056KB, bw=2918.1KB/s, iops=45, runt=179881msec
    slat (usec): min=32, max=327239, avg=577.55, stdev=9388.69
    clat (usec): min=330, max=1294.2K, avg=8890.55, stdev=46138.42
     lat (usec): min=402, max=1294.3K, avg=9468.75, stdev=47132.58
    clat percentiles (usec):
     |  1.00th=[  358],  5.00th=[  370], 10.00th=[  430], 20.00th=[  450],
     | 30.00th=[  668], 40.00th=[  940], 50.00th=[ 1272], 60.00th=[ 1336],
     | 70.00th=[ 1560], 80.00th=[ 5600], 90.00th=[15424], 95.00th=[24192],
     | 99.00th=[144384], 99.50th=[228352], 99.90th=[741376], 99.95th=[872448],
     | 99.99th=[1286144]
    bw (KB  /s): min=  105, max=13660, per=25.98%, avg=3033.09, stdev=2056.10
    lat (usec) : 10=0.01%, 100=0.01%, 250=0.02%, 500=12.04%, 750=6.01%
    lat (usec) : 1000=3.24%
    lat (msec) : 2=16.93%, 4=2.02%, 10=9.87%, 20=13.46%, 50=15.23%
    lat (msec) : 100=8.54%, 250=9.04%, 500=2.72%, 750=0.54%, 1000=0.24%
    lat (msec) : 2000=0.09%
  cpu          : usr=0.14%, sys=0.59%, ctx=16625, majf=0, minf=24
  IO depths    : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=8180/w=8204/d=0, short=r=0/w=0/d=0
virt: (groupid=0, jobs=1): err= 0: pid=5347: Fri May 30 16:06:21 2014
  read : io=523520KB, bw=3348.3KB/s, iops=52, runt=156355msec
    slat (usec): min=25, max=242606, avg=177.64, stdev=4926.94
    clat (usec): min=5, max=1128.2K, avg=69030.33, stdev=92416.74
     lat (usec): min=357, max=1128.2K, avg=69208.61, stdev=92459.50
    clat percentiles (msec):
     |  1.00th=[    4],  5.00th=[    7], 10.00th=[    9], 20.00th=[   14],
     | 30.00th=[   19], 40.00th=[   26], 50.00th=[   37], 60.00th=[   51],
     | 70.00th=[   71], 80.00th=[  105], 90.00th=[  172], 95.00th=[  241],
     | 99.00th=[  416], 99.50th=[  545], 99.90th=[  922], 99.95th=[ 1004],
     | 99.99th=[ 1123]
    bw (KB  /s): min=   63, max= 6876, per=29.66%, avg=3452.75, stdev=1274.21
  write: io=525056KB, bw=3358.2KB/s, iops=52, runt=156355msec
    slat (usec): min=37, max=335316, avg=588.63, stdev=10049.71
    clat (usec): min=326, max=1003.8K, avg=6620.20, stdev=43120.65
     lat (usec): min=413, max=1003.9K, avg=7209.47, stdev=44354.58
    clat percentiles (usec):
     |  1.00th=[  358],  5.00th=[  366], 10.00th=[  406], 20.00th=[  442],
     | 30.00th=[  620], 40.00th=[  756], 50.00th=[ 1176], 60.00th=[ 1272],
     | 70.00th=[ 1320], 80.00th=[ 1480], 90.00th=[ 2640], 95.00th=[15808],
     | 99.00th=[140288], 99.50th=[193536], 99.90th=[864256], 99.95th=[897024],
     | 99.99th=[1003520]
    bw (KB  /s): min=   72, max= 8762, per=29.71%, avg=3468.34, stdev=1603.79
    lat (usec) : 10=0.01%, 250=0.02%, 500=13.07%, 750=7.15%, 1000=3.34%
    lat (msec) : 2=21.17%, 4=1.43%, 10=7.20%, 20=10.63%, 50=14.55%
    lat (msec) : 100=10.03%, 250=9.04%, 500=1.93%, 750=0.23%, 1000=0.15%
    lat (msec) : 2000=0.04%
  cpu          : usr=0.16%, sys=0.68%, ctx=16915, majf=0, minf=24
  IO depths    : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=8180/w=8204/d=0, short=r=0/w=0/d=0
virt: (groupid=0, jobs=1): err= 0: pid=5348: Fri May 30 16:06:21 2014
  read : io=523520KB, bw=3268.2KB/s, iops=51, runt=160195msec
    slat (usec): min=26, max=338078, avg=259.14, stdev=6769.07
    clat (usec): min=4, max=903517, avg=70958.19, stdev=87737.74
     lat (usec): min=375, max=903571, avg=71217.97, stdev=87838.17
    clat percentiles (msec):
     |  1.00th=[    5],  5.00th=[    7], 10.00th=[   10], 20.00th=[   15],
     | 30.00th=[   21], 40.00th=[   29], 50.00th=[   40], 60.00th=[   55],
     | 70.00th=[   76], 80.00th=[  110], 90.00th=[  174], 95.00th=[  243],
     | 99.00th=[  429], 99.50th=[  506], 99.90th=[  725], 99.95th=[  816],
     | 99.99th=[  906]
    bw (KB  /s): min=  173, max= 7153, per=28.77%, avg=3349.59, stdev=1188.82
  write: io=525056KB, bw=3277.7KB/s, iops=51, runt=160195msec
    slat (usec): min=42, max=303703, avg=517.35, stdev=9112.62
    clat (usec): min=340, max=1461.5K, avg=6556.03, stdev=47381.54
     lat (usec): min=411, max=1461.6K, avg=7074.03, stdev=48289.76
    clat percentiles (usec):
     |  1.00th=[  362],  5.00th=[  370], 10.00th=[  398], 20.00th=[  446],
     | 30.00th=[  636], 40.00th=[  780], 50.00th=[ 1192], 60.00th=[ 1288],
     | 70.00th=[ 1336], 80.00th=[ 1496], 90.00th=[ 3600], 95.00th=[15680],
     | 99.00th=[138240], 99.50th=[201728], 99.90th=[733184], 99.95th=[856064],
     | 99.99th=[1466368]
    bw (KB  /s): min=  173, max= 9708, per=28.86%, avg=3369.21, stdev=1468.01
    lat (usec) : 10=0.01%, 100=0.01%, 250=0.03%, 500=12.98%, 750=6.61%
    lat (usec) : 1000=3.67%
    lat (msec) : 2=20.85%, 4=1.40%, 10=6.91%, 20=10.44%, 50=14.89%
    lat (msec) : 100=10.28%, 250=9.41%, 500=2.17%, 750=0.25%, 1000=0.07%
    lat (msec) : 2000=0.02%
  cpu          : usr=0.15%, sys=0.68%, ctx=16814, majf=0, minf=24
  IO depths    : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=8180/w=8204/d=0, short=r=0/w=0/d=0
virt: (groupid=0, jobs=1): err= 0: pid=5349: Fri May 30 16:06:21 2014
  read : io=523520KB, bw=2992.8KB/s, iops=46, runt=174928msec
    slat (usec): min=23, max=311675, avg=171.27, stdev=4974.27
    clat (usec): min=75, max=1114.8K, avg=76585.62, stdev=105112.42
     lat (usec): min=364, max=1114.9K, avg=76757.53, stdev=105149.80
    clat percentiles (usec):
     |  1.00th=[ 1704],  5.00th=[ 6240], 10.00th=[ 8768], 20.00th=[13376],
     | 30.00th=[19328], 40.00th=[27520], 50.00th=[39168], 60.00th=[54528],
     | 70.00th=[77312], 80.00th=[114176], 90.00th=[187392], 95.00th=[272384],
     | 99.00th=[518144], 99.50th=[675840], 99.90th=[913408], 99.95th=[995328],
     | 99.99th=[1122304]
    bw (KB  /s): min=  145, max= 9984, per=26.41%, avg=3074.43, stdev=1438.71
  write: io=525056KB, bw=3001.6KB/s, iops=46, runt=174928msec
    slat (usec): min=38, max=265589, avg=587.60, stdev=9637.20
    clat (usec): min=342, max=906290, avg=8148.18, stdev=44226.22
     lat (usec): min=406, max=906371, avg=8736.42, stdev=45334.93
    clat percentiles (usec):
     |  1.00th=[  358],  5.00th=[  366], 10.00th=[  422], 20.00th=[  446],
     | 30.00th=[  644], 40.00th=[  852], 50.00th=[ 1240], 60.00th=[ 1304],
     | 70.00th=[ 1416], 80.00th=[ 1960], 90.00th=[11200], 95.00th=[21888],
     | 99.00th=[146432], 99.50th=[216064], 99.90th=[815104], 99.95th=[856064],
     | 99.99th=[905216]
    bw (KB  /s): min=   74, max= 9472, per=26.38%, avg=3079.66, stdev=1727.46
    lat (usec) : 100=0.01%, 250=0.01%, 500=12.76%, 750=6.02%, 1000=3.50%
    lat (msec) : 2=18.48%, 4=1.65%, 10=8.47%, 20=11.68%, 50=14.81%
    lat (msec) : 100=10.03%, 250=9.42%, 500=2.51%, 750=0.41%, 1000=0.23%
    lat (msec) : 2000=0.02%
  cpu          : usr=0.12%, sys=0.63%, ctx=16535, majf=0, minf=23
  IO depths    : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=8180/w=8204/d=0, short=r=0/w=0/d=0

Run status group 0 (all jobs):
   READ: io=2045.0MB, aggrb=11641KB/s, minb=2910KB/s, maxb=3348KB/s, mint=156355msec, maxt=179881msec
  WRITE: io=2051.0MB, aggrb=11675KB/s, minb=2918KB/s, maxb=3358KB/s, mint=156355msec, maxt=179881msec

Disk stats (read/write):
    dm-0: ios=32704/33128, merge=0/0, ticks=2371685/821383, in_queue=3193165, util=100.00%, aggrios=32720/33200, aggrmerge=0/0, aggrticks=0/0, aggrin_queue=0, aggrutil=0.00%
    md127: ios=32720/33200, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=16359/33122, aggrmerge=0/78, aggrticks=1185825/498678, aggrin_queue=1684398, aggrutil=99.75%
  sda: ios=8461/33122, merge=1/78, ticks=393799/195017, in_queue=588764, util=95.85%
  sdb: ios=24258/33122, merge=0/78, ticks=1977851/802340, in_queue=2780032, util=99.75%

[-- Attachment #3: virt-ham0-testorigin-lvmcache.txt --]
[-- Type: text/plain, Size: 9749 bytes --]

virt: (g=0): rw=randrw, bs=64K-64K/64K-64K/64K-64K, ioengine=libaio, iodepth=4
...
virt: (g=0): rw=randrw, bs=64K-64K/64K-64K/64K-64K, ioengine=libaio, iodepth=4
fio-2.1.2
Starting 4 processes

virt: (groupid=0, jobs=1): err= 0: pid=5531: Fri May 30 16:25:54 2014
  read : io=523520KB, bw=2629.4KB/s, iops=41, runt=199130msec
    slat (usec): min=24, max=197634, avg=143.59, stdev=2991.82
    clat (usec): min=174, max=26698K, avg=67465.64, stdev=365815.35
     lat (usec): min=388, max=26698K, avg=67609.87, stdev=365817.53
    clat percentiles (usec):
     |  1.00th=[  398],  5.00th=[ 5600], 10.00th=[ 7968], 20.00th=[11584],
     | 30.00th=[15552], 40.00th=[21376], 50.00th=[29568], 60.00th=[41216],
     | 70.00th=[60160], 80.00th=[90624], 90.00th=[152576], 95.00th=[226304],
     | 99.00th=[440320], 99.50th=[544768], 99.90th=[880640], 99.95th=[1515520],
     | 99.99th=[16711680]
    bw (KB  /s): min=  320, max=13128, per=25.53%, avg=2685.01, stdev=1713.14
  write: io=525056KB, bw=2636.8KB/s, iops=41, runt=199130msec
    slat (usec): min=33, max=240957, avg=320.30, stdev=5617.33
    clat (usec): min=353, max=65433K, avg=29338.06, stdev=1020379.84
     lat (usec): min=437, max=65433K, avg=29659.01, stdev=1020395.97
    clat percentiles (usec):
     |  1.00th=[  370],  5.00th=[  378], 10.00th=[  398], 20.00th=[  454],
     | 30.00th=[  620], 40.00th=[  900], 50.00th=[ 1240], 60.00th=[ 1320],
     | 70.00th=[ 1608], 80.00th=[ 7584], 90.00th=[19584], 95.00th=[30848],
     | 99.00th=[103936], 99.50th=[175104], 99.90th=[815104], 99.95th=[3063808],
     | 99.99th=[16711680]
    bw (KB  /s): min=  114, max=13254, per=25.43%, avg=2681.93, stdev=1885.06
    lat (usec) : 250=0.01%, 500=13.29%, 750=5.71%, 1000=2.83%
    lat (msec) : 2=15.81%, 4=1.70%, 10=9.86%, 20=14.95%, 50=17.03%
    lat (msec) : 100=9.34%, 250=7.23%, 500=1.78%, 750=0.32%, 1000=0.05%
    lat (msec) : 2000=0.02%, >=2000=0.05%
  cpu          : usr=0.10%, sys=0.77%, ctx=16731, majf=0, minf=24
  IO depths    : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=8180/w=8204/d=0, short=r=0/w=0/d=0
virt: (groupid=0, jobs=1): err= 0: pid=5532: Fri May 30 16:25:54 2014
  read : io=523520KB, bw=3213.1KB/s, iops=50, runt=162892msec
    slat (usec): min=47, max=185115, avg=128.16, stdev=2376.71
    clat (usec): min=4, max=25770K, avg=59285.18, stdev=394418.17
     lat (usec): min=401, max=25770K, avg=59413.99, stdev=394421.00
    clat percentiles (msec):
     |  1.00th=[    4],  5.00th=[    7], 10.00th=[    9], 20.00th=[   13],
     | 30.00th=[   17], 40.00th=[   22], 50.00th=[   30], 60.00th=[   41],
     | 70.00th=[   56], 80.00th=[   78], 90.00th=[  126], 95.00th=[  182],
     | 99.00th=[  347], 99.50th=[  420], 99.90th=[  578], 99.95th=[  635],
     | 99.99th=[16712]
    bw (KB  /s): min=  242, max= 6629, per=31.56%, avg=3318.74, stdev=1376.74
  write: io=525056KB, bw=3223.4KB/s, iops=50, runt=162892msec
    slat (usec): min=61, max=269043, avg=299.24, stdev=5817.49
    clat (usec): min=326, max=69958K, avg=19848.94, stdev=886022.82
     lat (usec): min=444, max=69959K, avg=20148.82, stdev=886040.88
    clat percentiles (usec):
     |  1.00th=[  366],  5.00th=[  374], 10.00th=[  390], 20.00th=[  446],
     | 30.00th=[  524], 40.00th=[  700], 50.00th=[ 1112], 60.00th=[ 1256],
     | 70.00th=[ 1304], 80.00th=[ 1448], 90.00th=[ 2024], 95.00th=[17024],
     | 99.00th=[64768], 99.50th=[115200], 99.90th=[346112], 99.95th=[3260416],
     | 99.99th=[16711680]
    bw (KB  /s): min=  121, max= 8796, per=31.72%, avg=3345.35, stdev=1578.99
    lat (usec) : 10=0.01%, 100=0.01%, 500=14.81%, 750=6.41%, 1000=3.04%
    lat (msec) : 2=21.03%, 4=1.18%, 10=7.43%, 20=12.62%, 50=16.28%
    lat (msec) : 100=9.69%, 250=6.25%, 500=1.08%, 750=0.12%, 1000=0.01%
    lat (msec) : >=2000=0.04%
  cpu          : usr=0.13%, sys=0.93%, ctx=17115, majf=0, minf=24
  IO depths    : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=8180/w=8204/d=0, short=r=0/w=0/d=0
virt: (groupid=0, jobs=1): err= 0: pid=5533: Fri May 30 16:25:54 2014
  read : io=523520KB, bw=2713.7KB/s, iops=42, runt=192920msec
    slat (usec): min=35, max=30712, avg=92.16, stdev=338.87
    clat (usec): min=315, max=63712K, avg=73273.62, stdev=983181.94
     lat (usec): min=401, max=63713K, avg=73366.42, stdev=983181.85
    clat percentiles (msec):
     |  1.00th=[    3],  5.00th=[    7], 10.00th=[    9], 20.00th=[   13],
     | 30.00th=[   18], 40.00th=[   24], 50.00th=[   32], 60.00th=[   42],
     | 70.00th=[   58], 80.00th=[   83], 90.00th=[  135], 95.00th=[  202],
     | 99.00th=[  383], 99.50th=[  482], 99.90th=[ 1401], 99.95th=[ 1483],
     | 99.99th=[16712]
    bw (KB  /s): min=  121, max= 9216, per=31.94%, avg=3358.56, stdev=1911.18
  write: io=525056KB, bw=2721.7KB/s, iops=42, runt=192920msec
    slat (usec): min=56, max=258524, avg=400.46, stdev=7012.38
    clat (usec): min=348, max=70388K, avg=20488.66, stdev=849098.66
     lat (usec): min=435, max=70389K, avg=20889.77, stdev=849131.29
    clat percentiles (usec):
     |  1.00th=[  366],  5.00th=[  374], 10.00th=[  394], 20.00th=[  450],
     | 30.00th=[  620], 40.00th=[  876], 50.00th=[ 1224], 60.00th=[ 1288],
     | 70.00th=[ 1464], 80.00th=[ 2960], 90.00th=[15808], 95.00th=[24960],
     | 99.00th=[87552], 99.50th=[218112], 99.90th=[888832], 99.95th=[1400832],
     | 99.99th=[16711680]
    bw (KB  /s): min=    2, max=10112, per=31.98%, avg=3372.91, stdev=2142.52
    lat (usec) : 500=12.87%, 750=5.95%, 1000=2.91%
    lat (msec) : 2=18.21%, 4=1.31%, 10=8.83%, 20=13.82%, 50=17.58%
    lat (msec) : 100=10.40%, 250=6.34%, 500=1.43%, 750=0.20%, 1000=0.04%
    lat (msec) : 2000=0.07%, >=2000=0.04%
  cpu          : usr=0.12%, sys=0.77%, ctx=16939, majf=0, minf=24
  IO depths    : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=8180/w=8204/d=0, short=r=0/w=0/d=0
virt: (groupid=0, jobs=1): err= 0: pid=5534: Fri May 30 16:25:54 2014
  read : io=523520KB, bw=2857.1KB/s, iops=44, runt=183181msec
    slat (usec): min=44, max=195215, avg=112.54, stdev=2157.49
    clat (usec): min=312, max=43560K, avg=74513.46, stdev=550534.41
     lat (usec): min=375, max=43560K, avg=74626.63, stdev=550535.28
    clat percentiles (msec):
     |  1.00th=[    4],  5.00th=[    7], 10.00th=[    9], 20.00th=[   13],
     | 30.00th=[   18], 40.00th=[   25], 50.00th=[   35], 60.00th=[   48],
     | 70.00th=[   67], 80.00th=[   96], 90.00th=[  159], 95.00th=[  233],
     | 99.00th=[  441], 99.50th=[  545], 99.90th=[  979], 99.95th=[ 1205],
     | 99.99th=[16712]
    bw (KB  /s): min=  348, max= 7734, per=27.54%, avg=2895.85, stdev=1213.49
  write: io=525056KB, bw=2866.4KB/s, iops=44, runt=183181msec
    slat (usec): min=54, max=275450, avg=324.36, stdev=5740.29
    clat (usec): min=348, max=47249K, avg=14560.10, stdev=569838.75
     lat (usec): min=441, max=47249K, avg=14885.10, stdev=569878.87
    clat percentiles (usec):
     |  1.00th=[  366],  5.00th=[  374], 10.00th=[  394], 20.00th=[  450],
     | 30.00th=[  564], 40.00th=[  740], 50.00th=[ 1192], 60.00th=[ 1256],
     | 70.00th=[ 1336], 80.00th=[ 1576], 90.00th=[10432], 95.00th=[21888],
     | 99.00th=[99840], 99.50th=[166912], 99.90th=[528384], 99.95th=[692224],
     | 99.99th=[16711680]
    bw (KB  /s): min=  213, max= 7303, per=27.52%, avg=2901.91, stdev=1403.33
    lat (usec) : 500=14.29%, 750=6.23%, 1000=2.95%
    lat (msec) : 2=19.19%, 4=1.25%, 10=8.00%, 20=12.42%, 50=15.55%
    lat (msec) : 100=10.17%, 250=7.70%, 500=1.84%, 750=0.31%, 1000=0.04%
    lat (msec) : 2000=0.03%, >=2000=0.04%
  cpu          : usr=0.13%, sys=0.81%, ctx=16805, majf=0, minf=23
  IO depths    : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=8180/w=8204/d=0, short=r=0/w=0/d=0

Run status group 0 (all jobs):
   READ: io=2045.0MB, aggrb=10516KB/s, minb=2629KB/s, maxb=3213KB/s, mint=162892msec, maxt=199130msec
  WRITE: io=2051.0MB, aggrb=10546KB/s, minb=2636KB/s, maxb=3223KB/s, mint=162892msec, maxt=199130msec

Disk stats (read/write):
    dm-0: ios=65408/67846, merge=0/0, ticks=4160319/11781438, in_queue=15944596, util=100.00%, aggrios=21839/22771, aggrmerge=0/0, aggrticks=1288260/3918688, aggrin_queue=5206991, aggrutil=100.00%
    dm-1: ios=84/117, merge=0/0, ticks=120/182, in_queue=302, util=0.11%, aggrios=96/383, aggrmerge=0/62, aggrticks=124/431, aggrin_queue=555, aggrutil=0.22%
  sdc: ios=96/383, merge=0/62, ticks=124/431, in_queue=555, util=0.22%
  dm-2: ios=12/328, merge=0/0, ticks=3/268, in_queue=271, util=0.12%
    dm-5: ios=65421/67870, merge=0/0, ticks=3864659/11755615, in_queue=15620402, util=100.00%, aggrios=65421/67888, aggrmerge=0/0, aggrticks=0/0, aggrin_queue=0, aggrutil=0.00%
    md127: ios=65421/67888, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=16422/33219, aggrmerge=16286/34662, aggrticks=970890/852897, aggrin_queue=1823577, aggrutil=99.59%
  sda: ios=9379/33225, merge=9264/34659, ticks=406471/203345, in_queue=609659, util=96.99%
  sdb: ios=23466/33213, merge=23309/34666, ticks=1535310/1502449, in_queue=3037496, util=99.59%

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] Testing the new LVM cache feature
  2014-05-30 15:28                                           ` Richard W.M. Jones
@ 2014-05-30 18:16                                             ` Mike Snitzer
  2014-05-30 20:53                                               ` Mike Snitzer
  0 siblings, 1 reply; 37+ messages in thread
From: Mike Snitzer @ 2014-05-30 18:16 UTC (permalink / raw)
  To: Richard W.M. Jones
  Cc: Heinz Mauelshagen, LVM general discussion and development,
	thornber, Zdenek Kabelac

On Fri, May 30 2014 at 11:28am -0400,
Richard W.M. Jones <rjones@redhat.com> wrote:

> I did in fact recreate the ext4 filesystem, because I didn't read your
> email in time.
> 
> Here are the commands I used to create the whole lot:
> 
> ----------------------------------------------------------------------
> lvcreate -L 800G -n testorigin vg_guests @slow
> mkfs -t ext4 /dev/vg_guests/testorigin
> # at this point, I tested the speed of the uncached LV, see below
> lvcreate -L 1G -n lv_cache_meta vg_guests @ssd
> lvcreate -L 200G -n lv_cache vg_guests @ssd
> lvconvert --type cache-pool --chunksize 32k --poolmetadata vg_guests/lv_cache_meta vg_guests/lv_cache
> lvconvert --type cache --cachepool vg_guests/lv_cache vg_guests/testorigin
> dmsetup message vg_guests-testorigin 0 sequential_threshold 0
> dmsetup message vg_guests-testorigin 0 read_promote_adjustment 0
> dmsetup message vg_guests-testorigin 0 write_promote_adjustment 0
> # at this point, I tested the speed of the cached LV, see below
> ----------------------------------------------------------------------
> 
> To test the uncached LV, I ran the same fio test twice on the mounted
> ext4 filesystem.  The results of the second run are in the first
> attachment.
> 
> To test the cached LV, I ran these commands 3 times in a row:
> 
> md5sum virt.*
> echo 3 > /proc/sys/vm/drop_caches
> 
> then I ran the fio test twice.  The results of the second run are
> attached.
> 
> This time the LVM cache test is about 10% slower than the HDD test.
> I'm not sure what to make of that at all.

It could be that the 32k cache blocksize increased the metadata overhead
enough to reduce the performance to that degree.

And even though you recreated the filesystem it still could be the case
that the IO issued from ext4 is slightly misaligned.  I'd welcome you
going to back to a blocksize of 64K (you don't _need_ to go to 64K but it
seems you're giving up quite a bit of performance now).  And then
collecting blktraces of the origin volume for the fio run -- to see if
64K * 2 IOs are being issued for each 64K fio IO.  I would think it
would be fairly clear from the blktrace but maybe not.

It could be that a targeted debug line in dm-cache would serve as a
better canary for whether misalignment is a concern.  I'll see if I can
come up with a patch that helps us assess misalignment.

Joe Thornber will be back from holiday on Monday so we may get some
additional insight from him soon enough.

Sorry for your troubles but this is good feedback.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] Testing the new LVM cache feature
  2014-05-30 18:16                                             ` Mike Snitzer
@ 2014-05-30 20:53                                               ` Mike Snitzer
  0 siblings, 0 replies; 37+ messages in thread
From: Mike Snitzer @ 2014-05-30 20:53 UTC (permalink / raw)
  To: Richard W.M. Jones
  Cc: Heinz Mauelshagen, LVM general discussion and development,
	thornber, Zdenek Kabelac

On Fri, May 30 2014 at  2:16pm -0400,
Mike Snitzer <snitzer@redhat.com> wrote:

> On Fri, May 30 2014 at 11:28am -0400,
> Richard W.M. Jones <rjones@redhat.com> wrote:
> > 
> > This time the LVM cache test is about 10% slower than the HDD test.
> > I'm not sure what to make of that at all.
> 
> It could be that the 32k cache blocksize increased the metadata overhead
> enough to reduce the performance to that degree.
> 
> And even though you recreated the filesystem it still could be the case
> that the IO issued from ext4 is slightly misaligned.  I'd welcome you
> going to back to a blocksize of 64K (you don't _need_ to go to 64K but it
> seems you're giving up quite a bit of performance now).  And then
> collecting blktraces of the origin volume for the fio run -- to see if
> 64K * 2 IOs are being issued for each 64K fio IO.  I would think it
> would be fairly clear from the blktrace but maybe not.

Thinking about this a little more: if the IO that ext4 is issuing to the
cache is aligned on a blocksize boundary (e.g. 64K) we really shouldn't
see _any_ IO from the origin device when you are running fio.  The
reason is we avoid promoting (aka copying) from the origin if an entire
cache block is being overwritten.

Looking at the fio output from the cache run you did using the 32K
blocksize it is very clear that the MD array (on sda and sdb) is
involved quite a lot.

And your even older fio run output when using the original 64K blocksize
shows a bunch of IO to md127...

So it seems fairly clear that dm-cache isn't utilizing the cache block
overwrite optimization it has to avoid promotions from the origin.  This
would _seem_ to validate my concern about alignment.. or something else
needs to explain why we're not able to avoid promotions.

If you have time to reconfigure with 64K blocksize and rerun the fio
test, please look at the amount of write IO performed by md127 (and sda
and sdb).. and also look at the number of promotions, via 'dmsetup
status' for the cache device, before and after the fio run.

We can try to reproduce using a pristine ext4 filesystem ontop of
MD with the fio job you provided... and I'm now wondering if we're
getting bitten by DM stacked on MD (due to bvec merge being limited to 1
page, see linux.git commit 8cbeb67a for some additional context).  So it
may be worth trying _without_ MD raid1 just as a test.  Use either sda
or sdb directly as the origin volume.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] Testing the new LVM cache feature
  2014-05-30 13:46                           ` Richard W.M. Jones
  2014-05-30 13:54                             ` Heinz Mauelshagen
@ 2014-05-30 13:55                             ` Mike Snitzer
  2014-05-30 14:29                               ` Richard W.M. Jones
  1 sibling, 1 reply; 37+ messages in thread
From: Mike Snitzer @ 2014-05-30 13:55 UTC (permalink / raw)
  To: Richard W.M. Jones
  Cc: Heinz Mauelshagen, Zdenek Kabelac, thornber,
	LVM general discussion and development

On Fri, May 30 2014 at  9:46am -0400,
Richard W.M. Jones <rjones@redhat.com> wrote:

> I have now set both read_promote_adjustment ==
> write_promote_adjustment == 0 and used drop_caches between runs.
> 
> I also read Documentation/device-mapper/cache-policies.txt at Heinz's
> suggestion.
> 
> I'm afraid the performance of the fio test is still not the same as
> the SSD (4.8 times slower than the SSD-only test now).

Obviously not what we want.  But you're not doing any repeated IO to
those blocks.. it is purely random right?

So really, the cache is waiting for blocks to get promoted from the
origin if the IOs from fio don't completely cover the cache block size
you've specified.

Can you go back over those settings?  From your dmsetup table output you
shared earlier in the thread you're using a blocksize of 128 sectors (or
64K).  And your fio random write workload is using 64K.

So unless you have misaligned IO you _should_ be able to avoid reading
from the origin.  But XFS is in play here.. I'm wondering if it is
issuing IO differently than we'd otherwise see if you were testing
against the block devices directly...

> Would repeated runs of (md5sum virt.* ; echo 3 > /proc/sys/vm/drop_caches)
> not eventually cause the whole file to be placed on the SSD?
> It does seem very counter-intuitive if not.

If you set read_promote_adjustment to 0 it should pull the associated
blocks into the cache.  What makes you think it isn't?  How are you
judging the performance of the md5sum IO?  Do you see IO being issued to
the origin via blktrace or something?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] Testing the new LVM cache feature
  2014-05-30 13:55                             ` Mike Snitzer
@ 2014-05-30 14:29                               ` Richard W.M. Jones
  2014-05-30 14:36                                 ` Mike Snitzer
  0 siblings, 1 reply; 37+ messages in thread
From: Richard W.M. Jones @ 2014-05-30 14:29 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Heinz Mauelshagen, Zdenek Kabelac, thornber,
	LVM general discussion and development

On Fri, May 30, 2014 at 09:55:29AM -0400, Mike Snitzer wrote:
> So unless you have misaligned IO you _should_ be able to avoid reading
> from the origin.  But XFS is in play here.. I'm wondering if it is

The filesystem is ext4.

> If you set read_promote_adjustment to 0 it should pull the associated
> blocks into the cache.  What makes you think it isn't?

The fio test is about twice as fast as when I ran the fio test
directly on the hard disk array.  This test runs about 5 times slower
than when I ran it directly on the SSD.

I'm not measuring the speed of the md5sum operation.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] Testing the new LVM cache feature
  2014-05-30 14:29                               ` Richard W.M. Jones
@ 2014-05-30 14:36                                 ` Mike Snitzer
  0 siblings, 0 replies; 37+ messages in thread
From: Mike Snitzer @ 2014-05-30 14:36 UTC (permalink / raw)
  To: Richard W.M. Jones
  Cc: Heinz Mauelshagen, LVM general discussion and development,
	thornber, Zdenek Kabelac

On Fri, May 30 2014 at 10:29am -0400,
Richard W.M. Jones <rjones@redhat.com> wrote:

> On Fri, May 30, 2014 at 09:55:29AM -0400, Mike Snitzer wrote:
> > So unless you have misaligned IO you _should_ be able to avoid reading
> > from the origin.  But XFS is in play here.. I'm wondering if it is
> 
> The filesystem is ext4.

OK, so I have even more concern about misalignment then.  At least XFS
goes to great lengths to build large IOs if Direct IO is used (via
bio_add_page, the optimal io size is used to build the IO up).

I'm not aware of ext4 taking similar steps but it could be it does now
(I vaguely remember ext4 borrowing heavily from XFS at one point,
could've been for direct IO).

We need better tools for assessing whether the IO is misaligned.  But
for now we'd have to start with looking at blktrace data to the
underlying origin device.  If we keep seeing >64K sequential IOs to the
origin that would speak to dm-cache pulling in 2 64K blocks from the
origin.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] Testing the new LVM cache feature
  2014-05-29 21:58                     ` Mike Snitzer
  2014-05-30  9:04                       ` Richard W.M. Jones
@ 2014-05-30 11:53                       ` Mike Snitzer
  1 sibling, 0 replies; 37+ messages in thread
From: Mike Snitzer @ 2014-05-30 11:53 UTC (permalink / raw)
  To: Richard W.M. Jones
  Cc: Zdenek Kabelac, thornber, LVM general discussion and development

On Thu, May 29 2014 at  5:58pm -0400,
Mike Snitzer <snitzer@redhat.com> wrote:
 
> BTW, this is all with a eye toward realizing the optimization that
> dm-cache provides for origin blocks that were discarded (like I said
> before dm-cache doesn't promote from the origin if the corresponding
> block was marked for discard).  So you don't _need_ to do any of
> this.. purely about trying to optimize a bit more.

And if you do make use of discards, you should have this stable fix
applied to your kernel:

https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=for-linus&id=f1daa838e861ae1a0fb7cd9721a21258430fcc8c

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] Testing the new LVM cache feature
  2014-05-29 20:47               ` Richard W.M. Jones
  2014-05-29 21:06                 ` Mike Snitzer
@ 2014-05-30 11:38                 ` Alasdair G Kergon
  2014-05-30 11:45                   ` Alasdair G Kergon
  1 sibling, 1 reply; 37+ messages in thread
From: Alasdair G Kergon @ 2014-05-30 11:38 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: LVM general discussion and development

On Thu, May 29, 2014 at 09:47:20PM +0100, Richard W.M. Jones wrote:
> Is there a reason why fast and slow devices need to be in the same VG?
> I've talked to two other people who found this very confusing.  No one
> knew that you could manually place LVs into different PVs, and it's
> something of a pain to have to remember to place LVs every time you
> create or resize one.  It seems it would be a lot simpler if you could
> have the slow PVs in one VG and the fast PVs in another VG.
 
We recommend you use tags.  Much more flexible/dynamic solution than forcing
the use of separate VGs.

  pvchange --addtag ssd
  pvs -o+tags

  lvcreate ... $vg @ssd

to restrict the allocation the command performs to the PVs with the 'ssd' tag.

Alasdair

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] Testing the new LVM cache feature
  2014-05-30 11:38                 ` Alasdair G Kergon
@ 2014-05-30 11:45                   ` Alasdair G Kergon
  2014-05-30 12:45                     ` Werner Gold
  0 siblings, 1 reply; 37+ messages in thread
From: Alasdair G Kergon @ 2014-05-30 11:45 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: LVM general discussion and development

And for lvextend, you should add any tags you are using in this way to lvm.conf:

    # When searching for free space to extend an LV, the "cling"
    # allocation policy will choose space on the same PVs as the last
    # segment of the existing LV.  If there is insufficient space and a
    # list of tags is defined here, it will check whether any of them are
    # attached to the PVs concerned and then seek to match those PV tags
    # between existing extents and new extents.
    # Use the special tag "@*" as a wildcard to match any PV tag.
 
    # Example: LVs are mirrored between two sites within a single VG.
    # PVs are tagged with either @site1 or @site2 to indicate where
    # they are situated.

    # cling_tag_list = [ "@site1", "@site2" ]
    # cling_tag_list = [ "@*" ]

(The "cling" allocation policy is enabled by default.)

Alasdair

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [linux-lvm] Testing the new LVM cache feature
  2014-05-30 11:45                   ` Alasdair G Kergon
@ 2014-05-30 12:45                     ` Werner Gold
  0 siblings, 0 replies; 37+ messages in thread
From: Werner Gold @ 2014-05-30 12:45 UTC (permalink / raw)
  To: linux-lvm

[-- Attachment #1: Type: text/plain, Size: 999 bytes --]

Many thanks to Alasdair and Heinz for the hint with the tagging feature.
More convenient than dealing with UUIDs.

I also stumbled across the "same VG" issue when I tried to set up the
test environment. Thanks to Richard for that hint. :-)

I ran bonnie++ on my X230 (RHEL7) here where I used an external USB3
SSD. Attached you find the results.

With cache, there is a significant difference in random create. That's
what I would expect from an SSD cache.

Werner
-- 
Werner Gold                                    wgold@redhat.com
Partner Enablement / EMEA                phone: 49.9331.803 855
Steinbachweg 23                          fax:     +49.9331.4407
97252 Frickenhausen/Main, Germany        cell: +49.172.764 4633
Key fingerprint =  FF91B07C 6F3D340E A71791AC 5E3A6CB4 D44CBC37

Reg. Adresse: Red Hat GmbH, Werner-von-Siemens-Ring 14, D-85630 Grasbrunn
Handelsregister: Amtsgericht Muenchen HRB 153243
Geschaeftsfuehrer: Mark Hegarty, Charlie Peters, Michael Cunningham,
Charles Cachera

[-- Attachment #2: bonnie-dm-cache.html --]
[-- Type: text/html, Size: 12622 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2014-05-30 20:53 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-22 10:18 [linux-lvm] Testing the new LVM cache feature Richard W.M. Jones
2014-05-22 14:43 ` Zdenek Kabelac
2014-05-22 15:22   ` Richard W.M. Jones
2014-05-22 15:49     ` Richard W.M. Jones
2014-05-22 18:04       ` Mike Snitzer
2014-05-22 18:13         ` Richard W.M. Jones
2014-05-29 13:52           ` Richard W.M. Jones
2014-05-29 20:34             ` Mike Snitzer
2014-05-29 20:47               ` Richard W.M. Jones
2014-05-29 21:06                 ` Mike Snitzer
2014-05-29 21:19                   ` Richard W.M. Jones
2014-05-29 21:58                     ` Mike Snitzer
2014-05-30  9:04                       ` Richard W.M. Jones
2014-05-30 10:30                         ` Richard W.M. Jones
2014-05-30 13:38                         ` Mike Snitzer
2014-05-30 13:40                           ` Richard W.M. Jones
2014-05-30 13:42                           ` Heinz Mauelshagen
2014-05-30 13:54                             ` Richard W.M. Jones
2014-05-30 13:58                               ` Zdenek Kabelac
2014-05-30 13:46                           ` Richard W.M. Jones
2014-05-30 13:54                             ` Heinz Mauelshagen
2014-05-30 14:26                               ` Richard W.M. Jones
2014-05-30 14:29                                 ` Mike Snitzer
2014-05-30 14:36                                   ` Richard W.M. Jones
2014-05-30 14:44                                     ` Mike Snitzer
2014-05-30 14:51                                       ` Richard W.M. Jones
2014-05-30 14:58                                         ` Mike Snitzer
2014-05-30 15:28                                           ` Richard W.M. Jones
2014-05-30 18:16                                             ` Mike Snitzer
2014-05-30 20:53                                               ` Mike Snitzer
2014-05-30 13:55                             ` Mike Snitzer
2014-05-30 14:29                               ` Richard W.M. Jones
2014-05-30 14:36                                 ` Mike Snitzer
2014-05-30 11:53                       ` Mike Snitzer
2014-05-30 11:38                 ` Alasdair G Kergon
2014-05-30 11:45                   ` Alasdair G Kergon
2014-05-30 12:45                     ` Werner Gold

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).