Re: [Qemu-devel] [PATCH v2 00/18] I/O prefetch cache

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: "Denis V. Lunev" <den@openvz.org>
To: Pavel Butsykin <pbutsykin@virtuozzo.com>,
	qemu-devel@nongnu.org, qemu-block@nongnu.org
Cc: kwolf@redhat.com, mreitz@redhat.com, eblake@redhat.com,
	armbru@redhat.com
Subject: Re: [Qemu-devel] [PATCH v2 00/18] I/O prefetch cache
Date: Wed, 25 Jan 2017 19:50:45 +0300	[thread overview]
Message-ID: <64f835ae-0bad-ab2a-f5b4-74ece4499b23@openvz.org> (raw)
In-Reply-To: <20161230143142.18214-1-pbutsykin@virtuozzo.com>

On 12/30/2016 05:31 PM, Pavel Butsykin wrote:
> The prefetch cache aims to improve the performance of sequential read data.
> Of most interest here are the requests of a small size of data for sequential
> read, such requests can be optimized by extending them and moving into 
> the prefetch cache. However, there are 2 issues:
>  - In aggregate only a small portion of requests is sequential, so delays caused
>    by the need to read more volumes of data will lead to an overall decrease
>    in performance.
>  - The presence of redundant data in the cache memory with a large number of
>    random requests.
> This pcache implementation solves the above and other problems prefetching data.
> The pcache algorithm can be summarised by the following main steps.
>
> 1. Monitor I/O requests to identify typical sequences.
> This implementation of prefetch cache works at the storage system level and has 
> information only about the physical block addresses of I/O requests. Statistics 
> are collected only from read requests to a maximum size of 64kb(by default),
> each request that matches the criteria falls into a pool of requests. In order
> to store request statistics used by the rb-tree, it's simple but for
> this issue a quite efficient data structure.
>
> 2. Identifying sequential I/O streams.
> For each read request to be carried out attempting to lift the chain sequence 
> from the tree statistics, where this request will be element of a sequential
> chain of requests. The key to search for consecutive requests is the area of bytes 
> preceding the current request. The size of this area should not be too small to 
> avoid false readahead. The sequential stream data requests can be identified
> even when a large number of random requests. For example, if there is access to 
> the blocks 100, 1157, 27520, 4, 101, 312, 1337, 102, in the context of request
> processing 102 will be identified the chain of sequential requests 100, 101. 102
> and then should a decision be made to do readahead. Also a situation may arise
> when multiple applications A, B, C simultaneously perform sequential read of
> data. For each separate application that will be sequential read data 
> A(100, 101, 102), B(300, 301, 302), C(700, 701, 702), but for block devices it 
> may look like a random data reading: 100,300,700,101,301,701,102,302,702. 
> In this case, the sequential streams will also be recognised because location
> requests in the rb-tree will allow to separate the sequential I/O streams.
>
> 3. Do the readahead into the cache for recognized sequential data streams.
> After the issue of the detection of pcache case was resolved, need using larger 
> requests to bring data into the cache. In this implementation the pcache used
> readahead instead of the extension request, therefore the request goes as is. 
> There is not any reason to put data in the cache that will never be picked up, 
> but this will always happen in the case of extension requests. In order to store
> areas of cached blocks is also used the rb-tree, it's simple but for this issue
> a quite efficient data structure.
>
> 4. Control size of the prefetch cache pool and the requests statistic pool
> For control the border of the pool statistic of requests, the data of requests 
> are placed and replaced according to the FIFO principle, everything is simple.
> For control the boundaries of the memory cache used LRU list, it allows to limit
> the max amount memory that we can allocate for pcache. But the LRU is there
> mainly to prevent displacement of the cache blocks that was read partially. 
> The main way the memory is pushed out immediately after use, as soon as a chunk
> of memory from the cache has been completely read, since the probability of
> repetition of the request is very low. Cases when one and the same portion of
> the cache memory has been read several times are not optimized and do not apply
> to the cases that can optimize the pcache. Thus, using a cache memory of small
> volume, by the optimization of the operations read-ahead and clear memory, we
> can read entire volumes of data, providing a 100% cache hit. Also does not
> decrease the effectiveness of random read requests.
>
> PCache is implemented as a qemu block filter driver, has some configurable
> parameters, such as: total cache size, statistics size, readahead size,
> maximum size of block that can be processed.
>
> For performance evaluation has been used several test cases with different
> sequential and random read data on SSD disk. Here are the results of tests and
> qemu parameters:
>
> qemu parameters: 
> -machine pc,accel=kvm,usb=off,vmport=off -m 1024 -smp 8
> -drive file=/img/harddisk.hdd,if=none,cache=none,id=drive-scsi0-0-0-0,aio=native
> -drive driver=pcache,image=drive-scsi0-0-0-0,if=virtio
>
> *****************************************************************
> * Testcase                        * Results in iops             *
> *                                 *******************************
> *                                 *  clean qemu  *    pcache    *
> *****************************************************************
> * Create/open 16 file(s) of total * 21645 req/s  * 74793 req/s  *
> * size 2048.00 MB named           * 20385 req/s  * 66481 req/s  *
> * /tmp/tmp.tmp, start 4 thread(s) * 20616 req/s  * 69007 req/s  *
> * and do uncached sequential read *              *              *
> * by 4KB blocks                   *              *              *
> *****************************************************************
> * Create/open 16 file(s) of total * 84033 req/s  * 87828 req/s  *
> * size 2048.00 MB named           * 84602 req/s  * 89678 req/s  *
> * /tmp/tmp.tmp, start 4 thread(s) * 83163 req/s  * 96202 req/s  *
> * and do uncached sequential read *              *              *
> * by 4KB blocks with constant     *              *              *
> * queue len 32                    *              *              *
> *****************************************************************
> * Create/open 16 file(s) of total * 14104 req/s  * 14164 req/s  *
> * size 2048.00 MB named           * 14130 req/s  * 14232 req/s  *
> * /tmp/tmp.tmp, start 4 thread(s) * 14183 req/s  * 14080 req/s  *
> * and do uncached random read by  *              *              *
> * 4KB blocks                      *              *              *
> *****************************************************************
> * Create/open 16 file(s) of total * 23480 req/s  * 23483 req/s  *
> * size 2048.00 MB named           * 23070 req/s  * 22432 req/s  *
> * /tmp/tmp.tmp, start 4 thread(s) * 24090 req/s  * 23499 req/s  *
> * and do uncached random read by  *              *              *
> * 4KB blocks with constant queue  *              *              *
> * len 32                          *              *              *
> *****************************************************************
>
> Changes from v1:
> - avoid bdrv_aio_*() interfaces
> - add pcache to the QAPI schema
> - fix remarks and add more comments for rbcache
> - add more scenarios for "/rbcache/insert" test
> - fix rbcache/shrink/* tests
> - pcache: up-to-date cache for removed nodes
> - rewrite "block/pcache: pick up parts of the cache" patch
> - changed the statuses of nodes for a more flexible determination of
>   the node state
>
> Pavel Butsykin (18):
>   block/pcache: empty pcache driver filter
>   util/rbtree: add rbtree from linux kernel
>   util/rbcache: range-based cache core
>   tests/test-rbcache: add test cases
>   block/pcache: statistics collection read requests
>   block/pcache: skip large aio read
>   block/pcache: updating statistics for overlapping requests
>   block/pcache: add AIO readahead
>   block/pcache: skip readahead for unallocated clusters
>   block/pcache: cache invalidation on write requests
>   block/pcache: add reading data from the cache
>   block/pcache: inflight readahead request waiting for read
>   block/pcache: write through
>   block/pcache: up-to-date cache for removed nodes
>   block/pcache: pick up parts of the cache
>   block/pcache: drop used pcache nodes
>   qapi: allow blockdev-add for pcache
>   block/pcache: add tracepoints
>
>  MAINTAINERS                     |  13 +
>  block/Makefile.objs             |   1 +
>  block/pcache.c                  | 764 ++++++++++++++++++++++++++++++++++++++++
>  block/trace-events              |  10 +
>  include/qemu/rbcache.h          | 128 +++++++
>  include/qemu/rbtree.h           | 107 ++++++
>  include/qemu/rbtree_augmented.h | 235 ++++++++++++
>  qapi/block-core.json            |  30 +-
>  tests/Makefile.include          |   3 +
>  tests/test-rbcache.c            | 431 +++++++++++++++++++++++
>  util/Makefile.objs              |   2 +
>  util/rbcache.c                  | 253 +++++++++++++
>  util/rbtree.c                   | 570 ++++++++++++++++++++++++++++++
>  13 files changed, 2545 insertions(+), 2 deletions(-)
>  create mode 100644 block/pcache.c
>  create mode 100644 include/qemu/rbcache.h
>  create mode 100644 include/qemu/rbtree.h
>  create mode 100644 include/qemu/rbtree_augmented.h
>  create mode 100644 tests/test-rbcache.c
>  create mode 100644 util/rbcache.c
>  create mode 100644 util/rbtree.c
>
ping?

     prev parent reply	other threads:[~2017-01-25 16:51 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-30 14:31 [Qemu-devel] [PATCH v2 00/18] I/O prefetch cache Pavel Butsykin
2016-12-30 14:31 ` [Qemu-devel] [PATCH v2 01/18] block/pcache: empty pcache driver filter Pavel Butsykin
2016-12-30 14:31 ` [Qemu-devel] [PATCH v2 02/18] util/rbtree: add rbtree from linux kernel Pavel Butsykin
2016-12-30 14:31 ` [Qemu-devel] [PATCH v2 03/18] util/rbcache: range-based cache core Pavel Butsykin
2016-12-30 14:31 ` [Qemu-devel] [PATCH v2 04/18] tests/test-rbcache: add test cases Pavel Butsykin
2016-12-30 14:31 ` [Qemu-devel] [PATCH v2 05/18] block/pcache: statistics collection read requests Pavel Butsykin
2016-12-30 14:31 ` [Qemu-devel] [PATCH v2 06/18] block/pcache: skip large aio read Pavel Butsykin
2016-12-30 14:31 ` [Qemu-devel] [PATCH v2 07/18] block/pcache: updating statistics for overlapping requests Pavel Butsykin
2016-12-30 14:31 ` [Qemu-devel] [PATCH v2 08/18] block/pcache: add AIO readahead Pavel Butsykin
2016-12-30 14:31 ` [Qemu-devel] [PATCH v2 09/18] block/pcache: skip readahead for unallocated clusters Pavel Butsykin
2016-12-30 14:31 ` [Qemu-devel] [PATCH v2 10/18] block/pcache: cache invalidation on write requests Pavel Butsykin
2016-12-30 14:31 ` [Qemu-devel] [PATCH v2 11/18] block/pcache: add reading data from the cache Pavel Butsykin
2016-12-30 14:31 ` [Qemu-devel] [PATCH v2 12/18] block/pcache: inflight readahead request waiting for read Pavel Butsykin
2016-12-30 14:31 ` [Qemu-devel] [PATCH v2 13/18] block/pcache: write through Pavel Butsykin
2016-12-30 14:31 ` [Qemu-devel] [PATCH v2 14/18] block/pcache: up-to-date cache for removed nodes Pavel Butsykin
2016-12-30 14:31 ` [Qemu-devel] [PATCH v2 15/18] block/pcache: pick up parts of the cache Pavel Butsykin
2016-12-30 14:31 ` [Qemu-devel] [PATCH v2 16/18] block/pcache: drop used pcache nodes Pavel Butsykin
2016-12-30 14:31 ` [Qemu-devel] [PATCH v2 17/18] qapi: allow blockdev-add for pcache Pavel Butsykin
2016-12-30 14:31 ` [Qemu-devel] [PATCH v2 18/18] block/pcache: add tracepoints Pavel Butsykin
2017-01-25 16:50 ` Denis V. Lunev [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=64f835ae-0bad-ab2a-f5b4-74ece4499b23@openvz.org \
    --to=den@openvz.org \
    --cc=armbru@redhat.com \
    --cc=eblake@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=pbutsykin@virtuozzo.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).