All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kaveh Razavi <kaveh@cs.vu.nl>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: Kevin Wolf <kwolf@redhat.com>,
	qemu-devel@nongnu.org, Stefan Hajnoczi <stefanha@redhat.com>
Subject: Re: [Qemu-devel] [PATCH] Introduce cache images for the QCOW2 format
Date: Thu, 15 Aug 2013 14:25:08 +0200	[thread overview]
Message-ID: <520CC8A4.4090405@cs.vu.nl> (raw)
In-Reply-To: <20130815083230.GE22521@stefanha-thinkpad.redhat.com>

[-- Attachment #1: Type: text/plain, Size: 3207 bytes --]

On 08/15/2013 10:32 AM, Stefan Hajnoczi wrote:
> I don't buy the argument about the page cache being evicted at any time:
>
> At the scale where caching is important, provisioning a measily 100 MB
> of RAM per guest should not be a challenge.
>
> cgroups can be used to isolate page cache between VMs if you want to
> guaranteed caches.
>
> But it could be more interesting not to isolate so that the page cache
> acts host-wide to reduce the overall I/O instead of narrowly focussing
> on caching 100 MB for a specific image even if it is rarely accessed.
>
> The real downside I see is that the page cache is volatile, so you could
> see heavy I/O if multiple hosts reboot at the same time.
>

At the VM hosts, the memory is mostly allocated to VMs. Without 
persisted caches, starting another VM from any of the possible backing 
VM images may or may not result in network traffic (depending on the 
page cache). Regardless of the page cache, the existing cache images 
persisted on the disk at hosts, can eliminate this at least on VM boot.

At the storage site however, I think it makes sense to dedicate memory 
for popular backing images (via tmpfs rather than page cache). The data 
blocks of the popular images used for booting will be accessed by all 
VMs starting from these "template" images.

> Streaming offers a rate limiting parameter so you can tune it to the
> network conditions.
>
> Copying the full image doesn't just reduce load on the NFS server, it
> also means guests can continue to run if the NFS server becomes
> unreachable.  That's an important property for reliability.

I am not really sure whether copying the entire image reduces the load 
on the NFS server, specially at scale. If copying the entire image at 
scale is desired/necessary, peer-to-peer approaches are documented to 
perform better. They are mostly implemented at the host file-system 
layer though (search for e.g. VMTorrent). I agree on the reliability 
consideration if you deal with an unreliable (remote) file-system.

> 1)
> It is persistent.  The backing file chain looks like this:
>
>    /nfs/template.qcow2 <- /local/cache.qcow2 <- /local/vm001.qcow2
>
> The cache is a regular qcow2 image file that is persistent.  The discard
> command is used to evict data from the file.  Copy-on-read accesses are
> used to populate the cache when the guest submits a read request.
>
> 2)
> You can set cache size or other parameters as a qemu-nbd option (this
> doesn't exist but could be implemented):
>
>    $ qemu-img create -f qcow2 -o backing_file=/nfs/template.qcow2 cache.qcow2
>    $ qemu-nbd --options cache-size=100MB,evict=lru cache.qcow2
>
> So it's the qemu-nbd process that performs the cache housekeeping work.
> The cache.qcow2 file itself just persists data and isn't aware of cache
> settings.

OK, this is better, since the user can also define a policy _and_ the 
cache can be shared by different VMs at the creation time without races. 
With an eviction policy 'none' in combination with cache_size, only the 
first accessed data blocks get cached, essentially providing the same 
functionality as this patch.

Kaveh


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3738 bytes --]

      reply	other threads:[~2013-08-15 12:25 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-13 17:03 [Qemu-devel] [PATCH] Introduce cache images for the QCOW2 format Kaveh Razavi
2013-08-13 21:37 ` Eric Blake
2013-08-14 11:13   ` Kaveh Razavi
2013-08-13 22:53 ` Alex Bligh
2013-08-14 11:28   ` Kaveh Razavi
2013-08-14 11:52     ` Fam Zheng
2013-08-14 12:03       ` Alex Bligh
2013-08-14 15:58         ` Richard W.M. Jones
2013-08-15  0:53         ` Fam Zheng
2013-08-15  5:51           ` Alex Bligh
2013-08-14 11:57     ` Alex Bligh
2013-08-14 13:37       ` Kaveh Razavi
2013-08-13 23:16 ` Alex Bligh
2013-08-14 11:42   ` Kaveh Razavi
2013-08-14 12:02     ` Alex Bligh
2013-08-14 13:43       ` Kaveh Razavi
2013-08-14 13:50         ` Alex Bligh
2013-08-14 14:26           ` Kaveh Razavi
2013-08-14 15:02             ` Alex Bligh
2013-08-14 15:32             ` Kevin Wolf
2013-08-15  7:50               ` Wenchao Xia
2013-08-15  8:11               ` Stefan Hajnoczi
2013-08-14  9:29 ` Stefan Hajnoczi
2013-08-14 14:20   ` Kaveh Razavi
2013-08-15  8:32     ` Stefan Hajnoczi
2013-08-15 12:25       ` Kaveh Razavi [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=520CC8A4.4090405@cs.vu.nl \
    --to=kaveh@cs.vu.nl \
    --cc=kwolf@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@gmail.com \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.