From: Josh Durgin <josh.durgin@inktank.com>
To: Sage Weil <sage@newdream.net>
Cc: ceph-devel@vger.kernel.org
Subject: Re: rbd caching
Date: Mon, 07 May 2012 00:09:30 -0700 [thread overview]
Message-ID: <4FA7752A.9060502@inktank.com> (raw)
In-Reply-To: <Pine.LNX.4.64.1205051640540.28748@cobra.newdream.net>
On 05/05/2012 04:51 PM, Sage Weil wrote:
> The second set of patches restructure the way the cache itself is managed.
> One goal is to be able to control cache behavior on a per-image basis
> (this one write-thru, this was write-back, etc.). Another goal is to
> share a single pool of memory for several images. The librbd.h calls to
> do this currently look something like this:
>
> int rbd_cache_create(rados_t cluster, rbd_cache_t *cache, uint64_t max_size,
> uint64_t max_dirty, uint64_t target_dirty);
> int rbd_cache_destroy(rbd_cache_t cache);
> int rbd_open_cached(rados_ioctx_t io, const char *name, rbd_image_t image,
> const char *snap_name, rbd_cache_t cache);
>
> Setting the cache tunables should probably be broken out into several
> different calls, so that it is possible to add new ones in the future.
> Beyond that, though, the limitation here is that you can set the
> target_dirty or max_dirty for a _cache_, and then have multiple images
> share that cache, but you can't then set a max_dirty limit for an
> individual image.
I'm not sure that these should be separate API calls. We can
already control per-image caches via different rados_conf
settings when the image is opened. We're already opening
a new rados_cluster_t (which can have its own settings)
for each image in qemu.
> Does it matter? Ideally, I supposed, you could set:
>
> - per-cache size
> - per-cache max_dirty
> - per-cache target_dirty
> - per-image max_dirty (0 for write-thru)
> - per-image target_dirty
>
> and then share a single cache for many images, and the flushing logic
> could observe both sets of dirty limits. That just means calls to set
> max_dirty and target_dirty for individual images, too.
I don't think all this flexibility is necessary. If we did want
to add it, it could be done with configuration settings instead
of pushing the complexity to the librbd caller. For example, there
could be a 'rbd_cache_name' option, and the images using the same
cache name could share the same underlying cache. Alternatively,
there could be an option to make all rbd images use the same cache
with their own limits.
What use cases do you see for single-vm cache sharing? I can't think
of any common ones off the top of my head. It seems like ksm
will provide much more benefit (especially with layering).
> Is it worth the complexity? In the end, this will be wired up to the qemu
> writeback options, so the range of actual usage will fall within
> whatever is doable with those options and generic 'rbd cache size = ..'
> tunables, most likely...
There's no notion of shared caches or cache size, since it's
designed for using the host page cache. I think leaving any
extra cache configuration in rbd-specific options makes sense
for now.
Josh
next prev parent reply other threads:[~2012-05-07 7:09 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-05 23:51 rbd caching Sage Weil
2012-05-07 7:09 ` Josh Durgin [this message]
2012-05-07 16:44 ` Sage Weil
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4FA7752A.9060502@inktank.com \
--to=josh.durgin@inktank.com \
--cc=ceph-devel@vger.kernel.org \
--cc=sage@newdream.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.