From: Alex Elsayed <eternaleye@gmail.com>
To: ceph-devel@vger.kernel.org
Subject: RE: Cache tiering read-proxy mode
Date: Tue, 22 Jul 2014 15:50:33 -0700 [thread overview]
Message-ID: <lqmprr$glk$1@ger.gmane.org> (raw)
In-Reply-To: alpine.DEB.2.00.1407201831340.28285@cobra.newdream.net
Sage Weil wrote:
> [Adding ceph-devel]
>
> On Mon, 21 Jul 2014, Wang, Zhiqiang wrote:
>> Sage,
>>
>> I agree with you that promotion on the 2nd read could improve cache
>> tiering's performance for some kinds of workloads. The general idea here
>> is to implement some kinds of policies in the cache tier to measure the
>> warmness of the data. If the cache tier is aware of the data warmness,
>> it could even initiate data movement between the cache tier and the base
>> tier. This means data could be prefetched into the cache tier before
>> reading or writing. But I think this is something we could do in the
>> future.
>
> Yeah. I suspect it will be challenging to put this sort of prefetching
> intelligence directly into the OSDs, though. It could possibly be done by
> an external agent, maybe, or could be driven by explicit hints from
> clients ("I will probably access this data soon").
>
>> The 'promotion on 2nd read' policy is straightforward. Sure it will
>> benefit some kinds of workload, but not all. If it is implemented as a
>> cache tier option, the user needs to decide to turn it on or not. But
>> I'm afraid most of the users don't have the idea of this. This increases
>> the difficulty of using cache tiering.
>
> I suspect the 2nd read behavior will be something we'll want to do by
> default... but yeah, there will be a new pool option (or options) that
> controls the behavior.
>
>> One question for the implementation of 'promotion on 2nd read': what do
>> we do for the 1st read? Does the cache tier read the object from base
>> tier but not doing replication, or just redirecting it?
>
> For the first read, we just redirect the client. The on the second read,
> we call promote_object(). See maybe_handle_cache() in ReplicatedPG.cc.
> We can pretty easily tell the difference by checking the in-memory HitSet
> for a match.
>
> Perhaps the option in the pool would be something like
> min_read_recency_for_promote? If we measure "recency" as "(avg) seconds
> since last access" (loosely), 0 would mean it would promote on first read,
> and anything <= the HitSet interval would mean promote if the object is in
> the current HitSet. > than that would mean we'd need to keep additional
> previous HitSets in RAM.
>
> ...which leads us to a separate question of how to describe access
> frequency vs recency. We keep N HitSets, each covering a time period of T
> seconds. Normally we only keep the most recent HitSet in memory, unless
> the agent is active (flushing data). So what I described above is
> checking how recently the last access was (within how many multiples of T
> seconds). Additionally, though, we could describe the frequency of
> access: was the object accesssed at least once in every N interval of T
> seconds? Or some fraction of them? That is probably best described as
> "temperature?" I'm not to fond of the term "recency," tho I can't
> think of anything better right now.
>
> Anyway, for the read promote behavior, recency is probably sufficient, but
> for the tiering agent flush/evict behavior temperature might be a good
> thing to consider...
>
> sage
It might be worth looking at the MQ (Multi-Queue) caching policy[1], which
was explicitly designed for second-level caches (which applies here) - the
client is very likely to be doing caching, whether they use CephFS
(FSCache), RBD (client caching), or RADOS (application-level); that causes
some interesting changes in terms of the statistical behavior of the second-
level cache.
[1]
https://www.usenix.org/legacy/event/usenix01/full_papers/zhou/zhou_html/node9.html
prev parent reply other threads:[~2014-07-22 22:55 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <06E7D85B3BA36C4DB207FEDE871C534891BC27@SHSMSX101.ccr.corp.intel.com>
[not found] ` <alpine.DEB.2.00.1407180707310.28285@cobra.newdream.net>
[not found] ` <06E7D85B3BA36C4DB207FEDE871C534891CD56@SHSMSX101.ccr.corp.intel.com>
2014-07-21 1:44 ` Cache tiering read-proxy mode Sage Weil
2014-07-21 2:40 ` Wang, Zhiqiang
2014-07-21 3:55 ` Sage Weil
2014-07-21 7:49 ` Wang, Zhiqiang
2014-07-21 14:20 ` Sage Weil
2014-07-22 1:37 ` Wang, Zhiqiang
2014-07-28 7:35 ` Wang, Zhiqiang
2014-07-28 19:59 ` Sage Weil
2014-07-29 3:10 ` Wang, Zhiqiang
2014-07-29 15:43 ` Sage Weil
2014-07-31 11:33 ` Wang, Zhiqiang
2014-07-22 22:50 ` Alex Elsayed [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='lqmprr$glk$1@ger.gmane.org' \
--to=eternaleye@gmail.com \
--cc=ceph-devel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.