From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alex Elsayed Subject: RE: Cache tiering read-proxy mode Date: Tue, 22 Jul 2014 15:50:33 -0700 Message-ID: References: <06E7D85B3BA36C4DB207FEDE871C534891BC27@SHSMSX101.ccr.corp.intel.com> <06E7D85B3BA36C4DB207FEDE871C534891CD56@SHSMSX101.ccr.corp.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7Bit Return-path: Received: from plane.gmane.org ([80.91.229.3]:38283 "EHLO plane.gmane.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932289AbaGVWzH (ORCPT ); Tue, 22 Jul 2014 18:55:07 -0400 Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1X9ixc-00041w-U3 for ceph-devel@vger.kernel.org; Wed, 23 Jul 2014 00:55:04 +0200 Received: from 50.245.141.77 ([50.245.141.77]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 23 Jul 2014 00:55:04 +0200 Received: from eternaleye by 50.245.141.77 with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 23 Jul 2014 00:55:04 +0200 Sender: ceph-devel-owner@vger.kernel.org List-ID: To: ceph-devel@vger.kernel.org Sage Weil wrote: > [Adding ceph-devel] > > On Mon, 21 Jul 2014, Wang, Zhiqiang wrote: >> Sage, >> >> I agree with you that promotion on the 2nd read could improve cache >> tiering's performance for some kinds of workloads. The general idea here >> is to implement some kinds of policies in the cache tier to measure the >> warmness of the data. If the cache tier is aware of the data warmness, >> it could even initiate data movement between the cache tier and the base >> tier. This means data could be prefetched into the cache tier before >> reading or writing. But I think this is something we could do in the >> future. > > Yeah. I suspect it will be challenging to put this sort of prefetching > intelligence directly into the OSDs, though. It could possibly be done by > an external agent, maybe, or could be driven by explicit hints from > clients ("I will probably access this data soon"). > >> The 'promotion on 2nd read' policy is straightforward. Sure it will >> benefit some kinds of workload, but not all. If it is implemented as a >> cache tier option, the user needs to decide to turn it on or not. But >> I'm afraid most of the users don't have the idea of this. This increases >> the difficulty of using cache tiering. > > I suspect the 2nd read behavior will be something we'll want to do by > default... but yeah, there will be a new pool option (or options) that > controls the behavior. > >> One question for the implementation of 'promotion on 2nd read': what do >> we do for the 1st read? Does the cache tier read the object from base >> tier but not doing replication, or just redirecting it? > > For the first read, we just redirect the client. The on the second read, > we call promote_object(). See maybe_handle_cache() in ReplicatedPG.cc. > We can pretty easily tell the difference by checking the in-memory HitSet > for a match. > > Perhaps the option in the pool would be something like > min_read_recency_for_promote? If we measure "recency" as "(avg) seconds > since last access" (loosely), 0 would mean it would promote on first read, > and anything <= the HitSet interval would mean promote if the object is in > the current HitSet. > than that would mean we'd need to keep additional > previous HitSets in RAM. > > ...which leads us to a separate question of how to describe access > frequency vs recency. We keep N HitSets, each covering a time period of T > seconds. Normally we only keep the most recent HitSet in memory, unless > the agent is active (flushing data). So what I described above is > checking how recently the last access was (within how many multiples of T > seconds). Additionally, though, we could describe the frequency of > access: was the object accesssed at least once in every N interval of T > seconds? Or some fraction of them? That is probably best described as > "temperature?" I'm not to fond of the term "recency," tho I can't > think of anything better right now. > > Anyway, for the read promote behavior, recency is probably sufficient, but > for the tiering agent flush/evict behavior temperature might be a good > thing to consider... > > sage It might be worth looking at the MQ (Multi-Queue) caching policy[1], which was explicitly designed for second-level caches (which applies here) - the client is very likely to be doing caching, whether they use CephFS (FSCache), RBD (client caching), or RADOS (application-level); that causes some interesting changes in terms of the statistical behavior of the second- level cache. [1] https://www.usenix.org/legacy/event/usenix01/full_papers/zhou/zhou_html/node9.html