From: Loic Dachary <loic@dachary.org>
To: "Ma, Jianpeng" <jianpeng.ma@intel.com>,
Andreas Joachim Peters <Andreas.Joachim.Peters@cern.ch>
Cc: Ceph Development <ceph-devel@vger.kernel.org>
Subject: Re: ISA erasure code plugin and cache
Date: Mon, 04 Aug 2014 14:56:36 +0200 [thread overview]
Message-ID: <53DF8304.5040100@dachary.org> (raw)
In-Reply-To: <6AA21C22F0A5DA478922644AD2EC308C8A60B8@SHSMSX101.ccr.corp.intel.com>
[-- Attachment #1: Type: text/plain, Size: 4967 bytes --]
On 04/08/2014 14:50, Ma, Jianpeng wrote:
>> -----Original Message-----
>> From: ceph-devel-owner@vger.kernel.org
>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Loic Dachary
>> Sent: Monday, August 4, 2014 8:37 PM
>> To: Andreas Joachim Peters
>> Cc: Ma, Jianpeng; Ceph Development
>> Subject: Re: ISA erasure code plugin and cache
>>
>>
>>
>> On 04/08/2014 14:15, Andreas Joachim Peters wrote:> Hi Loic,
>>>
>>> the background relevant to your comments have (unfortunately) never been
>> answered on the mailing list.
>>>
>>> The cache is written in a way, that it is useful for a fixed (k,m) combination
>> and thread-safe.
>>>
>>> So, if there is one instance of the plugin per pool, it is the right
>> implementation. If there is (as you write) one instance for each PG in any pool,
>> it is sort of 'stupid' because the same encoding table is stored for each PG
>> seperatly.
>>
>> I would not called it stupid ;-) Just not as efficient as if the cache was by all PGs.
>> Without cache the decode table has to be calculated for each object in the
>> placement group and there are a lot of objects. The table may be duplicated
>> hundreds of time so it has an impact on the memory footprint, but it should not
>> have a visible impact on the decode performances. An optimisation of your
>> implementation to save memory would be nice, but it is not critical.
>>
> AFAIK, for a recovery pg, all the objects of pg have the same lost chunks. So only the first object miss the cache.
> But the later won't. It only need a entry to cache.
> Or am I missing something?
It also is my understanding and that's what makes the cache so useful. Now, in the long run the cache stays and grows. Since it is a few mega bytes per PG, it will eventually has a non negligible impact on a long running OSD. But again, it's nothing critical to performances.
Cheers
>
> Jianpeng Ma
>
>> How large are the decode tables ?
>>
>>> So if the final statement is to create one plugin instance per PG, I will change
>> it accordingly and shared encoding & decoding tables for a fixed (k,m), if not it
>> can stay.
>>>
>>> Just need to know that ... this boils down to the fact, that encoding &
>> decoding should not be considered 'stateless'.
>>>
>>> Cheers Andreas.
>>>
>>>
>>> ________________________________________
>>> From: Loic Dachary [loic@dachary.org]
>>> Sent: 04 August 2014 13:56
>>> To: Andreas Joachim Peters
>>> Cc: Ma, Jianpeng; Ceph Development
>>> Subject: ISA erasure code plugin and cache
>>>
>>> Hi,
>>>
>>> Here is how I understand the current code:
>>>
>>> When an OSD is missing, recovery is required and the primary OSD will collect
>> the available chunks to do so. It will then call the decode method via
>> ECUtil::decode which is a small wrapper around the corresponding
>> ErasureCodeInterface::decode method.
>>>
>>> https://github.com/ceph/ceph/blob/master/src/osd/ECBackend.cc#L361
>>>
>>> The ISA plugin will then use the isa_decode method to perform the work
>>>
>>>
>> https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode
>> Isa.cc#L212
>>>
>>> and will be repeatedly called until all objects in the PGs that were relying on
>> the missing OSD are recovered. To avoid computing the decoding table for each
>> object, it is stored in a LRU cache
>>>
>>>
>> https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode
>> Isa.cc#L480
>>>
>>> and copied in the stack if already there:
>>>
>>>
>> https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode
>> Isa.cc#L433
>>>
>>> Each PG has a separate instance of ErasureCodeIsa, obtained when it is
>> created:
>>>
>>> https://github.com/ceph/ceph/blob/master/src/osd/PGBackend.cc#L292
>>>
>>> It means that data members of each ErasureCodeIsa are copied as many
>> times as there are PGs. If an OSD handles participates in 200 PG that belong to
>> an erasure coded pool configured to use ErasureCodeIsa, the data members
>> will be duplicated 200 times.
>>>
>>> It is good practice to make it so that the encode/decode methods of
>> ErasureCodeIsa are thread safe. In the jerasure plugin these methods have no
>> side effect on the object. In the isa plugin the LRU cache storing the decode
>> tables is modified by the decode method and guarded by a mutex:
>>>
>>> get
>> https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode
>> Isa.cc#L281
>>> put
>> https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode
>> Isa.cc#L310
>>>
>>> Please correct me if I'm mistaken ;-) I've not reviewed the code yet and try to
>> find problems, I just wanted to make sure I get the intention before doing so.
>>>
>>> Cheers
>>> --
>>> Loïc Dachary, Artisan Logiciel Libre
>>>
>>
>> --
>> Loïc Dachary, Artisan Logiciel Libre
>
--
Loïc Dachary, Artisan Logiciel Libre
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 263 bytes --]
next prev parent reply other threads:[~2014-08-04 12:56 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-08-04 11:56 ISA erasure code plugin and cache Loic Dachary
2014-08-04 12:15 ` Andreas Joachim Peters
2014-08-04 12:37 ` Loic Dachary
2014-08-04 12:43 ` Andreas Joachim Peters
2014-08-04 12:50 ` Ma, Jianpeng
2014-08-04 12:56 ` Loic Dachary [this message]
2014-08-04 13:19 ` Andreas Joachim Peters
2014-08-04 14:22 ` Sage Weil
2014-08-04 15:22 ` Andreas Joachim Peters
2014-08-04 17:04 ` Loic Dachary
2014-08-04 20:07 ` Andreas Joachim Peters
2014-08-04 21:26 ` Loic Dachary
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=53DF8304.5040100@dachary.org \
--to=loic@dachary.org \
--cc=Andreas.Joachim.Peters@cern.ch \
--cc=ceph-devel@vger.kernel.org \
--cc=jianpeng.ma@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.