All of lore.kernel.org
 help / color / mirror / Atom feed
From: Loic Dachary <loic@dachary.org>
To: ZHOU Yuan <dunk007@gmail.com>
Cc: Dietmar Maurer <dietmar@proxmox.com>,
	"ceph-users@lists.ceph.com" <ceph-users@lists.ceph.com>,
	ceph-devel@vger.kernel.org
Subject: Re: [ceph-users] crush choose firstn vs. indep
Date: Tue, 14 Jan 2014 09:33:50 +0100	[thread overview]
Message-ID: <52D4F66E.3060301@dachary.org> (raw)
In-Reply-To: <CADTt812pZoObC+LWRGPSDFudpFSW4KR9B+z5=cMFsJLQoXCGWw@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3020 bytes --]



On 14/01/2014 07:49, ZHOU Yuan wrote:> Hi Loic, thanks for the education!
> 
> I’m also trying to understand the new ‘indep’ mode. Is this new mode designed for Ceph-EC only? It seems that all of the data in 3-copy system are equivalent and this new algorithm should also work?
> 

In the best case scenario, using indep instead of firstn on replicated pools won't make a difference. However, if the crush mapper does not find the required amount of items, firstn will give ( for instance ) [1,2,4] instead of [1,2,3,4] and the replicated pool code will gracefully handle this. If using indep the result will be [1,2,CRUSH_ITEM_NONE,4] and will probably assert somewhere.

Here is an example from the test suite run when you make check :
https://github.com/ceph/ceph/blob/master/src/test/cli/crushtool/bad-mappings.t
where 2147483647 == CRUSH_ITEM_NONE

I don't know of an other reason preventing the use of indep for replicated pools.

Cheers

> 
> Sincerely, Yuan
> 
> 
> On Mon, Jan 13, 2014 at 7:37 AM, Loic Dachary <loic@dachary.org <mailto:loic@dachary.org>> wrote:
> 
> 
> 
>     On 12/01/2014 15:55, Dietmar Maurer wrote:
>     > From the docs:
>     >
>     >
>     >
>     > step [choose|chooseleaf] [firstn|indep] <N> <bucket-type>
>     >
>     >
>     >
>     > What exactly is the difference between ‘firstn’ and ‘indep’?
>     >
>     Hi,
> 
>     For Ceph releases up to Emperor[1], firstn is used and I'm not aware of a use case requiring indep. As part of the effort to implement erasure coded pools, firstn[2] and indep[3] were separated in two functions. The firstn method is best suited for replicated pools. The indep method tries to minimize the position changes in case an OSD becomes unavailable. For instance, if indep finds
> 
>       [1,2,3,4]
> 
>     and after a while 3 become unavailable, it is very likely to replace it with
> 
>       [1,2,5,4]
> 
>     It matters to erasure coded pools because
> 
>       [4,5,2,1]
> 
>     (i.e. the same OSDs but in different positions), implies more I/O. Another difference is that in the case of a mapping failure (i.e. unable to find the required number of OSDs), firstn will return a short list ( for instance [1,2,3] when 4 are required ) and indep will return a list with a placeholder at the missing position ( for instance [1,2,CRUSH_ITEM_NONE,4] ).
> 
>     Cheers
> 
>     [1] implementation in releases up to Emperor https://github.com/ceph/ceph/blob/v0.72/src/crush/mapper.c#L295
>     [2] firstn https://github.com/ceph/ceph/blob/v0.74/src/crush/mapper.c#L295
>     [3] indep https://github.com/ceph/ceph/blob/v0.74/src/crush/mapper.c#L459
> 
>     --
>     Loïc Dachary, Artisan Logiciel Libre
> 
> 
>     _______________________________________________
>     ceph-users mailing list
>     ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 

-- 
Loïc Dachary, Artisan Logiciel Libre


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

      reply	other threads:[~2014-01-14  8:33 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <24E144B8C0207547AD09C467A8259F75592E55C0@lisa.maurer-it.com>
     [not found] ` <52D32729.40400@dachary.org>
     [not found]   ` <52D32729.40400-cLsNCMjd+0JAfugRpC6u6w@public.gmane.org>
2014-01-14  6:49     ` crush choose firstn vs. indep ZHOU Yuan
2014-01-14  8:33       ` Loic Dachary [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52D4F66E.3060301@dachary.org \
    --to=loic@dachary.org \
    --cc=ceph-devel@vger.kernel.org \
    --cc=ceph-users@lists.ceph.com \
    --cc=dietmar@proxmox.com \
    --cc=dunk007@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.