All of lore.kernel.org
 help / color / mirror / Atom feed
From: Josh Durgin <josh.durgin@inktank.com>
To: Sage Weil <sweil@redhat.com>, ceph-devel@vger.kernel.org
Subject: Re: crush: straw is dead, long live straw2
Date: Mon, 08 Dec 2014 16:21:07 -0800	[thread overview]
Message-ID: <54864073.5070907@inktank.com> (raw)
In-Reply-To: <alpine.DEB.2.00.1412081516010.18281@cobra.newdream.net>

On 12/08/2014 03:48 PM, Sage Weil wrote:
>   - Use floating point log function.  This is problematic for the kernel
> implementation (no floating point), is slower than the lookup table, and
> makes me worry about whether the floating point calculations are
> consistent across architectures (the mapping has to be completely
> deterministic).

This also won't work for QEMU, which may not restore floating point
modes while doing I/O (leading to crashes like 
http://tracker.ceph.com/issues/3521).

>   - Use some approximation of the logarithm with fixed-point arithmetic.
> I spent a bit of time search and found
>
>   http://www.researchgate.net/publication/230668515_A_fixed-point_implementation_of_the_natural_logarithm_based_on_a_expanded_hyperbolic_CORDIC_algorithm
>
> but it also involves a couple of lookup tables and (judging by figure 1)
> is probably slower than the 256 KB table.
>
> We could probably expand out taylor series and get something half decent,
> but any precision we lose will translate to OSD utilizations that are off
> from the input weights--something that costs real disk space and we
> probably want to avoid.
>
>   - Stick with the 128KB lookup table.  Performance sensitive clients can
> precalculate all PG mappings when they get OSDMap updates if they are
> concerned about leave the CRUSH calculation in the IO path.
>
> Any other suggestions?

It could be a lookup table generated (or chosen) at runtime to a
particular size configured by the crushmap, but that's probably more
complex than it's worth.

> Here is my implementation of the lookup-table based approach:
>
> 	https://github.com/ceph/ceph/commit/1d462c9f6a262de3a51533193ed2dff34c730727
> 	https://github.com/ceph/ceph/commits/wip-crush-straw2
>
> You'll notice that included in there is a unit test that verifies that
> changing a single item's weight does not effect the distribution of inputs
> among other items in the bucket.  :)


  reply	other threads:[~2014-12-09  0:27 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-08 23:48 crush: straw is dead, long live straw2 Sage Weil
2014-12-09  0:21 ` Josh Durgin [this message]
2014-12-12  9:14 ` Thorsten Behrens
2014-12-12 14:46   ` Sage Weil
2014-12-12 15:39     ` Joe Landman
2014-12-12 16:20       ` Sage Weil
2014-12-12 16:39         ` Joe Landman
2014-12-12 18:18         ` Joe Landman
2014-12-12 21:29           ` Sage Weil
2014-12-12 15:43     ` Milosz Tanski
2014-12-12 17:27 ` Yehuda Sadeh
2014-12-12 21:42   ` Mark Nelson
2017-01-27  0:26 ` Loic Dachary

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54864073.5070907@inktank.com \
    --to=josh.durgin@inktank.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=sweil@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.