* An Evaluation of Object Name Hashing
@ 2016-01-12 13:17 Marcel Lauhoff
2016-01-12 13:38 ` Sage Weil
0 siblings, 1 reply; 3+ messages in thread
From: Marcel Lauhoff @ 2016-01-12 13:17 UTC (permalink / raw)
To: ceph-devel
Hi,
I wrote a Master's Thesis about Ceph and cold storage last year. One of
the things I looked at was modifications to object placement.
Among others, what would happen to balance (e.g objects / OSD) when
all objects of a file end up on the same OSD. I also ran tests with a
different hash algorithm (Linux dcache).
I wrote an article on my website with the analysis, changes to the
source and how I ran the tests:
http://irq0.org/articles/ceph/object_name_hashing
~irq0
--
Marcel Lauhoff
lauhoff@uni-mainz.de
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: An Evaluation of Object Name Hashing
2016-01-12 13:17 An Evaluation of Object Name Hashing Marcel Lauhoff
@ 2016-01-12 13:38 ` Sage Weil
2016-01-26 18:40 ` Marcel Lauhoff
0 siblings, 1 reply; 3+ messages in thread
From: Sage Weil @ 2016-01-12 13:38 UTC (permalink / raw)
To: Marcel Lauhoff; +Cc: ceph-devel
Hi Marcel,
This is great!
On Tue, 12 Jan 2016, Marcel Lauhoff wrote:
>
> Hi,
>
> I wrote a Master's Thesis about Ceph and cold storage last year. One of
> the things I looked at was modifications to object placement.
>
> Among others, what would happen to balance (e.g objects / OSD) when
> all objects of a file end up on the same OSD. I also ran tests with a
> different hash algorithm (Linux dcache).
>
> I wrote an article on my website with the analysis, changes to the
> source and how I ran the tests:
>
> http://irq0.org/articles/ceph/object_name_hashing
The interesting thing to me is the error bars for linux prefix (the
right-most set of bars on the last graph). They range is significantly
wider than rjenkins + prefix (ranging from 2.1TiB to 4.0TiB (vs 2.3-3.7ish
for the others). The reason we switched away from the linux dcache hash
(it was the original choice) is because it is very weak. I suspect that
even if you look at the average + standard deviation it hides some of the
badness; looking at 99th or 99.9th percentile, or simply a plot of the osd
utilization distribution, will show that there are more low- and high-
utilization outliers.
The other thing to keep in mind is that beyond a certain size locality
doesn't buy you that much... the disk seek overhead is no longer
significant once you've read several megabytes of data. At the same
time, concentrating all data in a file (or rbd image) on a single device
means that a large, busy, hot file can focus a lot of traffic on a single
OSD.
What might be more useful is the ability to take the data for several
smaller files that are thought to be related (e.g., in the same directory,
created at the same time) and try to store them together. In that case,
since we know the file are small, the impact on balance would not be
significant. On the other hand, what we currently do with (very) small
files in CephFS is just inline the data in the inode anyway so we already
get that locality (and more)--the main limitation there being that the max
inline size is quite small (a KB or two, IIRC).
sage
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: An Evaluation of Object Name Hashing
2016-01-12 13:38 ` Sage Weil
@ 2016-01-26 18:40 ` Marcel Lauhoff
0 siblings, 0 replies; 3+ messages in thread
From: Marcel Lauhoff @ 2016-01-26 18:40 UTC (permalink / raw)
To: ceph-devel
Hi!
Sage Weil <sage@newdream.net> writes:
> On Tue, 12 Jan 2016, Marcel Lauhoff wrote:
>>
>> I wrote an article on my website with the analysis, changes to the
>> source and how I ran the tests:
>>
>> http://irq0.org/articles/ceph/object_name_hashing
>
> The interesting thing to me is the error bars for linux prefix (the
> right-most set of bars on the last graph). They range is significantly
> wider than rjenkins + prefix (ranging from 2.1TiB to 4.0TiB (vs 2.3-3.7ish
> for the others). The reason we switched away from the linux dcache hash
> (it was the original choice) is because it is very weak. I suspect that
> even if you look at the average + standard deviation it hides some of the
> badness; looking at 99th or 99.9th percentile, or simply a plot of the osd
> utilization distribution, will show that there are more low- and high-
> utilization outliers.
I rerun the tests and included Adler-32, CRC32, MD5 and SHA-1 (MD5 and
SHA-1 truncated to 32 bit). I updated the article.
In summary: Adler-32 does not work. MD5 and SHA-1 are OK. CRC32 as good
as RJenkins, maybe even slightly better.
~marcel
--
Marcel Lauhoff
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2016-01-26 18:40 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-01-12 13:17 An Evaluation of Object Name Hashing Marcel Lauhoff
2016-01-12 13:38 ` Sage Weil
2016-01-26 18:40 ` Marcel Lauhoff
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.