From: "Jim Schutt" <jaschut@sandia.gov>
To: Mark Nelson <mark.nelson@inktank.com>
Cc: "ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>
Subject: Re: Interesting results
Date: Fri, 29 Jun 2012 08:54:25 -0600 [thread overview]
Message-ID: <4FEDC1A1.9050808@sandia.gov> (raw)
In-Reply-To: <4FECE068.1010701@inktank.com>
On 06/28/2012 04:53 PM, Mark Nelson wrote:
> On 06/28/2012 05:37 PM, Jim Schutt wrote:
>> Hi,
>>
>> Lots of trouble reports go by on the list - I thought
>> it would be useful to report a success.
>>
>> Using a patch (https://lkml.org/lkml/2012/6/28/446)
>> on top of 2.5-rc4 for my OSD servers, the same kernel
>> for my Linux clients, and a recent master branch
>> tip (git://github.com/ceph/ceph commit 4142ac44b3f),
>> I was able to sustain streaming writes from 166 linux
>> clients for 2 hours:
>>
>> On 166 clients:
>> dd conv=fdatasync if=/dev/zero of=/mnt/ceph/stripe-4M/1/zero0.`hostname
>> -s` bs=4k count=65536k
>>
>> Elapsed time: 7274.55 seconds
>> Total data: 45629732.553 MB (43515904 MiB)
>> Aggregate rate: 6272.516 MB/s
>>
>> That kernel patch was critical; without it this test
>> runs into trouble after a few minutes because the
>> kernel runs into trouble looking for pages to merge
>> during page compaction. Also critical were the ceph
>> tunings I mentioned here:
>> http://www.spinics.net/lists/ceph-devel/msg07128.html
>>
>> -- Jim
>
> Nice! Did you see much performance degradation over time? Internally I've sen some slow downs (especially at smaller block sizes) as the osds fill up. How many servers and how many drives?
>
This result is from 12 servers, 24 OSDs/server, starting
from a freshly-built filesystem. I use 64KB btrfs metadata
nodes.
There is some performance degradation during such runs.
During the initial 10 TB or so, each server sustains ~2.2 GB/s,
as reported by vmstat.
Nearer the end of the run, data rate on each server is
much more variable, with peaks at ~2 GB/s and valleys at
~1.5 GB/s.
I am suspecting that some of that variability comes from
the OSDs not filling up uniformly; here's low/high utilization
at the end of the run:
server 1K-blocks Used Available Use% Mounted on
cs42: 939095640 258202860 662416404 29% /ram/mnt/ceph/data.osd.261
cs38: 939095640 259052468 661568524 29% /ram/mnt/ceph/data.osd.154
cs39: 939095640 264803592 655825592 29% /ram/mnt/ceph/data.osd.174
cs34: 939095640 265911256 654711400 29% /ram/mnt/ceph/data.osd.52
cs41: 939095640 270588260 650049820 30% /ram/mnt/ceph/data.osd.238
cs33: 939095640 345327760 575399472 38% /ram/mnt/ceph/data.osd.47
cs40: 939095640 351180832 569558176 39% /ram/mnt/ceph/data.osd.205
cs35: 939095640 351372096 569365696 39% /ram/mnt/ceph/data.osd.89
cs41: 939095640 352522904 568214632 39% /ram/mnt/ceph/data.osd.217
cs33: 939095640 358181684 562561740 39% /ram/mnt/ceph/data.osd.35
max/min: 1.3872
Note that I am using osd_pg_bits=7, osd_pgp_bits=7. I have plans
to push that to see what happens. I've also got another dozen
servers on a truck somewhere on their way to here....
The under-utilized OSDs finish early, which I believe contributes
to performance tailing off at the end of such a run. I don't have
any data on how big this effect might be.
I haven't yet tested filling my filesystem to capacity, so I have no
data regarding what happens as the disks fill up.
> Still, those are the kinds of numbers I like to see. Congrats! :)
Thanks - I think it's pretty cool that testing
Ceph found a performance issue in the kernel.
-- Jim
>
> Mark
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
next prev parent reply other threads:[~2012-06-29 14:54 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-06-28 22:37 Interesting results Jim Schutt
2012-06-28 22:53 ` Mark Nelson
2012-06-29 14:54 ` Jim Schutt [this message]
2012-07-01 19:57 ` Stefan Priebe
2012-07-02 14:04 ` Jim Schutt
2012-07-02 14:07 ` Stefan Priebe - Profihost AG
2012-07-02 14:38 ` Jim Schutt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4FEDC1A1.9050808@sandia.gov \
--to=jaschut@sandia.gov \
--cc=ceph-devel@vger.kernel.org \
--cc=mark.nelson@inktank.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.