All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Jim Schutt" <jaschut@sandia.gov>
To: Mark Nelson <mark.nelson@inktank.com>
Cc: "ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>
Subject: Re: Interesting results
Date: Fri, 29 Jun 2012 08:54:25 -0600	[thread overview]
Message-ID: <4FEDC1A1.9050808@sandia.gov> (raw)
In-Reply-To: <4FECE068.1010701@inktank.com>

On 06/28/2012 04:53 PM, Mark Nelson wrote:
> On 06/28/2012 05:37 PM, Jim Schutt wrote:
>> Hi,
>>
>> Lots of trouble reports go by on the list - I thought
>> it would be useful to report a success.
>>
>> Using a patch (https://lkml.org/lkml/2012/6/28/446)
>> on top of 2.5-rc4 for my OSD servers, the same kernel
>> for my Linux clients, and a recent master branch
>> tip (git://github.com/ceph/ceph commit 4142ac44b3f),
>> I was able to sustain streaming writes from 166 linux
>> clients for 2 hours:
>>
>> On 166 clients:
>> dd conv=fdatasync if=/dev/zero of=/mnt/ceph/stripe-4M/1/zero0.`hostname
>> -s` bs=4k count=65536k
>>
>> Elapsed time: 7274.55 seconds
>> Total data: 45629732.553 MB (43515904 MiB)
>> Aggregate rate: 6272.516 MB/s
>>
>> That kernel patch was critical; without it this test
>> runs into trouble after a few minutes because the
>> kernel runs into trouble looking for pages to merge
>> during page compaction. Also critical were the ceph
>> tunings I mentioned here:
>> http://www.spinics.net/lists/ceph-devel/msg07128.html
>>
>> -- Jim
>
> Nice! Did you see much performance degradation over time? Internally I've sen some slow downs (especially at smaller block sizes) as the osds fill up. How many servers and how many drives?
>

This result is from 12 servers, 24 OSDs/server, starting
from a freshly-built filesystem. I use 64KB btrfs metadata
nodes.

There is some performance degradation during such runs.
During the initial 10 TB or so, each server sustains ~2.2 GB/s,
as reported by vmstat.

Nearer the end of the run, data rate on each server is
much more variable, with peaks at ~2 GB/s and valleys at
~1.5 GB/s.

I am suspecting that some of that variability comes from
the OSDs not filling up uniformly; here's low/high utilization
at the end of the run:

server                     1K-blocks      Used Available Use% Mounted on

cs42:                      939095640 258202860 662416404  29% /ram/mnt/ceph/data.osd.261
cs38:                      939095640 259052468 661568524  29% /ram/mnt/ceph/data.osd.154
cs39:                      939095640 264803592 655825592  29% /ram/mnt/ceph/data.osd.174
cs34:                      939095640 265911256 654711400  29% /ram/mnt/ceph/data.osd.52
cs41:                      939095640 270588260 650049820  30% /ram/mnt/ceph/data.osd.238

cs33:                      939095640 345327760 575399472  38% /ram/mnt/ceph/data.osd.47
cs40:                      939095640 351180832 569558176  39% /ram/mnt/ceph/data.osd.205
cs35:                      939095640 351372096 569365696  39% /ram/mnt/ceph/data.osd.89
cs41:                      939095640 352522904 568214632  39% /ram/mnt/ceph/data.osd.217
cs33:                      939095640 358181684 562561740  39% /ram/mnt/ceph/data.osd.35

  max/min: 1.3872

Note that I am using osd_pg_bits=7, osd_pgp_bits=7.  I have plans
to push that to see what happens.  I've also got another dozen
servers on a truck somewhere on their way to here....

The under-utilized OSDs finish early, which I believe contributes
to performance tailing off at the end of such a run.  I don't have
any data on how big this effect might be.

I haven't yet tested filling my filesystem to capacity, so I have no
data regarding what happens as the disks fill up.

> Still, those are the kinds of numbers I like to see. Congrats! :)

Thanks - I think it's pretty cool that testing
Ceph found a performance issue in the kernel.

-- Jim

>
> Mark
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>



  reply	other threads:[~2012-06-29 14:54 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-28 22:37 Interesting results Jim Schutt
2012-06-28 22:53 ` Mark Nelson
2012-06-29 14:54   ` Jim Schutt [this message]
2012-07-01 19:57 ` Stefan Priebe
2012-07-02 14:04   ` Jim Schutt
2012-07-02 14:07     ` Stefan Priebe - Profihost AG
2012-07-02 14:38       ` Jim Schutt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FEDC1A1.9050808@sandia.gov \
    --to=jaschut@sandia.gov \
    --cc=ceph-devel@vger.kernel.org \
    --cc=mark.nelson@inktank.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.