Interesting results

All of lore.kernel.org
 help / color / mirror / Atom feed

* Interesting results
@ 2012-06-28 22:37 Jim Schutt
  2012-06-28 22:53 ` Mark Nelson
  2012-07-01 19:57 ` Stefan Priebe
  0 siblings, 2 replies; 7+ messages in thread
From: Jim Schutt @ 2012-06-28 22:37 UTC (permalink / raw)
  To: ceph-devel@vger.kernel.org

Hi,

Lots of trouble reports go by on the list - I thought
it would be useful to report a success.

Using a patch (https://lkml.org/lkml/2012/6/28/446)
on top of 2.5-rc4 for my OSD servers, the same kernel
for my Linux clients, and a recent master branch
tip (git://github.com/ceph/ceph commit 4142ac44b3f),
I was able to sustain streaming writes from 166 linux
clients for 2 hours:

On 166 clients:
   dd conv=fdatasync if=/dev/zero of=/mnt/ceph/stripe-4M/1/zero0.`hostname -s` bs=4k count=65536k

Elapsed time:   7274.55 seconds
Total data:     45629732.553 MB (43515904 MiB)
Aggregate rate: 6272.516 MB/s

That kernel patch was critical; without it this test
runs into trouble after a few minutes because the
kernel runs into trouble looking for pages to merge
during page compaction.  Also critical were the ceph
tunings I mentioned here:
   http://www.spinics.net/lists/ceph-devel/msg07128.html

-- Jim

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Interesting results
  2012-06-28 22:37 Interesting results Jim Schutt
@ 2012-06-28 22:53 ` Mark Nelson
  2012-06-29 14:54   ` Jim Schutt
  2012-07-01 19:57 ` Stefan Priebe
  1 sibling, 1 reply; 7+ messages in thread
From: Mark Nelson @ 2012-06-28 22:53 UTC (permalink / raw)
  To: Jim Schutt; +Cc: ceph-devel@vger.kernel.org

On 06/28/2012 05:37 PM, Jim Schutt wrote:
> Hi,
>
> Lots of trouble reports go by on the list - I thought
> it would be useful to report a success.
>
> Using a patch (https://lkml.org/lkml/2012/6/28/446)
> on top of 2.5-rc4 for my OSD servers, the same kernel
> for my Linux clients, and a recent master branch
> tip (git://github.com/ceph/ceph commit 4142ac44b3f),
> I was able to sustain streaming writes from 166 linux
> clients for 2 hours:
>
> On 166 clients:
> dd conv=fdatasync if=/dev/zero of=/mnt/ceph/stripe-4M/1/zero0.`hostname
> -s` bs=4k count=65536k
>
> Elapsed time: 7274.55 seconds
> Total data: 45629732.553 MB (43515904 MiB)
> Aggregate rate: 6272.516 MB/s
>
> That kernel patch was critical; without it this test
> runs into trouble after a few minutes because the
> kernel runs into trouble looking for pages to merge
> during page compaction. Also critical were the ceph
> tunings I mentioned here:
> http://www.spinics.net/lists/ceph-devel/msg07128.html
>
> -- Jim

Nice!  Did you see much performance degradation over time?  Internally 
I've sen some slow downs (especially at smaller block sizes) as the osds 
fill up.  How many servers and how many drives?

Still, those are the kinds of numbers I like to see.  Congrats! :)

Mark

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Interesting results
  2012-06-28 22:53 ` Mark Nelson
@ 2012-06-29 14:54   ` Jim Schutt
  0 siblings, 0 replies; 7+ messages in thread
From: Jim Schutt @ 2012-06-29 14:54 UTC (permalink / raw)
  To: Mark Nelson; +Cc: ceph-devel@vger.kernel.org

On 06/28/2012 04:53 PM, Mark Nelson wrote:
> On 06/28/2012 05:37 PM, Jim Schutt wrote:
>> Hi,
>>
>> Lots of trouble reports go by on the list - I thought
>> it would be useful to report a success.
>>
>> Using a patch (https://lkml.org/lkml/2012/6/28/446)
>> on top of 2.5-rc4 for my OSD servers, the same kernel
>> for my Linux clients, and a recent master branch
>> tip (git://github.com/ceph/ceph commit 4142ac44b3f),
>> I was able to sustain streaming writes from 166 linux
>> clients for 2 hours:
>>
>> On 166 clients:
>> dd conv=fdatasync if=/dev/zero of=/mnt/ceph/stripe-4M/1/zero0.`hostname
>> -s` bs=4k count=65536k
>>
>> Elapsed time: 7274.55 seconds
>> Total data: 45629732.553 MB (43515904 MiB)
>> Aggregate rate: 6272.516 MB/s
>>
>> That kernel patch was critical; without it this test
>> runs into trouble after a few minutes because the
>> kernel runs into trouble looking for pages to merge
>> during page compaction. Also critical were the ceph
>> tunings I mentioned here:
>> http://www.spinics.net/lists/ceph-devel/msg07128.html
>>
>> -- Jim
>
> Nice! Did you see much performance degradation over time? Internally I've sen some slow downs (especially at smaller block sizes) as the osds fill up. How many servers and how many drives?
>

This result is from 12 servers, 24 OSDs/server, starting
from a freshly-built filesystem. I use 64KB btrfs metadata
nodes.

There is some performance degradation during such runs.
During the initial 10 TB or so, each server sustains ~2.2 GB/s,
as reported by vmstat.

Nearer the end of the run, data rate on each server is
much more variable, with peaks at ~2 GB/s and valleys at
~1.5 GB/s.

I am suspecting that some of that variability comes from
the OSDs not filling up uniformly; here's low/high utilization
at the end of the run:

server                     1K-blocks      Used Available Use% Mounted on

cs42:                      939095640 258202860 662416404  29% /ram/mnt/ceph/data.osd.261
cs38:                      939095640 259052468 661568524  29% /ram/mnt/ceph/data.osd.154
cs39:                      939095640 264803592 655825592  29% /ram/mnt/ceph/data.osd.174
cs34:                      939095640 265911256 654711400  29% /ram/mnt/ceph/data.osd.52
cs41:                      939095640 270588260 650049820  30% /ram/mnt/ceph/data.osd.238

cs33:                      939095640 345327760 575399472  38% /ram/mnt/ceph/data.osd.47
cs40:                      939095640 351180832 569558176  39% /ram/mnt/ceph/data.osd.205
cs35:                      939095640 351372096 569365696  39% /ram/mnt/ceph/data.osd.89
cs41:                      939095640 352522904 568214632  39% /ram/mnt/ceph/data.osd.217
cs33:                      939095640 358181684 562561740  39% /ram/mnt/ceph/data.osd.35

  max/min: 1.3872

Note that I am using osd_pg_bits=7, osd_pgp_bits=7.  I have plans
to push that to see what happens.  I've also got another dozen
servers on a truck somewhere on their way to here....

The under-utilized OSDs finish early, which I believe contributes
to performance tailing off at the end of such a run.  I don't have
any data on how big this effect might be.

I haven't yet tested filling my filesystem to capacity, so I have no
data regarding what happens as the disks fill up.

> Still, those are the kinds of numbers I like to see. Congrats! :)

Thanks - I think it's pretty cool that testing
Ceph found a performance issue in the kernel.

-- Jim

>
> Mark
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Interesting results
  2012-06-28 22:37 Interesting results Jim Schutt
  2012-06-28 22:53 ` Mark Nelson
@ 2012-07-01 19:57 ` Stefan Priebe
  2012-07-02 14:04   ` Jim Schutt
  1 sibling, 1 reply; 7+ messages in thread
From: Stefan Priebe @ 2012-07-01 19:57 UTC (permalink / raw)
  To: Jim Schutt; +Cc: ceph-devel@vger.kernel.org

thanks for sharing. Which btrfs mount options did you use?

Am 29.06.2012 00:37, schrieb Jim Schutt:
> Hi,
>
> Lots of trouble reports go by on the list - I thought
> it would be useful to report a success.
>
> Using a patch (https://lkml.org/lkml/2012/6/28/446)
> on top of 2.5-rc4 for my OSD servers, the same kernel
> for my Linux clients, and a recent master branch
> tip (git://github.com/ceph/ceph commit 4142ac44b3f),
> I was able to sustain streaming writes from 166 linux
> clients for 2 hours:
>
> On 166 clients:
>    dd conv=fdatasync if=/dev/zero
> of=/mnt/ceph/stripe-4M/1/zero0.`hostname -s` bs=4k count=65536k
>
> Elapsed time:   7274.55 seconds
> Total data:     45629732.553 MB (43515904 MiB)
> Aggregate rate: 6272.516 MB/s
>
> That kernel patch was critical; without it this test
> runs into trouble after a few minutes because the
> kernel runs into trouble looking for pages to merge
> during page compaction.  Also critical were the ceph
> tunings I mentioned here:
>    http://www.spinics.net/lists/ceph-devel/msg07128.html
>
> -- Jim
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Interesting results
  2012-07-01 19:57 ` Stefan Priebe
@ 2012-07-02 14:04   ` Jim Schutt
  2012-07-02 14:07     ` Stefan Priebe - Profihost AG
  0 siblings, 1 reply; 7+ messages in thread
From: Jim Schutt @ 2012-07-02 14:04 UTC (permalink / raw)
  To: Stefan Priebe; +Cc: ceph-devel@vger.kernel.org

On 07/01/2012 01:57 PM, Stefan Priebe wrote:
> thanks for sharing. Which btrfs mount options did you use?

  -o noatime

is all I use.

-- Jim

>
> Am 29.06.2012 00:37, schrieb Jim Schutt:
>> Hi,
>>
>> Lots of trouble reports go by on the list - I thought
>> it would be useful to report a success.
>>
>> Using a patch (https://lkml.org/lkml/2012/6/28/446)
>> on top of 2.5-rc4 for my OSD servers, the same kernel
>> for my Linux clients, and a recent master branch
>> tip (git://github.com/ceph/ceph commit 4142ac44b3f),
>> I was able to sustain streaming writes from 166 linux
>> clients for 2 hours:
>>
>> On 166 clients:
>> dd conv=fdatasync if=/dev/zero
>> of=/mnt/ceph/stripe-4M/1/zero0.`hostname -s` bs=4k count=65536k
>>
>> Elapsed time: 7274.55 seconds
>> Total data: 45629732.553 MB (43515904 MiB)
>> Aggregate rate: 6272.516 MB/s
>>
>> That kernel patch was critical; without it this test
>> runs into trouble after a few minutes because the
>> kernel runs into trouble looking for pages to merge
>> during page compaction. Also critical were the ceph
>> tunings I mentioned here:
>> http://www.spinics.net/lists/ceph-devel/msg07128.html
>>
>> -- Jim
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Interesting results
  2012-07-02 14:04   ` Jim Schutt
@ 2012-07-02 14:07     ` Stefan Priebe - Profihost AG
  2012-07-02 14:38       ` Jim Schutt
  0 siblings, 1 reply; 7+ messages in thread
From: Stefan Priebe - Profihost AG @ 2012-07-02 14:07 UTC (permalink / raw)
  To: Jim Schutt; +Cc: ceph-devel@vger.kernel.org

Am 02.07.2012 16:04, schrieb Jim Schutt:
> On 07/01/2012 01:57 PM, Stefan Priebe wrote:
>> thanks for sharing. Which btrfs mount options did you use?
>
>   -o noatime
>
> is all I use.

Thanks. Have you ever measured random I/O performance? Or is sequential 
all you need?

Stefan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Interesting results
  2012-07-02 14:07     ` Stefan Priebe - Profihost AG
@ 2012-07-02 14:38       ` Jim Schutt
  0 siblings, 0 replies; 7+ messages in thread
From: Jim Schutt @ 2012-07-02 14:38 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG; +Cc: ceph-devel@vger.kernel.org

On 07/02/2012 08:07 AM, Stefan Priebe - Profihost AG wrote:
> Am 02.07.2012 16:04, schrieb Jim Schutt:
>> On 07/01/2012 01:57 PM, Stefan Priebe wrote:
>>> thanks for sharing. Which btrfs mount options did you use?
>>
>> -o noatime
>>
>> is all I use.
>
> Thanks. Have you ever measured random I/O performance? Or is sequential all you need?

So far I've only been testing sequential, because
I like to find the limits under sequential load first.

--Jim

>
> Stefan
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2012-07-02 14:38 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-06-28 22:37 Interesting results Jim Schutt
2012-06-28 22:53 ` Mark Nelson
2012-06-29 14:54   ` Jim Schutt
2012-07-01 19:57 ` Stefan Priebe
2012-07-02 14:04   ` Jim Schutt
2012-07-02 14:07     ` Stefan Priebe - Profihost AG
2012-07-02 14:38       ` Jim Schutt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.