From mboxrd@z Thu Jan 1 00:00:00 1970 From: Stefan Priebe Subject: Re: speedup ceph / scaling / find the bottleneck Date: Sun, 01 Jul 2012 23:01:44 +0200 Message-ID: <4FF0BAB8.3070503@profihost.ag> References: <4FED8792.1090905@profihost.ag> <4FED964D.3080201@inktank.com> <4FEDA777.1060309@profihost.ag> <4FEE1B91.8080404@profihost.ag> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: Received: from mail.profihost.ag ([85.158.179.208]:37500 "EHLO mail.profihost.ag" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751640Ab2GAVBn (ORCPT ); Sun, 1 Jul 2012 17:01:43 -0400 In-Reply-To: <4FEE1B91.8080404@profihost.ag> Sender: ceph-devel-owner@vger.kernel.org List-ID: To: Sage Weil Cc: Mark Nelson , "ceph-devel@vger.kernel.org" Hello list, Hello sage, i've made some further tests. Sequential 4k writes over 200GB: 300% CPU usage of kvm process 34712 iops Random 4k writes over 200GB: 170% CPU usage of kvm process 5500 iops When i make random 4k writes over 100MB: 450% CPU usage of kvm process and !! 25059 iops !! Random 4k writes over 1GB: 380% CPU usage of kvm process 14387 iops So the range where the random I/O happen seem to be important and the cpu usage just seem to reflect the iops. So i'm not sure if the problem is really the client rbd driver. Mark i hope you can make some tests next week. Greets Stefan Am 29.06.2012 23:18, schrieb Stefan Priebe: > Am 29.06.2012 17:28, schrieb Sage Weil: >> On Fri, 29 Jun 2012, Stefan Priebe - Profihost AG wrote: >>> Am 29.06.2012 13:49, schrieb Mark Nelson: >>>> I'll try to replicate your findings in house. I've got some other >>>> things I have to do today, but hopefully I can take a look next >>>> week. If >>>> I recall correctly, in the other thread you said that sequential writes >>>> are using much less CPU time on your systems? >>> >>> Random 4k writes: 10% idle >>> Seq 4k writes: !! 99,7% !! idle >>> Seq 4M writes: 90% idle >> >> I take it 'rbd cache = true'? > Yes > >> It sounds like librbd (or the guest file >> system) is coalescing the sequential writes into big writes. I'm a bit >> surprised that the 4k ones have lower CPU utilization, but there are lots >> of opportunity for noise there, so I would n't read too far into it yet. > 90 to 99,7 is OK the 9% goes to flush, kworker and xfs processes. It was > the overall system load. Not just ceph-osd. > >>>> Do you see better scaling in that case? >>> >>> 3 osd nodes: >>> 1 VM: >>> Rand 4k writes: 7000 iops > <-- this one is WRONG! sorry it is 14100 iops > > >>> Seq 4k writes: 19900 iops >>> >>> 2 VMs: >>> Rand 4k writes: 6000 iops each >>> Seq 4k writes: 4000 iops VM 1 >>> Seq 4k writes: 18500 iops VM 2 >>> >>> >>> 4 osd nodes: >>> 1 VM: >>> Rand 4k writes: 14400 iops <------ ???? >> >> Can you double-check this number? > Triple checked BUT i see the the Rand 4k writes with 3 osd nodes was > wrong. Sorry. > >>> Seq 4k writes: 19000 iops >>> >>> 2 VMs: >>> Rand 4k writes: 7000 iops each >>> Seq 4k writes: 18000 iops each >> >> With the exception of that one number above, it really sounds like the >> bottleneck is in the client (VM or librbd+librados) and not in the >> cluster. Performance won't improve when you add OSDs if the limiting >> factor is the clients ability to dispatch/stream/sustatin IOs. That also >> seems concistent with the fact that limiting the # of CPUs on the OSDs >> doesn't affect much. > ACK > >> Aboe, with 2 VMs, for instance, your total iops for the cluster doubled >> (36000 total). Can you try with 4 VMs and see if it continues to >> scale in >> that dimension? At some point you will start to saturate the OSDs, >> and at >> that point adding more OSDs should show aggregate throughput going up. > From where did you get that value? It scales to VMs on some points but > it does not scale with OSDs. > > Stefan