From mboxrd@z Thu Jan 1 00:00:00 1970 From: Josh Durgin Subject: Re: speedup ceph / scaling / find the bottleneck Date: Mon, 02 Jul 2012 13:30:19 -0700 Message-ID: <4FF204DB.80709@inktank.com> References: <59beaaec-5f12-4fb2-9c03-69f41849e89e@mailpro> <4FF13BEB.8080906@profihost.ag> <4FF1F4F6.4030403@profihost.ag> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: Received: from mail-pb0-f46.google.com ([209.85.160.46]:34534 "EHLO mail-pb0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932104Ab2GBUdJ (ORCPT ); Mon, 2 Jul 2012 16:33:09 -0400 Received: by pbbrp8 with SMTP id rp8so8010765pbb.19 for ; Mon, 02 Jul 2012 13:33:09 -0700 (PDT) In-Reply-To: <4FF1F4F6.4030403@profihost.ag> Sender: ceph-devel-owner@vger.kernel.org List-ID: To: Stefan Priebe Cc: Gregory Farnum , Alexandre DERUMIER , Sage Weil , ceph-devel@vger.kernel.org, Mark Nelson On 07/02/2012 12:22 PM, Stefan Priebe wrote: > Am 02.07.2012 18:51, schrieb Gregory Farnum: >> On Sun, Jul 1, 2012 at 11:12 PM, Stefan Priebe - Profihost AG >> wrote: >>> @sage / mark >>> How does the aggregation work? Does it work 4MB blockwise or target= node >>> based? >> Aggregation is based on the 4MB blocks, and if you've got caching >> enabled then it's also not going to flush them out to disk very ofte= n >> if you're continuously updating the block =97 I don't remember all t= he >> conditions, but essentially, you'll run into dirty limits and it wil= l >> asynchronously flush out the data based on a combination of how old = it >> is, and how long it's been since some version of it was stable on >> disk. > Is there any way to check if rbd caching works correctly? For me the = I/O > values do not change if i switch writeback on or of and it also doesn= 't > matter how large i set the cache size. > > ... If you add admin_socket=3D/path/to/admin_socket for your client running qemu (in that client's ceph.conf section or manually in the qemu command line) you can check that caching is enabled: ceph --admin-daemon /path/to/admin_socket show config | grep rbd_cache And see statistics it generates (look for cache) with: ceph --admin-daemon /path/to/admin_socket perfcounters_dump Josh >>> Ceph: >>> 2 VMs: >>> write: io=3D2234MB, bw=3D25405KB/s, iops=3D6351, runt=3D 90041msec >>> read : io=3D4760MB, bw=3D54156KB/s, iops=3D13538, runt=3D 90007msec >>> write: io=3D56372MB, bw=3D638402KB/s, iops=3D155, runt=3D 90421msec >>> read : io=3D86572MB, bw=3D981225KB/s, iops=3D239, runt=3D 90346msec >>> >>> write: io=3D2222MB, bw=3D25275KB/s, iops=3D6318, runt=3D 90011msec >>> read : io=3D4747MB, bw=3D54000KB/s, iops=3D13500, runt=3D 90008msec >>> write: io=3D55300MB, bw=3D626733KB/s, iops=3D153, runt=3D 90353msec >>> read : io=3D84992MB, bw=3D965283KB/s, iops=3D235, runt=3D 90162msec >> >> I can't quite tell what's going on here, can you describe the test i= n >> more detail? > > I've network booted my VM and then run the following command: > export DISK=3D/dev/vda; (fio --filename=3D$DISK --direct=3D1 --rw=3Dr= andwrite > --bs=3D4k --size=3D200G --numjobs=3D50 --runtime=3D90 --group_reporti= ng > --name=3Dfile1;fio --filename=3D$DISK --direct=3D1 --rw=3Drandread --= bs=3D4k > --size=3D200G --numjobs=3D50 --runtime=3D90 --group_reporting --name=3D= file1;fio > --filename=3D$DISK --direct=3D1 --rw=3Dwrite --bs=3D4M --size=3D200G = --numjobs=3D50 > --runtime=3D90 --group_reporting --name=3Dfile1;fio --filename=3D$DIS= K > --direct=3D1 --rw=3Dread --bs=3D4M --size=3D200G --numjobs=3D50 --run= time=3D90 > --group_reporting --name=3Dfile1 )|egrep " read| write" > > - write random 4k I/O > - read random 4k I/O > - write seq 4M I/O > - read seq 4M I/O > > Stefan -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" i= n the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html