From mboxrd@z Thu Jan 1 00:00:00 1970 From: Stefan Priebe - Profihost AG Subject: Re: poor OSD performance using kernel 3.4 Date: Tue, 29 May 2012 11:46:58 +0200 Message-ID: <4FC49B12.8020004@profihost.ag> References: <5970d59f-9531-4f60-8600-3e1268824c83@mailpro> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: Received: from mail.profihost.ag ([85.158.179.208]:46354 "EHLO mail.profihost.ag" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753133Ab2E2JrA (ORCPT ); Tue, 29 May 2012 05:47:00 -0400 In-Reply-To: <5970d59f-9531-4f60-8600-3e1268824c83@mailpro> Sender: ceph-devel-owner@vger.kernel.org List-ID: To: Alexandre DERUMIER Cc: ceph-devel@vger.kernel.org, Mark Nelson It would be really nice if somebody from inktank can comment this whole sitation. Thanks! Stefan Am 29.05.2012 05:54, schrieb Alexandre DERUMIER: >>> This happens with ext4 or btrfs too.=20 >=20 > maybe this is related to io scheduler ? >=20 > did you have compared cfq,deadline,noop scheduler ? >=20 > noop should be fast with ssd. >=20 >=20 > also what's is your sas/sata controller ? >=20 > ----- Mail original -----=20 >=20 > De: "Stefan Priebe" =20 > =C3=80: "Alexandre DERUMIER" =20 > Cc: ceph-devel@vger.kernel.org, "Mark Nelson" =20 > Envoy=C3=A9: Lundi 28 Mai 2012 21:48:34=20 > Objet: Re: poor OSD performance using kernel 3.4=20 >=20 > Am 28.05.2012 08:52, schrieb Alexandre DERUMIER:=20 >>> I think filestore journal parallel works only with btrfs.=20 >>> Other filesystem are writeahead.=20 >>>> ... you might be right but i can't change ceph's implementation.=20 >> >> See my schema,=20 >> I think you see parallel writes, because you see flush write of firs= t wave to disk, in the same time=20 >> of second wave write to journal.=20 > Yes i fulll=C3=BD understand and agree - but still this should at lea= st=20 > result in a constant bandwidth near max of underlying disk.=20 >=20 >>>> I totally aggree with you but this is just a test setup AND if you= have=20 >>>> a big log file to copy let's say 100GB your journal will never be = big=20 >>>> enough and the speed should never drop to 0MB/s. Also i see the co= rrect=20 >>>> behaviour with 3.0.X where the speed is maxed to the underlying de= vice.=20 >>>> So i still see no reason that with 3.4 the speed drops to 0MB/s an= d is=20 >>>> mostly 10-20MB/s instead of 130MB/s.=20 >> >> Maybe something is wrong with 3.4, then your disk write more slowly.= (xfs bug, sata driver controller bug, ...)=20 >=20 > This happens with ext4 or btrfs too.=20 >=20 > Squential write speed to FS is exactly the same under 3.0 and 3.4 usi= ng=20 > oflag=3Ddirect.=20 >=20 > 3.4:=20 > 10000+0 records in=20 > 10000+0 records out=20 > 10485760000 bytes (10 GB) copied, 41,4899 s, 253 MB/s=20 >=20 > 3.0:=20 > 10000+0 records in=20 > 10000+0 records out=20 > 10485760000 bytes (10 GB) copied, 40,861 s, 257 MB/s=20 >=20 >> maybe some local benchmark of your ssd with 3.4 can give some tips ?= =20 >=20 >>>> How many disks (7,2K) do you have by osd ?=20 >>>>> One intel 520 SSD per OSD.=20 >> >> I see some benchmark on internet about 150-300MB/s (depend of the bl= ocksize).=20 > bench OSD shows around 260MB/s=20 >=20 > ceph osd tell X bench shows me a speed of 260MB/s under both kernels=20 > which corresponds to the dd from above.=20 >=20 >> Something must be wrong, Doing local benchmark can really help I thi= nk.=20 >> You can use sysbench-tools=20 >> https://github.com/tsuna/sysbench-tools=20 >> It make bench compare with nice graphs.=20 > Thx hopefully i'll find something.=20 >=20 > Stefan=20 >=20 >=20 >=20 -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" i= n the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html