From mboxrd@z Thu Jan 1 00:00:00 1970 From: Yann Dupont Subject: Re: poor OSD performance using kernel 3.4 => problem found Date: Thu, 31 May 2012 17:43:15 +0200 Message-ID: <4FC79193.1000604@univ-nantes.fr> References: <4FBE415E.8030702@profihost.ag> <4FC54CDB.1000506@inktank.com> <4FC5BF27.5060704@profihost.ag> <4FC5C941.6010105@profihost.ag> <4FC5FEC1.90103@profihost.ag> <4FC60FC8.207@inktank.com> <4FC61596.3050703@profihost.ag> <4FC62BB0.1020003@inktank.com> <4FC66A1F.1080407@profihost.ag> <4FC68CAA.9030708@profihost.ag> <4FC7197D.5010406@profihost.ag> <4FC77045.6050907@univ-nantes.fr> <4FC77407.1050401@profihost.ag> <4FC775F3.8080109@univ-nantes.fr> <4FC78339.10900@univ-nantes.fr> <4FC78EFF.1090206@inktank.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: Received: from smtptls1-lmb.cpub.univ-nantes.fr ([193.52.103.110]:56055 "EHLO smtp-tls.univ-nantes.fr" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S932527Ab2EaPnV (ORCPT ); Thu, 31 May 2012 11:43:21 -0400 In-Reply-To: <4FC78EFF.1090206@inktank.com> Sender: ceph-devel-owner@vger.kernel.org List-ID: To: Mark Nelson Cc: Stefan Priebe - Profihost AG , Yehuda Sadeh , Stefan Majer , ceph-devel@vger.kernel.org On 31/05/2012 17:32, Mark Nelson wrote: > ceph osd pool get pg_num My setup is detailed in a previous mail , But as I changed some=20 parameters this morning, here we go : root@chichibu:~# ceph osd pool get data pg_num PG_NUM: 576 root@chichibu:~# ceph osd pool get rbd pg_num PG_NUM: 576 The pg num is quite low because I started with small OSD (9 osd with=20 200G each - internal disks) when I formatted. Now, I reduced to 8 osd,=20 (osd.4 is out) but with much larger (& faster) storage. Now, each of the 8 OSD have 5T on it, I try, for the moment, to keep th= e=20 OSD similars. Replication is set to 2. The fs is btrfs formatted with big metadata (-l 64k -n64k), and mounted= =20 via space_cache,compress=3Dlzo,nobarrier,noatime. journal is on tmpfs : osd journal =3D /dev/shm/journal osd journal size =3D 6144 I know this is dangerous, remember It's NOT a production system for the= =20 moment. No OSD is full, I don't have much data stored for the moment. Concerning crush map, I'm not using the default one : The 8 nodes are in 3 different locations (some kilometers away). 2 are=20 in 1 place, 2 in another, and the 4 last in the principal place. There is 10G between all the nodes and they are in the same VLAN, no=20 router involved (but there is (negligible ?) latency between nodes) I try to group host together to avoid problem when I loose a location=20 (electrical problem, for example). Not sure I really customized the=20 crush map as I should have. here is the map : begin crush map # devices device 0 osd.0 device 1 osd.1 device 2 osd.2 device 3 osd.3 device 4 device4 device 5 osd.5 device 6 osd.6 device 7 osd.7 device 8 osd.8 # types type 0 osd type 1 host type 2 rack type 3 pool # buckets host karuizawa { id -5 # do not change unnecessarily # weight 1.000 alg straw hash 0 # rjenkins1 item osd.2 weight 1.000 } host hazelburn { id -6 # do not change unnecessarily # weight 1.000 alg straw hash 0 # rjenkins1 item osd.3 weight 1.000 } rack loire { id -3 # do not change unnecessarily # weight 2.000 alg straw hash 0 # rjenkins1 item karuizawa weight 1.000 item hazelburn weight 1.000 } host carsebridge { id -8 # do not change unnecessarily # weight 1.000 alg straw hash 0 # rjenkins1 item osd.5 weight 1.000 } host cameronbridge { id -9 # do not change unnecessarily # weight 1.000 alg straw hash 0 # rjenkins1 item osd.6 weight 1.000 } rack chantrerie { id -7 # do not change unnecessarily # weight 2.000 alg straw hash 0 # rjenkins1 item carsebridge weight 1.000 item cameronbridge weight 1.000 } host chichibu { id -2 # do not change unnecessarily # weight 1.000 alg straw hash 0 # rjenkins1 item osd.0 weight 1.000 } host glenesk { id -4 # do not change unnecessarily # weight 1.000 alg straw hash 0 # rjenkins1 item osd.1 weight 1.000 } host braeval { id -10 # do not change unnecessarily # weight 1.000 alg straw hash 0 # rjenkins1 item osd.7 weight 1.000 } host hanyu { id -11 # do not change unnecessarily # weight 1.000 alg straw hash 0 # rjenkins1 item osd.8 weight 1.000 } rack lombarderie { id -12 # do not change unnecessarily # weight 4.000 alg straw hash 0 # rjenkins1 item chichibu weight 1.000 item glenesk weight 1.000 item braeval weight 1.000 item hanyu weight 1.000 } pool default { id -1 # do not change unnecessarily # weight 8.000 alg straw hash 0 # rjenkins1 item loire weight 2.000 item chantrerie weight 2.000 item lombarderie weight 4.000 } # rules rule data { ruleset 0 type replicated min_size 1 max_size 10 step take default step chooseleaf firstn 0 type host step emit } rule metadata { ruleset 1 type replicated min_size 1 max_size 10 step take default step chooseleaf firstn 0 type host step emit } rule rbd { ruleset 2 type replicated min_size 1 max_size 10 step take default step chooseleaf firstn 0 type host step emit } # end crush map Hope it helps, cheers --=20 Yann Dupont - Service IRTS, DSI Universit=E9 de Nantes Tel : 02.53.48.49.20 - Mail/Jabber : Yann.Dupont@univ-nantes.fr -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" i= n the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html