From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dave Spano Subject: Re: OSD memory leaks? Date: Wed, 9 Jan 2013 11:10:59 -0500 (EST) Message-ID: <8366806.170.1357747859058.JavaMail.dspano@it1> References: Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: Received: from rrcs-24-103-221-203.nys.biz.rr.com ([24.103.221.203]:52648 "EHLO mail.optogenics.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932099Ab3AIQLG convert rfc822-to-8bit (ORCPT ); Wed, 9 Jan 2013 11:11:06 -0500 In-Reply-To: Sender: ceph-devel-owner@vger.kernel.org List-ID: To: =?utf-8?Q?S=C3=A9bastien?= Han Cc: ceph-devel , Samuel Just Yes, I'm using argonaut.=20 I've got 38 heap files from yesterday. Currently, the OSD in question i= s using 91.2% of memory according to top, and staying there. I initiall= y thought it would go until the OOM killer started killing processes, b= ut I don't see anything funny in the system logs that indicate that.=20 On the other hand, the ceph-osd process on osd.1 is using far less memo= ry.=20 osd.0 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND = = =20 9151 root 20 0 20.4g 14g 2548 S 1 91.2 517:58.71 ceph-osd=20 osd.1 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND = = =20 10785 root 20 0 673m 310m 5164 S 3 1.9 107:04.39 ceph-osd =20 Here's what tcmalloc says when I run ceph osd tell 0 heap stats: 2013-01-09 11:09:36.778675 7f62aae23700 0 log [INF] : osd.0tcmalloc he= ap stats:------------------------------------------------ 2013-01-09 11:09:36.779113 7f62aae23700 0 log [INF] : MALLOC: 210= 884768 ( 201.1 MB) Bytes in use by application 2013-01-09 11:09:36.779348 7f62aae23700 0 log [INF] : MALLOC: + 89= 026560 ( 84.9 MB) Bytes in page heap freelist 2013-01-09 11:09:36.779928 7f62aae23700 0 log [INF] : MALLOC: + 7= 926512 ( 7.6 MB) Bytes in central cache freelist 2013-01-09 11:09:36.779951 7f62aae23700 0 log [INF] : MALLOC: + = 144896 ( 0.1 MB) Bytes in transfer cache freelist 2013-01-09 11:09:36.779972 7f62aae23700 0 log [INF] : MALLOC: + 11= 046512 ( 10.5 MB) Bytes in thread cache freelists 2013-01-09 11:09:36.780013 7f62aae23700 0 log [INF] : MALLOC: + 5= 177344 ( 4.9 MB) Bytes in malloc metadata 2013-01-09 11:09:36.780030 7f62aae23700 0 log [INF] : MALLOC: ------= ------ 2013-01-09 11:09:36.780056 7f62aae23700 0 log [INF] : MALLOC: =3D 3= 24206592 ( 309.2 MB) Actual memory used (physical + swap) 2013-01-09 11:09:36.780081 7f62aae23700 0 log [INF] : MALLOC: + 126= 177280 ( 120.3 MB) Bytes released to OS (aka unmapped) 2013-01-09 11:09:36.780112 7f62aae23700 0 log [INF] : MALLOC: ------= ------ 2013-01-09 11:09:36.780127 7f62aae23700 0 log [INF] : MALLOC: =3D 4= 50383872 ( 429.5 MB) Virtual address space used 2013-01-09 11:09:36.780152 7f62aae23700 0 log [INF] : MALLOC: 2013-01-09 11:09:36.780168 7f62aae23700 0 log [INF] : MALLOC: = 37492 Spans in use 2013-01-09 11:09:36.780330 7f62aae23700 0 log [INF] : MALLOC: = 51 Thread heaps in use 2013-01-09 11:09:36.780359 7f62aae23700 0 log [INF] : MALLOC: = 4096 Tcmalloc page size 2013-01-09 11:09:36.780384 7f62aae23700 0 log [INF] : ----------------= -------------------------------- Dave Spano=20 Optogenics=20 Systems Administrator=20 ----- Original Message -----=20 =46rom: "S=C3=A9bastien Han" =20 To: "Samuel Just" =20 Cc: "Dave Spano" , "ceph-devel" =20 Sent: Wednesday, January 9, 2013 10:20:43 AM=20 Subject: Re: OSD memory leaks?=20 I guess he runs Argonaut as well.=20 More suggestions about this problem?=20 Thanks!=20 --=20 Regards,=20 S=C3=A9bastien Han.=20 On Mon, Jan 7, 2013 at 8:09 PM, Samuel Just wrot= e:=20 >=20 > Awesome! What version are you running (ceph-osd -v, include the hash)= ?=20 > -Sam=20 >=20 > On Mon, Jan 7, 2013 at 11:03 AM, Dave Spano w= rote:=20 > > This failed the first time I sent it, so I'm resending in plain tex= t.=20 > >=20 > > Dave Spano=20 > > Optogenics=20 > > Systems Administrator=20 > >=20 > >=20 > >=20 > > ----- Original Message -----=20 > >=20 > > From: "Dave Spano" =20 > > To: "S=C3=A9bastien Han" =20 > > Cc: "ceph-devel" , "Samuel Just" =20 > > Sent: Monday, January 7, 2013 12:40:06 PM=20 > > Subject: Re: OSD memory leaks?=20 > >=20 > >=20 > > Sam,=20 > >=20 > > Attached are some heaps that I collected today. 001 and 003 are jus= t after I started the profiler; 011 is the most recent. If you need mor= e, or anything different let me know. Already the OSD in question is at= 38% memory usage. As mentioned by S=C3=A8bastien, restarting ceph-osd = keeps things going.=20 > >=20 > > Not sure if this is helpful information, but out of the two OSDs th= at I have running, the first one (osd.0) is the one that develops this = problem the quickest. osd.1 does have the same issue, it just takes muc= h longer. Do the monitors hit the first osd in the list first, when the= re's activity?=20 > >=20 > >=20 > > Dave Spano=20 > > Optogenics=20 > > Systems Administrator=20 > >=20 > >=20 > > ----- Original Message -----=20 > >=20 > > From: "S=C3=A9bastien Han" =20 > > To: "Samuel Just" =20 > > Cc: "ceph-devel" =20 > > Sent: Friday, January 4, 2013 10:20:58 AM=20 > > Subject: Re: OSD memory leaks?=20 > >=20 > > Hi Sam,=20 > >=20 > > Thanks for your answer and sorry the late reply.=20 > >=20 > > Unfortunately I can't get something out from the profiler, actually= I=20 > > do but I guess it doesn't show what is supposed to show... I will k= eep=20 > > on trying this. Anyway yesterday I just thought that the problem mi= ght=20 > > be due to some over usage of some OSDs. I was thinking that the=20 > > distribution of the primary OSD might be uneven, this could have=20 > > explained that some memory leaks are more important with some serve= rs.=20 > > At the end, the repartition seems even but while looking at the pg=20 > > dump I found something interesting in the scrub column, timestamps=20 > > from the last scrubbing operation matched with times showed on the=20 > > graph.=20 > >=20 > > After this, I made some calculation, I compared the total number of= =20 > > scrubbing operation with the time range where memory leaks occurred= =2E=20 > > First of all check my setup:=20 > >=20 > > root@c2-ceph-01 ~ # ceph osd tree=20 > > dumped osdmap tree epoch 859=20 > > # id weight type name up/down reweight=20 > > -1 12 pool default=20 > > -3 12 rack lc2_rack33=20 > > -2 3 host c2-ceph-01=20 > > 0 1 osd.0 up 1=20 > > 1 1 osd.1 up 1=20 > > 2 1 osd.2 up 1=20 > > -4 3 host c2-ceph-04=20 > > 10 1 osd.10 up 1=20 > > 11 1 osd.11 up 1=20 > > 9 1 osd.9 up 1=20 > > -5 3 host c2-ceph-02=20 > > 3 1 osd.3 up 1=20 > > 4 1 osd.4 up 1=20 > > 5 1 osd.5 up 1=20 > > -6 3 host c2-ceph-03=20 > > 6 1 osd.6 up 1=20 > > 7 1 osd.7 up 1=20 > > 8 1 osd.8 up 1=20 > >=20 > >=20 > > And there are the results:=20 > >=20 > > * Ceph node 1 which has the most important memory leak performed 16= 08=20 > > in total and 1059 during the time range where memory leaks occured=20 > > * Ceph node 2, 1168 in total and 776 during the time range where=20 > > memory leaks occured=20 > > * Ceph node 3, 940 in total and 94 during the time range where memo= ry=20 > > leaks occurred=20 > > * Ceph node 4, 899 in total and 191 during the time range where=20 > > memory leaks occurred=20 > >=20 > > I'm still not entirely sure that the scrub operation causes the lea= k=20 > > but the only relevant relation that I found...=20 > >=20 > > Could it be that the scrubbing process doesn't release memory? Btw = I=20 > > was wondering, how ceph decides at what time it should run the=20 > > scrubbing operation? I know that it's once a day and control by the= =20 > > following options=20 > >=20 > > OPTION(osd_scrub_min_interval, OPT_FLOAT, 300)=20 > > OPTION(osd_scrub_max_interval, OPT_FLOAT, 60*60*24)=20 > >=20 > > But how ceph determined the time where the operation started, durin= g=20 > > cluster creation probably?=20 > >=20 > > I just checked the options that control OSD scrubbing and found tha= t by default:=20 > >=20 > > OPTION(osd_max_scrubs, OPT_INT, 1)=20 > >=20 > > So that might explain why only one OSD uses a lot of memory.=20 > >=20 > > My dirty workaround at the moment is to performed a check of memory= =20 > > use by every OSD and restart it if it uses more than 25% of the tot= al=20 > > memory. Also note that on ceph 1, 3 and 4 it's always one OSD that=20 > > uses a lot of memory, for ceph 2 only the mem usage is high but alm= ost=20 > > the same for all the OSD process.=20 > >=20 > > Thank you in advance.=20 > >=20 > > --=20 > > Regards,=20 > > S=C3=A9bastien Han.=20 > >=20 > >=20 > > On Wed, Dec 19, 2012 at 10:43 PM, Samuel Just wrote:=20 > >>=20 > >> Sorry, it's been very busy. The next step would to try to get a he= ap=20 > >> dump. You can start a heap profile on osd N by:=20 > >>=20 > >> ceph osd tell N heap start_profiler=20 > >>=20 > >> and you can get it to dump the collected profile using=20 > >>=20 > >> ceph osd tell N heap dump.=20 > >>=20 > >> The dumps should show up in the osd log directory.=20 > >>=20 > >> Assuming the heap profiler is working correctly, you can look at t= he=20 > >> dump using pprof in google-perftools.=20 > >>=20 > >> On Wed, Dec 19, 2012 at 8:37 AM, S=C3=A9bastien Han wrote:=20 > >> > No more suggestions? :(=20 > >> > --=20 > >> > Regards,=20 > >> > S=C3=A9bastien Han.=20 > >> >=20 > >> >=20 > >> > On Tue, Dec 18, 2012 at 6:21 PM, S=C3=A9bastien Han wrote:=20 > >> >> Nothing terrific...=20 > >> >>=20 > >> >> Kernel logs from my clients are full of "libceph: osd4=20 > >> >> 172.20.11.32:6801 socket closed"=20 > >> >>=20 > >> >> I saw this somewhere on the tracker.=20 > >> >>=20 > >> >> Does this harm?=20 > >> >>=20 > >> >> Thanks.=20 > >> >>=20 > >> >> --=20 > >> >> Regards,=20 > >> >> S=C3=A9bastien Han.=20 > >> >>=20 > >> >>=20 > >> >>=20 > >> >> On Mon, Dec 17, 2012 at 11:55 PM, Samuel Just wrote:=20 > >> >>>=20 > >> >>> What is the workload like?=20 > >> >>> -Sam=20 > >> >>>=20 > >> >>> On Mon, Dec 17, 2012 at 2:41 PM, S=C3=A9bastien Han wrote:=20 > >> >>> > Hi,=20 > >> >>> >=20 > >> >>> > No, I don't see nothing abnormal in the network stats. I don= 't see=20 > >> >>> > anything in the logs... :(=20 > >> >>> > The weird thing is that one node over 4 seems to take way mo= re memory=20 > >> >>> > than the others...=20 > >> >>> >=20 > >> >>> > --=20 > >> >>> > Regards,=20 > >> >>> > S=C3=A9bastien Han.=20 > >> >>> >=20 > >> >>> >=20 > >> >>> > On Mon, Dec 17, 2012 at 11:31 PM, S=C3=A9bastien Han wrote:=20 > >> >>> >>=20 > >> >>> >> Hi,=20 > >> >>> >>=20 > >> >>> >> No, I don't see nothing abnormal in the network stats. I do= n't see anything in the logs... :(=20 > >> >>> >> The weird thing is that one node over 4 seems to take way m= ore memory than the others...=20 > >> >>> >>=20 > >> >>> >> --=20 > >> >>> >> Regards,=20 > >> >>> >> S=C3=A9bastien Han.=20 > >> >>> >>=20 > >> >>> >>=20 > >> >>> >>=20 > >> >>> >> On Mon, Dec 17, 2012 at 7:12 PM, Samuel Just wrote:=20 > >> >>> >>>=20 > >> >>> >>> Are you having network hiccups? There was a bug noticed re= cently that=20 > >> >>> >>> could cause a memory leak if nodes are being marked up and= down.=20 > >> >>> >>> -Sam=20 > >> >>> >>>=20 > >> >>> >>> On Mon, Dec 17, 2012 at 12:28 AM, S=C3=A9bastien Han wrote:=20 > >> >>> >>> > Hi guys,=20 > >> >>> >>> >=20 > >> >>> >>> > Today looking at my graphs I noticed that one over 4 cep= h nodes used a=20 > >> >>> >>> > lot of memory. It keeps growing and growing.=20 > >> >>> >>> > See the graph attached to this mail.=20 > >> >>> >>> > I run 0.48.2 on Ubuntu 12.04.=20 > >> >>> >>> >=20 > >> >>> >>> > The other nodes also grow, but slowly than the first one= =2E=20 > >> >>> >>> >=20 > >> >>> >>> > I'm not quite sure about the information that I have to = provide. So=20 > >> >>> >>> > let me know. The only thing I can say is that the load h= aven't=20 > >> >>> >>> > increase that much this week. It seems to be consuming a= nd not giving=20 > >> >>> >>> > back the memory.=20 > >> >>> >>> >=20 > >> >>> >>> > Thank you in advance.=20 > >> >>> >>> >=20 > >> >>> >>> > --=20 > >> >>> >>> > Regards,=20 > >> >>> >>> > S=C3=A9bastien Han.=20 > >> >>> >>=20 > >> >>> >>=20 > --=20 > To unsubscribe from this list: send the line "unsubscribe ceph-devel"= in=20 > the body of a message to majordomo@vger.kernel.org=20 > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" i= n the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html