From: David McBride <dwm37@cam.ac.uk>
To: Ceph-devel <ceph-devel@vger.kernel.org>
Subject: Ceph daemon memory utilization: 'heap release' drops use by 50%
Date: Mon, 14 Apr 2014 13:28:55 +0100 [thread overview]
Message-ID: <534BD487.5090507@cam.ac.uk> (raw)
Hello,
I'm currently experimenting with a Ceph deployment, and am noting that
some of my machines are having processes killed by the OOM killer,
despite provisioning 32GB for a 12 OSD machine.
(This tended to correlate with reshaping the cluster, which is not
surprising given that OSD memory utilization is documented to spike when
recovery operations are in progress.)
While the recently-added zRAM kernel facility appears to be helping
somewhat in stretching the available resources, I've been reviewing the
heap utilization statistics displayed via `ceph tell osd.$i heap stats`.
On a representative process, I see:
> osd.0tcmalloc heap stats:------------------------------------------------
> MALLOC: 593850280 ( 566.3 MiB) Bytes in use by application
> MALLOC: + 1621073920 ( 1546.0 MiB) Bytes in page heap freelist
> MALLOC: + 117159712 ( 111.7 MiB) Bytes in central cache freelist
> MALLOC: + 2987008 ( 2.8 MiB) Bytes in transfer cache freelist
> MALLOC: + 84780344 ( 80.9 MiB) Bytes in thread cache freelists
> MALLOC: + 13119640 ( 12.5 MiB) Bytes in malloc metadata
> MALLOC: ------------
> MALLOC: = 2432970904 ( 2320.3 MiB) Actual memory used (physical + swap)
> MALLOC: + 44449792 ( 42.4 MiB) Bytes released to OS (aka unmapped)
> MALLOC: ------------
> MALLOC: = 2477420696 ( 2362.7 MiB) Virtual address space used
> MALLOC:
> MALLOC: 60887 Spans in use
> MALLOC: 775 Thread heaps in use
> MALLOC: 8192 Tcmalloc page size
> ------------------------------------------------
I noticed there's a huge amount of memory — 1.5GB — on the main
freelist. As an experiment, I ran `ceph tell osd.$i heap release`, and
the amount of memory in use dropped substantially:
> osd.0tcmalloc heap stats:------------------------------------------------
> MALLOC: 581434648 ( 554.5 MiB) Bytes in use by application
> MALLOC: + 11509760 ( 11.0 MiB) Bytes in page heap freelist
> MALLOC: + 105904144 ( 101.0 MiB) Bytes in central cache freelist
> MALLOC: + 2070848 ( 2.0 MiB) Bytes in transfer cache freelist
> MALLOC: + 97882520 ( 93.3 MiB) Bytes in thread cache freelists
> MALLOC: + 13119640 ( 12.5 MiB) Bytes in malloc metadata
> MALLOC: ------------
> MALLOC: = 811921560 ( 774.3 MiB) Actual memory used (physical + swap)
> MALLOC: + 1665499136 ( 1588.3 MiB) Bytes released to OS (aka unmapped)
> MALLOC: ------------
> MALLOC: = 2477420696 ( 2362.7 MiB) Virtual address space used
> MALLOC:
> MALLOC: 60733 Spans in use
> MALLOC: 803 Thread heaps in use
> MALLOC: 8192 Tcmalloc page size
> ------------------------------------------------
This was consistent across all 12 OSDs; running this command on all the
OSDs on a machine dropped memory utilization by ~15GB, or ~50% of the
amount of RAM in my machine.
Is this expected behaviour? Would it be prudent to treat this as the
amount of memory the Ceph OSDs genuinely requires at peak demand?
(If so, that indicates that I need to be looking to increase the spec of
my storage nodes...)
I see similar results on my MON nodes. Before a release:
> mon.ceph-sm000tcmalloc heap stats:------------------------------------------------
> MALLOC: 599497240 ( 571.7 MiB) Bytes in use by application
> MALLOC: + 806297600 ( 768.9 MiB) Bytes in page heap freelist
> MALLOC: + 32448368 ( 30.9 MiB) Bytes in central cache freelist
> MALLOC: + 1684080 ( 1.6 MiB) Bytes in transfer cache freelist
> MALLOC: + 23270408 ( 22.2 MiB) Bytes in thread cache freelists
> MALLOC: + 5091480 ( 4.9 MiB) Bytes in malloc metadata
> MALLOC: ------------
> MALLOC: = 1468289176 ( 1400.3 MiB) Actual memory used (physical + swap)
> MALLOC: + 30859264 ( 29.4 MiB) Bytes released to OS (aka unmapped)
> MALLOC: ------------
> MALLOC: = 1499148440 ( 1429.7 MiB) Virtual address space used
> MALLOC:
> MALLOC: 18309 Spans in use
> MALLOC: 122 Thread heaps in use
> MALLOC: 8192 Tcmalloc page size
> ------------------------------------------------
After:
> mon.ceph-sm000tcmalloc heap stats:------------------------------------------------
> MALLOC: 600108520 ( 572.3 MiB) Bytes in use by application
> MALLOC: + 17342464 ( 16.5 MiB) Bytes in page heap freelist
> MALLOC: + 32392208 ( 30.9 MiB) Bytes in central cache freelist
> MALLOC: + 964240 ( 0.9 MiB) Bytes in transfer cache freelist
> MALLOC: + 23402360 ( 22.3 MiB) Bytes in thread cache freelists
> MALLOC: + 5091480 ( 4.9 MiB) Bytes in malloc metadata
> MALLOC: ------------
> MALLOC: = 679301272 ( 647.8 MiB) Actual memory used (physical + swap)
> MALLOC: + 819847168 ( 781.9 MiB) Bytes released to OS (aka unmapped)
> MALLOC: ------------
> MALLOC: = 1499148440 ( 1429.7 MiB) Virtual address space used
> MALLOC:
> MALLOC: 16396 Spans in use
> MALLOC: 122 Thread heaps in use
> MALLOC: 8192 Tcmalloc page size
> ------------------------------------------------
The tcmalloc documentation suggests that memory should be gradually
being returned to the operating system:
http://gperftools.googlecode.com/svn/trunk/doc/tcmalloc.html#runtime
Given these OSDs were largely idle over the weekend prior to running
this experiment, it seems clear that this process is not operating as
designed.
I've looked through the environment of my running processes and the Ceph
source, and can see no reference to TCMALLOC_RELEASE_RATE or
SetMemoryReleaseRate().
I'm currently running an experiment whereby I define
"env TCMALLOC_RELEASE_RATE=10" in
/etc/init/ceph-{osd,mon}.conf.override; I'll see if this has any impact
on memory usage over time.
(I suspect that my current Ceph cluster placement-group count is
excessive; with 144 OSDs, I'm running with about a dozen pools, each of
which with ~8000 PGs. It's not clear how the guidelines for PG-sizing
should be adjusted for multiple-pool configurations; at some point I'll
see what effect wiping my cluster and using a much smaller per-pool PG
count has.)
Cheers,
David
--
David McBride <dwm37@cam.ac.uk>
Unix Specialist, University Information Services
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next reply other threads:[~2014-04-14 12:49 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-04-14 12:28 David McBride [this message]
2014-04-14 13:53 ` Ceph daemon memory utilization: 'heap release' drops use by 50% Gregory Farnum
2014-04-14 14:04 ` David McBride
2014-04-14 14:10 ` Gregory Farnum
2014-04-14 14:14 ` David McBride
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=534BD487.5090507@cam.ac.uk \
--to=dwm37@cam.ac.uk \
--cc=ceph-devel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.