All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matt Benjamin <mbenjamin@redhat.com>
To: Shinobu Kinjo <skinjo@redhat.com>
Cc: Alexandre DERUMIER <aderumier@odiso.com>,
	Stephen L Blinick <stephen.l.blinick@intel.com>,
	Somnath Roy <Somnath.Roy@sandisk.com>,
	Mark Nelson <mnelson@redhat.com>,
	ceph-devel <ceph-devel@vger.kernel.org>
Subject: Re: Ceph Hackathon: More Memory Allocator Testing
Date: Thu, 20 Aug 2015 10:46:24 -0400 (EDT)	[thread overview]
Message-ID: <1505100477.6819228.1440081984457.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <1388436889.6755045.1440075299003.JavaMail.zimbra@redhat.com>

Jemalloc 4.0 seems to have some shiny new capabilities, at least.

Matt

-- 
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-761-4689
fax.  734-769-8938
cel.  734-216-5309

----- Original Message -----
> From: "Shinobu Kinjo" <skinjo@redhat.com>
> To: "Alexandre DERUMIER" <aderumier@odiso.com>
> Cc: "Stephen L Blinick" <stephen.l.blinick@intel.com>, "Somnath Roy" <Somnath.Roy@sandisk.com>, "Mark Nelson"
> <mnelson@redhat.com>, "ceph-devel" <ceph-devel@vger.kernel.org>
> Sent: Thursday, August 20, 2015 8:54:59 AM
> Subject: Re: Ceph Hackathon: More Memory Allocator Testing
> 
> Thank you for that result.
> So it might make sense to know difference between jemalloc and jemalloc 4.0.
> 
>  Shinobu
> 
> ----- Original Message -----
> From: "Alexandre DERUMIER" <aderumier@odiso.com>
> To: "Shinobu Kinjo" <skinjo@redhat.com>
> Cc: "Stephen L Blinick" <stephen.l.blinick@intel.com>, "Somnath Roy"
> <Somnath.Roy@sandisk.com>, "Mark Nelson" <mnelson@redhat.com>, "ceph-devel"
> <ceph-devel@vger.kernel.org>
> Sent: Thursday, August 20, 2015 5:17:46 PM
> Subject: Re: Ceph Hackathon: More Memory Allocator Testing
> 
> memory results of osd daemon under load,
> 
> jemalloc use always more memory than tcmalloc,
> jemalloc 4.0 seem to reduce memory usage but still a little bit more than
> tcmalloc
> 
> 
> 
> osd_op_threads=2 : tcmalloc 2.1
> ------------------------------------------
> root      38066  2.3  0.7 1223088 505144 ?      Ssl  08:35   1:32
> /usr/bin/ceph-osd --cluster=ceph -i 4 -f
> root      38165  2.4  0.7 1247828 525356 ?      Ssl  08:35   1:34
> /usr/bin/ceph-osd --cluster=ceph -i 5 -f
> 
> 
> osd_op_threads=32: tcmalloc 2.1
> ------------------------------------------
> 
> root      39002  102  0.7 1455928 488584 ?      Ssl  09:41   0:30
> /usr/bin/ceph-osd --cluster=ceph -i 4 -f
> root      39168  114  0.7 1483752 518368 ?      Ssl  09:41   0:30
> /usr/bin/ceph-osd --cluster=ceph -i 5 -f
> 
> 
> osd_op_threads=2 jemalloc 3.5
> -----------------------------
> root      18402 72.0  1.1 1642000 769000 ?      Ssl  09:43   0:17
> /usr/bin/ceph-osd --cluster=ceph -i 0 -f
> root      18434 89.1  1.2 1677444 797508 ?      Ssl  09:43   0:21
> /usr/bin/ceph-osd --cluster=ceph -i 1 -f
> 
> 
> osd_op_threads=32 jemalloc 3.5
> -----------------------------
> root      17204  3.7  1.2 2030616 816520 ?      Ssl  08:35   2:31
> /usr/bin/ceph-osd --cluster=ceph -i 0 -f
> root      17228  4.6  1.2 2064928 830060 ?      Ssl  08:35   3:05
> /usr/bin/ceph-osd --cluster=ceph -i 1 -f
> 
> 
> osd_op_threads=2 jemalloc 4.0
> -----------------------------
> root      19967  113  1.1 1432520 737988 ?      Ssl  10:04   0:31
> /usr/bin/ceph-osd --cluster=ceph -i 1 -f
> root      19976 93.6  1.0 1409376 711192 ?      Ssl  10:04   0:26
> /usr/bin/ceph-osd --cluster=ceph -i 0 -f
> 
> 
> osd_op_threads=32 jemalloc 4.0
> -----------------------------
> root      20484  128  1.1 1689176 778508 ?      Ssl  10:06   0:26
> /usr/bin/ceph-osd --cluster=ceph -i 0 -f
> root      20502  170  1.2 1720524 810668 ?      Ssl  10:06   0:35
> /usr/bin/ceph-osd --cluster=ceph -i 1 -f
> 
> 
> 
> ----- Mail original -----
> De: "aderumier" <aderumier@odiso.com>
> À: "Shinobu Kinjo" <skinjo@redhat.com>
> Cc: "Stephen L Blinick" <stephen.l.blinick@intel.com>, "Somnath Roy"
> <Somnath.Roy@sandisk.com>, "Mark Nelson" <mnelson@redhat.com>, "ceph-devel"
> <ceph-devel@vger.kernel.org>
> Envoyé: Jeudi 20 Août 2015 07:29:22
> Objet: Re: Ceph Hackathon: More Memory Allocator Testing
> 
> Hi,
> 
> jemmaloc 4.0 has been released 2 days agos
> 
> https://github.com/jemalloc/jemalloc/releases
> 
> I'm curious to see performance/memory usage improvement :)
> 
> 
> ----- Mail original -----
> De: "Shinobu Kinjo" <skinjo@redhat.com>
> À: "Stephen L Blinick" <stephen.l.blinick@intel.com>
> Cc: "aderumier" <aderumier@odiso.com>, "Somnath Roy"
> <Somnath.Roy@sandisk.com>, "Mark Nelson" <mnelson@redhat.com>, "ceph-devel"
> <ceph-devel@vger.kernel.org>
> Envoyé: Jeudi 20 Août 2015 04:00:15
> Objet: Re: Ceph Hackathon: More Memory Allocator Testing
> 
> How about making any sheet for testing patter?
> 
> Shinobu
> 
> ----- Original Message -----
> From: "Stephen L Blinick" <stephen.l.blinick@intel.com>
> To: "Alexandre DERUMIER" <aderumier@odiso.com>, "Somnath Roy"
> <Somnath.Roy@sandisk.com>
> Cc: "Mark Nelson" <mnelson@redhat.com>, "ceph-devel"
> <ceph-devel@vger.kernel.org>
> Sent: Thursday, August 20, 2015 10:09:36 AM
> Subject: RE: Ceph Hackathon: More Memory Allocator Testing
> 
> Would it make more sense to try this comparison while changing the size of
> the worker thread pool? i.e. changing "osd_op_num_threads_per_shard" and
> "osd_op_num_shards" (default is currently 2 and 5 respectively, for a total
> of 10 worker threads).
> 
> Thanks,
> 
> Stephen
> 
> 
> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org
> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Alexandre DERUMIER
> Sent: Wednesday, August 19, 2015 11:47 AM
> To: Somnath Roy
> Cc: Mark Nelson; ceph-devel
> Subject: Re: Ceph Hackathon: More Memory Allocator Testing
> 
> Just have done a small test with jemalloc, change osd_op_threads value, and
> check the memory just after daemon restart.
> 
> osd_op_threads = 2 (default)
> 
> 
> USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
> root 10246 6.0 0.3 1086656 245760 ? Ssl 20:36 0:01 /usr/bin/ceph-osd
> --cluster=ceph -i 0 -f
> 
> osd_op_threads = 32
> 
> USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
> root 10736 19.5 0.4 1474672 307412 ? Ssl 20:37 0:01 /usr/bin/ceph-osd
> --cluster=ceph -i 0 -f
> 
> 
> 
> I'll try to compare with tcmalloc tommorow and under load.
> 
> 
> 
> ----- Mail original -----
> De: "Somnath Roy" <Somnath.Roy@sandisk.com>
> À: "aderumier" <aderumier@odiso.com>
> Cc: "Mark Nelson" <mnelson@redhat.com>, "ceph-devel"
> <ceph-devel@vger.kernel.org>
> Envoyé: Mercredi 19 Août 2015 19:29:56
> Objet: RE: Ceph Hackathon: More Memory Allocator Testing
> 
> Yes, it should be 1 per OSD...
> There is no doubt that TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES is relative to
> the number of threads running..
> But, I don't know if number of threads is a factor for jemalloc..
> 
> Thanks & Regards
> Somnath
> 
> -----Original Message-----
> From: Alexandre DERUMIER [mailto:aderumier@odiso.com]
> Sent: Wednesday, August 19, 2015 9:55 AM
> To: Somnath Roy
> Cc: Mark Nelson; ceph-devel
> Subject: Re: Ceph Hackathon: More Memory Allocator Testing
> 
> << I think that tcmalloc have a fixed size
> (TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES), and share it between all process.
> 
> >>I think it is per tcmalloc instance loaded , so, at least with num_osds *
> >>num_tcmalloc_instance * TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES in a box.
> 
> What is num_tcmalloc_instance ? I think 1 osd process use a defined
> TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES size ?
> 
> I'm saying that, because I have exactly the same bug, client side, with
> librbd + tcmalloc + qemu + iothreads.
> When I defined too much iothread threads, I'm hitting the bug directly. (can
> reproduce 100%).
> Like the thread_cache size is divide by number of threads?
> 
> 
> 
> 
> 
> 
> ----- Mail original -----
> De: "Somnath Roy" <Somnath.Roy@sandisk.com>
> À: "aderumier" <aderumier@odiso.com>, "Mark Nelson" <mnelson@redhat.com>
> Cc: "ceph-devel" <ceph-devel@vger.kernel.org>
> Envoyé: Mercredi 19 Août 2015 18:27:30
> Objet: RE: Ceph Hackathon: More Memory Allocator Testing
> 
> << I think that tcmalloc have a fixed size
> (TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES), and share it between all process.
> 
> I think it is per tcmalloc instance loaded , so, at least with num_osds *
> num_tcmalloc_instance * TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES in a box.
> 
> Also, I think there is no point of increasing osd_op_threads as it is not in
> IO path anymore..Mark is using default 5:2 for shard:thread per shard..
> 
> But, yes, it could be related to number of threads OSDs are using, need to
> understand how jemalloc works..Also, there may be some tuning to reduce
> memory usage (?).
> 
> Thanks & Regards
> Somnath
> 
> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org
> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Alexandre DERUMIER
> Sent: Wednesday, August 19, 2015 9:06 AM
> To: Mark Nelson
> Cc: ceph-devel
> Subject: Re: Ceph Hackathon: More Memory Allocator Testing
> 
> I was listening at the today meeting,
> 
> and seem that the blocker to have jemalloc as default,
> 
> is that it's used more memory by osd (around 300MB?), and some guys could
> have boxes with 60disks.
> 
> 
> I just wonder if the memory increase is related to
> osd_op_num_shards/osd_op_threads value ?
> 
> Seem that as hackaton, the bench has been done on super big cpus boxed
> 36cores/72T, http://ceph.com/hackathon/2015-08-ceph-hammer-full-ssd.pptx
> with osd_op_threads = 32.
> 
> I think that tcmalloc have a fixed size
> (TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES), and share it between all process.
> 
> Maybe jemalloc allocated memory by threads.
> 
> 
> 
> (I think guys with 60disks box, dont use ssd, so low iops by osd, and they
> don't need a lot of threads by osd)
> 
> 
> 
> ----- Mail original -----
> De: "aderumier" <aderumier@odiso.com>
> À: "Mark Nelson" <mnelson@redhat.com>
> Cc: "ceph-devel" <ceph-devel@vger.kernel.org>
> Envoyé: Mercredi 19 Août 2015 16:01:28
> Objet: Re: Ceph Hackathon: More Memory Allocator Testing
> 
> Thanks Marc,
> 
> Results are matching exactly what I have seen with tcmalloc 2.1 vs 2.4 vs
> jemalloc.
> 
> and indeed tcmalloc, even with bigger cache, seem decrease over time.
> 
> 
> What is funny, is that I see exactly same behaviour client librbd side, with
> qemu and multiple iothreads.
> 
> 
> Switching both server and client to jemalloc give me best performance on
> small read currently.
> 
> 
> 
> 
> 
> 
> ----- Mail original -----
> De: "Mark Nelson" <mnelson@redhat.com>
> À: "ceph-devel" <ceph-devel@vger.kernel.org>
> Envoyé: Mercredi 19 Août 2015 06:45:36
> Objet: Ceph Hackathon: More Memory Allocator Testing
> 
> Hi Everyone,
> 
> One of the goals at the Ceph Hackathon last week was to examine how to
> improve Ceph Small IO performance. Jian Zhang presented findings showing a
> dramatic improvement in small random IO performance when Ceph is used with
> jemalloc. His results build upon Sandisk's original findings that the
> default thread cache values are a major bottleneck in TCMalloc 2.1. To
> further verify these results, we sat down at the Hackathon and configured
> the new performance test cluster that Intel generously donated to the Ceph
> community laboratory to run through a variety of tests with different memory
> allocator configurations. I've since written the results of those tests up
> in pdf form for folks who are interested.
> 
> The results are located here:
> 
> http://nhm.ceph.com/hackathon/Ceph_Hackathon_Memory_Allocator_Testing.pdf
> 
> I want to be clear that many other folks have done the heavy lifting here.
> These results are simply a validation of the many tests that other folks
> have already done. Many thanks to Sandisk and others for figuring this out
> as it's a pretty big deal!
> 
> Side note: Very little tuning other than swapping the memory allocator and a
> couple of quick and dirty ceph tunables were set during these tests. It's
> quite possible that higher IOPS will be achieved as we really start digging
> into the cluster and learning what the bottlenecks are.
> 
> Thanks,
> Mark
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the
> body of a message to majordomo@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the
> body of a message to majordomo@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the
> body of a message to majordomo@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html
> 
> ________________________________
> 
> PLEASE NOTE: The information contained in this electronic mail message is
> intended only for the use of the designated recipient(s) named above. If the
> reader of this message is not the intended recipient, you are hereby
> notified that you have received this message in error and that any review,
> dissemination, distribution, or copying of this message is strictly
> prohibited. If you have received this communication in error, please notify
> the sender by telephone or e-mail (as shown above) immediately and destroy
> any and all copies of this message in your possession (whether hard copies
> or electronically stored copies).
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the
> body of a message to majordomo@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html
> N�����r��y���b�X��ǧv�^�)޺{.n�+���z�]z���{ay�ʇڙ�,j��f���h���z��w������j:+v���w�j�m��������zZ+��ݢj"��
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2015-08-20 14:46 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-08-19  4:45 Ceph Hackathon: More Memory Allocator Testing Mark Nelson
2015-08-19  5:13 ` Shinobu Kinjo
2015-08-19  5:36 ` Somnath Roy
2015-08-19  8:07   ` Haomai Wang
2015-08-19  9:06     ` Shinobu Kinjo
2015-08-19 12:17     ` Mark Nelson
2015-08-19 12:36       ` Dałek, Piotr
2015-08-19 12:44         ` Mark Nelson
2015-08-19 12:47           ` Dałek, Piotr
2015-08-19 12:10   ` Mark Nelson
2015-08-19  6:33 ` Stefan Priebe - Profihost AG
2015-08-19 12:20   ` Mark Nelson
2015-08-19 14:01 ` Alexandre DERUMIER
2015-08-19 16:05   ` Alexandre DERUMIER
2015-08-19 16:27     ` Somnath Roy
2015-08-19 16:55       ` Alexandre DERUMIER
2015-08-19 16:57         ` Blinick, Stephen L
2015-08-20  6:35           ` Dałek, Piotr
2015-08-20  7:08             ` Haomai Wang
2015-08-20  7:18               ` Dałek, Piotr
2015-08-19 17:29         ` Somnath Roy
2015-08-19 18:20           ` Allen Samuels
2015-08-19 18:36             ` Mark Nelson
2015-08-19 18:47               ` Łukasz Redynk
2015-08-20  6:25             ` Dałek, Piotr
2015-08-19 18:47           ` Alexandre DERUMIER
2015-08-20  1:09             ` Blinick, Stephen L
2015-08-20  2:00               ` Shinobu Kinjo
2015-08-20  5:29                 ` Alexandre DERUMIER
2015-08-20  8:17                   ` Alexandre DERUMIER
2015-08-20 12:54                     ` Shinobu Kinjo
2015-08-20 14:46                       ` Matt Benjamin [this message]
2015-08-19 20:16   ` Somnath Roy
2015-08-19 20:17     ` Stefan Priebe
2015-08-19 20:29       ` Somnath Roy
2015-08-19 20:31         ` Stefan Priebe
2015-08-19 20:34           ` Somnath Roy
2015-08-19 20:40             ` Stefan Priebe
2015-08-19 20:44               ` Somnath Roy
2015-08-21  3:45                 ` Shishir Gowda
2015-08-21  4:22                 ` Shishir Gowda
2015-08-21 14:26                   ` Milosz Tanski
2015-08-21 19:07                     ` Robert LeBlanc
2015-08-22 13:52                       ` Sage Weil
2015-08-22 13:55                     ` Sage Weil
2015-08-22 16:15                       ` Somnath Roy
2015-08-22 16:57                         ` Alexandre DERUMIER
2015-08-22 17:03                           ` Somnath Roy
2015-08-23 13:12                             ` Alexandre DERUMIER
2015-08-23 16:38                               ` Somnath Roy
2015-09-03  9:13                             ` Shinobu Kinjo
2015-09-03 13:06                               ` Daniel Gryniewicz
2015-09-03 13:12                                 ` Matt Benjamin
2015-08-24 17:01                       ` Robert LeBlanc
2015-08-19 20:50 ` Zhang, Jian

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1505100477.6819228.1440081984457.JavaMail.zimbra@redhat.com \
    --to=mbenjamin@redhat.com \
    --cc=Somnath.Roy@sandisk.com \
    --cc=aderumier@odiso.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=mnelson@redhat.com \
    --cc=skinjo@redhat.com \
    --cc=stephen.l.blinick@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.