All of lore.kernel.org
 help / color / mirror / Atom feed
* Memory Allocators and Ceph
@ 2015-05-27 17:40 Robert LeBlanc
       [not found] ` <CAANLjFpErC4xbwgJgZGWFdMaWQ1Q4otBksyRqP0jfWKnqVacog-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Robert LeBlanc @ 2015-05-27 17:40 UTC (permalink / raw)
  To: ceph-users@lists.ceph.com, ceph-devel

[-- Attachment #1: Type: text/plain, Size: 3276 bytes --]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

With all the talk of tcmalloc and jemalloc, I decided to do some
testing og the different memory allocating technologies between KVM
and Ceph. These tests were done a pre-production system so I've tried
to remove some the variance with many runs and averages. The details
are as follows:

Ceph v0.94.1 (I backported a branch from master to get full jemalloc
support for part of the tests)
tcmalloc v2.4-3
jemalloc v3.6.0-1
QEMU v0.12.1.2-2 (I understand the latest version for RH6/CentOS6)
OSDs are only spindles with SSD journals, no SSD tiering

The 11 Ceph nodes are:
CentOS 7.1
Linux 3.18.9
1 x Intel E5-2640
64 GB RAM
40 Gb Intel NIC bonded with LACP using jumbo frames
10 x Toshiba MG03ACA400 4 TB 7200 RPM drives
2 x Intel SSDSC2BB240G4 240GB SSD
1 x 32 GB SATADOM for OS

The KVM node is:
CentOS 6.6
Linux 3.12.39
QEMU v0.12.1.2-2 cache mode none

The VM is:
CentOS 6.6
Linux 2.6.32-504
fio v2.1.10

On average preloading Ceph with either tcmalloc or jemalloc showed an
increase of performance of about 30% with most performance gains for
smaller I/O. Although preloading QEMU with jemalloc provided about a
6% increase on a lightly loaded server, it did not add or subtract a
noticeable performance difference combined with Ceph using either
tcmalloc or jemalloc.

Compiling Ceph entirely with jemalloc overall had a negative
performance impact. This may be due to dynamically linking to RocksDB
instead of the default static linking.

Preloading QEMU with tcmalloc in all cases overall showed very
negative results, however it showed the most improvement of any tests
in the 1MB tests up to almost 2.5x performance of the baseline. If
your workload is guaranteed to be of 1MB I/O (and possibly larger),
then this option may be useful.

Based on the architecture of jemalloc, it is possible that with it
loaded on the QEMU host may provide more benefit on servers that are
closer to memory capacity, but I did not test this scenario.

Any feedback regarding this exercise is welcome.

Data: https://docs.google.com/a/leblancnet.us/spreadsheets/d/1n12IqAOuH2wH-A7Sq5boU8kSEYg_Pl20sPmM0idjj00/edit?usp=sharing
Test script is multitest. The real world test is based off of the disk
stats of about 100 of our servers which have uptimes of many months.

- - ----------------
Robert LeBlanc
GPG Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
-----BEGIN PGP SIGNATURE-----
Version: Mailvelope v0.13.1
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJVZgGRCRDmVDuy+mK58QAAM20QAJh0rR0NIQABCkMjiluP
f/mcIiy4MQfFd5RJ9/ZlMRDQ0KDwW7haRm58QE0S/l6ZZ3+z7MqsQOW8KHJE
Y75YjEdsl7zrLLcB4wNnUKJXZrPwzFReTtLbXsNB8h73tbzaLp3y9711gbNf
EQQujiSp5XDiOK+d+H0FVGp4AfVmFvlO5gjQMSUcUt58qN6BsnD8NbRLEvKf
S2WzvJjFO7g1HqWr5QssKGb+1rvze2Z2xByURU8yKVpdX59EIhfzPdgadp/n
AJGR2pXWGgW2CQ3ce7gN7cr32cjjWbmzpdr0djgVB5/Y1ERU8FvwNFIwFa6N
eFUKCohW5UjMw8CcO9CzUQtQxgKnqeHcyVe6Loamd2eZ+epIupFLI3lQF6NU
GSdBV/8Ale1SJuhShY6QnEJFav8nLTvNvlDF/NiBoSUMtnsl5fDTpLH3KA2w
o8sT2dcDEJEc9+kzUrugUBElinjOacFcINU3osYZJ0NNi4t1PDtPTUiWChvT
jZdpWVGVpxZ3w46csACJZxY0lP/Kd6JoSH+78q7wNivCHeHT7c3uy8KGbKA7
fecFaHBAsCYliX1tDN/abZFVhEvdb8AuTGqGkZ7xHj0PAUyddObYGjkStVUw
dGOH+nurnFZ5Qqct/gvcbxggbOTGunHLGwtALT5EAtTB1ThlfpVQImy5vKl0
aOER
=YTTi
-----END PGP SIGNATURE-----

[-- Attachment #2: multitest --]
[-- Type: application/octet-stream, Size: 13894 bytes --]

#!/usr/bin/perl
#########################################################
#  multitest, by Marcus Sorensen, BetterServers Inc     #
#  modified by David Collins and Robert LeBlanc, EIG    #
#  Licensed under the Open Software License version 3.0 #
#  http://opensource.org/licenses/OSL-3.0               #
#########################################################
use strict;
use Data::Dumper;

$| = 1;
my $colors = { red => "\e[1;31m", def => "\e[0m", green => "\e[1;32m", cyan => "\e[1;36m" };
my $restbetweentests = 15;
my $testtime = 300;   #seconds
my $testsize = "12500MB";
my $testjobs = 8;
my $testiodepth = 8;
my $testname = "multiiotester";
my %final_out;

unless ( `which fio 2>/dev/null`) {
  print "No executable 'fio' found in path, exiting\n";
  exit;
}

print <<EOF;
$colors->{red}
Multiple IO Tester$colors->{def}

  This application emulates a busy server in several states by launching multiple
threads that do various types of IO. This allows us to see what the consequences
are of running in a multitasking environment. This test uses direct IO and 
invalidates caches between tests, testing the disk, not the memory.

$colors->{red}NOTE:$colors->{def} You need at least 100GB of free space in your current working directory.

The following tests currently consist of:

  8 sequential readers
  8 sequential writers
  8 mixed seqential readers/writers (random choice per IO)
  8 random readers
  8 random writers
  8 mixed random readers/writers (random choice per IO)
  A real work simulation of varied read/write requests of various sizes weighted to smaller I/O and 65% read 35% write.

Feel free to modify the script to meet your needs. Enjoy!

The test should take less than 3 hours. Press <ENTER> to begin...
EOF
<STDIN>;

my $tests = { 'read-1024k'      => { 'order' => 1, 
                               'block' => '1024k', 
                               'output' => { 'multiiotester'=>'4', '2'=>'5', '3'=>'6' },
                               'name' => 'sequential read' }, 
              'write-1024k'     => { 'order' => 2, 
                               'block' => '1024k', 
                               'output' => { 'multiiotester'=>'20', '2'=>'25', '3'=>'47' },
                               'name' => 'sequential write' }, 
              'rw-1024k'        => { 'order' => 3, 
                               'block' => '1024k', 
                               'output' => { 'multiiotester'=>'4,20', '2'=>'5,25', '3'=>'6,47' },
                               'name' => 'seq read/seq write' },

              'read-256k'      => { 'order' => 1,
                               'block' => '256k',
                               'output' => { 'multiiotester'=>'4', '2'=>'5', '3'=>'6' },
                               'name' => 'sequential read' },
              'write-256k'     => { 'order' => 2,
                               'block' => '256k',
                               'output' => { 'multiiotester'=>'20', '2'=>'25', '3'=>'47' },
                               'name' => 'sequential write' },
              'rw-256k'        => { 'order' => 3,
                               'block' => '256k',
                               'output' => { 'multiiotester'=>'4,20', '2'=>'5,25', '3'=>'6,47' },
                               'name' => 'seq read/seq write' },


              'read-64k'      => { 'order' => 1,
                               'block' => '64k',
                               'output' => { 'multiiotester'=>'4', '2'=>'5', '3'=>'6' },
                               'name' => 'sequential read' },
              'write-64k'     => { 'order' => 2,
                               'block' => '64k',
                               'output' => { 'multiiotester'=>'20', '2'=>'25', '3'=>'47' },
                               'name' => 'sequential write' },
              'rw-64k'        => { 'order' => 3,
                               'block' => '64k',
                               'output' => { 'multiiotester'=>'4,20', '2'=>'5,25', '3'=>'6,47' },
                               'name' => 'seq read/seq write' },



              'read-16k'      => { 'order' => 1,
                               'block' => '16k',
                               'output' => { 'multiiotester'=>'4', '2'=>'5', '3'=>'6' },
                               'name' => 'sequential read' },
              'write-16k'     => { 'order' => 2,
                               'block' => '16k',
                               'output' => { 'multiiotester'=>'20', '2'=>'25', '3'=>'47' },
                               'name' => 'sequential write' },
              'rw-16k'        => { 'order' => 3,
                               'block' => '16k',
                               'output' => { 'multiiotester'=>'4,20', '2'=>'5,25', '3'=>'6,47' },
                               'name' => 'seq read/seq write' },



              'read-4k'      => { 'order' => 1,
                               'block' => '4k',
                               'output' => { 'multiiotester'=>'4', '2'=>'5', '3'=>'6' },
                               'name' => 'sequential read' },
              'write-4k'     => { 'order' => 2,
                               'block' => '4k',
                               'output' => { 'multiiotester'=>'20', '2'=>'25', '3'=>'47' },
                               'name' => 'sequential write' },
              'rw-4k'        => { 'order' => 3,
                               'block' => '4k',
                               'output' => { 'multiiotester'=>'4,20', '2'=>'5,25', '3'=>'6,47' },
                               'name' => 'seq read/seq write' },




              'randread-4k'  => { 'order' => 4, 
                               'block' => '4k', 
                               'output' => { 'multiiotester'=>'4', '2'=>'5', '3'=>'6' },
                               'name' => 'random read' }, 
              'randwrite-4k' => { 'order' => 5, 
                               'block' => '4k', 
                               'output' => { 'multiiotester'=>'20', '2'=>'25', '3'=>'47' },
                               'name' => 'random write' } , 
              'randrw-4k'    => { 'order' => 6, 
                               'block' => '4k', 
                               'output' => { 'multiiotester'=>'4,20', '2'=>'5,25', '3'=>'6,47' },
                               'name' => 'rand read/rand write' },


              'randread-16k'  => { 'order' => 4, 
                               'block' => '16k', 
                               'output' => { 'multiiotester'=>'4', '2'=>'5', '3'=>'6' },
                               'name' => 'random read' }, 
              'randwrite-16k' => { 'order' => 5, 
                               'block' => '16k', 
                               'output' => { 'multiiotester'=>'20', '2'=>'25', '3'=>'47' },
                               'name' => 'random write' } , 
              'randrw-16k'    => { 'order' => 6, 
                               'block' => '16k', 
                               'output' => { 'multiiotester'=>'4,20', '2'=>'5,25', '3'=>'6,47' },
                               'name' => 'rand read/rand write' },


              'randread-64k'  => { 'order' => 4, 
                               'block' => '64k', 
                               'output' => { 'multiiotester'=>'4', '2'=>'5', '3'=>'6' },
                               'name' => 'random read' }, 
              'randwrite-64k' => { 'order' => 5, 
                               'block' => '64k', 
                               'output' => { 'multiiotester'=>'20', '2'=>'25', '3'=>'47' },
                               'name' => 'random write' } , 
              'randrw-64k'    => { 'order' => 6, 
                               'block' => '64k', 
                               'output' => { 'multiiotester'=>'4,20', '2'=>'5,25', '3'=>'6,47' },
                               'name' => 'rand read/rand write' },



              'randread-256k'  => { 'order' => 4,
                               'block' => '256k',
                               'output' => { 'multiiotester'=>'4', '2'=>'5', '3'=>'6' },
                               'name' => 'random read' },
              'randwrite-256k' => { 'order' => 5,
                               'block' => '256k',
                               'output' => { 'multiiotester'=>'20', '2'=>'25', '3'=>'47' },
                               'name' => 'random write' } ,
              'randrw-256k'    => { 'order' => 6,
                               'block' => '256k',
                               'output' => { 'multiiotester'=>'4,20', '2'=>'5,25', '3'=>'6,47' },
                               'name' => 'rand read/rand write' },



              'randread-1024k'  => { 'order' => 4,
                               'block' => '1024k',
                               'output' => { 'multiiotester'=>'4', '2'=>'5', '3'=>'6' },
                               'name' => 'random read' },
              'randwrite-1024k' => { 'order' => 5,
                               'block' => '1024k',
                               'output' => { 'multiiotester'=>'20', '2'=>'25', '3'=>'47' },
                               'name' => 'random write' } ,
              'randrw-1024k'    => { 'order' => 6,
                               'block' => '1024k',
                               'output' => { 'multiiotester'=>'4,20', '2'=>'5,25', '3'=>'6,47' },
                               'name' => 'rand read/rand write' },



            };

mkdir('./multiiotester') if ! -d './multiiotester';
chdir('./multiiotester') or die "unable to chdir to test directory: $^E";


foreach my $t ( sort{$tests->{$a}->{order} cmp $tests->{$b}->{order}} keys %{$tests} ) {
  print "$colors->{cyan} running IO \"$tests->{$t}->{name} ($t)\" test... $colors->{def}\n";


	# Enable 'next' for testing
	if ( $t !~ /^read\-\d{2}k/ ) {
		#next;
	}

	my $testtype = $t;
	$testtype =~ s/\-.+//;

  my $cmd = "fio --direct=1 --invalidate=1 --ioengine=libaio --iodepth=$testiodepth --thread --time_based --runtime=$testtime --rw=$testtype --bs=$tests->{$t}->{block} --size=$testsize --numjobs=$testjobs --name=$testname --minimal | grep ';'";
  my @output = `$cmd`;
  $output[0] =~ /^(.*?);/;
  my $version = $1;
  my $data;
  my $iop_data;
  
  foreach my $d (@output){
    next unless $d =~ /;/;
    my $field = $tests->{$t}->{output}->{$version};
    my @items = split(";",$d);
    if ($field =~ /(\d+),(\d+)/) {
      $data .= "$items[$1];$items[$2]\n";
			$iop_data .= "$items[$1+1];$items[$2+1]\n";
    } else {
      $data .= "$items[$field]\n";
      $iop_data .= "$items[$field+1]\n";
    }
  }

  my @results = split(/;/,combinejobs($data));
	my @iops = split(/;/,combinejobs($iop_data));

	print "\tresult is $colors->{green}" . join("$colors->{def}/$colors->{green}", map { convert($_) } @results) . "$colors->{def} per second\n";
  print "\tequals $colors->{green}" . join("$colors->{def}/$colors->{green}", @iops) . "$colors->{def} IOs per second\n\n";
  $final_out{$t}{'iops'} = \@iops;
  $final_out{$t}{'rate'} = \@results;


  sleep $restbetweentests;
}

print "$colors->{cyan} running IO \"Real World Test (real)\" test... $colors->{def}\n";

my $cmd = "fio --name $testname --rw randrw --bssplit 4k/85:32k/11:512/3:1m/1,4k/89:32k/10:512k/1 --ioengine libaio --iodepth $testiodepth --numjobs $testjobs --direct 1 --rwmixread 72 --norandommap --minimal --size=$testsize --runtime=$testtime --time_based --thread | grep ';'";
my @output = `$cmd`;
$output[0] =~ /^(.*?);/;
my $version = $1;
my $data;
my $iop_data;

foreach my $d (@output){
  next unless $d =~ /;/;
  my $field = '6,47';
  my $iop_field = '7,48';
  my @items = split(";",$d);
  if ($field =~ /(\d+),(\d+)/) {
    $data .= "$items[$1];$items[$2]\n";
  } else {
    $data .= "$items[$field]\n";
  }
  if ($iop_field =~ /(\d+),(\d+)/) {
    $iop_data .= "$items[$1];$items[$2]\n";
  } else {
    $iop_data .= "$items[$field]\n";
  }
}

my @results = split(/;/,combinejobs($data));
my @iops = split(/;/,combinejobs($iop_data));

print "\tresult is $colors->{green}" . join("$colors->{def}/$colors->{green}", map { convert($_) } @results) . "$colors->{def} per second\n";
print "\tequals $colors->{green}" . join("$colors->{def}/$colors->{green}", @iops) . "$colors->{def} IOs per second\n\n";
$final_out{'real'}{'iops'} = \@iops;
$final_out{'real'}{'rate'} = \@results;


#print "cleaning up files..\n";

#unlink glob "multiiotester*";
#chdir("..");
#rmdir("multiiotester") or print "unable to delete directory 'multiiotester'\n";

###########################
####### subroutines #######
###########################

sub convert {
  my $val = shift;
  my @units = ('KB','MB','GB');
  my $i = 0;

  $val =~ /^\d+/;
  while (length($&) > 3 ) {
    $val = sprintf("%.2f",$val / 1024);
    $i++;
    $val =~ /^\d+/;
  }
  return $val . $units[$i];
}

#sub toiops {
#   my $val = shift;
#   my $blocksize = shift;
# 
#   $blocksize =~ s/k//;
#   my $io = sprintf("%.1f",$val/$blocksize);
#  
#   return $io;
# }

sub combinejobs {
  my $input = shift;
  
  my @lines = split(/\n/,$input);
  my @output = ();

  foreach my $l (0..$#lines) {
    my @temp = split(/;/,$lines[$l]);
    foreach my $t (0..$#temp){
      $output[$t] += $temp[$t];
    }
  }

  return join(";",@output);
}


print "\n\n\n#########################################\n\n";
my $header;
my $csv;
for my $test ( sort keys %final_out ) {
	if ( scalar @{$final_out{$test}{'rate'}} == 1 ) {
		$header .= "$test rate,$test IOPs,";
		$csv .= "@{$final_out{$test}{'rate'}}[0],@{$final_out{$test}{'iops'}}[0],"
	} else {
		$header .= "$test read rate,$test read IOPs,$test write rate,$test write IOPs,";
		$csv .= "@{$final_out{$test}{'rate'}}[0],@{$final_out{$test}{'iops'}}[0],@{$final_out{$test}{'rate'}}[1],@{$final_out{$test}{'iops'}}[1],";
	}
}
chop($header);
chop($csv);
print "$header\n";
print "$csv\n";

print "\n\n#########################################\n\n";

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-05-28 15:54 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-05-27 17:40 Memory Allocators and Ceph Robert LeBlanc
     [not found] ` <CAANLjFpErC4xbwgJgZGWFdMaWQ1Q4otBksyRqP0jfWKnqVacog-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-05-27 17:59   ` Haomai Wang
     [not found]     ` <CACJqLyZS5pVB8ULCc7CNemtd1qRhkfz_mvOS0RRdbiHFbiQn6A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-05-27 18:12       ` Robert LeBlanc
2015-05-27 20:06   ` Mark Nelson
     [not found]     ` <556623AB.9030804-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-05-27 21:00       ` Robert LeBlanc
     [not found]         ` <CAANLjFr=f=o4_2admJ9rxdxrB5XBcDy8i2mYzVtEYP_mFZb_Aw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-05-27 21:48           ` Mark Nelson
     [not found]             ` <55663BB0.7090500-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-05-28 15:54               ` Robert LeBlanc

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.