From: Jeff Wu <cpwu@tnsoft.com.cn>
To: Gregory Farnum <gregf@hq.newdream.net>
Cc: "ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>,
Andrew Lv <mllv@tnsoft.com.cn>
Subject: Re: Performence test on ceph v0.23 + EXT4 and Btrfs
Date: Wed, 1 Dec 2010 14:59:45 +0800 [thread overview]
Message-ID: <1291186785.1809.91.camel@cephhost> (raw)
In-Reply-To: <1291167344.1809.60.camel@cephhost>
在 2010-12-01三的 09:35 +0800,Jeff Wu写道:
>
> 在 2010-12-01三的 01:07 +0800,Gregory Farnum写道:
> > On Mon, Nov 29, 2010 at 10:19 PM, Jeff Wu <cpwu@tnsoft.com.cn> wrote:
> > > Is "40-50MB/s" the speed that it run bench at local btrfs disk ?
> > > not the speed that run bench from client to osd server ?
> > > with this speed ,run bench from client to osd server ,will which get
> > > about 20~25MB/s( 40~50MB /2 )speed ?
> > Data on Ceph is replicated across 2 OSDs (by default; this is
> > configurable). So while figuring out potential performance involves a
> > lot of variables, in a simple case like this where you aren't bounded
> > by network bandwidth you'll find that your read/write performance
> > simply tracks the slower disk. I'd expect your Ceph tests (at least
> > the streaming ones) to run at 40-50MB/s.
>
> Hi Greg,thank you very much for your quickly reply.
> >
> > Given that everything else is okay, I cannot stress enough that
> > running without a journal is going to cause significant performance
> > degradations. I have a hard time believing that it's responsible for
> > 13-second latencies, but it's possible. So how about you set up a
> > journal (it can just be a file or new partition on the drives you're
> > already using) and report back your results after you do that. :)
>
> I will add journal to ceph.conf to try it .
>
>
Hi ,greg,
With your suggestions, i add the journal config:
"
osd data = /opt/ceph/data/osd$id
osd journal = /home/transoft/data/osd$id/journal
filestore journal writeahead = true
osd journal size = 10000
"
to ceph.conf. the detail ceph.conf attached below.
then , run six times for the commad: "$ sudo ceph osd tell 0/1
bench" ,get the results:
$ sudo ceph -w
osd0 172.16.10.42:6800/17347 1 : [INF] bench: wrote 1024 MB in blocks of
4096 KB in 29.818194 sec at 28201 KB/sec
osd0 172.16.10.42:6800/17347 2 : [INF] bench: wrote 1024 MB in blocks of
4096 KB in 30.013058 sec at 34801 KB/sec
osd0 172.16.10.42:6800/17347 3 : [INF] bench: wrote 1024 MB in blocks of
4096 KB in 30.463511 sec at 30274 KB/sec
osd1 172.16.10.65:6800/4845 1 : [INF] bench: wrote 1024 MB in blocks of
4096 KB in 165.067603 sec at 6329 KB/sec
osd1 172.16.10.65:6800/4845 2 : [INF] bench: wrote 1024 MB in blocks of
4096 KB in 181.034333 sec at 5782 KB/sec
osd1 172.16.10.65:6800/4845 3 : [INF] bench: wrote 1024 MB in blocks of
4096 KB in 196.055812 sec at 5334 KB/sec
and i also use "dd" to test raw drive, get the logs:
1. OSD0, mkfs.btrfs format /opt
$ sudo dd if=/dev/zero of=/opt/dd.img bs=2M count=1024
1024+0 records in
1024+0 records out
2147483648 bytes transfered in 21.4497 secs(100 MB/sec)
2. OSD1 ,mkfs. btrfs format /opt
~$ sudo dd if=/dev/zero of=/opt/dd.img bs=2M count=1024
1024+0 records in
1024+0 records out
2147483648 bytes transfered in 48.2037 secs(44.6 MB/sec)
with these logs, OSD1 disk speed might limit the test performance.
and i also detect a issue ,take the following steps:
$. mckephfs -c ceph.conf -v --mkbtrfs -a
$ init-ceph - ceph.conf --btrfs -v -a start
then execute:
$ init-ceph - ceph.conf --btrfs -v -a stop
this command can't stop OSD0 and OSD1 cosd process:
OSD0:
/usr/local/bin/cosd -i 0 -c ceph.conf
OSD1:
/usr/local/bin/cosd -i 1 -c ceph.conf
then , i create the folder "/var/run/ceph" at OSD0 and OSD1 host
manually.
execute:
$ init-ceph - ceph.conf --btrfs -v -a stop
this command can stop OSD0 and OSD1 cosd process:
/usr/local/bin/cosd -i 0 -c ceph.conf
/usr/local/bin/cosd -i 1 -c ceph.conf
Thanks,
Jeff.Wu
>
> > Adding a journal to the OSDs lets them turn all their random writes
> > into streaming ones.
> > -Greg
>
=========================================================
transoft@ubuntu-mon0:/usr/local/etc/ceph$ sudo ceph osd tell 0 bench
2010-12-01 10:45:13.670910 mon <- [osd,tell,0,bench]
2010-12-01 10:45:13.671180 mon1 -> 'ok' (0)
transoft@ubuntu-mon0:/usr/local/etc/ceph$ sudo ceph osd tell 0 bench
2010-12-01 10:45:29.350198 mon <- [osd,tell,0,bench]
2010-12-01 10:45:29.350457 mon1 -> 'ok' (0)
transoft@ubuntu-mon0:/usr/local/etc/ceph$ sudo ceph osd tell 0 bench
2010-12-01 10:45:31.000281 mon <- [osd,tell,0,bench]
2010-12-01 10:45:31.000560 mon0 -> 'ok' (0)
transoft@ubuntu-mon0:/usr/local/etc/ceph$ sudo ceph osd tell 1 bench
2010-12-01 10:45:34.860782 mon <- [osd,tell,1,bench]
2010-12-01 10:45:34.861020 mon1 -> 'ok' (0)
transoft@ubuntu-mon0:/usr/local/etc/ceph$ sudo ceph osd tell 1 bench
2010-12-01 10:45:36.760811 mon <- [osd,tell,1,bench]
2010-12-01 10:45:36.761161 mon2 -> 'ok' (0)
transoft@ubuntu-mon0:/usr/local/etc/ceph$ sudo ceph osd tell 1 bench
2010-12-01 10:45:37.530714 mon <- [osd,tell,1,bench]
2010-12-01 10:45:37.530968 mon2 -> 'ok' (0)
transoft@ubuntu-mon0:/usr/local/etc/ceph$ sudo ceph -w
2010-12-01 10:44:59.450653 pg v13: 528 pgs: 528 active+clean; 12 KB
data, 5304 KB used, 219 GB / 219 GB avail
2010-12-01 10:44:59.451365 mds e5: 1/1/1 up {0=up:active}, 1
up:standby
2010-12-01 10:44:59.451387 osd e6: 2 osds: 2 up, 2 in
2010-12-01 10:44:59.451412 log 2010-12-01 10:43:43.044865 mon0
172.16.10.171:6789/0 7 : [INF] mds0 172.16.10.171:6801/2482 up:active
2010-12-01 10:44:59.451440 mon e1: 3 mons at
{0=172.16.10.171:6789/0,1=172.16.10.171:6790/0,2=172.16.10.171:6791/0}
2010-12-01 10:46:45.000262 log 2010-12-01 10:45:15.599526 osd0
172.16.10.42:6800/17347 1 : [INF] bench: wrote 1024 MB in blocks of 4096
KB in 29.818194 sec at 28201 KB/sec
2010-12-01 10:46:45.000262 log 2010-12-01 10:45:46.062142 osd0
172.16.10.42:6800/17347 2 : [INF] bench: wrote 1024 MB in blocks of 4096
KB in 30.013058 sec at 34801 KB/sec
2010-12-01 10:46:45.000262 log 2010-12-01 10:46:16.836607 osd0
172.16.10.42:6800/17347 3 : [INF] bench: wrote 1024 MB in blocks of 4096
KB in 30.463511 sec at 30274 KB/sec
2010-12-01 10:48:20.042152 pg v14: 528 pgs: 528 active+clean; 32780
KB data, 888 MB used, 218 GB / 219 GB avail
2010-12-01 10:50:50.038298 pg v15: 528 pgs: 528 active+clean; 73740
KB data, 54928 KB used, 219 GB / 219 GB avail
2010-12-01 10:52:15.074470 pg v16: 528 pgs: 528 active+clean; 73740
KB data, 79440 KB used, 219 GB / 219 GB avail
2010-12-01 10:54:55.546098 log 2010-12-01 11:52:34.244851 osd1
172.16.10.65:6800/4845 1 : [INF] bench: wrote 1024 MB in blocks of 4096
KB in 165.067603 sec at 6329 KB/sec
2010-12-01 10:54:55.546098 log 2010-12-01 11:55:52.010739 osd1
172.16.10.65:6800/4845 2 : [INF] bench: wrote 1024 MB in blocks of 4096
KB in 181.034333 sec at 5782 KB/sec
2010-12-01 10:54:55.546098 log 2010-12-01 11:59:09.560115 osd1
172.16.10.65:6800/4845 3 : [INF] bench: wrote 1024 MB in blocks of 4096
KB in 196.055812 sec at 5334 KB/sec
2010-12-01 10:55:01.001357 pg v17: 528 pgs: 528 active+clean; 73741
KB data, 1106 MB used, 218 GB / 219 GB avail
============ceph.conf====================
;
; Sample ceph ceph.conf file.
;
; This file defines cluster membership, the various locations
; that Ceph stores data, and any other runtime options.
; If a 'host' is defined for a daemon, the start/stop script will
; verify that it matches the hostname (or else ignore it). If it is
; not defined, it is assumed that the daemon is intended to start on
; the current host (e.g., in a setup with a startup.conf on each
; node).
; global
[global]
; enable secure authentication
; auth supported = cephx
keyring = /etc/ceph/keyring.bin
; monitors
; You need at least one. You need at least three if you want to
; tolerate any node failures. Always create an odd number.
[mon]
mon data = /opt/ceph/data/mon$id
;mon data = /home/transoft/data/mon$id
; logging, for debugging monitor crashes, in order of
; their likelihood of being helpful :)
;debug ms = 20
;debug mon = 20
;debug paxos = 20
;debug auth = 20
[mon0]
host = ubuntu-mon0
mon addr = 172.16.10.171:6789
[mon1]
host = ubuntu-mon0
mon addr = 172.16.10.171:6790
[mon2]
host = ubuntu-mon0
mon addr = 172.16.10.171:6791
; mds
; You need at least one. Define two to get a standby.
[mds]
; where the mds keeps it's secret encryption keys
keyring = /etc/ceph/keyring.$name
; mds logging to debug issues.
;debug ms = 20
;debug mds = 20
[mds.0]
host = ubuntu-mon0
[mds.1]
host = ubuntu-mon0
; osd
; You need at least one. Two if you want data to be replicated.
; Define as many as you like.
[osd]
; This is where the btrfs volume will be mounted.
;osd data = /opt/ceph/data/osd$id
osd class tmp = /var/lib/ceph/tmp
; Ideally, make this a separate disk or partition. A few
; hundred MB should be enough; more if you have fast or many
; disks. You can use a file under the osd data dir if need be
; (e.g. /data/osd$id/journal), but it will be slower than a
; separate disk or partition.
; This is an example of a file-based journal.
;osd journal = /home/transoft/data/osd$id/journal
;filestore journal writeahead = true
; journal size, in megabytes
;osd journal size = 1000
keyring = /etc/ceph/keyring.$name
; osd logging to debug osd issues, in order of likelihood of being
; helpful
;debug ms = 20
;debug osd = 20
;debug filestore = 20
;debug journal = 20
[osd0]
host = ubuntu-osd0
osd data = /opt/ceph/data/osd$id
osd journal = /home/transoft/data/osd$id/journal
filestore journal writeahead = true
osd journal size = 10000
; if 'btrfs devs' is not specified, you're responsible for
; setting up the 'osd data' dir. if it is not btrfs, things
; will behave up until you try to recover from a crash (which
; usually fine for basic testing).
; btrfs devs = /dev/sdx
[osd1]
host = ubuntu-osd1
osd data = /opt/ceph/data/osd$id
osd journal = /home/transoft/data/osd$id/journal
filestore journal writeahead = true
osd journal size = 10000
;btrfs devs = /dev/sdy
;[osd2]
;host = zeta
;btrfs devs = /dev/sdx
;[osd3]
;host = eta
;btrfs devs = /dev/sdy
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2010-12-01 6:58 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1291001135.1872.106.camel@cephhost>
[not found] ` <1291001398.1872.107.camel@cephhost>
[not found] ` <1291001962.1872.113.camel@cephhost>
[not found] ` <1291002250.1872.116.camel@cephhost>
2010-11-29 3:53 ` Performence test on ceph v0.23 + EXT4 and Btrfs Jeff Wu
2010-11-29 17:07 ` Gregory Farnum
2010-11-30 2:55 ` Jeff Wu
2010-11-30 3:18 ` Gregory Farnum
2010-11-30 6:19 ` Jeff Wu
2010-11-30 17:07 ` Gregory Farnum
2010-12-01 1:35 ` Jeff Wu
2010-12-01 6:59 ` Jeff Wu [this message]
2010-12-01 16:05 ` Gregory Farnum
2010-12-02 1:38 ` Jeff Wu
2010-12-02 2:35 ` Gregory Farnum
2010-12-02 3:22 ` Jeff Wu
2010-12-02 6:10 ` Sage Weil
2010-12-02 7:31 ` Jeff Wu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1291186785.1809.91.camel@cephhost \
--to=cpwu@tnsoft.com.cn \
--cc=ceph-devel@vger.kernel.org \
--cc=gregf@hq.newdream.net \
--cc=mllv@tnsoft.com.cn \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.