From: Dieter Kasper <d.kasper@kabelmail.de>
To: Alexandre DERUMIER <aderumier@odiso.com>
Cc: "ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>,
Andreas Bluemle <andreas.bluemle@itxperts.de>
Subject: Re: RBD performance - tuning hints
Date: Thu, 30 Aug 2012 18:48:59 +0200 [thread overview]
Message-ID: <20120830164859.GE32184@oder.kd-bie.de> (raw)
In-Reply-To: <5831c8ed-f98d-45f9-8837-178cee74fa9f@mailpro>
[-- Attachment #1: Type: text/plain, Size: 10043 bytes --]
On Thu, Aug 30, 2012 at 06:12:11PM +0200, Alexandre DERUMIER wrote:
> >>well, you have to compare
> >>- pure a SSD (via PCIe or SAS-6G) vs.
> >>- Ceph-Journal, which goes 2x over 10GbE with IP
> >> Client -> primary-copy -> 2nd-copy
> >> (= redundancy over Ethernet distance)
>
> Sure but the first osd ack to the client,before replicating to the others osd.
no
>
> Client -> primary-copy -> 2nd-copy
> <-ack
> primary-copy -> 2nd-copy
> -> 3st-copy
>
> Or I'm wrong ?
yes,
please have a look at the attached file: ceph-replication-acks.png
The client usually will continue on 'ACK' and not wait for the 'commit'.
BTW. all my journals are in RAM (/dev/ramX)
32x 2GB = 32GB of data with replica 2x
If "filestore min/max sync interval" is set to 99999999
data should 'never' be written to OSD
('never' at least during the tests if the written data is < 32GB)
In such a configuration only the Ceph-Code and the Interconnect (10GbE/IP) would be the brakeman.
Cheers,
-Dieter
>
>
> ----- Mail original -----
>
> De: "Dieter Kasper" <d.kasper@kabelmail.de>
> À: "Alexandre DERUMIER" <aderumier@odiso.com>
> Cc: ceph-devel@vger.kernel.org, "Andreas Bluemle" <andreas.bluemle@itxperts.de>
> Envoyé: Jeudi 30 Août 2012 18:02:05
> Objet: Re: RBD performance - tuning hints
>
> On Thu, Aug 30, 2012 at 05:46:35PM +0200, Alexandre DERUMIER wrote:
> > Thanks
> >
> > >> 8x SSD, 200GB each
> >
> > 20000 iops seem pretty low,no ?
> well, you have to compare
> - pure a SSD (via PCIe or SAS-6G) vs.
> - Ceph-Journal, which goes 2x over 10GbE with IP
> Client -> primary-copy -> 2nd-copy
> (= redundancy over Ethernet distance)
>
> I'm curious about the answer from Inktank,
>
> -Dieter
>
> >
> >
> > for @intank:
> >
> > Is their a bottleneck somewhere in ceph ?
> Maybe "SimpleMessenger dispatching: cause of performance problems?"
> from Thu, 16 Aug 2012 18:08:39 +0200
> by <andreas.bluemle@itxperts.de>
> can be an answer.
> Especially if a small number of OSDs is used.
>
> >
> > I said that, because I would like to know if it's scale by adding new nodes.
> >
> > Does Intank have already done some random iops benchmark ? (I always see sequential throughput bench in the mailing list)
> >
> >
> > ----- Mail original -----
> >
> > De: "Dieter Kasper" <d.kasper@kabelmail.de>
> > À: "Alexandre DERUMIER" <aderumier@odiso.com>
> > Cc: ceph-devel@vger.kernel.org
> > Envoyé: Jeudi 30 Août 2012 17:33:42
> > Objet: Re: RBD performance - tuning hints
> >
> > On Thu, Aug 30, 2012 at 05:28:02PM +0200, Alexandre DERUMIER wrote:
> > > Thanks for the report !
> > >
> > > vs your first benchmark, it's with RBD 4M or 64K ?
> > with 4MB (see attached config info)
> >
> > Cheers,
> > -Dieter
> >
> > >
> > > (how much ssd by node?)
> > 8x SSD, 200GB each
> >
> > >
> > >
> > >
> > > ----- Mail original -----
> > >
> > > De: "Dieter Kasper" <d.kasper@kabelmail.de>
> > > À: "Alexandre DERUMIER" <aderumier@odiso.com>
> > > Cc: ceph-devel@vger.kernel.org
> > > Envoyé: Jeudi 30 Août 2012 16:56:34
> > > Objet: Re: RBD performance - tuning hints
> > >
> > > Hi Alexandre,
> > >
> > > with the 4 filestore parameter below some fio values could be increased:
> > > filestore max sync interval = 30
> > > filestore min sync interval = 29
> > > filestore flusher = false
> > > filestore queue max ops = 10000
> > >
> > > ###### IOPS
> > > fio_read_4k_64: 9373
> > > fio_read_4k_128: 9939
> > > fio_randwrite_8k_16: 12376
> > > fio_randwrite_4k_16: 13315
> > > fio_randwrite_512_32: 13660
> > > fio_randwrite_8k_32: 17318
> > > fio_randwrite_4k_32: 18057
> > > fio_randwrite_8k_64: 19693
> > > fio_randwrite_512_64: 20015 <<<
> > > fio_randwrite_4k_64: 20024 <<<
> > > fio_randwrite_8k_128: 20547 <<<
> > > fio_randwrite_4k_128: 20839 <<<
> > > fio_randwrite_512_128: 21417 <<<
> > > fio_randread_8k_128: 48872
> > > fio_randread_4k_128: 50002
> > > fio_randread_512_128: 51202
> > >
> > > ###### MB/s
> > > fio_randread_2m_32: 628
> > > fio_read_4m_64: 630
> > > fio_randread_8m_32: 633
> > > fio_read_2m_32: 637
> > > fio_read_4m_16: 640
> > > fio_randread_4m_16: 652
> > > fio_write_2m_32: 660
> > > fio_randread_4m_32: 677
> > > fio_read_4m_32: 678
> > > (...)
> > > fio_write_4m_64: 771
> > > fio_randwrite_2m_64: 789
> > > fio_write_8m_128: 796
> > > fio_write_4m_32: 802
> > > fio_randwrite_4m_128: 807 <<<
> > > fio_randwrite_2m_32: 811 <<<
> > > fio_write_2m_128: 833 <<<
> > > fio_write_8m_64: 901 <<<
> > >
> > > Best Regards,
> > > -Dieter
> > >
> > >
> > > On Wed, Aug 29, 2012 at 10:50:12AM +0200, Alexandre DERUMIER wrote:
> > > > Nice results !
> > > > (can you make same benchmark from a qemu-kvm guest with virtio-driver ?
> > > > I have made some bench some month ago with stephan priebe, and we never be able to have more than 20000iops, with a full ssd 3nodes cluster)
> > > >
> > > > >>How can I set the variables when the Journal data have go to the OSD ? (after X seconds and/or when Y %-full)
> > > > I think you can try to tune these values
> > > >
> > > > filestore max sync interval = 30
> > > > filestore min sync interval = 29
> > > > filestore flusher = false
> > > > filestore queue max ops = 10000
> > > >
> > > >
> > > >
> > > > ----- Mail original -----
> > > >
> > > > De: "Dieter Kasper" <d.kasper@kabelmail.de>
> > > > À: ceph-devel@vger.kernel.org
> > > > Cc: "Dieter Kasper (KD)" <d.kasper@kabelmail.de>
> > > > Envoyé: Mardi 28 Août 2012 19:48:42
> > > > Objet: RBD performance - tuning hints
> > > >
> > > > Hi,
> > > >
> > > > on my 4-node system (SSD + 10GbE, see bench-config.txt for details)
> > > > I can observe a pretty nice rados bench performance
> > > > (see bench-rados.txt for details):
> > > >
> > > > Bandwidth (MB/sec): 961.710
> > > > Max bandwidth (MB/sec): 1040
> > > > Min bandwidth (MB/sec): 772
> > > >
> > > >
> > > > Also the bandwidth performance generated with
> > > > fio --filename=/dev/rbd1 --direct=1 --rw=$io --bs=$bs --size=2G --iodepth=$threads --ioengine=libaio --runtime=60 --group_reporting --name=file1 --output=fio_${io}_${bs}_${threads}
> > > >
> > > > .... is acceptable, e.g.
> > > > fio_write_4m_16 795 MB/s
> > > > fio_randwrite_8m_128 717 MB/s
> > > > fio_randwrite_8m_16 714 MB/s
> > > > fio_randwrite_2m_32 692 MB/s
> > > >
> > > >
> > > > But, the write IOPS seems to be limited around 19k ...
> > > > RBD 4M 64k (= optimal_io_size)
> > > > fio_randread_512_128 53286 55925
> > > > fio_randread_4k_128 51110 44382
> > > > fio_randread_8k_128 30854 29938
> > > > fio_randwrite_512_128 18888 2386
> > > > fio_randwrite_512_64 18844 2582
> > > > fio_randwrite_8k_64 17350 2445
> > > > (...)
> > > > fio_read_4k_128 10073 53151
> > > > fio_read_4k_64 9500 39757
> > > > fio_read_4k_32 9220 23650
> > > > (...)
> > > > fio_read_4k_16 9122 14322
> > > > fio_write_4k_128 2190 14306
> > > > fio_read_8k_32 706 13894
> > > > fio_write_4k_64 2197 12297
> > > > fio_write_8k_64 3563 11705
> > > > fio_write_8k_128 3444 11219
> > > >
> > > >
> > > > Any hints for tuning the IOPS (read and/or write) would be appreciated.
> > > >
> > > > How can I set the variables when the Journal data have go to the OSD ? (after X seconds and/or when Y %-full)
> > > >
> > > >
> > > > Kind Regards,
> > > > -Dieter
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > --
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > Alexandre D e rumier
> > > >
> > > > Ingénieur Systèmes et Réseaux
> > > >
> > > >
> > > > Fixe : 03 20 68 88 85
> > > >
> > > > Fax : 03 20 68 90 88
> > > >
> > > >
> > > > 45 Bvd du Général Leclerc 59100 Roubaix
> > > > 12 rue Marivaux 75002 Paris
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > > > the body of a message to majordomo@vger.kernel.org
> > > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > >
> > >
> > >
> > >
> > > --
> > >
> > > --
> > >
> > >
> > >
> > >
> > >
> > > Alexandre D e rumier
> > >
> > > Ingénieur Systèmes et Réseaux
> > >
> > >
> > > Fixe : 03 20 68 88 85
> > >
> > > Fax : 03 20 68 90 88
> > >
> > >
> > > 45 Bvd du Général Leclerc 59100 Roubaix
> > > 12 rue Marivaux 75002 Paris
> > >
> >
> >
> >
> > --
> >
> > --
> >
> >
> >
> >
> >
> > Alexandre D e rumier
> >
> > Ingénieur Systèmes et Réseaux
> >
> >
> > Fixe : 03 20 68 88 85
> >
> > Fax : 03 20 68 90 88
> >
> >
> > 45 Bvd du Général Leclerc 59100 Roubaix
> > 12 rue Marivaux 75002 Paris
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
>
>
> --
>
> --
>
>
>
>
>
> Alexandre D e rumier
>
> Ingénieur Systèmes et Réseaux
>
>
> Fixe : 03 20 68 88 85
>
> Fax : 03 20 68 90 88
>
>
> 45 Bvd du Général Leclerc 59100 Roubaix
> 12 rue Marivaux 75002 Paris
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Principal Consultant, Data Center Storage Architecture and Technology
FTS CTO
FUJITSU TECHNOLOGY SOLUTIONS GMBH
Mies-van-der-Rohe-Straße 8 / 4F
80807 München
Germany
Telephone: +49 89 62060 1898
Telefax: +49 89 62060 329 1898
Mobile: +49 170 8563173
Email: dieter.kasper@ts.fujitsu.com
Internet: http://ts.fujitsu.com
Company Details: http://ts.fujitsu.com/imprint.html
[-- Attachment #2: ceph-replication-acks.png --]
[-- Type: image/png, Size: 18144 bytes --]
next prev parent reply other threads:[~2012-08-30 16:50 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-07-20 10:24 Ceph write performance George Shuklin
[not found] ` <20120720104150.GA16630@oder.kd-bie.de>
2012-07-20 10:48 ` George Shuklin
2012-07-20 11:49 ` Mark Nelson
2012-07-20 20:36 ` Ceph write performance on RAM-DISK Dieter Kasper
2012-07-20 21:28 ` Mark Nelson
2012-07-20 15:53 ` Ceph write performance Matthew Richardson
2012-07-20 16:37 ` Gregory Farnum
2012-08-28 17:48 ` RBD performance - tuning hints Dieter Kasper
2012-08-28 18:53 ` Smart Weblications GmbH - Florian Wiessner
2012-08-28 19:04 ` Dieter Kasper
2012-08-29 8:50 ` Alexandre DERUMIER
2012-08-29 17:37 ` Josh Durgin
2012-08-29 19:29 ` RBD performance - tuning hints / parameter doc Dieter Kasper
2012-08-29 22:34 ` Samuel Just
2012-08-30 15:08 ` Dieter Kasper
2012-08-30 20:39 ` Samuel Just
2012-08-30 14:56 ` RBD performance - tuning hints Dieter Kasper
2012-08-30 15:28 ` Alexandre DERUMIER
2012-08-30 15:33 ` Dieter Kasper
2012-08-30 15:46 ` Alexandre DERUMIER
2012-08-30 16:02 ` Dieter Kasper
2012-08-30 16:12 ` Alexandre DERUMIER
2012-08-30 16:16 ` Josh Durgin
2012-08-31 7:46 ` Alexandre DERUMIER
2012-08-31 8:11 ` Dietmar Maurer
2012-08-31 8:48 ` Mark Kirkwood
2012-08-31 9:49 ` RBD performance - tuning hints / major slowdown effect(s) Dieter Kasper
2012-08-31 10:16 ` Mark Kirkwood
2012-08-31 10:58 ` RBD performance - tuning hints Jerker Nyberg
2012-08-30 16:48 ` Dieter Kasper [this message]
2012-08-30 18:10 ` Gregory Farnum
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120830164859.GE32184@oder.kd-bie.de \
--to=d.kasper@kabelmail.de \
--cc=aderumier@odiso.com \
--cc=andreas.bluemle@itxperts.de \
--cc=ceph-devel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.