All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dieter Kasper <d.kasper@kabelmail.de>
To: Samuel Just <sam.just@inktank.com>
Cc: Josh Durgin <josh.durgin@inktank.com>,
	Alexandre DERUMIER <aderumier@odiso.com>,
	"ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>
Subject: Re: RBD performance - tuning hints / parameter doc
Date: Thu, 30 Aug 2012 17:08:38 +0200	[thread overview]
Message-ID: <20120830150838.GB32184@oder.kd-bie.de> (raw)
In-Reply-To: <CA+4uBUZNBaQLfvTtVHC9b_Rz3rkQXPb0ggANHHX3iSpc8VJfTQ@mail.gmail.com>

Samuel,

thank you very much for this explicitely description!

As far as I understand the journal acts as a ringbuffer in front of the OSD.
Using time as a parameter to trigger sync might not be the best for 
a dynamic Storage subsystem. On a high workload e.g. 10/20 for min/max 
might be optimal for for 4 nodes with 10 OSDs each, 
but not after adding 4 additional nodes.

Are there parameters to trigger the syncs to OSD
in relation to the fill grade of the journal ?
e.g.
filestore [min|max] sync percent:

Do not sync before min-% full; sync after max-% full

What would happen if I set "filestore [min|max] sync interval" to 999999 ?
Will the journal sync start at 100% full or at X% ?
What is 'X' by defaut ?
How can I set 'X' ?

Best Regards,
-Dieter


On Thu, Aug 30, 2012 at 12:34:43AM +0200, Samuel Just wrote:
> filestore [min|max] sync interval:
> 
> Periodically, the filestore needs to quiesce writes and do a syncfs in
> order to create
> a consistent commit point up to which it can free journal entries.  Syncing more
> frequently tends to reduce the time required to do the sync, and
> reduces the amount
> of data that needs to remain in the journal.  Less frequent syncs
> would allow the
> backing filesystem to better coalesce small writes and metadata
> updates hopefully
> resulting in more efficient syncs.  'filestore max sync interval'
> defines the maximum
> time period between syncs, 'filestore min sync interval' defines the
> minimum time
> period between syncs.
> 
> filestore flusher:
> 
> The filestore flusher forces data from large writes to be written out
> using sync_file_range
> before the sync in order to (hopefully) reduce the cost of the
> eventual sync.  In practice,
> disabling 'filestore flusher' seems to improve performance in some cases.
> 
> filestore queue max ops:
> 
> 'filestore queue max ops' defines the number of in progress ops the
> filestore will accept
> before blocking on queueing new ones.  This mostly shouldn't have much
> of an effect
> on performance and should probably be ignored.
> 
> filestore op threads:
> 
> 'filestore op threads' defines the number of threads used to submit
> filesystem operations
> in parallel.
> 
> journal dio:
> 
> 'journal dio' enables using O_DIRECT for writing to the journal.  This
> should usually
> be enabled.  If possible, 'journal aio' should also be enabled to
> allow use of libaio
> to do asynchronous writes.
> 
> osd op threads:
> 
> 'osd op threads' defines the size of the thread pool used to service
> OSD operations
> such as client requests.  Increasing this may increase the rate of
> request processing.
> 
> osd disk threads:
> 
> 'osd disk threads' defines the number of threads used to perform background disk
> intensive osd operations such as scrubbing and snap trimming.
> 
> On Wed, Aug 29, 2012 at 12:29 PM, Dieter Kasper <d.kasper@kabelmail.de> wrote:
> > Hi Josh,
> >
> > thanks for the hint.
> > Can you please spend a view words about the meaing of these parameters ?
> > - filestore min/max sync interval =     int/float ?     seconds ? of what ?
> > - filestore flusher = false
> > - filestore queue max ops = 10000
> >         what is 'one op' ?      queue in front of what ?
> > - filestore op threads =
> >         what are useful values here ?
> >
> > - journal dio = true/false
> > - osd op threads =
> > - osd disk threads =
> >
> >
> > Kind Regards,
> > -Dieter
> >
> >
> > On Wed, Aug 29, 2012 at 07:37:36PM +0200, Josh Durgin wrote:
> >> On 08/29/2012 01:50 AM, Alexandre DERUMIER wrote:
> >> > Nice results !
> >> > (can you make same benchmark from a qemu-kvm guest with virtio-driver ?
> >> > I have made some bench some month ago with stephan priebe, and we never be able to have more than 20000iops, with a full ssd 3nodes cluster)
> >> >
> >> >>> How can I set the variables when the Journal data have go to the OSD ? (after X seconds and/or when Y %-full)
> >> > I think you can try to tune these values
> >> >
> >> > filestore max sync interval = 30
> >> > filestore min sync interval = 29
> >> > filestore flusher = false
> >> > filestore queue max ops = 10000
> >>
> >> Increasing filestore_op_threads might help as well.
> >>
> >> > ----- Mail original -----
> >> >
> >> > De: "Dieter Kasper" <d.kasper@kabelmail.de>
> >> > À: ceph-devel@vger.kernel.org
> >> > Cc: "Dieter Kasper (KD)" <d.kasper@kabelmail.de>
> >> > Envoyé: Mardi 28 Août 2012 19:48:42
> >> > Objet: RBD performance - tuning hints
> >> >
> >> > Hi,
> >> >
> >> > on my 4-node system (SSD + 10GbE, see bench-config.txt for details)
> >> > I can observe a pretty nice rados bench performance
> >> > (see bench-rados.txt for details):
> >> >
> >> > Bandwidth (MB/sec): 961.710
> >> > Max bandwidth (MB/sec): 1040
> >> > Min bandwidth (MB/sec): 772
> >> >
> >> >
> >> > Also the bandwidth performance generated with
> >> > fio --filename=/dev/rbd1 --direct=1 --rw=$io --bs=$bs --size=2G --iodepth=$threads --ioengine=libaio --runtime=60 --group_reporting --name=file1 --output=fio_${io}_${bs}_${threads}
> >> >
> >> > .... is acceptable, e.g.
> >> > fio_write_4m_16 795 MB/s
> >> > fio_randwrite_8m_128 717 MB/s
> >> > fio_randwrite_8m_16 714 MB/s
> >> > fio_randwrite_2m_32 692 MB/s
> >> >
> >> >
> >> > But, the write IOPS seems to be limited around 19k ...
> >> > RBD 4M 64k (= optimal_io_size)
> >> > fio_randread_512_128 53286 55925
> >> > fio_randread_4k_128 51110 44382
> >> > fio_randread_8k_128 30854 29938
> >> > fio_randwrite_512_128 18888 2386
> >> > fio_randwrite_512_64 18844 2582
> >> > fio_randwrite_8k_64 17350 2445
> >> > (...)
> >> > fio_read_4k_128 10073 53151
> >> > fio_read_4k_64 9500 39757
> >> > fio_read_4k_32 9220 23650
> >> > (...)
> >> > fio_read_4k_16 9122 14322
> >> > fio_write_4k_128 2190 14306
> >> > fio_read_8k_32 706 13894
> >> > fio_write_4k_64 2197 12297
> >> > fio_write_8k_64 3563 11705
> >> > fio_write_8k_128 3444 11219
> >> >
> >> >
> >> > Any hints for tuning the IOPS (read and/or write) would be appreciated.
> >> >
> >> > How can I set the variables when the Journal data have go to the OSD ? (after X seconds and/or when Y %-full)
> >> >
> >> >
> >> > Kind Regards,
> >> > -Dieter
> >> >
> >> >
> >> >
> >>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2012-08-30 15:09 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-07-20 10:24 Ceph write performance George Shuklin
     [not found] ` <20120720104150.GA16630@oder.kd-bie.de>
2012-07-20 10:48   ` George Shuklin
2012-07-20 11:49     ` Mark Nelson
2012-07-20 20:36       ` Ceph write performance on RAM-DISK Dieter Kasper
2012-07-20 21:28         ` Mark Nelson
2012-07-20 15:53 ` Ceph write performance Matthew Richardson
2012-07-20 16:37 ` Gregory Farnum
2012-08-28 17:48 ` RBD performance - tuning hints Dieter Kasper
2012-08-28 18:53   ` Smart Weblications GmbH - Florian Wiessner
2012-08-28 19:04     ` Dieter Kasper
2012-08-29  8:50   ` Alexandre DERUMIER
2012-08-29 17:37     ` Josh Durgin
2012-08-29 19:29       ` RBD performance - tuning hints / parameter doc Dieter Kasper
2012-08-29 22:34         ` Samuel Just
2012-08-30 15:08           ` Dieter Kasper [this message]
2012-08-30 20:39             ` Samuel Just
2012-08-30 14:56     ` RBD performance - tuning hints Dieter Kasper
2012-08-30 15:28       ` Alexandre DERUMIER
2012-08-30 15:33         ` Dieter Kasper
2012-08-30 15:46           ` Alexandre DERUMIER
2012-08-30 16:02             ` Dieter Kasper
2012-08-30 16:12               ` Alexandre DERUMIER
2012-08-30 16:16                 ` Josh Durgin
2012-08-31  7:46                   ` Alexandre DERUMIER
2012-08-31  8:11                     ` Dietmar Maurer
2012-08-31  8:48                       ` Mark Kirkwood
2012-08-31  9:49                         ` RBD performance - tuning hints / major slowdown effect(s) Dieter Kasper
2012-08-31 10:16                           ` Mark Kirkwood
2012-08-31 10:58                       ` RBD performance - tuning hints Jerker Nyberg
2012-08-30 16:48                 ` Dieter Kasper
2012-08-30 18:10                   ` Gregory Farnum

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120830150838.GB32184@oder.kd-bie.de \
    --to=d.kasper@kabelmail.de \
    --cc=aderumier@odiso.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=josh.durgin@inktank.com \
    --cc=sam.just@inktank.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.