All of lore.kernel.org
 help / color / mirror / Atom feed
From: Josh Durgin <josh.durgin@inktank.com>
To: Mark Kirkwood <mark.kirkwood@catalyst.net.nz>
Cc: ceph-devel@vger.kernel.org
Subject: Re: Ceph RBD performance - random writes
Date: Wed, 08 Aug 2012 11:46:12 -0700	[thread overview]
Message-ID: <5022B3F4.5050109@inktank.com> (raw)
In-Reply-To: <5021F6D1.7000004@catalyst.net.nz>

On 08/07/2012 10:19 PM, Mark Kirkwood wrote:
> I've been looking at using Ceph RBD as a block store for database use.
> As part of this I'm looking a how (particularly random) IO of smallish
> (4K, 8K) block sizes performs.
>
> I've setup Ceph with a single osd and mon spread over two SSD (Intel
> 520) - 2G journal on one and the osd data on the other (xfs filesystem).
> The Intel's are pretty fast, and (despite being shackled by a crappy
> Nvidia SATA controller) fly for random IO.
>
> However I am not seeing that reflected in the RBD case. I have the
> device mounted on the local machine where the osd and mon are running
> (so network performance should not be a factor here).
>
> Here is what I did:
>
> Create a rbd device of 10G and mount on /mnt/vol0:
>
> $ rbd create --size 10240 vol0
> $ rbd map vol0
> $ mkfx.xfs /dev/rbd0
> $ rbd mount /dev/rdb0 /mnt/vol0
>
> Make a file:
>
> $ dd if=/dev/zero of=/mnt/vol0/dump/file bs=4k count=300000 conv=fsync
> 1228800000 bytes (1.2 GB) copied, 13.4361 s, 91.5 MB/s
>
> Performance ok if file size < journal (2G).
>
> $ dd if=/dev/zero of=/mnt/vol0/dump/file bs=4096k count=200 conv=fsync
> 838860800 bytes (839 MB) copied, 9.47086 s, 88.6 MB/s
>
> Not so good if file size > journal.
>
> $ dd if=/dev/zero of=/mnt/vol0/dump/file bs=4096k count=1000 conv=fsync
> 4194304000 bytes (4.2 GB) copied, 279.891 s, 15.0 MB/s
>
> Random writes (see attached file) sync'ed with sync_file_range are ok if
> block size big:
>
> $ ./writetest /mnt/vol0/dump/file 4194304 0 1
> random writes: 292 of: 4194304 bytes elapsed: 9.8397s io rate: 30/s
> (118.70 MB/s)
>
> $ ./writetest /mnt/vol0/dump/file 1048576 0 1
> random writes: 1171 of: 1048576 bytes elapsed: 10.6042s io rate: 110/s
> (110.43 MB/s)
>
> $ ./writetest /mnt/vol0/dump/file 131072 0 1
> random writes: 9375 of: 131072 bytes elapsed: 15.8075s io rate: 593/s
> (74.13 MB/s)
>
>
> However smallish block size is suicide (trigger suicide assert after a
> while), I see 100 IOPS or less on actual devices, all 100% util:
>
> $ ./writetest /mnt/vol0/dump/file 8192 0 1
>
> I am running into http://tracker.newdream.net/issues/2784 here I think.

This can be a sign of a bug in the underlying filesystem or hardware -
maybe your controller? That assert is hit when a single operation to
the filesystem beneath the osd takes longer than 180 seconds (by
default).

> Note that the actual SSD are very fast for this when accessed directly:
>
> $ ./writetest /data1/ceph/1/file 8192 0 1
> random writes: 1000000 of: 8192 bytes elapsed: 125.7907s io rate: 7950/s
> (62.11 MB/s)
>
>
> Thanks for your patience in reading so far - some actual questions now :-)
>
> 1/ Why is the appending write from dd when the size of file > journal so
> slow, despite reasonably capable storage devices?

It's possible you need to use more threads to have more operations in
flight in to the filestore (the main storage for the osd). Try
something like this in your ceph configuration for the osds:

     osd op threads = 24
     osd disk threads = 24
     filestore op threads = 6
     filestore queue max ops = 24

(from http://www.spinics.net/lists/ceph-devel/msg07128.html)

> 2/ Is the sudden dramatic drop in random write performance a
> manifestation of the "small requests  are slow" issue? or is this
> something else?

It's probably that. Sam's actively looking into it, and once he has
something it will be interesting to see how well it works on your
hardware.

Josh

> Thanks
>
> Mark
>
>


  reply	other threads:[~2012-08-08 18:46 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-08-08  5:19 Ceph RBD performance - random writes Mark Kirkwood
2012-08-08 18:46 ` Josh Durgin [this message]
2012-08-08 21:58   ` Mark Nelson
2012-08-08 23:36     ` Mark Kirkwood
2012-08-09  0:43       ` Mark Kirkwood
2012-08-09  3:54         ` Mark Kirkwood
2012-08-09 11:42           ` Mark Nelson
2012-08-09 23:31             ` Mark Kirkwood
2012-08-14  5:41               ` Mark Kirkwood
2012-08-09 14:48 ` Matthew Richardson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5022B3F4.5050109@inktank.com \
    --to=josh.durgin@inktank.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=mark.kirkwood@catalyst.net.nz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.