From: Mark Nelson <mnelson@redhat.com>
To: Casier David <casierdavid@gmail.com>, ceph-devel@vger.kernel.org
Subject: Re: Perfomance CPU and IOPS
Date: Sun, 24 May 2015 07:09:08 -0500 [thread overview]
Message-ID: <5561BF64.5080509@redhat.com> (raw)
In-Reply-To: <5561B444.6060108@gmail.com>
On 05/24/2015 06:21 AM, Casier David wrote:
> Hello everybody,
> I have some suggestions to improve the Ceph performance with in case of
> using Rados Block Device.
>
> On FileStore :
> - Remove all metadata in HDD and used omap on SSD. This reduce IOPS
> and increases throughput.
This may matter more now that we've been removing other bottlenecks. In
the past when we've tested this with filestore on spinning disks it
hasn't made a huge difference. Having said that Sam recently committed
a change that should make it safer to use leveldb/rocksdb on alternate
disks:
https://github.com/ceph/ceph/pull/4718
> - Remove journal, thread "sync_entry", and write directly in
> queue_transaction.
>
> To compensate journal, you could use Cache Tier Ceph.
Cache teiring isn't really a good solution for general Ceph workloads
right now. The overhead of object promotions (4MB be default) into the
cache tier is just too heavy when the hot/cold distribution is not
highly skewed.
Instead I'd suggest reading up on Sage's newstore project:
http://www.spinics.net/lists/ceph-devel/msg22712.html
https://wiki.ceph.com/Planning/Blueprints/Infernalis/NewStore_%28new_osd_backend%29
Specifically note that the WAL is used during overwrites but for full
object writes we can simply write out the new object using libaio and
remove the old one.
Here's our latest performance results vs filestore. The biggest area we
need to improve is large object object partial overwrites on fast
devices (ie SSDs):
http://nhm.ceph.com/newstore/8c8c5903_rbd_rados_tests.pdf
We may be able to split objects up into ~512k fragments to help deal
with large partial object overwites. It may also be that we could help
the rocksdb folks change the way the WAL works (dedicated portion of the
disk like Ceph journals rather than log files that get created/deleted
on the disk)
>
> ceph-osd must be with less Thread and Lock.
> With 1 OSD for 1 HDD, i think Lock is necessary only for scrub, recovery
> or other background job.
> And only one thread with the use of libaio.
Which locks do you think could be safely removed?
>
> I think Ceph-OSD should be very light.
> Potentially with direct writing aftergiven the transmitted data to other
> OSD from map.
> In this case, a lot of ceph-osd could work on the same server.
>
> Actually, i work on the repository https://www.github.com/dcasier/ceph.
> You could see start works on FileStore.*
> But potentially not safe.
I'd highly suggest getting feedback from Sam/Sage before going too far
down this rabbit hole. :)
>
> David.
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2015-05-24 12:09 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-05-24 11:21 Perfomance CPU and IOPS Casier David
2015-05-24 12:09 ` Mark Nelson [this message]
2015-05-24 15:55 ` Casier David
2015-05-26 8:34 ` Somnath Roy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5561BF64.5080509@redhat.com \
--to=mnelson@redhat.com \
--cc=casierdavid@gmail.com \
--cc=ceph-devel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.