All of lore.kernel.org
 help / color / mirror / Atom feed
From: Walker, Benjamin <benjamin.walker at intel.com>
To: spdk@lists.01.org
Subject: Re: [SPDK] Ceph/Bluestore SPDK based backend?
Date: Tue, 07 Feb 2017 18:54:53 +0000	[thread overview]
Message-ID: <1486493691.22338.1.camel@intel.com> (raw)
In-Reply-To: 9ee92422-bd3f-b9dc-924d-7576abb4e052@gmail.com

[-- Attachment #1: Type: text/plain, Size: 5101 bytes --]

On Tue, 2017-02-07 at 19:20 +0100, Tobias Oberstein wrote:
> Hi Nate,
> 
> Am 07.02.2017 um 14:03 schrieb Marushak, Nathan:
> > Hi Tobias,
> > 
> > There has been some work done in Bluestore for this. If you search
> > "SPDK Bluestore" or something similar you'll see some links.
> 
> I was trying to find conclusive info on the net before - with no 
> definite result though, eg, after reading (collegue of yours):
> 
> Accelerate Ceph via SPDK
> 
> http://7xweck.com1.z0.glb.clouddn.com/cephdaybeijing201608/04-SPDK%E5
> %8A%A0%E9%80%9FCeph-
> XSKY%20Bluestore%E6%A1%88%E4%BE%8B%E5%88%86%E4%BA%AB-
> %E6%89%AC%E5%AD%90%E5%A4%9C-%E7%8E%8B%E8%B1%AA%E8%BF%88.pdf
> 
> My understanding is:
> 
> Bluestore seems to introduce a proper block device abstraction
> within 
> the Ceph OSD implementation.
> 
> And this new OSD internal block device abstraction is implemented
> for 
> one, over regular Linux block devices (already a step forward from
> being 
> forced to shuffle everything through a filesystem).

Correct - Bluestore is a highly simplified user space filesystem.

> 
> But what I couldn't find in above or on the net: is there a SPDK
> backed 
> implementation of this new Bluestore OSD block device abstraction?
> 
> Do you have a link for me? I really tried to find it ..

Here is a link to the actual code:

https://github.com/ceph/ceph/blob/master/src/os/bluestore/NVMEDevice.cc

This was not implemented by the SPDK team and I don't know what state
it is in, but it is definitely there.

> 
> > The impact to performance of Ceph was somewhat limited however.
> > There are bottlenecks in the Ceph OSD.
> 
> Ok=( Any public avail info on that?

I don't have the actual numbers on hand, but it was a small improvement
only. I'm speculating, but I can think of a number of problems in the
above implementation that will limit performance. The biggest problem
is that Ceph still relies on buffered I/O in a number of cases, but the
SPDK implementation doesn't do any caching. Caching is of course the
single most important aspect of storage performance. The above
implementation also copies memory for every read and write into DMA-
able buffers because Ceph doesn't allocate buffers from DMA-able memory
by default. To fix that, Ceph would need to either make its memory
manager pluggable as well, or just use SPDK/DPDK throughout for all
data buffer allocations. Third, Ceph still does some blocking I/O in
certain cases, and blocking I/O with SPDK, given there is no caching,
is probably slower than the kernel.

> 
> In general: having a SPDK+DPDK backed implementation of Ceph/OSD
> seems 
> highly desirable with potentially big impact .. not?

I think there is room to make it far faster than it is today using
SPDK/DPDK, but it would take a much more dramatic set of changes to the
structure of the OSD to actually realize the benefit. The whole OSD
would probably need to be rewritten to do one thread per core with
message passing and entirely asynchronous network and storage stacks.
That's effectively a brand new OSD.

> 
> Thanks for your reply!
> Cheers,
> /Tobias
> 
> > 
> > Thanks,
> > Nate
> > 
> > On Feb 7, 2017, at 5:20 AM, Andrey Kuzmin <andrey.v.kuzmin(a)gmail.co
> > m<mailto:andrey.v.kuzmin(a)gmail.com>> wrote:
> > 
> > Not that I know of, and likely because it belongs to Ceph, not
> > SPDK. SPDK goal is to enable applications to utilize NVMe flash
> > more efficiently, not to provide a backend for each and every
> > application out there.
> > 
> > Regards,
> > Andrey
> > 
> > On Feb 7, 2017 14:03, "Tobias Oberstein" <tobias.oberstein(a)gmail.co
> > m<mailto:tobias.oberstein(a)gmail.com>> wrote:
> > Hi,
> > 
> > the 16.2 release added a Ceph RBD block device as a backend for
> > SPDK applications. I am wondering about the inverse?
> > 
> > As in: having Ceph RBD OSDs use SPDK to use NVMe flash as
> > underlying block storage.
> > 
> > There seems to be efforts with Ceph/Bluestore
> > 
> > http://www.slideshare.net/sageweil1/bluestore-a-new-faster-storage-
> > backend-for-ceph
> > 
> > to allow OSDs use raw block devices as underlying storage (instead
> > of Filestore, which shuffles everything through a filesystem).
> > 
> > So put differently: is there a Ceph/Bluestore block device
> > implementation using SPDK?
> > 
> > Cheers,
> > /Tobias
> > _______________________________________________
> > SPDK mailing list
> > SPDK(a)lists.01.org<mailto:SPDK(a)lists.01.org>
> > https://lists.01.org/mailman/listinfo/spdk
> > _______________________________________________
> > SPDK mailing list
> > SPDK(a)lists.01.org<mailto:SPDK(a)lists.01.org>
> > https://lists.01.org/mailman/listinfo/spdk
> > 
> > 
> > 
> > _______________________________________________
> > SPDK mailing list
> > SPDK(a)lists.01.org
> > https://lists.01.org/mailman/listinfo/spdk
> > 
> 
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 3274 bytes --]

             reply	other threads:[~2017-02-07 18:54 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-07 18:54 Walker, Benjamin [this message]
  -- strict thread matches above, loose matches on Subject: below --
2017-02-08 10:42 [SPDK] Ceph/Bluestore SPDK based backend? Andrey Kuzmin
2017-02-07 22:37 Tobias Oberstein
2017-02-07 18:20 Tobias Oberstein
2017-02-07 18:16 Andrey Kuzmin
2017-02-07 18:06 Tobias Oberstein
2017-02-07 13:03 Marushak, Nathan
2017-02-07 12:20 Andrey Kuzmin
2017-02-07 11:03 Tobias Oberstein

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1486493691.22338.1.camel@intel.com \
    --to=spdk@lists.01.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.