From: "peng.hse" <peng.hse@xtaotech.com>
To: Sage Weil <sweil@redhat.com>, Javen Wu <javen.wu@xtaotech.com>
Cc: ceph-devel@vger.kernel.org
Subject: Re: Is BlueFS an alternative of BlueStore?
Date: Thu, 7 Jan 2016 22:37:15 +0800 [thread overview]
Message-ID: <568E781B.4030803@xtaotech.com> (raw)
In-Reply-To: <alpine.DEB.2.11.1601070813000.26051@cpach.fuggernut.com>
Hi Sage,
thanks for your quick response. Javen and I once the zfs developer,are
currently focusing on how to
leverage some of the zfs ideas to improve the ceph backend performance
in userspace.
Based on your encouraging reply, we come up with 2 schemes to continue
our future work
1. the scheme one: using the entire new FS to replace rocksdb+bluefs,
the FS itself handles the mapping of
oid->fs-object(kind of zfs dnode) and the according attrs used by ceph.
Despite the implemention challenges you mentioned about the in-order
enumeration of objects during backfill, scrub, etc (the
same situation we also confronted in zfs, the ZAP features help us
a lot).
From performance or architecture point of view, it looks more clear
and clean, would you suggest us to give a try ?
2. the scheme two: As your last suspect, we just temporarily implemented
the simple version of the FS
which leverage libzpool ideas to plug into rocksdb underneath as
your bluefs did
precious your insightful reply.
Thanks
On 2016年01月07日 21:19, Sage Weil wrote:
> On Thu, 7 Jan 2016, Javen Wu wrote:
>> Hi Sage,
>>
>> Sorry to bother you. I am not sure if it is appropriate to send email to you
>> directly, but I cannot find any useful information to address my confusion
>> from Internet. Hope you can help me.
>>
>> Occasionally, I heard that you are going to start BlueFS to eliminate the
>> redudancy between XFS journal and RocksDB WAL. I am a little confused.
>> Is the Bluefs only to host RocksDB for BlueStore or it's an
>> alternative of BlueStore?
>>
>> I am a new comer to CEPH, I am not sure my understanding is correct about
>> BlueStore. BlueStore in my mind is as below.
>>
>> BlueStore
>> =========
>> RocksDB
>> +-----------+ +-----------+
>> | onode | | |
>> | WAL | | |
>> | omap | | |
>> +-----------+ | bdev |
>> | | | |
>> | XFS | | |
>> | | | |
>> +-----------+ +-----------+
> This is the picture before BlueFS enters the picture.
>
>> I am curious if BlueFS is able to host RocksDB, actually it's already a
>> "filesystem" which have to maintain blockmap kind of metadata by its own
>> WITHOUT the help of RocksDB.
> Right. BlueFS is a really simple "file system" that is *just* complicated
> enough to implement the rocksdb::Env interface, which is what rocksdb
> needs to store its log and sst files. The after picture looks like
>
> +--------------------+
> | bluestore |
> +----------+ |
> | rocksdb | |
> +----------+ |
> | bluefs | |
> +----------+---------+
> | block device |
> +--------------------+
>
>> The reason we care the intention and the design target of BlueFS is that I had
>> discussion with my partner Peng.Hse about an idea to introduce a new
>> ObjectStore using ZFS library. I know CEPH supports ZFS as FileStore backend
>> already, but we had a different immature idea to use libzpool to implement a
>> new
>> ObjectStore for CEPH totally in userspace without SPL and ZOL kernel module.
>> So that we can align CEPH transaction and zfs transaction in order to avoid
>> double write for CEPH journal.
>> ZFS core part libzpool (DMU, metaslab etc) offers a dnode object store and
>> it's platform kernel/user independent. Another benefit for the idea is we
>> can extend our metadata without bothering any DBStore.
>>
>> Frankly, we are not sure if our idea is realistic so far, but when I heard of
>> BlueFS, I think we need to know the BlueFS design goal.
> I think it makes a lot of sense, but there are a few challenges. One
> reason we use rocksdb (or a similar kv store) is that we need in-order
> enumeration of objects in order to do collection listing (needed for
> backfill, scrub, and omap). You'll need something similar on top of zfs.
>
> I suspect the simplest path would be to also implement the rocksdb::Env
> interface on top of the zfs libraries. See BlueRocksEnv.{cc,h} to see the
> interface that has to be implemented...
>
> sage
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2016-01-07 14:43 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-01-07 4:01 Is BlueFS an alternative of BlueStore? Javen Wu
2016-01-07 13:19 ` Sage Weil
2016-01-07 14:37 ` peng.hse [this message]
2016-01-07 14:40 ` Javen Wu
2016-01-07 15:10 ` Sage Weil
2016-01-07 15:54 ` Javen Wu
2016-01-13 14:31 ` Javen Wu
2016-01-13 14:58 ` Sage Weil
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=568E781B.4030803@xtaotech.com \
--to=peng.hse@xtaotech.com \
--cc=ceph-devel@vger.kernel.org \
--cc=javen.wu@xtaotech.com \
--cc=sweil@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.