From: Eric Sandeen <esandeen@redhat.com>
To: Ric Wheeler <rwheeler@redhat.com>, Sage Weil <sage@newdream.net>,
David Casier <david.casier@aevoo.fr>
Cc: Ceph Development <ceph-devel@vger.kernel.org>,
Dave Chinner <dchinner@redhat.com>,
Brian Foster <bfoster@redhat.com>
Subject: Re: Fwd: Fwd: [newstore (again)] how disable double write WAL
Date: Fri, 4 Dec 2015 14:20:14 -0600 [thread overview]
Message-ID: <5661F57E.80709@redhat.com> (raw)
In-Reply-To: <5661F3A9.8070703@redhat.com>
On 12/4/15 2:12 PM, Ric Wheeler wrote:
> On 12/01/2015 05:02 PM, Sage Weil wrote:
>> Hi David,
>>
>> On Tue, 1 Dec 2015, David Casier wrote:
>>> Hi Sage,
>>> With a standard disk (4 to 6 TB), and a small flash drive, it's easy
>>> to create an ext4 FS with metadata on flash
>>>
>>> Example with sdg1 on flash and sdb on hdd :
>>>
>>> size_of() {
>>> blockdev --getsize $1
>>> }
>>>
>>> mkdmsetup() {
>>> _ssd=/dev/$1
>>> _hdd=/dev/$2
>>> _size_of_ssd=$(size_of $_ssd)
>>> echo """0 $_size_of_ssd linear $_ssd 0
>>> $_size_of_ssd $(size_of $_hdd) linear $_hdd 0" | dmsetup create dm-${1}-${2}
>>> }
>>>
>>> mkdmsetup sdg1 sdb
>>>
>>> mkfs.ext4 -O ^has_journal,flex_bg,^uninit_bg,^sparse_super,sparse_super2,^extra_isize,^dir_nlink,^resize_inode
>>> -E packed_meta_blocks=1,lazy_itable_init=0 -G 32768 -I 128 -i
>>> $((1024*512)) /dev/mapper/dm-sdg1-sdb
>>>
>>> With that, all meta_blocks are on the SSD
>>>
>>> If omap are on SSD, there are almost no metadata on HDD
>>>
>>> Consequence : performance Ceph (with hack on filestore without journal
>>> and directIO) are almost same that performance of the HDD.
>>>
>>> With cache-tier, it's very cool !
>> Cool! I know XFS lets you do that with the journal, but I'm not sure if
>> you can push the fs metadata onto a different device too.. I'm guessing
>> not?
>>
>>> That is why we are working on a hybrid approach HDD / Flash on ARM or Intel
>>>
>>> With newstore, it's much more difficult to control the I/O profil.
>>> Because rocksDB embedded its own intelligence
>> This is coincidentally what I've been working on today. So far I've just
>> added the ability to put the rocksdb WAL on a second device, but it's
>> super easy to push rocksdb data there as well (and have it spill over onto
>> the larger, slower device if it fills up). Or to put the rocksdb WAL on a
>> third device (e.g., expensive NVMe or NVRAM).
>>
>> See this ticket for the ceph-disk tooling that's needed:
>>
>> http://tracker.ceph.com/issues/13942
>>
>> I expect this will be more flexible and perform better than the ext4
>> metadata option, but we'll need to test on your hardware to confirm!
>>
>> sage
>
> I think that XFS "realtime" subvolumes are the thing that does this - the second volume contains only the data (no metadata).
>
> Seem to recall that it is popular historically with video appliances, etc but it is not commonly used.
>
> Some of the XFS crew cc'ed above would have more information on this,
The realtime subvolume puts all data on a separate volume, and uses a different
allocator; it is more for streaming type applications, in general. And it's
not enabled in RHEL - and not heavily tested at this point, I think.
-Eric
next prev parent reply other threads:[~2015-12-04 20:20 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <9D046674-EA8B-4CB5-B049-3CF665D4ED64@aevoo.fr>
2015-11-24 20:42 ` Fwd: [newstore (again)] how disable double write WAL Sage Weil
[not found] ` <CA+gn+znHyioZhOvuidN1pvMgRMOMvjbjcues_+uayYVadetz=A@mail.gmail.com>
2015-12-01 20:34 ` Fwd: " David Casier
2015-12-01 22:02 ` Sage Weil
2015-12-04 20:12 ` Ric Wheeler
2015-12-04 20:20 ` Eric Sandeen [this message]
2015-12-08 4:46 ` Dave Chinner
2016-02-15 15:18 ` David Casier
2016-02-15 16:21 ` Eric Sandeen
2016-02-16 3:35 ` Dave Chinner
2016-02-16 8:14 ` David Casier
2016-02-16 8:39 ` David Casier
2016-02-19 5:26 ` Dave Chinner
2016-02-19 11:28 ` Blair Bethwaite
2016-02-19 12:57 ` Mark Nelson
2016-02-22 12:01 ` Sage Weil
2016-02-22 17:09 ` David Casier
2016-02-22 17:16 ` Sage Weil
2016-02-18 17:54 ` David Casier
2016-02-19 17:06 ` Eric Sandeen
2016-02-21 10:56 ` David Casier
2016-02-22 15:56 ` Eric Sandeen
2016-02-22 16:12 ` David Casier
2016-02-22 16:16 ` Eric Sandeen
2016-02-22 17:17 ` Howard Chu
2016-02-23 5:20 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5661F57E.80709@redhat.com \
--to=esandeen@redhat.com \
--cc=bfoster@redhat.com \
--cc=ceph-devel@vger.kernel.org \
--cc=david.casier@aevoo.fr \
--cc=dchinner@redhat.com \
--cc=rwheeler@redhat.com \
--cc=sage@newdream.net \
--cc=sandeen@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox