From: Ferry Toth <ftoth@exalondelft.nl>
To: linux-btrfs@vger.kernel.org
Subject: Re: Hot data tracking / hybrid storage
Date: Wed, 18 May 2016 22:44:55 +0000 (UTC) [thread overview]
Message-ID: <nhir96$8n4$1@ger.gmane.org> (raw)
In-Reply-To: 20160517203335.5ff99a05@jupiter.sol.kaishome.de
Op Tue, 17 May 2016 20:33:35 +0200, schreef Kai Krakow:
> Am Tue, 17 May 2016 07:32:11 -0400 schrieb "Austin S. Hemmelgarn"
> <ahferroin7@gmail.com>:
>
>> On 2016-05-17 02:27, Ferry Toth wrote:
>> > Op Mon, 16 May 2016 01:05:24 +0200, schreef Kai Krakow:
>> >
>> >> Am Sun, 15 May 2016 21:11:11 +0000 (UTC)
>> >> schrieb Duncan <1i5t5.duncan@cox.net>:
>> >>
>> [...]
>> > <snip>
>> >>
>> >> You can go there with only one additional HDD as temporary storage.
>> >> Just connect it, format as bcache, then do a "btrfs dev replace".
>> >> Now wipe that "free" HDD (use wipefs), format as bcache,
>> >> then... well, you get the point. At the last step, remove the
>> >> remaining HDD. Now add your SSDs, format as caching device, and
>> >> attach each individual HDD backing bcache to each SSD caching
>> >> bcache.
>> >>
>> >> Devices don't need to be formatted and created at the same time. I'd
>> >> also recommend to add all SSDs only in the last step to not wear
>> >> them early with writes during device replacement.
>> >>
>> >> If you want, you can add one additional step to get the temporary
>> >> hard disk back. But why not simply replace the oldest hard disk with
>> >> the newest. Take a look at smartctl to see which is the best
>> >> candidate.
>> >>
>> >> I went a similar route but without one extra HDD. I had three HDDs
>> >> in mraid1/draid0 and enough spare space. I just removed one HDD,
>> >> prepared it for bcache, then added it back and removed the next.
>> >>
>> > That's what I mean, a lot of work. And it's still a cache, with
>> > unnecessary copying from the ssd to the hdd.
>> On the other hand, it's actually possible to do this all online with
>> BTRFS because of the reshaping and device replacement tools.
>>
>> In fact, I've done even more complex reprovisioning online before (for
>> example, my home server system has 2 SSD's and 4 HDD's, running BTRFS
>> on top of LVM, I've at least twice completely recreated the LVM layer
>> online without any data loss and minimal performance degradation).
>> >
>> > And what happens when either a hdd or ssd starts failing?
>> I have absolutely no idea how bcache handles this, but I doubt it's any
>> better than BTRFS.
>
> Bcache should in theory fall back to write-through as soon as an error
> counter exceeds a threshold. This is adjustable with sysfs
> io_error_halftime and io_error_limit. Tho I never tried what actually
> happens when either the HDD (in bcache writeback-mode) or the SSD fails.
> Actually, btrfs should be able to handle this (tho, according to list
> reports, it doesn't handle errors very well at this point).
>
> BTW: Unnecessary copying from SSD to HDD doesn't take place in bcache
> default mode: It only copies from HDD to SSD in writeback mode (data is
> written to the cache first, then persisted to HDD in the background).
> You can also use "write through" (data is written to SSD and persisted
> to HDD at the same time, reporting persistence to the application only
> when both copies were written) and "write around" mode (data is written
> to HDD only, and only reads are written to the SSD cache device).
>
> If you want bcache behave as a huge IO scheduler for writes, use
> writeback mode. If you have write-intensive applications, you may want
> to choose write-around to not wear out the SSDs early. If you want
> writes to be cached for later reads, you can choose write-through mode.
> The latter two modes will ensure written data is always persisted to HDD
> with the same guaranties you had without bcache. The last mode is
> default and should not change behavior of btrfs if the HDD fails, and if
> the SSD fails bcache would simply turn off and fall back to HDD.
>
Hello Kai,
Yeah, lots of modes. So that means, none works well for all cases?
Our server has lots of old files, on smb (various size), imap (10000's
small, 1000's large), postgresql server, virtualbox images (large), 50 or
so snapshots and running synaptics for system upgrades is painfully slow.
We are expecting slowness to be caused by fsyncs which appear to be much
worse on a raid10 with snapshots. Presumably the whole thing would be
fast enough with ssd's but that would be not very cost efficient.
All the overhead of the cache layer could be avoided if btrfs would just
prefer to write small, hot, files to the ssd in the first place and clean
up while balancing. A combination of 2 ssd's and 4 hdd's would be very
nice (the mobo has 6 x sata, which is pretty common)
Moreover increasing the ssd's size in the future would then be just as
simple as replacing a disk by a larger one.
I think many would sign up for such a low maintenance, efficient setup
that doesn't require a PhD in IT to think out and configure.
Even at home, I would just throw in a low cost ssd next to the hdd if it
was as simple as device add. But I wouldn't want to store my photo/video
collection on just ssd, too expensive.
> Regards,
> Kai
>
> Replies to list-only preferred.
next prev parent reply other threads:[~2016-05-18 22:45 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-05-15 12:12 Hot data tracking / hybrid storage Ferry Toth
2016-05-15 21:11 ` Duncan
2016-05-15 23:05 ` Kai Krakow
2016-05-17 6:27 ` Ferry Toth
2016-05-17 11:32 ` Austin S. Hemmelgarn
2016-05-17 18:33 ` Kai Krakow
2016-05-18 22:44 ` Ferry Toth [this message]
2016-05-19 18:09 ` Kai Krakow
2016-05-19 18:51 ` Austin S. Hemmelgarn
2016-05-19 21:01 ` Kai Krakow
2016-05-20 11:46 ` Austin S. Hemmelgarn
2016-05-19 23:23 ` Henk Slager
2016-05-20 12:03 ` Austin S. Hemmelgarn
2016-05-20 17:02 ` Ferry Toth
2016-05-20 17:59 ` Austin S. Hemmelgarn
2016-05-20 21:31 ` Henk Slager
2016-05-29 6:23 ` Andrei Borzenkov
2016-05-29 17:53 ` Chris Murphy
2016-05-29 18:03 ` Holger Hoffstätte
2016-05-29 18:33 ` Chris Murphy
2016-05-29 20:45 ` Ferry Toth
2016-05-31 12:21 ` Austin S. Hemmelgarn
2016-06-01 10:45 ` Dmitry Katsubo
2016-05-20 22:26 ` Henk Slager
2016-05-23 11:32 ` Austin S. Hemmelgarn
2016-05-16 11:25 ` Austin S. Hemmelgarn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='nhir96$8n4$1@ger.gmane.org' \
--to=ftoth@exalondelft.nl \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).