linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ferry Toth <ftoth@exalondelft.nl>
To: linux-btrfs@vger.kernel.org
Subject: Re: Hot data tracking / hybrid storage
Date: Fri, 20 May 2016 17:02:00 +0000 (UTC)	[thread overview]
Message-ID: <nhnfu8$igq$1@ger.gmane.org> (raw)
In-Reply-To: 1c1358af-4549-618a-c408-b93832d33225@gmail.com

Op Fri, 20 May 2016 08:03:12 -0400, schreef Austin S. Hemmelgarn:

> On 2016-05-19 19:23, Henk Slager wrote:
>> On Thu, May 19, 2016 at 8:51 PM, Austin S. Hemmelgarn
>> <ahferroin7@gmail.com> wrote:
>>> On 2016-05-19 14:09, Kai Krakow wrote:
>>>>
>>>> Am Wed, 18 May 2016 22:44:55 +0000 (UTC)
>>>> schrieb Ferry Toth <ftoth@exalondelft.nl>:
>>>>
>>>>> Op Tue, 17 May 2016 20:33:35 +0200, schreef Kai Krakow:
>>>> Bcache is actually low maintenance, no knobs to turn. Converting to
>>>> bcache protective superblocks is a one-time procedure which can be
>>>> done online. The bcache devices act as normal HDD if not attached to
>>>> a caching SSD. It's really less pain than you may think. And it's a
>>>> solution available now. Converting back later is easy: Just detach
>>>> the HDDs from the SSDs and use them for some other purpose if you
>>>> feel so later. Having the bcache protective superblock still in place
>>>> doesn't hurt then. Bcache is a no-op without caching device attached.
>>>
>>> No, bcache is _almost_ a no-op without a caching device.  From a
>>> userspace perspective, it does nothing, but it is still another layer
>>> of indirection in the kernel, which does have a small impact on
>>> performance.  The same is true of using LVM with a single volume
>>> taking up the entire partition, it looks almost no different from just
>>> using the partition, but it will perform worse than using the
>>> partition directly.  I've actually done profiling of both to figure
>>> out base values for the overhead, and while bcache with no cache
>>> device is not as bad as the LVM example, it can still be a roughly
>>> 0.5-2% slowdown (it gets more noticeable the faster your backing
>>> storage is).
>>>
>>> You also lose the ability to mount that filesystem directly on a
>>> kernel without bcache support (this may or may not be an issue for
>>> you).
>>
>> The bcache (protective) superblock is in an 8KiB block in front of the
>> file system device. In case the current, non-bcached HDD's use modern
>> partitioning, you can do a 5-minute remove or add of bcache, without
>> moving/copying filesystem data. So in case you have a bcache-formatted
>> HDD that had just 1 primary partition (512 byte logical sectors), the
>> partition start is at sector 2048 and the filesystem start is at 2064.
>> Hard removing bcache (so making sure the module is not
>> needed/loaded/used the next boot) can be done done by changing the
>> start-sector of the partition from 2048 to 2064. In gdisk one has to
>> change the alignment to 16 first, otherwise this it refuses. And of
>> course, also first flush+stop+de-register bcache for the HDD.
>>
>> The other way around is also possible, i.e. changing the start-sector
>> from 2048 to 2032. So that makes adding bcache to an existing
>> filesystem a 5 minute action and not a GBs- or TBs copy action. It is
>> not online of course, but just one reboot is needed (or just umount,
>> gdisk, partprobe, add bcache etc).
>> For RAID setups, one could just do 1 HDD first.
> My argument about the overhead was not about the superblock, it was
> about the bcache layer itself.  It isn't practical to just access the
> data directly if you plan on adding a cache device, because then you
> couldn't do so online unless you're going through bcache.  This extra
> layer of indirection in the kernel does add overhead, regardless of the
> on-disk format.
> 
> Secondarily, having a HDD with just one partition is not a typical use
> case, and that argument about the slack space resulting from the 1M
> alignment only holds true if you're using an MBR instead of a GPT layout
> (or for that matter, almost any other partition table format), and
> you're not booting from that disk (because GRUB embeds itself there).
> It's also fully possible to have an MBR formatted disk which doesn't
> have any spare space there too (which is how most flash drives get
> formatted).

We have 4 1TB drives in MBR, 1MB free at the beginning, grub on all 4, 
then 8GB swap, then all the rest btrfs (no LVM used). The 4 btrfs 
partitions are in the same pool, which is in btrfs RAID10 format. /boot 
is in subvolume @boot.

In this configuration nothing would beat btrfs if I could just add 2 
SSD's to the pool that would be clever enough to be paired in RAID1 and 
would be preferred for small (<1GB) file writes. Then balance should be 
able to move not often used files to the HDD.

None of the methods mentioned here sound easy or quick to do, or even 
well tested.

> This also doesn't change the fact that without careful initial
> formatting (it is possible on some filesystems to embed the bcache SB at
> the beginning of the FS itself, many of them have some reserved space at
> the beginning of the partition for bootloaders, and this space doesn't
> have to exist when mounting the FS) or manual alteration of the
> partition, it's not possible to mount the FS on a system without bcache
> support.
>>
>> There is also a tool doing the conversion in-place (I haven't used it
>> myself, my python(s) had trouble; I could do the partition table edit
>> much faster/easier):
>> https://github.com/g2p/blocks#bcache-conversion
>>
> I actually hadn't known about this tool, thanks for mentioning it.



  reply	other threads:[~2016-05-20 17:14 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-15 12:12 Hot data tracking / hybrid storage Ferry Toth
2016-05-15 21:11 ` Duncan
2016-05-15 23:05   ` Kai Krakow
2016-05-17  6:27     ` Ferry Toth
2016-05-17 11:32       ` Austin S. Hemmelgarn
2016-05-17 18:33         ` Kai Krakow
2016-05-18 22:44           ` Ferry Toth
2016-05-19 18:09             ` Kai Krakow
2016-05-19 18:51               ` Austin S. Hemmelgarn
2016-05-19 21:01                 ` Kai Krakow
2016-05-20 11:46                   ` Austin S. Hemmelgarn
2016-05-19 23:23                 ` Henk Slager
2016-05-20 12:03                   ` Austin S. Hemmelgarn
2016-05-20 17:02                     ` Ferry Toth [this message]
2016-05-20 17:59                       ` Austin S. Hemmelgarn
2016-05-20 21:31                         ` Henk Slager
2016-05-29  6:23                         ` Andrei Borzenkov
2016-05-29 17:53                           ` Chris Murphy
2016-05-29 18:03                             ` Holger Hoffstätte
2016-05-29 18:33                               ` Chris Murphy
2016-05-29 20:45                                 ` Ferry Toth
2016-05-31 12:21                                   ` Austin S. Hemmelgarn
2016-06-01 10:45                                   ` Dmitry Katsubo
2016-05-20 22:26                     ` Henk Slager
2016-05-23 11:32                       ` Austin S. Hemmelgarn
2016-05-16 11:25 ` Austin S. Hemmelgarn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='nhnfu8$igq$1@ger.gmane.org' \
    --to=ftoth@exalondelft.nl \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).