linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Henk Slager <eye1tm@gmail.com>
To: linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: Hot data tracking / hybrid storage
Date: Fri, 20 May 2016 23:31:43 +0200	[thread overview]
Message-ID: <CAPmG0ja4UVHH0VkjAM3iobjz69Gi5DbxkwzzTzbKkH6Ddf4uUQ@mail.gmail.com> (raw)
In-Reply-To: <9bde1aa8-fac5-34c4-dffc-0bf15d86c008@gmail.com>

On Fri, May 20, 2016 at 7:59 PM, Austin S. Hemmelgarn
<ahferroin7@gmail.com> wrote:
> On 2016-05-20 13:02, Ferry Toth wrote:
>>
>> We have 4 1TB drives in MBR, 1MB free at the beginning, grub on all 4,
>> then 8GB swap, then all the rest btrfs (no LVM used). The 4 btrfs
>> partitions are in the same pool, which is in btrfs RAID10 format. /boot
>> is in subvolume @boot.
>
> If you have GRUB installed on all 4, then you don't actually have the full
> 2047 sectors between the MBR and the partition free, as GRUB is embedded in
> that space.  I forget exactly how much space it takes up, but I know it's
> not the whole 1023.5K  I would not suggest risking usage of the final 8k
> there though.  You could however convert to raid1 temporarily, and then for
> each device, delete it, reformat for bcache, then re-add it to the FS.  This
> may take a while, but should be safe (of course, it's only an option if
> you're already using a kernel with bcache support).

There is more then enough space in that 2047 sectors area for
inserting a bcache SB, but initially I also found it risky and was not
so sure. I anyhow don't want GRUB in the MBR, but in the filesystem/OS
partition that it should boot, otherwise multi-OS on the same SSD or
HDD gets into trouble.

For the described system, assuming a few minutes offline or
'maintenance' mode is acceptable, I personally would just shrink the
swap by 8KiB, lower its end-sector by 16 and also lower the
start-sector of the btrfs partition by 16 and then add bcache. The
location of GRUB should not matter actually.

>> In this configuration nothing would beat btrfs if I could just add 2
>> SSD's to the pool that would be clever enough to be paired in RAID1 and
>> would be preferred for small (<1GB) file writes. Then balance should be
>> able to move not often used files to the HDD.
>>
>> None of the methods mentioned here sound easy or quick to do, or even
>> well tested.

I agree that all the methods are actually quite complicated,
especially if compared to ZFS and its tools. Adding an ARC is as
simple and easy as you want and describe.

The statement I wanted make is that adding bcache for a (btrfs)
file-system can be done without touching the FS itself, provided that
one can allow some offline time for the FS.

> It really depends on what you're used to.  I would consider most of the
> options easy, but one of the areas I'm strongest with is storage management,
> and I've repaired damaged filesystems and partition tables by hand with a
> hex editor before, so I'm not necessarily a typical user.  If I was going to
> suggest something specifically, it would be dm-cache, because it requires no
> modification to the backing store at all, but that would require running on
> LVM if you want it to be easy to set up (it's possible to do it without LVM,
> but you need something to call dmsetup before mounting the filesystem, which
> is not easy to configure correctly), and if you're on an enterprise distro,
> it may not be supported.
>
> If you wanted to, it's possible, and not all that difficult, to convert a
> BTRFS system to BTRFS on top of LVM online, but you would probably have to
> split out the boot subvolume to a separate partition (depends on which
> distro you're on, some have working LVM support in GRUB, some don't).  If
> you're on a distro which does have LVM support in GRUB, the procedure would
> be:
> 1. Convert the BTRFS array to raid1. This lets you run with only 3 disks
> instead of 4.
> 2. Delete one of the disks from the array.
> 3. Convert the disk you deleted from the array to a LVM PV and add it to a
> VG.
> 4. Create a new logical volume occupying almost all of the PV you just added
> (having a little slack space is usually a good thing).
> 5. Add use btrfs replace to add the LV to the BTRFS array while deleting one
> of the others.
> 6. Repeat from step 3-5 for each disk, but stop at step 4 when you have
> exactly one disk that isn't on LVM (so for four disks, stop at step four
> when you have 2 with BTRFS+LVM, one with just the LVM logical volume, and
> one with just BTRFS).
> 7. Reinstall GRUB (it should pull in LVM support now).
> 8. Use BTRFS replace to move the final BTRFS disk to the empty LVM volume.
> 9. Convert the now empty final disk to LVM using steps 3-4
> 10. Add the LV to the BTRFS array and rebalance to raid10.
> 11. Reinstall GRUB again (just to be certain).
>
> I've done essentially the same thing on numerous occasions when
> reprovisioning for various reasons, and it's actually one of the things
> outside of the xfstests that I check with my regression testing (including
> simulating a couple of the common failure modes).  It takes a while
> (especially for big arrays with lots of data), but it works, and is
> relatively safe (you are guaranteed to be able to rebuild a raid1 array of 3
> disks from just 2, so losing the disk in the process of copying it will not
> result in data loss unless you hit a kernel bug).

  reply	other threads:[~2016-05-20 21:31 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-15 12:12 Hot data tracking / hybrid storage Ferry Toth
2016-05-15 21:11 ` Duncan
2016-05-15 23:05   ` Kai Krakow
2016-05-17  6:27     ` Ferry Toth
2016-05-17 11:32       ` Austin S. Hemmelgarn
2016-05-17 18:33         ` Kai Krakow
2016-05-18 22:44           ` Ferry Toth
2016-05-19 18:09             ` Kai Krakow
2016-05-19 18:51               ` Austin S. Hemmelgarn
2016-05-19 21:01                 ` Kai Krakow
2016-05-20 11:46                   ` Austin S. Hemmelgarn
2016-05-19 23:23                 ` Henk Slager
2016-05-20 12:03                   ` Austin S. Hemmelgarn
2016-05-20 17:02                     ` Ferry Toth
2016-05-20 17:59                       ` Austin S. Hemmelgarn
2016-05-20 21:31                         ` Henk Slager [this message]
2016-05-29  6:23                         ` Andrei Borzenkov
2016-05-29 17:53                           ` Chris Murphy
2016-05-29 18:03                             ` Holger Hoffstätte
2016-05-29 18:33                               ` Chris Murphy
2016-05-29 20:45                                 ` Ferry Toth
2016-05-31 12:21                                   ` Austin S. Hemmelgarn
2016-06-01 10:45                                   ` Dmitry Katsubo
2016-05-20 22:26                     ` Henk Slager
2016-05-23 11:32                       ` Austin S. Hemmelgarn
2016-05-16 11:25 ` Austin S. Hemmelgarn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPmG0ja4UVHH0VkjAM3iobjz69Gi5DbxkwzzTzbKkH6Ddf4uUQ@mail.gmail.com \
    --to=eye1tm@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).