From: Kai Krakow <hurikhan77+btrfs@gmail.com>
To: linux-btrfs@vger.kernel.org
Subject: Re: [RFC 0/5] BTRFS hot relocation support
Date: Thu, 16 May 2013 09:12:53 +0200 [thread overview]
Message-ID: <mgde6a-75n.ln1@hurikhan.ath.cx> (raw)
In-Reply-To: CAEH94LgEAcOHJKpuY+fY4cscxJ+QbynC0H9WbSGiV0fKp+-Ajw@mail.gmail.com
Hi!
I think such a solution as part of the filesystem could do much better than
something outside of it (like bcache). But I'm not sure: What makes data
hot? I think the most benefit is detecting random read access and mark only
those data as hot, also writes should go to the SSD first and then should be
spooled to the harddisks in background. Bcache does a lot regarding this.
Since this is within the filesystem, users could even mark files as being
always "hot" with some attribute or ioctl. This could be used by a boot-
readahead and preload implementation to automatically make files hot used
during booting or for preloading when I start an application.
On the other side hot relocation should be able to reduce writes to the SSD
as good as possible, for example: Do not defragment files during autodefrag,
it makes no sense. Also write data in bursts of erase block size etc.
And also important: What if the SSD dies due to wearing? Will it gracefully
fall back to harddisk? What does "relocation" mean? Files (hot data) should
only be cached in copy to SSD, and not moved there. It should be possible
for btrfs to just drop a failing SSD from the filesystem without data loss
because otherwise one should use two SSDs in raid-1 mode to get a safe cache
storage.
Altogether I think that a spinning media btrfs raid can outperform a single
SSD so hot relocation should probably be used to reduce head movements
because this is where SSD really excels. So everything that involves heavy
head movement should go to SSD first, then written back to harddisk. And I
think there's a lot potential to optimize because a COW filesystem like
btrfs naturally has a lot of head movement.
What do you think?
BTW: I have not tried the one or the other yet because I'm still deciding
which way to go. Your patches are more welcome because I do not need to
migrate my storage to bcache-provided block devices. OTOH the bcache
implementation looks a lot more mature (with regard to performance and
safety) at this point because it provides many of the above mentioned
features - most importantly gracefully handling failing SSDs.
Regarding btrfs raid outperforms SSD: During boot my spinning media 3 device
btrfs raid reads boot files with up to 600 MB/s (from LZ compressed fs),
boot takes about 7 seconds until the display manager starts (which takes
another 30 seconds but that's another story), and the system is pretty
crowded with services I actually wouldn't need if I optimized for boot
performance. But I think systemd's read-ahead implementation has a lot
influence on this fast booting: It defragments and relocates boot files on
btrfs during boot so the harddisks can sequentially read all this stuff. I
think it also compresses boot files if compression is enabled because
booting is IO bound, not CPU bound. Benchmarks showed that my btrfs raid
could technically read up to 450 MB/s, so I think the 600 MB/s counts for
decompressed data. A single SSD could not do that. For that same reason I
created a small script to defragment and compress files used by the preload
daemon. Without benchmarking it, this felt like another small performance
boost. So I'm eager what could be next with some sort of SSD cache because
the only problem left seems to be heavy head movement which slows down the
system.
Zhi Yong Wu <zwu.kernel@gmail.com> schrieb:
> HI,
>
> What do you think if its design approach goes correctly? Do you
> have any comments or better design idea for BTRFS hot relocation
> support? any comments are appreciated, thanks.
>
>
> On Mon, May 6, 2013 at 4:53 PM, <zwu.kernel@gmail.com> wrote:
>> From: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
>>
>> The patchset as RFC is sent out mainly to see if it goes in the
>> correct development direction.
>>
>> The patchset is trying to introduce hot relocation support
>> for BTRFS. In hybrid storage environment, when the data in
>> HDD disk get hot, it can be relocated to SSD disk by BTRFS
>> hot relocation support automatically; also, if SSD disk ratio
>> exceed its upper threshold, the data which get cold can be
>> looked up and relocated to HDD disk to make more space in SSD
>> disk at first, and then the data which get hot will be relocated
>> to SSD disk automatically.
>>
>> BTRFS hot relocation mainly reserve block space from SSD disk
>> at first, load the hot data to page cache from HDD, allocate
>> block space from SSD disk, and finally write the data to SSD disk.
>>
>> If you'd like to play with it, pls pull the patchset from
>> my git on github:
>> https://github.com/wuzhy/kernel.git hot_reloc
>>
>> For how to use, please refer too the example below:
>>
>> root@debian-i386:~# echo 0 > /sys/block/vdc/queue/rotational
>> ^^^ Above command will hack /dev/vdc to be one SSD disk
>> root@debian-i386:~# echo 999999 > /proc/sys/fs/hot-age-interval
>> root@debian-i386:~# echo 10 > /proc/sys/fs/hot-update-interval
>> root@debian-i386:~# echo 10 > /proc/sys/fs/hot-reloc-interval
>> root@debian-i386:~# mkfs.btrfs -d single -m single -h /dev/vdb /dev/vdc
>> -f
>>
>> WARNING! - Btrfs v0.20-rc1-254-gb0136aa-dirty IS EXPERIMENTAL
>> WARNING! - see http://btrfs.wiki.kernel.org before using
>>
>> [ 140.279011] device fsid c563a6dc-f192-41a9-9fe1-5a3aa01f5e4c devid 1
>> [ transid 16 /dev/vdb 140.283650] device fsid
>> [ c563a6dc-f192-41a9-9fe1-5a3aa01f5e4c devid 2 transid 16 /dev/vdc
>> [ 140.517089] device fsid 197d47a7-b9cd-46a8-9360-eb087b119424 devid 1
>> [ transid 3 /dev/vdb 140.550759] device fsid
>> [ 197d47a7-b9cd-46a8-9360-eb087b119424 devid 1 transid 3 /dev/vdb
>> [ 140.552473] device fsid c563a6dc-f192-41a9-9fe1-5a3aa01f5e4c devid 2
>> [ transid 16 /dev/vdc
>> adding device /dev/vdc id 2
>> [ 140.636215] device fsid 197d47a7-b9cd-46a8-9360-eb087b119424 devid 2
>> [ transid 3 /dev/vdc
>> fs created label (null) on /dev/vdb
>> nodesize 4096 leafsize 4096 sectorsize 4096 size 14.65GB
>> Btrfs v0.20-rc1-254-gb0136aa-dirty
>> root@debian-i386:~# mount -o hot_move /dev/vdb /data2
>> [ 144.855471] device fsid 197d47a7-b9cd-46a8-9360-eb087b119424 devid 1
>> [ transid 6 /dev/vdb 144.870444] btrfs: disk space caching is enabled
>> [ 144.904214] VFS: Turning on hot data tracking
>> root@debian-i386:~# dd if=/dev/zero of=/data2/test1 bs=1M count=2048
>> 2048+0 records in
>> 2048+0 records out
>> 2147483648 bytes (2.1 GB) copied, 23.4948 s, 91.4 MB/s
>> root@debian-i386:~# df -h
>> Filesystem Size Used Avail Use% Mounted on
>> /dev/vda1 16G 13G 2.2G 86% /
>> tmpfs 4.8G 0 4.8G 0% /lib/init/rw
>> udev 10M 176K 9.9M 2% /dev
>> tmpfs 4.8G 0 4.8G 0% /dev/shm
>> /dev/vdb 15G 2.0G 13G 14% /data2
>> root@debian-i386:~# btrfs fi df /data2
>> Data: total=3.01GB, used=2.00GB
>> System: total=4.00MB, used=4.00KB
>> Metadata: total=8.00MB, used=2.19MB
>> Data_SSD: total=8.00MB, used=0.00
>> root@debian-i386:~# echo 108 > /proc/sys/fs/hot-reloc-threshold
>> ^^^ Above command will start HOT RLEOCATE, because The data temperature
>> is currently 109 root@debian-i386:~# df -h
>> Filesystem Size Used Avail Use% Mounted on
>> /dev/vda1 16G 13G 2.2G 86% /
>> tmpfs 4.8G 0 4.8G 0% /lib/init/rw
>> udev 10M 176K 9.9M 2% /dev
>> tmpfs 4.8G 0 4.8G 0% /dev/shm
>> /dev/vdb 15G 2.1G 13G 14% /data2
>> root@debian-i386:~# btrfs fi df /data2
>> Data: total=3.01GB, used=6.25MB
>> System: total=4.00MB, used=4.00KB
>> Metadata: total=8.00MB, used=2.26MB
>> Data_SSD: total=2.01GB, used=2.00GB
>> root@debian-i386:~#
>>
>> Zhi Yong Wu (5):
>> vfs: add one list_head field
>> btrfs: add one new block group
>> btrfs: add one hot relocation kthread
>> procfs: add three proc interfaces
>> btrfs: add hot relocation support
>>
>> fs/btrfs/Makefile | 3 +-
>> fs/btrfs/ctree.h | 26 +-
>> fs/btrfs/extent-tree.c | 107 +++++-
>> fs/btrfs/extent_io.c | 31 +-
>> fs/btrfs/extent_io.h | 4 +
>> fs/btrfs/file.c | 36 +-
>> fs/btrfs/hot_relocate.c | 802
>> +++++++++++++++++++++++++++++++++++++++++++
>> fs/btrfs/hot_relocate.h | 48 +++
>> fs/btrfs/inode-map.c | 13 +-
>> fs/btrfs/inode.c | 92 ++++-
>> fs/btrfs/ioctl.c | 23 +-
>> fs/btrfs/relocation.c | 14 +-
>> fs/btrfs/super.c | 30 +-
>> fs/btrfs/volumes.c | 28 +-
>> fs/hot_tracking.c | 1 +
>> include/linux/btrfs.h | 4 +
>> include/linux/hot_tracking.h | 1 +
>> kernel/sysctl.c | 22 ++
>> 18 files changed, 1234 insertions(+), 51 deletions(-)
>> create mode 100644 fs/btrfs/hot_relocate.c
>> create mode 100644 fs/btrfs/hot_relocate.h
>>
>> --
>> 1.7.11.7
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
>
next prev parent reply other threads:[~2013-05-16 7:18 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-05-06 8:53 [RFC 0/5] BTRFS hot relocation support zwu.kernel
2013-05-06 8:53 ` [RFC 1/5] vfs: add one list_head field zwu.kernel
2013-05-06 8:53 ` [RFC 2/5] btrfs: add one new block group zwu.kernel
2013-05-06 8:53 ` [RFC 3/5] btrfs: add one hot relocation kthread zwu.kernel
2013-05-06 8:53 ` [RFC 4/5] procfs: add three proc interfaces zwu.kernel
2013-05-06 8:53 ` [RFC 5/5] btrfs: add hot relocation support zwu.kernel
2013-05-06 20:36 ` [RFC 0/5] BTRFS " Kai Krakow
2013-05-07 5:17 ` Tomasz Torcz
2013-05-07 21:17 ` Kai Krakow
2013-05-07 21:35 ` Gabriel de Perthuis
2013-05-07 21:58 ` Kai Krakow
2013-05-07 22:27 ` Gabriel de Perthuis
2013-05-08 23:13 ` Zhi Yong Wu
2013-05-09 6:30 ` Stefan Behrens
2013-05-09 6:42 ` Zhi Yong Wu
2013-05-09 7:41 ` Stefan Behrens
2013-05-09 7:49 ` Zhi Yong Wu
2013-05-09 7:28 ` Zheng Liu
2013-05-09 6:56 ` Roger Binns
2013-05-19 10:41 ` Martin Steigerwald
2013-05-19 13:43 ` Zhi Yong Wu
2013-05-19 14:42 ` Martin Steigerwald
2013-05-19 13:46 ` Zhi Yong Wu
2013-05-09 7:17 ` Gabriel de Perthuis
2013-05-14 15:24 ` Zhi Yong Wu
2013-05-16 7:12 ` Kai Krakow [this message]
2013-05-17 7:23 ` Zhi Yong Wu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=mgde6a-75n.ln1@hurikhan.ath.cx \
--to=hurikhan77+btrfs@gmail.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.