linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* BTRFS hot relocation not merged
@ 2015-02-19 11:49 Max Schettler
  2015-02-19 21:06 ` Duncan
  0 siblings, 1 reply; 3+ messages in thread
From: Max Schettler @ 2015-02-19 11:49 UTC (permalink / raw)
  To: Zhi Yong Wu; +Cc: linux-btrfs

Hi,

I recently was looking for the status of hot relocation on btrfs.
There seemed to be some activity on the mailinglist around 5/2013
regarding patches that should provide the functionality.
However they have not been merged yet and there hasn`t been
further discussion about them (to my knowledge).
What is the status of hot relocation?

Max

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: BTRFS hot relocation not merged
  2015-02-19 11:49 BTRFS hot relocation not merged Max Schettler
@ 2015-02-19 21:06 ` Duncan
  2015-02-19 23:07   ` Kai Krakow
  0 siblings, 1 reply; 3+ messages in thread
From: Duncan @ 2015-02-19 21:06 UTC (permalink / raw)
  To: linux-btrfs

Max Schettler posted on Thu, 19 Feb 2015 12:49:37 +0100 as excerpted:

> I recently was looking for the status of hot relocation on btrfs.
> There seemed to be some activity on the mailinglist around 5/2013
> regarding patches that should provide the functionality.
> However they have not been merged yet and there hasn`t been further
> discussion about them (to my knowledge).
> What is the status of hot relocation?

The current suggestion is to use something like bcache or dmcache in 
tandem with btrfs.  I'm not sure of dmcache/btrfs status, but there are 
people actually using bcache/btrfs here on this list, with the reports 
I've read generally very positive.

Longer term, the feature in various forms remains on the wiki's project 
ideas page, here:

https://btrfs.wiki.kernel.org/index.php/Project_ideas

However, as can be seen on that page, btrfs is definitely not lacking in 
ideas for future development, rather the reverse, and unfortunately btrfs 
in general has a history of wildly optimistic feature ETAs, tho they do 
eventually come online, with raid56 mode being the most recent example.

That being the case and with none of the variants of the suggestion 
already formally claimed and in-progress, I'd suggest checking back in 
3-5 years...  unless of course this is a feeler and you're proposing to 
claim and implement it yourself. =:^)

There's also this rather vague comment on the wiki, on the main page, 
under Features, additional features in development or planned (so closer 
to News, then scroll up a bit)...

* Hot data tracking and moving to faster devices (currently being pushed 
as a generic feature available through VFS)

https://btrfs.wiki.kernel.org/index.php/Main_Page#News

(and scroll up a bit)

I'm not sure if that refers to bcache and similar, or something else, tho 
I didn't check the talk and history pages, which may have a hint...

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: BTRFS hot relocation not merged
  2015-02-19 21:06 ` Duncan
@ 2015-02-19 23:07   ` Kai Krakow
  0 siblings, 0 replies; 3+ messages in thread
From: Kai Krakow @ 2015-02-19 23:07 UTC (permalink / raw)
  To: linux-btrfs

Duncan <1i5t5.duncan@cox.net> schrieb:

> Max Schettler posted on Thu, 19 Feb 2015 12:49:37 +0100 as excerpted:
> 
>> I recently was looking for the status of hot relocation on btrfs.
>> There seemed to be some activity on the mailinglist around 5/2013
>> regarding patches that should provide the functionality.
>> However they have not been merged yet and there hasn`t been further
>> discussion about them (to my knowledge).
>> What is the status of hot relocation?
> 
> The current suggestion is to use something like bcache or dmcache in
> tandem with btrfs.  I'm not sure of dmcache/btrfs status, but there are
> people actually using bcache/btrfs here on this list, with the reports
> I've read generally very positive.

Yes, here's one! :-)

[...]
> There's also this rather vague comment on the wiki, on the main page,
> under Features, additional features in development or planned (so closer
> to News, then scroll up a bit)...
> 
> * Hot data tracking and moving to faster devices (currently being pushed
> as a generic feature available through VFS)
> 
> https://btrfs.wiki.kernel.org/index.php/Main_Page#News
> 
> (and scroll up a bit)
> 
> I'm not sure if that refers to bcache and similar, or something else, tho
> I didn't check the talk and history pages, which may have a hint...

Actually, bcache does not implement hot data tracking. It more or less acts 
as a huge scheduler (so it is in a range with deadline/cfq/... and friends) 
and thus minimizes seek times as its primary focus. This is achieved by 
trying to detect random reads and optionally writes, and caching those in a 
log structured file systems by using access patterns optimized for non-
rotational media. Optionally cached writes are written back lazily in the 
background and reordered to minimize seek and maximize throuput to the 
rotational media. Linear access patterns are directly passed through to the 
rotational media as they are not that bad for those kind of access patterns 
(at least compared with past-generation SSDs). In that regard, even a good 
USB stick could do as a cache, or an internal card reader - tho I'd probably 
strongly recommend against using it.

The nice thing is, that this way, bcache can combine mixed fast SSD random 
access patterns and linear HDD access patterns into one stream with summed 
transfer rates. So it is by definition faster than plain HDD access on its 
own.

But it even goes beyond: The read and write latencies of the cache devices 
are measured, and if it goes above a certain threshold, it will fall back 
fetching the data from the slower device which probably will, and this is a 
heuristic, have the data ready faster then the congested caching device. 
This is pretty neat, as it adds benefit to the summed transfer rates.

With this, if I can trust ksysguard, I get transfer rates of up to 800 MB/s 
in a bcache+3xbtrfs(mraid1,draid0) setup, tho most times it peaks at around 
150 MB/s where I had around 80 MB/s usual peaks without bcache. But this is 
not the main benefit. My access latencies and IO queue depths have gone down 
to virtually zero. And this is probably where the most speedup comes from.

System boot (on systemd, with services like postfix and mariadb, using 
autodefrag and readahead) went down from around 60s to 5s (measured in 
systemd-analyze critical path), with almost no seeking sounds from the 
harddisks. KDE starts a lot faster now (maybe another 60-80s down to around 
10s) and is instantly responsive with all panels, backgrounds and icons 
loaded when the splash fades out while I had a black background and a lot of 
ongoing IO previously after splash faded out.

The cache hit rate is usually above 80% with an 80 GB bcache partition for a 
3x 1TB btrfs volume. My SSD is specified with 550 MB/s reading and 150 MB/s 
writing. Measured it's lower (around 480/130) but still faster than HDD even 
at linear writing.

I'm using writeback. And I had no data loss or inconsistencies yet, even I 
had to hard reboot one time or another. But btrfs without bcache has also 
been rock solid for me in the past few months wrt hard reboots or powerloss. 
Some people actually say, with bcache the probability of loosing data should 
be potentially lower as the data is faster on stable storage and thus 
transactions on btrfs can be closed faster. While bcache will still be in 
dirty state, it will write back data later and replay its log if it didn't 
finish before rebooting. Well, bcache is always in dirty state, by design.

I just wonder what role bcache would play in writeback mode and btrfs-raid 
scenario as a single bcache device covers multiple btrfs devices when btrfs 
itself assumes (and only sees) multiple devices - but it's actually one when 
passed through bcache first. Write errors may go undetected (because bcache 
writes behind) while btrfs still sees good data from the cache. But btrfs 
checksums should probably handle this anyways... I'm not sure. Maybe bcache 
should not allow reading blocks from the cache which are going to be written 
back, and then evict written blocks from the cache before those need to be 
read again from the backing device. It would ensure that btrfs really sees 
what is on the platter instead of what's maybe cached. Probably in the end, 
it's the same problem as bit-rot when bcache and HDD unkowningly don't match 
and later bcache evicts good data from cache and leaves bad data behind.

Ahh, complicated... ;-)

But I trust bcache by now though I didn't forcibly try the big disasters (by 
cutting the power cord during heavy IO or similar funny things). And 
nevertheless, I still have my daily backups around. ;-)

Altogehter, I wonder if having a real hot data cache would bring so much 
additional benefit. Maybe only when it's huge and when it's really fast (I 
mean those SSDs capable of doing 500+ MB/s at reading AND writing).

-- 
Replies to list only preferred.


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2015-02-19 23:17 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-02-19 11:49 BTRFS hot relocation not merged Max Schettler
2015-02-19 21:06 ` Duncan
2015-02-19 23:07   ` Kai Krakow

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).