* Re: Metadata / Data on Heterogeneous Media
2018-02-15 17:15 Metadata / Data on Heterogeneous Media Ellis H. Wilson III
@ 2018-02-15 19:06 ` Adam Borowski
2018-02-15 20:30 ` Ellis H. Wilson III
2018-02-15 19:11 ` Hugo Mills
2018-02-16 3:57 ` Qu Wenruo
2 siblings, 1 reply; 6+ messages in thread
From: Adam Borowski @ 2018-02-15 19:06 UTC (permalink / raw)
To: Ellis H. Wilson III; +Cc: Btrfs BTRFS
On Thu, Feb 15, 2018 at 12:15:49PM -0500, Ellis H. Wilson III wrote:
> In discussing the performance of various metadata operations over the past
> few days I've had this idea in the back of my head, and wanted to see if
> anybody had already thought about it before (likely, I would guess).
>
> It appears based on this page:
> https://btrfs.wiki.kernel.org/index.php/Btrfs_design
> that data and metadata in BTRFS are fairly well isolated from one another,
> particularly in the case of large files. This appears reinforced by a
> recent comment from Qu ("...btrfs strictly
> split metadata and data usage...").
>
> Yet, while there are plenty of options to RAID0/1/10/etc across generally
> homogeneous media types, there doesn't appear to be any functionality (at
> least that I can find) to segment different BTRFS internals to different
> types of devices. E.G., place metadata trees and extent block groups on
> SSD, and data trees and extent block groups on HDD(s).
>
> Is this something that has already been considered (and if so, implemented,
> which would make me extremely happy)? Is it feasible it is hasn't been
> approached yet? I admit my internal knowledge of BTRFS is fleeting, though
> I'm trying to work on that daily at this time, so forgive me if this is
> unapproachable for obvious architectural reasons.
Considered: many times. It's an obvious improvement, and one that shouldn't
even be that hard to implement. What remains, it's SMoC then SMoR (Simple
Matter of Coding then Simple Matter of Review), but both of those are in
short supply.
After the maximum size of inline extents has been lowered, there's no real
point in putting different types of metadata or not-really-metadata on
different media: thus, existing split of data -vs- metadata block groups is
fine.
What you'd want is an ability to tell the block allocator that metadata
block groups should prefer device[s] A, while data ones, device[s] B.
Right now, the allocator's algorithm is: any new allocations are placed on
device that has the most available space, for 2nd/etc RAID chunk obviously
excluding the device which 1st chunk has been already placed on. This is
optimal wrt not wasting space, but doesn't always provide best performance,
especially when devices' speed varies. There are also other downsides, like
usual RAID10 having 2/3 chance for tolerating two missing devices, while
btrfs RAID10 almost guarantees massive data loss with two missing devices.
Thus, allowing to specify an allocation policy that alters this algorithm
would be the way to go.
Meow!
--
⢀⣴⠾⠻⢶⣦⠀ The bill with 3 years prison for mentioning Polish concentration
⣾⠁⢰⠒⠀⣿⡁ camps is back. What about KL Warschau (operating until 1956)?
⢿⡄⠘⠷⠚⠋⠀ Zgoda? Łambinowice? Most ex-German KLs? If those were "soviet
⠈⠳⣄⠀⠀⠀⠀ puppets", Bereza Kartuska? Sikorski's camps in UK (thanks Brits!)?
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: Metadata / Data on Heterogeneous Media
2018-02-15 19:06 ` Adam Borowski
@ 2018-02-15 20:30 ` Ellis H. Wilson III
0 siblings, 0 replies; 6+ messages in thread
From: Ellis H. Wilson III @ 2018-02-15 20:30 UTC (permalink / raw)
To: Adam Borowski; +Cc: Btrfs BTRFS
On 02/15/2018 02:06 PM, Adam Borowski wrote:
> On Thu, Feb 15, 2018 at 12:15:49PM -0500, Ellis H. Wilson III wrote:
>> In discussing the performance of various metadata operations over the past
>> few days I've had this idea in the back of my head, and wanted to see if
>> anybody had already thought about it before (likely, I would guess).
>>
>> It appears based on this page:
>> https://btrfs.wiki.kernel.org/index.php/Btrfs_design
>> that data and metadata in BTRFS are fairly well isolated from one another,
>> particularly in the case of large files. This appears reinforced by a
>> recent comment from Qu ("...btrfs strictly
>> split metadata and data usage...").
>>
>> Yet, while there are plenty of options to RAID0/1/10/etc across generally
>> homogeneous media types, there doesn't appear to be any functionality (at
>> least that I can find) to segment different BTRFS internals to different
>> types of devices. E.G., place metadata trees and extent block groups on
>> SSD, and data trees and extent block groups on HDD(s).
>>
>> Is this something that has already been considered (and if so, implemented,
>> which would make me extremely happy)? Is it feasible it is hasn't been
>> approached yet? I admit my internal knowledge of BTRFS is fleeting, though
>> I'm trying to work on that daily at this time, so forgive me if this is
>> unapproachable for obvious architectural reasons.
>
> Considered: many times. It's an obvious improvement, and one that shouldn't
> even be that hard to implement. What remains, it's SMoC then SMoR (Simple
> Matter of Coding then Simple Matter of Review), but both of those are in
> short supply.
Glad to hear it's been discussed, and I understand the issue of
resources all too well with the project I'm working on. Maybe if my
nights and weekends open up...
> After the maximum size of inline extents has been lowered, there's no real
> point in putting different types of metadata or not-really-metadata on
> different media: thus, existing split of data -vs- metadata block groups is
> fine.
That was my thought. Regarding inlined data, I'm actually quite ok with
that being on SSD, as that would deliver fast access to tiny objects
where if you went to HDD you'd spend the great majority of your time
just seeking to the data in question compared to transfer.
Our existing COW filesystem this is replacing actually did exactly this,
except it would store the first N KB of each and every object on SSD, so
even for large files you could get the headers out quickly (as many
indexing apps want to do).
Best,
ellis
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Metadata / Data on Heterogeneous Media
2018-02-15 17:15 Metadata / Data on Heterogeneous Media Ellis H. Wilson III
2018-02-15 19:06 ` Adam Borowski
@ 2018-02-15 19:11 ` Hugo Mills
2018-02-15 20:31 ` Ellis H. Wilson III
2018-02-16 3:57 ` Qu Wenruo
2 siblings, 1 reply; 6+ messages in thread
From: Hugo Mills @ 2018-02-15 19:11 UTC (permalink / raw)
To: Ellis H. Wilson III; +Cc: Btrfs BTRFS
[-- Attachment #1: Type: text/plain, Size: 1864 bytes --]
On Thu, Feb 15, 2018 at 12:15:49PM -0500, Ellis H. Wilson III wrote:
> In discussing the performance of various metadata operations over
> the past few days I've had this idea in the back of my head, and
> wanted to see if anybody had already thought about it before
> (likely, I would guess).
>
> It appears based on this page:
> https://btrfs.wiki.kernel.org/index.php/Btrfs_design
> that data and metadata in BTRFS are fairly well isolated from one
> another, particularly in the case of large files. This appears
> reinforced by a recent comment from Qu ("...btrfs strictly
> split metadata and data usage...").
>
> Yet, while there are plenty of options to RAID0/1/10/etc across
> generally homogeneous media types, there doesn't appear to be any
> functionality (at least that I can find) to segment different BTRFS
> internals to different types of devices. E.G., place metadata trees
> and extent block groups on SSD, and data trees and extent block
> groups on HDD(s).
>
> Is this something that has already been considered (and if so,
> implemented, which would make me extremely happy)? Is it feasible
> it is hasn't been approached yet? I admit my internal knowledge of
> BTRFS is fleeting, though I'm trying to work on that daily at this
> time, so forgive me if this is unapproachable for obvious
> architectural reasons.
Well, it's been discussed, and I wrote up a theoretical framework
which should cover a wide range of use-cases:
https://www.spinics.net/lists/linux-btrfs/msg33916.html
I never got round to implementing it, though -- I ran into issues
over storing the properties/metadata needed to configure it.
Hugo.
--
Hugo Mills | Dullest spy film ever: The Eastbourne Ultimatum
hugo@... carfax.org.uk |
http://carfax.org.uk/ |
PGP: E2AB1DE4 | The Thick of It
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: Metadata / Data on Heterogeneous Media
2018-02-15 19:11 ` Hugo Mills
@ 2018-02-15 20:31 ` Ellis H. Wilson III
0 siblings, 0 replies; 6+ messages in thread
From: Ellis H. Wilson III @ 2018-02-15 20:31 UTC (permalink / raw)
To: Hugo Mills, Btrfs BTRFS
On 02/15/2018 02:11 PM, Hugo Mills wrote:
> On Thu, Feb 15, 2018 at 12:15:49PM -0500, Ellis H. Wilson III wrote:
>> In discussing the performance of various metadata operations over
>> the past few days I've had this idea in the back of my head, and
>> wanted to see if anybody had already thought about it before
>> (likely, I would guess).
>>
>> It appears based on this page:
>> https://btrfs.wiki.kernel.org/index.php/Btrfs_design
>> that data and metadata in BTRFS are fairly well isolated from one
>> another, particularly in the case of large files. This appears
>> reinforced by a recent comment from Qu ("...btrfs strictly
>> split metadata and data usage...").
>>
>> Yet, while there are plenty of options to RAID0/1/10/etc across
>> generally homogeneous media types, there doesn't appear to be any
>> functionality (at least that I can find) to segment different BTRFS
>> internals to different types of devices. E.G., place metadata trees
>> and extent block groups on SSD, and data trees and extent block
>> groups on HDD(s).
>>
>> Is this something that has already been considered (and if so,
>> implemented, which would make me extremely happy)? Is it feasible
>> it is hasn't been approached yet? I admit my internal knowledge of
>> BTRFS is fleeting, though I'm trying to work on that daily at this
>> time, so forgive me if this is unapproachable for obvious
>> architectural reasons.
>
> Well, it's been discussed, and I wrote up a theoretical framework
> which should cover a wide range of use-cases:
>
> https://www.spinics.net/lists/linux-btrfs/msg33916.html
>
> I never got round to implementing it, though -- I ran into issues
> over storing the properties/metadata needed to configure it.
Very interesting thread. Thank you for sharing Hugo. That nomenclature
is rather expressive, and the design covers a much broader base than I
was imagining.
Best,
ellis
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Metadata / Data on Heterogeneous Media
2018-02-15 17:15 Metadata / Data on Heterogeneous Media Ellis H. Wilson III
2018-02-15 19:06 ` Adam Borowski
2018-02-15 19:11 ` Hugo Mills
@ 2018-02-16 3:57 ` Qu Wenruo
2 siblings, 0 replies; 6+ messages in thread
From: Qu Wenruo @ 2018-02-16 3:57 UTC (permalink / raw)
To: Ellis H. Wilson III, Btrfs BTRFS
[-- Attachment #1.1: Type: text/plain, Size: 2196 bytes --]
On 2018年02月16日 01:15, Ellis H. Wilson III wrote:
> In discussing the performance of various metadata operations over the
> past few days I've had this idea in the back of my head, and wanted to
> see if anybody had already thought about it before (likely, I would guess).
>
> It appears based on this page:
> https://btrfs.wiki.kernel.org/index.php/Btrfs_design
> that data and metadata in BTRFS are fairly well isolated from one
> another, particularly in the case of large files. This appears
> reinforced by a recent comment from Qu ("...btrfs strictly
> split metadata and data usage...").
>
> Yet, while there are plenty of options to RAID0/1/10/etc across
> generally homogeneous media types, there doesn't appear to be any
> functionality (at least that I can find) to segment different BTRFS
> internals to different types of devices. E.G., place metadata trees and
> extent block groups on SSD, and data trees and extent block groups on
> HDD(s).
Just want to point out that, metadata trees, block groups, *AND* tree
blocks of file trees (data trees in your words) are all *METADATA*.
That's to say, btrfs can't isolate tree blocks from file trees and other
trees. They are all tree blocks, thus all metadata.
Real data is, non-inlined, non-hole data. Tree blocks of file tress has
pointers to data, but the tree blocks of file trees are still metadata.
So even on day, btrfs supports to alloc data/meta chunks using different
strategy, we are still a step away from your
different-chunk-for-different-tree idea.
Thanks,
Qu
>
> Is this something that has already been considered (and if so,
> implemented, which would make me extremely happy)? Is it feasible it is
> hasn't been approached yet? I admit my internal knowledge of BTRFS is
> fleeting, though I'm trying to work on that daily at this time, so
> forgive me if this is unapproachable for obvious architectural reasons.
>
> Best,
>
> ellis
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 520 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread