* general thoughts and questions + general and RAID5/6 stability? @ 2014-08-31 4:02 Christoph Anton Mitterer 0 siblings, 0 replies; 13+ messages in thread From: Christoph Anton Mitterer @ 2014-08-31 4:02 UTC (permalink / raw) To: linux-btrfs@vger.kernel.org [-- Attachment #1: Type: text/plain, Size: 13845 bytes --] Hey. For some time now I consider to use btrfs at a larger scale, basically in two scenarios: a) As the backend for data pools handled by dcache (dcache.org), where we run a Tier-2 in the higher PiB range for the LHC Computing Grid... For now that would be rather "boring" use of btrfs (i.e. not really using any of its advanced features) and also RAID functionality would still be provided by hardware (at least with the current hardware generations we have in use). b) Personally, for my NAS. Here the main goal is less performance but rather data safety (i.e. I want something like RAID6 or better) and security (i.e. it will be on top of dm-crypt/LUKS) and integrity. Hardware wise I'll use and UPS as well as enterprise SATA disks, from different vendors respectively different production lots. (Of course I'm aware that btrfs is experimental, and I would have regular backups) 1) Now I've followed linux-btrfs for a while and blogs like Marc's... and I still read about a lot of stability problems, some which sound quite serious. Sure we have a fsck now, but even in the wiki one can read statements like "the developers use it on their systems without major problems"... but also "if you do this, it could help you... or break even more". I mean I understand that there won't be a single point in time, where Chris Mason says "now it's stable" and it would be rock solid form that point on... but especially since new features (e.g. things like subvolume quota groups, online/offline dedup, online/offline fsck) move (or will) move in with every new version... one has (as an end-user) basically no chance to determine what can be used safely and what tickles the devil. So one issue I have is to determine the general stability of the different parts. 2) Documentation status... I feel that some general and extensive documentation is missing. One that basically handles (and teaches) all the things which are specific to modern (especially CoW) filesystems. - General design, features and problems of CoW and btrfs - Special situations that arise from the CoW, e.g. that one may not be able to remove files once the fs is full,... or that just reading files could make the used space grow (via the atime) - General guidelines when and how to use nodatacow... i.e. telling people for which kinds of files this SHOULD usually be done (VM images)... and what this means for those files (not checksumming) and what the drawbacks are if it's not used (e.g. if people insist on having the checksumming - what happens to the performance of VM images? what about the wear with SSDs?) - the implications of things like compression and hash algos... whether and when this will have performance impacts (positive or negative) and when not. - the typical lifecycles and procedures when using stuff like multiple devices (how to replace a faulty disk) or important hints like (don't span a btrfs RAID over multiple partitions on the same disk) - especially with the different (mount)options, I mean things that change the way the fs works like no-hole or mixed data/meta block groups... people need to have some general information when to choose which and some real world examples of disadvantages / advantages. E.g. what are the disadvantages of having mixed data/meta block groups? If there'd be only advantages, why wouldn't it be the default? Parts of this is already scattered over LWN articles, the wiki (however the quality greatly "varies" there), blog posts or mailing list posts... many of the information there is however outdated... and suggested procedures (e.g. how to replace a faulty disk) differ from example to example. An admin that wants to use btrfs shouldn't be required to pick all this together (which is basically impossible).. there should be a manpage (which is kept up to date!) that describes all this. Other important things to document (which I couldn't fine so far in most cases): What is actually guaranteed by btrfs respectively its design? For example: - If there'd be no bugs in the code,.. would the fs be guaranteed to be always consistent by it's CoW design? Or are there circumstances where it can still run into being inconsistent? - Does this basically mean, that even without and fs journal,.. my database is always consistent even if I have a power cut or system crash? - At which places does checksumming take place? Just data or also meta data? And is the checksumming chained as with ZFS, so that every change in blocks, triggers changes in the "upper" metadata blocks up to the superblock(s)? - When are these checksums verified? Only on fsck/scrub? Or really on every read? All this is information needed by an admin to determine what the system actually guarantees or how it behaves. - How much data/metadata (in terms of bytes) is covered by one checksum value? And if that varies, what's the maximum size? I mean if there would be on CRC32 per file (which can be GiB large) which would be read every time a single byte of that file is read... this would probably be bad ;) ... so we should tell the user "no we do this block or extent wise"... And since e.g. CRC32 is maybe not well suited for very big chunks of data, the user may want to know how much data is "protected" by one hash value... so that he can decide whether to switch to another algorithm (if one should become available). - Does stacking with block layers work in all cases (and in which does it not)? E.g. btrfs on top of looback devices, dm-crypt, MD, lvm2? And also the other way round: What of these can be put on top of btrfs? There's the prominent case, that swap files don't work on btrfs. But documentation in that area should also contain performance instructions, i.e. that while it's possible to have swap on top of btrfs via loopback, it's perhaps stupid with CoW... or e.g. with dmcrypt+MD there were quite some heavy performance impacts depending on whether dmcrypt was below or above MD. Now of course normally, dmcrypt will be below btrfs,... but there are still performance questions e.g. how does this work with multiple devices? Is there one IO thread per device or one for all? Or questions like: Are there any stability issues when btrfs is stacked below/above other block layer, e.g. in case of power losses... especially since btrfs relies so heavy on barriers. Or questions like: Is btrfs stable if lower block layers modify data? e.g. if dmcrypt should ever support online re-encryption - Many things about RAID (but more on that later). 3) What about some nice features which many people probably want to see... Especially other compression algos (xz/lzma or lz4[hc]) and hash alogs (xxHash... some people may even be interested in things like SHA2 or Keccak). I know some of them are planned... but is there any real estimation on when they come? 4) Are (or how) exiting btrfs filesystems kept up to date when btrfs evolves over time? What I mean here is... over time, more and more features are added to btrfs... this is of course not always a change in the on disk format... but I always wonder a bit: If I write the same that of my existing fs into a freshly created one (with the same settings)... would it basically look like the same (of course not exactly)? In many of the mails here on the list respectively commit logs one can read things which sound as this happens quite often... that things (that affect how data is written on the disk) are now handled better. Or what if defaults change? E.g. if something new like no-hole would become the default for new filesystems? An admin cannot track all these things and understand which of them actually means that he should recreate the filesystem. Of course there's the balance operation... but does this really affect everything? So the question is basically: As btrfs evolves... how to I keep my existing filesystems up to date so that they are as if they were created as new. 5) btrfs management [G]UIs are needed Not sure whether this should be go into existing files managers (like nemo or konqueror) or something separate... but I definitely think, that the btrfs community will need to provide some kind of powerful management [G]UI. Such a manager is IMHO crucial for anything that behaves like a storage management system. What should it be able to do? a) Searching for btrfs specific properties, e.g. - files compressed with a given algo - files for which the compression ratio is <,>,= n% - files which are nodatacow - files for which integrity data is stored with a given hash algo - files with a given redundancy level (e.g. DUP or RAID1 or RAID6 or DUPn if that should ever come) - files which should have a given redundancy level, but whose actual level is different (e.g. due to a degraded state, or for which more block copies than desired are still available) - files which are defragmented at n% Of course all these conditions should be combinable, and one should have further conditions like m/c/a-times or like the subvolumes/snapshots that should be searched. b) File lists in such a manager should display many details like compression ratio, algos (compression, hash), number of fragments, whether blocks of that file are referenced by other files, etc. pp. c) Of course it should be easy to change all the properties from above for a files (well at least if that's possible in btrfs). Like when I want to have some files, or dirs/subdirs, recompressed with another algo, or uncompressed. Or triggering online defragmentation for all files of a given fragmentation level. Or maybe I want to set a higher redundancy level for files which I consider extremely precious to myself (not sure if it's planned to have different redundancy levels per file) d) Such manager should perhaps also go through the logs and tell things like: - when was the last complete balance - when was the last complete scrub - for which files happened integrity check problems during read/scrub... how many of these could be corrected via other block copies? e) Maybe it could give even more low level information, like showing how a file is distributed over the devices, e.g. how the blocks are located, or showing the location block copies or involved block devices for the redundancy levels. 6) RAID / Redundancy Levels a) Just some remark, I think it's a bad idea to call these RAID in the btrfs terminology... since what we do is not necessarily exactly the same like classic RAID... this becomes most obvious with RAID1, which behaves not as RAID1 should (i.e. one copy per disk)... at least the used names should comply with MD. b) In other words... I think there should be RAID1, which equals to 1 copy per underlying device. And it would be great to have a redundancy level DUPx, which is x copies for each block spread over the underlying devices. So if x is 6 and one has 3 underlying devices, each of them should have 2 copies of each block. I think the DUPx level is quite interesting to protect against single block failures, especially also on computers where one usually simply doesn't have more than one disk drive (e.g. notebooks). c) As I've noted before, I think it would be quite nice if it would be supported to have different redundancy levels for different files... e.g. less previous stuff like OS data could have DUP ... more valuable data could have RAID6... and my most precious data could have DUP5 (i.e. 5 copies of each block). If that would ever come, one would probably need to make that property inheritable by directories to be really useful. d) What's the status of the multi-parity RAID (i.e. more than tow parity blocks)? Weren't some patches for that posted a while ago? e) Most important: What's the status on RAID5/6? Is it still completely experimental or already well tested? Does rebuilding work? Does scrubbing work? I mean as far as I know, there are still important parts that miss so that it works at all, right? When can one expect work on that to be completed? f) Again, it detailed documentation should be added how the different redundancy levels actually work, e.g. - Is there a chunk size, can it be configured and how does it affect reads/writes (as with MD) - How do parallel reads happen if multiple blocks are available? What e.g. if there are multiple block copies per device? Is simply always the first tried to be read? Or the one with the best seek times? Or is this optimised with other reads? g) When a block is read (and the checksum is always verified), does that already work, that if verification fails, the other blocks are tried, respectively the block is tried to be recalculated using the parity? What if all that fails, will it give a read error, or will it simply deliver a corrupted block, as with traditional RAID? h) We also need some RAID and integrity monitoring tool. Doesn't matter whether this is a completely new tool or whether it can be integrated in something existing. But we need tools, which inform the admin via different ways when a disk failed an a rebuild is necessary. And the same should happen when checksum verification errors happen that could be corrected (perhaps with a configurable threshold)...so that admins have the chance to notice signs of a disk that is about to fail. Of course such information is already printed to the kernel logs - well I guess so),... but I don't think it's enough to let 3rd parties and admins write scripts/daemons which do these checks and alerting... there should be something which is "official" and guaranteed to catch all cases and simply works(TM). Cheers, Chris. [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 5313 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: general thoughts and questions + general and RAID5/6 stability? @ 2014-09-19 20:50 William Hanson 2014-09-20 9:32 ` Duncan 0 siblings, 1 reply; 13+ messages in thread From: William Hanson @ 2014-09-19 20:50 UTC (permalink / raw) To: linux-btrfs; +Cc: calestyo Hey guys... I was just crawling through the wiki and this list's archive to find answers about some questions. Actually many of them matching those which Christoph has asked here some time ago, though it seems no answers came up at all. Isn't it possible to answer them, at least one by one? I'd believe that most of these questions and their answers would be of common interest and having them properly answered should be a benefit for all possible btrfs users. Regards, William. On Sun, 2014-08-31 at 06:02 +0200, Christoph Anton Mitterer wrote: > Hey. > > > For some time now I consider to use btrfs at a larger scale, basically > in two scenarios: > > a) As the backend for data pools handled by dcache (dcache.org), where > we run a Tier-2 in the higher PiB range for the LHC Computing Grid... > For now that would be rather "boring" use of btrfs (i.e. not really > using any of its advanced features) and also RAID functionality would > still be provided by hardware (at least with the current hardware > generations we have in use). > > b) Personally, for my NAS. Here the main goal is less performance but > rather data safety (i.e. I want something like RAID6 or better) and > security (i.e. it will be on top of dm-crypt/LUKS) and integrity. > Hardware wise I'll use and UPS as well as enterprise SATA disks, from > different vendors respectively different production lots. > (Of course I'm aware that btrfs is experimental, and I would have > regular backups) > > > > > 1) Now I've followed linux-btrfs for a while and blogs like Marc's... > and I still read about a lot of stability problems, some which sound > quite serious. > Sure we have a fsck now, but even in the wiki one can read statements > like "the developers use it on their systems without major problems"... > but also "if you do this, it could help you... or break even more". > > I mean I understand that there won't be a single point in time, where > Chris Mason says "now it's stable" and it would be rock solid form that > point on... but especially since new features (e.g. things like > subvolume quota groups, online/offline dedup, online/offline fsck) move > (or will) move in with every new version... one has (as an end-user) > basically no chance to determine what can be used safely and what > tickles the devil. > > So one issue I have is to determine the general stability of the > different parts. > > > > > 2) Documentation status... > I feel that some general and extensive documentation is missing. One > that basically handles (and teaches) all the things which are specific > to modern (especially CoW) filesystems. > - General design, features and problems of CoW and btrfs > - Special situations that arise from the CoW, e.g. that one may not be > able to remove files once the fs is full,... or that just reading files > could make the used space grow (via the atime) > - General guidelines when and how to use nodatacow... i.e. telling > people for which kinds of files this SHOULD usually be done (VM > images)... and what this means for those files (not checksumming) and > what the drawbacks are if it's not used (e.g. if people insist on having > the checksumming - what happens to the performance of VM images? what > about the wear with SSDs?) > - the implications of things like compression and hash algos... whether > and when this will have performance impacts (positive or negative) and > when not. > - the typical lifecycles and procedures when using stuff like multiple > devices (how to replace a faulty disk) or important hints like (don't > span a btrfs RAID over multiple partitions on the same disk) > - especially with the different (mount)options, I mean things that > change the way the fs works like no-hole or mixed data/meta block > groups... people need to have some general information when to choose > which and some real world examples of disadvantages / advantages. E.g. > what are the disadvantages of having mixed data/meta block groups? If > there'd be only advantages, why wouldn't it be the default? > > Parts of this is already scattered over LWN articles, the wiki (however > the quality greatly "varies" there), blog posts or mailing list posts... > many of the information there is however outdated... and suggested > procedures (e.g. how to replace a faulty disk) differ from example to > example. > An admin that wants to use btrfs shouldn't be required to pick all this > together (which is basically impossible).. there should be a manpage > (which is kept up to date!) that describes all this. > > Other important things to document (which I couldn't fine so far in most > cases): What is actually guaranteed by btrfs respectively its design? > For example: > - If there'd be no bugs in the code,.. would the fs be guaranteed to be > always consistent by it's CoW design? Or are there circumstances where > it can still run into being inconsistent? > - Does this basically mean, that even without and fs journal,.. my > database is always consistent even if I have a power cut or system > crash? > - At which places does checksumming take place? Just data or also meta > data? And is the checksumming chained as with ZFS, so that every change > in blocks, triggers changes in the "upper" metadata blocks up to the > superblock(s)? > - When are these checksums verified? Only on fsck/scrub? Or really on > every read? All this is information needed by an admin to determine what > the system actually guarantees or how it behaves. > - How much data/metadata (in terms of bytes) is covered by one checksum > value? And if that varies, what's the maximum size? I mean if there > would be on CRC32 per file (which can be GiB large) which would be read > every time a single byte of that file is read... this would probably be > bad ;) ... so we should tell the user "no we do this block or extent > wise"... And since e.g. CRC32 is maybe not well suited for very big > chunks of data, the user may want to know how much data is "protected" > by one hash value... so that he can decide whether to switch to another > algorithm (if one should become available). > - Does stacking with block layers work in all cases (and in which does > it not)? E.g. btrfs on top of looback devices, dm-crypt, MD, lvm2? And > also the other way round: What of these can be put on top of btrfs? > There's the prominent case, that swap files don't work on btrfs. But > documentation in that area should also contain performance instructions, > i.e. that while it's possible to have swap on top of btrfs via loopback, > it's perhaps stupid with CoW... or e.g. with dmcrypt+MD there were quite > some heavy performance impacts depending on whether dmcrypt was below or > above MD. Now of course normally, dmcrypt will be below btrfs,... but > there are still performance questions e.g. how does this work with > multiple devices? Is there one IO thread per device or one for all? > Or questions like: Are there any stability issues when btrfs is stacked > below/above other block layer, e.g. in case of power losses... > especially since btrfs relies so heavy on barriers. > Or questions like: Is btrfs stable if lower block layers modify data? > e.g. if dmcrypt should ever support online re-encryption > - Many things about RAID (but more on that later). > > > > > 3) What about some nice features which many people probably want to > see... > Especially other compression algos (xz/lzma or lz4[hc]) and hash alogs > (xxHash... some people may even be interested in things like SHA2 or > Keccak). > I know some of them are planned... but is there any real estimation on > when they come? > > > > > 4) Are (or how) exiting btrfs filesystems kept up to date when btrfs > evolves over time? > What I mean here is... over time, more and more features are added to > btrfs... this is of course not always a change in the on disk format... > but I always wonder a bit: If I write the same that of my existing fs > into a freshly created one (with the same settings)... would it > basically look like the same (of course not exactly)? > In many of the mails here on the list respectively commit logs one can > read things which sound as this happens quite often... that things (that > affect how data is written on the disk) are now handled better. > Or what if defaults change? E.g. if something new like no-hole would > become the default for new filesystems? > An admin cannot track all these things and understand which of them > actually means that he should recreate the filesystem. > > Of course there's the balance operation... but does this really affect > everything? > > So the question is basically: As btrfs evolves... how to I keep my > existing filesystems up to date so that they are as if they were created > as new. > > > > > 5) btrfs management [G]UIs are needed > Not sure whether this should be go into existing files managers (like > nemo or konqueror) or something separate... but I definitely think, that > the btrfs community will need to provide some kind of powerful > management [G]UI. > Such a manager is IMHO crucial for anything that behaves like a storage > management system. > What should it be able to do? > a) Searching for btrfs specific properties, e.g. > - files compressed with a given algo > - files for which the compression ratio is <,>,= n% > - files which are nodatacow > - files for which integrity data is stored with a given hash algo > - files with a given redundancy level (e.g. DUP or RAID1 or RAID6 or > DUPn if that should ever come) > - files which should have a given redundancy level, but whose actual > level is different (e.g. due to a degraded state, or for which more > block copies than desired are still available) > - files which are defragmented at n% > > Of course all these conditions should be combinable, and one should have > further conditions like m/c/a-times or like the subvolumes/snapshots > that should be searched. > > b) File lists in such a manager should display many details like > compression ratio, algos (compression, hash), number of fragments, > whether blocks of that file are referenced by other files, etc. pp. > > c) Of course it should be easy to change all the properties from above > for a files (well at least if that's possible in btrfs). > Like when I want to have some files, or dirs/subdirs, recompressed with > another algo, or uncompressed. > Or triggering online defragmentation for all files of a given > fragmentation level. > Or maybe I want to set a higher redundancy level for files which I > consider extremely precious to myself (not sure if it's planned to have > different redundancy levels per file) > > d) Such manager should perhaps also go through the logs and tell things > like: > - when was the last complete balance > - when was the last complete scrub > - for which files happened integrity check problems during read/scrub... > how many of these could be corrected via other block copies? > > e) Maybe it could give even more low level information, like showing how > a file is distributed over the devices, e.g. how the blocks are located, > or showing the location block copies or involved block devices for the > redundancy levels. > > > > > 6) RAID / Redundancy Levels > a) Just some remark, I think it's a bad idea to call these RAID in the > btrfs terminology... since what we do is not necessarily exactly the > same like classic RAID... this becomes most obvious with RAID1, which > behaves not as RAID1 should (i.e. one copy per disk)... at least the > used names should comply with MD. > > b) In other words... I think there should be RAID1, which equals to 1 > copy per underlying device. > And it would be great to have a redundancy level DUPx, which is x copies > for each block spread over the underlying devices. So if x is 6 and one > has 3 underlying devices, each of them should have 2 copies of each > block. > I think the DUPx level is quite interesting to protect against single > block failures, especially also on computers where one usually simply > doesn't have more than one disk drive (e.g. notebooks). > > c) As I've noted before, I think it would be quite nice if it would be > supported to have different redundancy levels for different files... > e.g. less previous stuff like OS data could have DUP ... more valuable > data could have RAID6... and my most precious data could have DUP5 (i.e. > 5 copies of each block). > If that would ever come, one would probably need to make that property > inheritable by directories to be really useful. > > d) What's the status of the multi-parity RAID (i.e. more than tow parity > blocks)? Weren't some patches for that posted a while ago? > > e) Most important: > What's the status on RAID5/6? Is it still completely experimental or > already well tested? > Does rebuilding work? Does scrubbing work? > I mean as far as I know, there are still important parts that miss so > that it works at all, right? > When can one expect work on that to be completed? > > f) Again, it detailed documentation should be added how the different > redundancy levels actually work, e.g. > - Is there a chunk size, can it be configured and how does it affect > reads/writes (as with MD) > - How do parallel reads happen if multiple blocks are available? What > e.g. if there are multiple block copies per device? Is simply always the > first tried to be read? Or the one with the best seek times? Or is this > optimised with other reads? > > g) When a block is read (and the checksum is always verified), does that > already work, that if verification fails, the other blocks are tried, > respectively the block is tried to be recalculated using the parity? > What if all that fails, will it give a read error, or will it simply > deliver a corrupted block, as with traditional RAID? > > h) We also need some RAID and integrity monitoring tool. > Doesn't matter whether this is a completely new tool or whether it can > be integrated in something existing. > But we need tools, which inform the admin via different ways when a disk > failed an a rebuild is necessary. > And the same should happen when checksum verification errors happen that > could be corrected (perhaps with a configurable threshold)...so that > admins have the chance to notice signs of a disk that is about to fail. > > Of course such information is already printed to the kernel logs - well > I guess so),... but I don't think it's enough to let 3rd parties and > admins write scripts/daemons which do these checks and alerting... there > should be something which is "official" and guaranteed to catch all > cases and simply works(TM). > > > > Cheers, > Chris. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: general thoughts and questions + general and RAID5/6 stability? 2014-09-19 20:50 William Hanson @ 2014-09-20 9:32 ` Duncan 2014-09-22 20:51 ` Stefan G. Weichinger 0 siblings, 1 reply; 13+ messages in thread From: Duncan @ 2014-09-20 9:32 UTC (permalink / raw) To: linux-btrfs William Hanson posted on Fri, 19 Sep 2014 16:50:05 -0400 as excerpted: > Hey guys... > > I was just crawling through the wiki and this list's archive to find > answers about some questions. Actually many of them matching those > which Christoph has asked here some time ago, though it seems no > answers came up at all. Seems his post slipped thru the cracks, perhaps because it was too much at once for people to try to chew on. Let's see if second time around works better... > > On Sun, 2014-08-31 at 06:02 +0200, Christoph Anton Mitterer wrote: > >> >> For some time now I consider to use btrfs at a larger scale, basically >> in two scenarios: > >> >> a) As the backend for data pools handled by dcache (dcache.org), where >> we run a Tier-2 in the higher PiB range for the LHC Computing Grid... > >> For now that would be rather "boring" use of btrfs (i.e. not really >> using any of its advanced features) and also RAID functionality would >> still be provided by hardware (at least with the current hardware >> generations we have in use). While that scale is simply out of my league, here's what I'd say if I were asked my own opinion. I'd say btrfs isn't ready for that, basically for one reason. Btrfs has stabilized quite a bit in the last year, and the scary warnings have now come off, but it's still not fully stable, and keeping backups of any data you value is still very strongly recommended. The scenario above is talking high PiB scale. Simply put, that's a **LOT** of data to keep backups of, or to lose all at once if you don't and something happens! At that scale I'd look at something more mature, with a reputation for working well at that scale. Xfs is what I'd be looking at. That or possibly zfs. People who value their data highly tend, for good reason, to be rather conservative when it comes to filesystems. At that level and at the conservatism I'd guess it calls for, I'd say another two years, perhaps longer, given btrfs history and how much longer than expected every step has seemed to take. >> b) Personally, for my NAS. Here the main goal is less performance but >> rather data safety (i.e. I want something like RAID6 or better) and >> security (i.e. it will be on top of dm-crypt/LUKS) and integrity. >> Hardware wise I'll use and UPS as well as enterprise SATA disks, from >> different vendors respectively different production lots. > >> (Of course I'm aware that btrfs is experimental, and I would have >> regular backups) [...] >> [1] So one issue I have is to determine the general stability of the >> different parts. Raid5/6 are still out of the question at this point. The operating code is there, but the recovery code is incomplete. In effect, btrfs raid5/6 must be treated as if it's slow raid0 in terms of dependability, but with a "free" upgrade to raid5/6 when the code is complete (assuming the array survives that long in its raid0 stage), as the operational code has been there all along and it has been creating and writing the parity, it just can't yet reliably restore from it if called to do so. So if you wouldn't be comfortable with the data on raid0, that is, with the idea of losing it all if you lose any of it, don't put it on btrfs raid5/6 at this point. The situation is actually /somewhat/ better than that, but that's the reliability bottom line you should be planning for, and if raid0 reliability isn't appropriate for your data, neither is btrfs raid5/6 at this point. Btrfs raid1 and raid10 modes, OTOH, are reasonably mature and ready for use, basically at the same level as single-device btrfs. Which is to say there's still active development and keep your backups ready as it's not /entirely/ stable yet, but a lot of people are using it without undue issues -- just keep those backups current and tested, and be prepared to use them if you need to. For btrfs raid1 mode, it's worth pointing out that for btrfs raid1 means two copies on different devices, no matter how many devices are in the array. It's always two copies, more devices simply adds more total capacity. Similarly with btrfs raid10, the "1/mirror" side of that 10 is always paired. Stripes can be two or three or whatever width, but there's always only the two mirrors. N-way-mirroring is on the roadmap, scheduled for introduction after raid5/6 is complete. So it's coming, but given the time it has taken for raid5/6 and the fact that it's still not complete, reasonably reliable n- way-mirroring could easily still be a year away or more. Features: Most of the core btrfs features are reasonably stable but some don't work so well together; see my just-previous post on a different thread about nocow and snapshots, for instance. (Basically, setting nocow ends up being nearly useless in the face of frequent snapshots of an actively rewritten file.) Qgroups/quotas are an exception. They've recently rewritten it as the old approach simply wasn't working, and while it /should/ be more stable now, it's still very new (like 3.17 new), and I'd give it at least two more kernel cycles before I'd consider it usable... if no further major problems show up during that time. And snapshot-aware-defrag has been disabled for now due to scalability issues, so defrag only considers the current snapshot it's actually pointed into to defrag, triggering data duplication and using up space faster that would otherwise be expected. You'd need to check on the status of non-core btrfs features like the various dedup applications, snapper style scheduled snapshotting, etc, individually, as they're developed separately and more or less independently. >> 2) Documentation status... > >> I feel that some general and extensive documentation is missing. This is gradually getting better. The manpages are generally kept current, and their practical usability without reference to other sources such as the wiki has improved DRAMATICALLY in the last six months or so. It still helps to have some good background in general principles such as COW, as they're not always explained, either on the wiki or in the manpages, but it's coming, and really, if there's one area I'd point out as having made MARKED strides toward a stable btrfs over the last six months, it WOULD be the documentation, as six months ago it simply wasn't stable ready, full-stop, but now I'd characterize much of the documentation as reasonably close to stable-ready, altho there are still some holes. IOW, while before documentation had fallen behind the progress of the rest of btrfs toward stable, in the last several months it has caught up and in general can be characterized as at about the same stability/ maturity status as btrfs itself, that is, not yet fully stable, but getting to where that goal is at least visible, now. But there's still no replacement for some good time investment in actually reading a few weeks of the list and most of the user-pages in the wiki, before you actually dive into btrfs on your own systems. Your choices and usage of btrfs will be the better for it, and it could well save you needless data loss or at least needless grief and stress. But of course that's the way it is with most reasonably advanced systems. >> Other important things to document (which I couldn't fine so far in >> most cases): What is actually guaranteed by btrfs respectively its >> design? > >> For example: > >> - If there'd be no bugs in the code,.. would the fs be guaranteed to >> be always consistent by it's CoW design? Or are there circumstances >> where it can still run into being inconsistent? In theory, yes, absent (software) bugs, btrfs would always be consistent. In reality, hardware has bugs too, and then there's simply cheap hardware that even absent bugs doesn't make the guarantees of more expensive hardware. Consumer-level storage hardware doesn't tend to have battery-backed write- caches, for instance, and some of it is known to lie and say the write- cache has been flushed to permanent storage when it hasn't been, for instance. But absent (both hardware and software) bugs, in theory... >> - Does this basically mean, that even without and fs journal,.. my >> database is always consistent even if I have a power cut or system >> crash? That's the idea of tree-based copy-on-write, yes. > >> - At which places does checksumming take place? Just data or also meta >> data? And is the checksumming chained as with ZFS, so that every >> change in blocks, triggers changes in the "upper" metadata blocks up >> to the superblock(s)? FWIW, at this level of question, people should really be reading the various whitepapers and articles discussing and explaining the technology, as linked on the wiki. But both data and metadata are checksummed, and yes, it's chained, all the way up the tree. >> - When are these checksums verified? Only on fsck/scrub? Or really on >> every read? All this is information needed by an admin to determine >> what the system actually guarantees or how it behaves. Checksums are verified per-read. If verification fails and there's a second copy available (btrfs multi-device raid1 or raid10 modes and dup- mode metadata or mixed-bg on single-device), it is verified and substituted (both in RAM and rewritten in place of the bad copy) if it checks out. If no valid copy is available, IO error. Scrub is simply the method used to do this systematically across the entire filesystem, instead of waiting until a particular block is read and its checksum verified. >> - How much data/metadata (in terms of bytes) is covered by one >> checksum value? And if that varies, what's the maximum size? Checksums are normally per block or node. For data, that's a standard page-size block (4 KiB on x86 and amd64, and also on arm, I believe, but for example, I believe it's 64 KiB on sparc). Metadata node/leaf sizes can be set at mkfs.btrfs time, but now default to 16 KiB, altho that too was 4 KiB in the past. >> - Does stacking with block layers work in all cases (and in which does >> it not)? E.g. btrfs on top of loopback devices, dm-crypt, MD, lvm2? Stacking btrfs on top of any block device variant should "just work", altho it should be noted that some of them might not pass flushes down and thus not be as resilient as others. And of course performance can be more or less affected as well. >> And also the other way round: What of these can be put on top of btrfs? Btrfs is a filesystem. So it'll take files. Via a loopback mounted file, you can make it a block device, which will of course take filesystems or other block devices stacked. That's not saying performance will be good thru all those layers, and reliability can be affected too, but it's possible. >> There's the prominent case, that swap files don't work on btrfs. But >> documentation in that area should also contain performance >> instructions Wait a minute. Where's my consulting fee? Come on, this is getting ridiculous. That's were individual case research and deployment testing comes in. >> Is there one IO thread per device or one for all? It should be noted that btrfs has /not/ yet been optimized for parallelization. The code still generally serializes writing each copy of a raid1 pair, for instance, and raid1 reads are assigned using a fairly dumb but reasonable initial-implementation odd/even-PID-based round-robin. (So if your use-case happens to involve a bunch of otherwise parallelized reads from all-even PIDs, for instance, they'll all hit the same copy of the raid1, leaving the other one idle...) This stuff will eventually be optimized, but getting raid5/6 and N-way- mirroring done first, so they know the implementation there that they're optimizing for, makes sense. >> 3) What about some nice features which many people probably want to >> see... > >> Especially other compression algos (xz/lzma or lz4[hc]) and hash alogs >> (xxHash... some people may even be interested in things like SHA2 or >> Keccak). > >> I know some of them are planned... but is there any real estimation on >> when they come? If there were estimations they'd be way off. The history of btrfs is that features repeatedly take far longer to implement than originally thought. What roadmap there is, is on the wiki. We know that raid5/6 mode is still in current development and n-way- mirroring is scheduled after that. But raid5/6 has been a kernel cycle or two out for over a year now. Then when they got it in, it was only the operational stuff, the recovery stuff, scrub, etc, still isn't complete. And there's the quota rework that is just done or still ongoing (I'm not sure which as I'm not particularly interested in that feature), and the snapshot-aware-defrag that was introduced in 3.9 but didn't scale so was disabled again, that is still to be reenabled after the quota rework and snapshot scaling stuff is done, and one dev has been putting a *LOT* of work into improving the manpages, and that intersects with the work on mount option consistency they're doing, and..., and... Various devs are the leads on various features and so several are developing in parallel, but of course there's the bug hunting, and review and testing of each other's work they do, and... so they're not able to simply work on their assigned feature. >> 4) Are (or how) exiting btrfs filesystems kept up to date when btrfs >> evolves over time? > >> What I mean here is... over time, more and more features are added to >> btrfs... this is of course not always a change in the on disk format... The disk format has been slowly changing, but keeping compatibility for the existing format and filesystems since I believe 2.6.32. What I do as part of my regular backup regime, is every few kernel cycles I wipe the (first level) backup and do a fresh mkfs.btrfs, activating new optional features as I believe appropriate. Then I boot to the new backup and run a bit to test it, then wipe the normal working copy and do a fresh mkfs.btrfs on it, again with the new optional features enabled that I want. All that keeping in mind that I have a second level backup (and for some things a third level), that's on reiserfs (which I used before and which since the switch to data=ordered by default has been extremely dependable for me, even thru hardware issues like bad memory, failing mobo that would reset the sata connection, etc) not btrfs, in case there's a problem with btrfs that hits both the working copy and primary backup. New kernels can mount old filesystems without problems (barring the occasional bug, and it's treated as a bug and fixed), but it isn't always possible to mount new filesystems on older kernels. However, given the rate of change and the number of fixed bugs, the recommendation is to stay current with the kernel in any case. Recently there was a bug that affected 3.15 and 3.16 (fixed in 3.16.2 and in 3.17- rc2), that didn't affect 3.14 series. During the trace and fix of that bug, the recommendation was to use 3.14, but not previous to that as there were known bugs fixed, and now that that known bug has been fixed, the recommendation is again latest stable series, thus 3.16.x currently, if not latest development serious, 3.17-rcX currently, or even btrfs integration, which currently are the patches that will be submitted for 3.18. Given that, if you're using earlier kernels you're using known-buggy kernels anyway. So keep current with the kernel (and to a lessor extent userspace, btrfs-progs-3.16 is current, and the previous 3.14.2 is acceptable, 3.12 if you /must/ drag your feet), and you won't have to worry about it. Of course that's a mark of btrfs stability as well. The recommendation to keep to current should relax as btrfs stabilizes. But 3.14 is a long- term-support stable kernel series and the recommendation to be running at least that is a good one. Perhaps it'll remain the earliest recommended stable kernel series for some time now that btrfs is stabilizing. >> Of course there's the balance operation... but does this really affect >> everything? Not everything. Some things are mkfs.btrfs-time only. >> So the question is basically: As btrfs evolves... how to I keep my >> existing filesystems up to date so that they are as if they were >> created as new. Balance is reasonable on an existing filesystem. However, as I said, I myself do, and would also recommend, taking advantage of those backups you should be making/testing, to boot from them and do a mkfs on the working filesystem every few kernel cycles, to take advantage of the new features and keep everything working as well as possible considering the filesystem is after all, while no longer officially experimental, certainly not yet entirely stable, either. >> 5) btrfs management [G]UIs are needed Separate project. It'll happen as that's the way FLOSS works, but it's not a worry of the core btrfs project at this point. As such, I'm not going to worry about it either, which means I can delete a nice big chunk without replying to any of it further than I just have... >> 6) RAID / Redundancy Levels > >> a) Just some remark, I think it's a bad idea to call these RAID in the >> btrfs terminology... since what we do is not necessarily exactly the >> same like classic RAID... this becomes most obvious with RAID1, which >> behaves not as RAID1 should (i.e. one copy per disk)... at least the >> used names should comply with MD. While I personally would have called in something else, say pair- mirroring, by the original raid definitions going back to the original paper outlining them back in the day (which someone posted a link to at one point and I actually read, at least that part), two-way-mirroring regardless of the number of devices actually DOES qualify as RAID-1. mdraid's implementation is different and does N-way-mirroring across all devices for RAID-1, but that's simply its implementation, not a requirement for RAID-1 either in the original paper or as generally accepted today. That said, you will note that in btrfs, the various levels are called raid0, raid1, raid10, raid56, in *non-caps*, as opposed to the traditional ALL-CAPS RAID-1 notation. One of the reasons given for that is that these btrfs raidN "modes" don't necessarily exactly correspond to the traditional RAID-N levels at the technical level, and the non-caps raidN notation was seen as an acceptable method of noting "RAID-like", behavior, that wasn't technically precisely RAID. N-way-mirroring is coming. It's just not implemented yet. >> c) As I've noted before, I think it would be quite nice if it would be >> supported to have different redundancy levels for different files... That's actually on the roadmap too, tho rather farther down the line. The btrfs subvolume framework is already setup to allow per-subvolume raid-levels, etc, at some point, altho it's not yet implemented, and there's already per-subvolume and per-file properties and extended attributes, including a per-file compression attribute. After they extend btrfs to handle per-subvolume redundancy levels, it should be a much smaller step to simply make that the default, and have per-file properties/attributes available for it as well, just as the per-file compression attribute is already there. But I'd put this probably 3-5 years out... and given btrfs history with implementations repeatedly taking longer than expected, it could easily be 5-10 years out... >> d) What's the status of the multi-parity RAID (i.e. more than [two] >> parity blocks)? Weren't some patches for that posted a while ago? Some proof-of-concept patches were indeed posted. And it's on the roadmap, but again, 3-5 years out. Tho it's likely there will be a general kernel solution before then, usable by mdraid, btrfs, etc, and if/ when that happens, it should make adapting it for btrfs much simpler. OTOH, that also means there will be much broader debate about getting a suitable general purpose solution, but it also means not just btrfs folks will be involved. At this point then, it's not a btrfs problem, but waiting on that general purpose kernel solution, which btrfs can then adapt at its leisure. >> e) Most important: > >> What's the status on RAID5/6? Is it still completely experimental or >> already well tested? Covered above. Consider it raid0 reliability at this point and you won't be caught out. Additionally, Marc MERLIN has put quite a bit of testing into it and has writeups on the wiki and linking to his blog. That's more detail than I have, for sure. >> f) Again, it detailed documentation should be added how the different >> redundancy levels actually work, e.g. > >> - Is there a chunk size, can it be configured There's a semi-major rework potentially planned to either coincide with the N-way-mirroring introduction, or possibly for after that, but with the N-way-mirroring written with it in mind. Existing raid0/1/10/5/6 would remain implemented as they are, possibly with a few more options, and likely with the existing names being aliases for new ones fitting the new naming framework. The new naming framework, meanwhile, would include redundancy/striping/parity/hotspares (possibly) all in the same overall framework. Hugo Mills is the guy with the details on that, tho I think it's mentioned in the ideas section on the wiki as well. With that in mind, too much documentation detail on the existing implementation would be premature as much of it would need rewritten for the new framework. Never-the-less, there's reasonable detail out there if you look. The wiki covers more than I'll write here, for sure. >> g) When a block is read (and the checksum is always verified), does >> that already work, that if verification fails, the other blocks are >> tried, respectively the block is tried to be recalculated using the >> parity? Other copies of the block (raid1,10,dup) are checked, as mentioned above. I'm not sure how raid56 handles it with parity, but since that code remains incomplete, it hasn't been a big factor. Presumably either Marc MERLIN or one of the devs will fill in the details once it's considered complete and usable. >> What if all that fails, will it give a read error, or will it simply >> deliver a corrupted block, as with traditional RAID? Read error, as mentioned above. >> h) We also need some RAID and integrity monitoring tool. "Patience, grasshopper." All in time... And that too could be a third-party tool, at least at first, altho while separate enough to be developed third-party, it's core enough presumably one would eventually be selected and shipped as part of btrfs-progs. I'd actually guess it /will/ be a third party tool at first. That's pure userspace after all, with little beyond what's already available in the logs and in sysfs needed, and the core btrfs devs already have their hands full with other projects, so a third-party implementation will almost certainly appear before they get to it. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: general thoughts and questions + general and RAID5/6 stability? 2014-09-20 9:32 ` Duncan @ 2014-09-22 20:51 ` Stefan G. Weichinger 2014-09-23 12:08 ` Austin S Hemmelgarn 0 siblings, 1 reply; 13+ messages in thread From: Stefan G. Weichinger @ 2014-09-22 20:51 UTC (permalink / raw) To: linux-btrfs Am 20.09.2014 um 11:32 schrieb Duncan: > What I do as part of my regular backup regime, is every few kernel cycles > I wipe the (first level) backup and do a fresh mkfs.btrfs, activating new > optional features as I believe appropriate. Then I boot to the new > backup and run a bit to test it, then wipe the normal working copy and do > a fresh mkfs.btrfs on it, again with the new optional features enabled > that I want. Is re-creating btrfs-filesystems *recommended* in any way? Does that actually make a difference in the fs-structure? So far I assumed it was enough to keep the kernel up2date, use current (stable) btrfs-progs and run some scrub every week or so (not to mention backups .. if it ain't backed up, it was/isn't important). Stefan ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: general thoughts and questions + general and RAID5/6 stability? 2014-09-22 20:51 ` Stefan G. Weichinger @ 2014-09-23 12:08 ` Austin S Hemmelgarn 2014-09-23 13:06 ` Stefan G. Weichinger 0 siblings, 1 reply; 13+ messages in thread From: Austin S Hemmelgarn @ 2014-09-23 12:08 UTC (permalink / raw) To: lists, linux-btrfs [-- Attachment #1: Type: text/plain, Size: 1141 bytes --] On 2014-09-22 16:51, Stefan G. Weichinger wrote: > Am 20.09.2014 um 11:32 schrieb Duncan: > >> What I do as part of my regular backup regime, is every few kernel cycles >> I wipe the (first level) backup and do a fresh mkfs.btrfs, activating new >> optional features as I believe appropriate. Then I boot to the new >> backup and run a bit to test it, then wipe the normal working copy and do >> a fresh mkfs.btrfs on it, again with the new optional features enabled >> that I want. > > Is re-creating btrfs-filesystems *recommended* in any way? > > Does that actually make a difference in the fs-structure? > I would recommend it, there are some newer features that you can only set at mkfs time. Quite often, when a new feature is implemented, it is some time before things are such that it can be enabled online, and even then that doesn't convert anything until it is rewritten. > So far I assumed it was enough to keep the kernel up2date, use current > (stable) btrfs-progs and run some scrub every week or so (not to mention > backups .. if it ain't backed up, it was/isn't important). > > Stefan > > [-- Attachment #2: S/MIME Cryptographic Signature --] [-- Type: application/pkcs7-signature, Size: 2455 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: general thoughts and questions + general and RAID5/6 stability? 2014-09-23 12:08 ` Austin S Hemmelgarn @ 2014-09-23 13:06 ` Stefan G. Weichinger 2014-09-23 13:38 ` Austin S Hemmelgarn 0 siblings, 1 reply; 13+ messages in thread From: Stefan G. Weichinger @ 2014-09-23 13:06 UTC (permalink / raw) To: linux-btrfs Am 23.09.2014 um 14:08 schrieb Austin S Hemmelgarn: > On 2014-09-22 16:51, Stefan G. Weichinger wrote: >> Is re-creating btrfs-filesystems *recommended* in any way? >> >> Does that actually make a difference in the fs-structure? >> > I would recommend it, there are some newer features that you can only > set at mkfs time. Quite often, when a new feature is implemented, it is > some time before things are such that it can be enabled online, and even > then that doesn't convert anything until it is rewritten. What features for example? I created my main btrfs a few months ago and would like to avoid recreating it as this would mean restoring my root-fs on my main workstation. Although I would do it if it is "worth it" ;-) I assume I could read some kind of version number out of the superblock or so? btrfs-show-super ? S ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: general thoughts and questions + general and RAID5/6 stability? 2014-09-23 13:06 ` Stefan G. Weichinger @ 2014-09-23 13:38 ` Austin S Hemmelgarn 2014-09-23 13:51 ` Stefan G. Weichinger ` (3 more replies) 0 siblings, 4 replies; 13+ messages in thread From: Austin S Hemmelgarn @ 2014-09-23 13:38 UTC (permalink / raw) To: lists, linux-btrfs [-- Attachment #1: Type: text/plain, Size: 3100 bytes --] On 2014-09-23 09:06, Stefan G. Weichinger wrote: > Am 23.09.2014 um 14:08 schrieb Austin S Hemmelgarn: >> On 2014-09-22 16:51, Stefan G. Weichinger wrote: >>> Is re-creating btrfs-filesystems *recommended* in any way? >>> >>> Does that actually make a difference in the fs-structure? >>> >> I would recommend it, there are some newer features that you can only >> set at mkfs time. Quite often, when a new feature is implemented, it is >> some time before things are such that it can be enabled online, and even >> then that doesn't convert anything until it is rewritten. > > What features for example? Well, running 'mkfs.btrfs -O list-all' with 3.16 btrfs-progs gives the following list of features: mixed-bg - mixed data and metadata block groups extref - increased hard-link limit per file to 65536 raid56 - raid56 extended format skinny-metadata - reduced size metadata extent refs no-holes - no explicit hole extents for files mixed-bg is something that you generally wouldn't want to change after mkfs. extref can be enabled online, and the filesystem metadata gets updated as-needed, and dosen't provide any real performance improvement (but is needed for some mail servers that have HUGE mail-queues) I don't know anything about the raid56 option, but there isn't any way to change it after mkfs. skinyy-metadata can be changed online, and the format gets updated on rewrite of each metadata block. This one does provide a performance improvement (stat() in particular runs noticeably faster). You should probably enable this if it isn't already enabled, even if you don't recreate your filesystem. no-holes cannot currently be changed online, and is a very recent addition (post v3.14 btrfs-progs I believe) that provides improved performance for sparse files (which is particularly useful if you are doing things with fixed size virtual machine disk images). It's this last one that prompted me personally to recreate my filesystems most recently, as I use sparse files to save space as much as possible. > > I created my main btrfs a few months ago and would like to avoid > recreating it as this would mean restoring my root-fs on my main > workstation. > > Although I would do it if it is "worth it" ;-) > > I assume I could read some kind of version number out of the superblock > or so? > > btrfs-show-super ? > AFAIK there isn't really any 'version number' that has any meaning in the superblock (except for telling the kernel that it uses the stable disk layout), however, there are flag bits that you can look for (compat_flags, compat_ro_flags, and incompat_flags). I'm not 100% certain what each bit means, but on my system with a only 1 month old BTRFS filesystem, with extref, skinny-metadata, and no-holes turned on, i have compat_flags: 0x0, compat_ro_flags: 0x0, and incompat_flags: 0x16b. The other potentially significant thing is that the default nodesize/leafsize has changed recently from 4096 to 16384, as that gives somewhat better performance for most use cases. [-- Attachment #2: S/MIME Cryptographic Signature --] [-- Type: application/pkcs7-signature, Size: 2455 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: general thoughts and questions + general and RAID5/6 stability? 2014-09-23 13:38 ` Austin S Hemmelgarn @ 2014-09-23 13:51 ` Stefan G. Weichinger 2014-09-23 14:24 ` Tobias Holst ` (2 subsequent siblings) 3 siblings, 0 replies; 13+ messages in thread From: Stefan G. Weichinger @ 2014-09-23 13:51 UTC (permalink / raw) To: linux-btrfs Am 23.09.2014 um 15:38 schrieb Austin S Hemmelgarn: > On 2014-09-23 09:06, Stefan G. Weichinger wrote: >> What features for example? > Well, running 'mkfs.btrfs -O list-all' with 3.16 btrfs-progs gives the > following list of features: > mixed-bg - mixed data and metadata block groups > extref - increased hard-link limit per file to 65536 > raid56 - raid56 extended format > skinny-metadata - reduced size metadata extent refs > no-holes - no explicit hole extents for files > > mixed-bg is something that you generally wouldn't want to change after > mkfs. > extref can be enabled online, and the filesystem metadata gets updated > as-needed, and dosen't provide any real performance improvement (but is > needed for some mail servers that have HUGE mail-queues) ok, not needed here > I don't know anything about the raid56 option, but there isn't any way > to change it after mkfs. not needed in my systems. > skinyy-metadata can be changed online, and the format gets updated on > rewrite of each metadata block. This one does provide a performance > improvement (stat() in particular runs noticeably faster). You should > probably enable this if it isn't already enabled, even if you don't > recreate your filesystem. So this is done via btrfstune, right? I will give that a try, for my rootfs it doesn't allow me right now as it is obviously mounted (live-cd, right?). > no-holes cannot currently be changed online, and is a very recent > addition (post v3.14 btrfs-progs I believe) that provides improved > performance for sparse files (which is particularly useful if you are > doing things with fixed size virtual machine disk images). Yes, I have some of those! > AFAIK there isn't really any 'version number' that has any meaning in > the superblock (except for telling the kernel that it uses the stable > disk layout), however, there are flag bits that you can look for > (compat_flags, compat_ro_flags, and incompat_flags). I'm not 100% > certain what each bit means, but on my system with a only 1 month old > BTRFS filesystem, with extref, skinny-metadata, and no-holes turned on, > i have compat_flags: 0x0, compat_ro_flags: 0x0, and incompat_flags: 0x16b. > > The other potentially significant thing is that the default > nodesize/leafsize has changed recently from 4096 to 16384, as that gives > somewhat better performance for most use cases. I have the 16k for both already. Thanks for your explanations, I will dig into it as soon as I find the time. Seems I have to backup/restore quite some stuff ;-) Stefan ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: general thoughts and questions + general and RAID5/6 stability? 2014-09-23 13:38 ` Austin S Hemmelgarn 2014-09-23 13:51 ` Stefan G. Weichinger @ 2014-09-23 14:24 ` Tobias Holst 2014-09-24 1:08 ` Qu Wenruo [not found] ` <CAGwxe4i2gQXSPiBGXbUKWid3o1tmD_+YtbOj=GQ11vzGx8CuTw@mail.gmail.com> 2014-09-25 7:15 ` Stefan G. Weichinger 3 siblings, 1 reply; 13+ messages in thread From: Tobias Holst @ 2014-09-23 14:24 UTC (permalink / raw) To: linux-btrfs If it is unknown, which of these options have been used at btrfs creation time - is it possible to check the state of these options afterwards on a mounted or unmounted filesystem? 2014-09-23 15:38 GMT+02:00 Austin S Hemmelgarn <ahferroin7@gmail.com>: > > Well, running 'mkfs.btrfs -O list-all' with 3.16 btrfs-progs gives the following list of features: > mixed-bg - mixed data and metadata block groups > extref - increased hard-link limit per file to 65536 > raid56 - raid56 extended format > skinny-metadata - reduced size metadata extent refs > no-holes - no explicit hole extents for files ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: general thoughts and questions + general and RAID5/6 stability? 2014-09-23 14:24 ` Tobias Holst @ 2014-09-24 1:08 ` Qu Wenruo 0 siblings, 0 replies; 13+ messages in thread From: Qu Wenruo @ 2014-09-24 1:08 UTC (permalink / raw) To: Tobias Holst, linux-btrfs -------- Original Message -------- Subject: Re: general thoughts and questions + general and RAID5/6 stability? From: Tobias Holst <tobby@tobby.eu> To: <linux-btrfs@vger.kernel.org> Date: 2014年09月23日 22:24 > If it is unknown, which of these options have been used at btrfs > creation time - is it possible to check the state of these options > afterwards on a mounted or unmounted filesystem? For mounted fs, sysfs can be used to see the feature enabled: /sys/fs/btrfs/<UUID>/features/ For unmounted fs, maybe not the best one, btrfs-show-super can show the incompat_flags in hex, and we can check <kernel tree>/fs/btrfs/ctree.h for BTRFS_FEATURE_INCOMPAT_## for the bits and calculate by hand... (Would it be better to add human readable output for btrfs-show-super?) Thanks, Qu > > > 2014-09-23 15:38 GMT+02:00 Austin S Hemmelgarn <ahferroin7@gmail.com>: >> Well, running 'mkfs.btrfs -O list-all' with 3.16 btrfs-progs gives the following list of features: >> mixed-bg - mixed data and metadata block groups >> extref - increased hard-link limit per file to 65536 >> raid56 - raid56 extended format >> skinny-metadata - reduced size metadata extent refs >> no-holes - no explicit hole extents for files > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 13+ messages in thread
[parent not found: <CAGwxe4i2gQXSPiBGXbUKWid3o1tmD_+YtbOj=GQ11vzGx8CuTw@mail.gmail.com>]
* Re: general thoughts and questions + general and RAID5/6 stability? [not found] ` <CAGwxe4i2gQXSPiBGXbUKWid3o1tmD_+YtbOj=GQ11vzGx8CuTw@mail.gmail.com> @ 2014-09-23 14:47 ` Austin S Hemmelgarn 2014-09-23 15:25 ` Kyle Gates 0 siblings, 1 reply; 13+ messages in thread From: Austin S Hemmelgarn @ 2014-09-23 14:47 UTC (permalink / raw) To: Tobias Holst; +Cc: lists, linux-btrfs [-- Attachment #1: Type: text/plain, Size: 985 bytes --] On 2014-09-23 10:23, Tobias Holst wrote: > If it is unknown, which of these options have been used at btrfs > creation time - is it possible to check the state of these options > afterwards on a mounted or unmounted filesystem? > > > 2014-09-23 15:38 GMT+02:00 Austin S Hemmelgarn <ahferroin7@gmail.com > <mailto:ahferroin7@gmail.com>>: > > Well, running 'mkfs.btrfs -O list-all' with 3.16 btrfs-progs gives > the following list of features: > mixed-bg - mixed data and metadata block groups > extref - increased hard-link limit per file to 65536 > raid56 - raid56 extended format > skinny-metadata - reduced size metadata extent refs > no-holes - no explicit hole extents for files > I don't think there is a specific tool for doing this, but some of them do show up in dmesg, for example skinny-metadata shows up as a mention of the FS having skinny extents. [-- Attachment #2: S/MIME Cryptographic Signature --] [-- Type: application/pkcs7-signature, Size: 2455 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: general thoughts and questions + general and RAID5/6 stability? 2014-09-23 14:47 ` Austin S Hemmelgarn @ 2014-09-23 15:25 ` Kyle Gates 0 siblings, 0 replies; 13+ messages in thread From: Kyle Gates @ 2014-09-23 15:25 UTC (permalink / raw) To: linux-btrfs@vger.kernel.org >> If it is unknown, which of these options have been used at btrfs >> creation time - is it possible to check the state of these options >> afterwards on a mounted or unmounted filesystem? >> > I don't think there is a specific tool for doing this, but some of them > do show up in dmesg, for example skinny-metadata shows up as a mention > of the FS having skinny extents. > Devs, It may be helpful to include the device in the kernel log for skinny extents. Currently it shows up like the following which is a little ambiguous: [ 6.050134] BTRFS info (device sde3): disk space caching is enabled [ 6.056606] BTRFS: has skinny extents <snipped> [ 7.740986] BTRFS info (device sde3): enabling auto defrag [ 7.747151] BTRFS info (device sde3): disk space caching is enabled <snipped> [ 7.908906] BTRFS info (device sde2): enabling auto defrag [ 7.915031] BTRFS info (device sde2): disk space caching is enabled [ 8.071033] BTRFS info (device sde4): enabling auto defrag [ 8.076715] BTRFS info (device sde4): disk space caching is enabled [ 8.082187] BTRFS: has skinny extents [ 8.513502] BTRFS info (device sde5): enabling auto defrag [ 8.518887] BTRFS info (device sde5): disk space caching is enabled [ 8.524064] BTRFS: has skinny extents [ 9.634285] BTRFS info (device sdd6): enabling auto defrag [ 9.639308] BTRFS info (device sdd6): disk space caching is enabled [ 9.644338] BTRFS: has skinny extents Thanks, Kyle ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: general thoughts and questions + general and RAID5/6 stability? 2014-09-23 13:38 ` Austin S Hemmelgarn ` (2 preceding siblings ...) [not found] ` <CAGwxe4i2gQXSPiBGXbUKWid3o1tmD_+YtbOj=GQ11vzGx8CuTw@mail.gmail.com> @ 2014-09-25 7:15 ` Stefan G. Weichinger 3 siblings, 0 replies; 13+ messages in thread From: Stefan G. Weichinger @ 2014-09-25 7:15 UTC (permalink / raw) To: linux-btrfs Am 23.09.2014 um 15:38 schrieb Austin S Hemmelgarn: >> What features for example? > Well, running 'mkfs.btrfs -O list-all' with 3.16 btrfs-progs gives the > following list of features: > mixed-bg - mixed data and metadata block groups > extref - increased hard-link limit per file to 65536 > raid56 - raid56 extended format > skinny-metadata - reduced size metadata extent refs > no-holes - no explicit hole extents for files > > mixed-bg is something that you generally wouldn't want to change after > mkfs. > extref can be enabled online, and the filesystem metadata gets updated > as-needed, and dosen't provide any real performance improvement (but is > needed for some mail servers that have HUGE mail-queues) > I don't know anything about the raid56 option, but there isn't any way > to change it after mkfs. > skinyy-metadata can be changed online, and the format gets updated on > rewrite of each metadata block. This one does provide a performance > improvement (stat() in particular runs noticeably faster). You should > probably enable this if it isn't already enabled, even if you don't > recreate your filesystem. > no-holes cannot currently be changed online, and is a very recent > addition (post v3.14 btrfs-progs I believe) that provides improved > performance for sparse files (which is particularly useful if you are > doing things with fixed size virtual machine disk images). Recreating or at least "btrfstune -rx" for my rootfs would mean that I have to boot from a live medium bringing recent btrfs-progs, right? sysresccd brings btrfs-progs-3.14.2 ... that should be enough, ok? aside from that, the rootfs on my thinkpad shows these features: # ls /sys/fs/btrfs/bec7dff9-8749-4db4-9a1b-fa844cfcc36a/features/ big_metadata compress_lzo extended_iref mixed_backref So I only miss the skinny extents ... and "no-holes". Stefan ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2014-09-25 7:15 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-08-31 4:02 general thoughts and questions + general and RAID5/6 stability? Christoph Anton Mitterer
-- strict thread matches above, loose matches on Subject: below --
2014-09-19 20:50 William Hanson
2014-09-20 9:32 ` Duncan
2014-09-22 20:51 ` Stefan G. Weichinger
2014-09-23 12:08 ` Austin S Hemmelgarn
2014-09-23 13:06 ` Stefan G. Weichinger
2014-09-23 13:38 ` Austin S Hemmelgarn
2014-09-23 13:51 ` Stefan G. Weichinger
2014-09-23 14:24 ` Tobias Holst
2014-09-24 1:08 ` Qu Wenruo
[not found] ` <CAGwxe4i2gQXSPiBGXbUKWid3o1tmD_+YtbOj=GQ11vzGx8CuTw@mail.gmail.com>
2014-09-23 14:47 ` Austin S Hemmelgarn
2014-09-23 15:25 ` Kyle Gates
2014-09-25 7:15 ` Stefan G. Weichinger
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).