* safe/necessary to balance system chunks? @ 2014-04-25 14:57 Steve Leung 2014-04-25 17:24 ` Chris Murphy 0 siblings, 1 reply; 17+ messages in thread From: Steve Leung @ 2014-04-25 14:57 UTC (permalink / raw) To: linux-btrfs Hi list, I've got a 3-device RAID1 btrfs filesystem that started out life as single-device. btrfs fi df: Data, RAID1: total=1.31TiB, used=1.07TiB System, RAID1: total=32.00MiB, used=224.00KiB System, DUP: total=32.00MiB, used=32.00KiB System, single: total=4.00MiB, used=0.00 Metadata, RAID1: total=66.00GiB, used=2.97GiB This still lists some system chunks as DUP, and not as RAID1. Does this mean that if one device were to fail, some system chunks would be unrecoverable? How bad would that be? Assuming this is something that needs to be fixed, would I be able to fix this by balancing the system chunks? Since the "force" flag is required, does that mean that balancing system chunks is inherently risky or unpleasant? Thanks, Steve ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: safe/necessary to balance system chunks? 2014-04-25 14:57 safe/necessary to balance system chunks? Steve Leung @ 2014-04-25 17:24 ` Chris Murphy 2014-04-25 18:12 ` Austin S Hemmelgarn 2014-04-25 18:36 ` Steve Leung 0 siblings, 2 replies; 17+ messages in thread From: Chris Murphy @ 2014-04-25 17:24 UTC (permalink / raw) To: Steve Leung; +Cc: linux-btrfs On Apr 25, 2014, at 8:57 AM, Steve Leung <sjleung@shaw.ca> wrote: > > Hi list, > > I've got a 3-device RAID1 btrfs filesystem that started out life as single-device. > > btrfs fi df: > > Data, RAID1: total=1.31TiB, used=1.07TiB > System, RAID1: total=32.00MiB, used=224.00KiB > System, DUP: total=32.00MiB, used=32.00KiB > System, single: total=4.00MiB, used=0.00 > Metadata, RAID1: total=66.00GiB, used=2.97GiB > > This still lists some system chunks as DUP, and not as RAID1. Does this mean that if one device were to fail, some system chunks would be unrecoverable? How bad would that be? Since it's "system" type, it might mean the whole volume is toast if the drive containing those 32KB dies. I'm not sure what kind of information is in system chunk type, but I'd expect it's important enough that if unavailable that mounting the file system may be difficult or impossible. Perhaps btrfs restore would still work? Anyway, it's probably a high penalty for losing only 32KB of data. I think this could use some testing to try and reproduce conversions where some amount of "system" or "metadata" type chunks are stuck in DUP. This has come up before on the list but I'm not sure how it's happening, as I've never encountered it. > > Assuming this is something that needs to be fixed, would I be able to fix this by balancing the system chunks? Since the "force" flag is required, does that mean that balancing system chunks is inherently risky or unpleasant? I don't think force is needed. You'd use btrfs balance start -sconvert=raid1 <mountpoint>; or with -sconvert=raid1,soft although it's probably a minor distinction for such a small amount of data. The metadata looks like it could use a balance, 66GB of metadata chunks allocated but only 3GB used. So you could include something like -musage=50 at the same time and that will balance any chunks with 50% or less usage. Chris Murphy ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: safe/necessary to balance system chunks? 2014-04-25 17:24 ` Chris Murphy @ 2014-04-25 18:12 ` Austin S Hemmelgarn 2014-04-25 18:43 ` Steve Leung ` (2 more replies) 2014-04-25 18:36 ` Steve Leung 1 sibling, 3 replies; 17+ messages in thread From: Austin S Hemmelgarn @ 2014-04-25 18:12 UTC (permalink / raw) To: Chris Murphy, Steve Leung; +Cc: linux-btrfs On 2014-04-25 13:24, Chris Murphy wrote: > > On Apr 25, 2014, at 8:57 AM, Steve Leung <sjleung@shaw.ca> wrote: > >> >> Hi list, >> >> I've got a 3-device RAID1 btrfs filesystem that started out life as single-device. >> >> btrfs fi df: >> >> Data, RAID1: total=1.31TiB, used=1.07TiB >> System, RAID1: total=32.00MiB, used=224.00KiB >> System, DUP: total=32.00MiB, used=32.00KiB >> System, single: total=4.00MiB, used=0.00 >> Metadata, RAID1: total=66.00GiB, used=2.97GiB >> >> This still lists some system chunks as DUP, and not as RAID1. Does this mean that if one device were to fail, some system chunks would be unrecoverable? How bad would that be? > > Since it's "system" type, it might mean the whole volume is toast if the drive containing those 32KB dies. I'm not sure what kind of information is in system chunk type, but I'd expect it's important enough that if unavailable that mounting the file system may be difficult or impossible. Perhaps btrfs restore would still work? > > Anyway, it's probably a high penalty for losing only 32KB of data. I think this could use some testing to try and reproduce conversions where some amount of "system" or "metadata" type chunks are stuck in DUP. This has come up before on the list but I'm not sure how it's happening, as I've never encountered it. > As far as I understand it, the system chunks are THE root chunk tree for the entire system, that is to say, it's the tree of tree roots that is pointed to by the superblock. (I would love to know if this understanding is wrong). Thus losing that data almost always means losing the whole filesystem. >> >> Assuming this is something that needs to be fixed, would I be able to fix this by balancing the system chunks? Since the "force" flag is required, does that mean that balancing system chunks is inherently risky or unpleasant? > > I don't think force is needed. You'd use btrfs balance start -sconvert=raid1 <mountpoint>; or with -sconvert=raid1,soft although it's probably a minor distinction for such a small amount of data. The kernel won't allow a balance involving system chunks unless you specify force, as it considers any kind of balance using them to be dangerous. Given your circumstances, I'd personally say that the safety provided by RAID1 outweighs the risk of making the FS un-mountable. > > The metadata looks like it could use a balance, 66GB of metadata chunks allocated but only 3GB used. So you could include something like -musage=50 at the same time and that will balance any chunks with 50% or less usage. > > > Chris Murphy > Personally, I would recommend making a full backup of all the data (tar works wonderfully for this), and recreate the entire filesystem from scratch, but passing all three devices to mkfs.btrfs. This should result in all the chunks being RAID1, and will also allow you to benefit from newer features. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: safe/necessary to balance system chunks? 2014-04-25 18:12 ` Austin S Hemmelgarn @ 2014-04-25 18:43 ` Steve Leung 2014-04-25 19:07 ` Austin S Hemmelgarn ` (2 more replies) 2014-04-25 19:14 ` Hugo Mills 2014-04-25 23:03 ` Duncan 2 siblings, 3 replies; 17+ messages in thread From: Steve Leung @ 2014-04-25 18:43 UTC (permalink / raw) To: Austin S Hemmelgarn, Chris Murphy; +Cc: linux-btrfs On 04/25/2014 12:12 PM, Austin S Hemmelgarn wrote: > On 2014-04-25 13:24, Chris Murphy wrote: >> >> On Apr 25, 2014, at 8:57 AM, Steve Leung <sjleung@shaw.ca> wrote: >> >>> I've got a 3-device RAID1 btrfs filesystem that started out life as single-device. >>> >>> btrfs fi df: >>> >>> Data, RAID1: total=1.31TiB, used=1.07TiB >>> System, RAID1: total=32.00MiB, used=224.00KiB >>> System, DUP: total=32.00MiB, used=32.00KiB >>> System, single: total=4.00MiB, used=0.00 >>> Metadata, RAID1: total=66.00GiB, used=2.97GiB >>> >>> This still lists some system chunks as DUP, and not as RAID1. Does this mean that if one device were to fail, some system chunks would be unrecoverable? How bad would that be? >>> >>> Assuming this is something that needs to be fixed, would I be able to fix this by balancing the system chunks? Since the "force" flag is required, does that mean that balancing system chunks is inherently risky or unpleasant? >> >> I don't think force is needed. You'd use btrfs balance start -sconvert=raid1 <mountpoint>; or with -sconvert=raid1,soft although it's probably a minor distinction for such a small amount of data. > The kernel won't allow a balance involving system chunks unless you > specify force, as it considers any kind of balance using them to be > dangerous. Given your circumstances, I'd personally say that the safety > provided by RAID1 outweighs the risk of making the FS un-mountable. Agreed, I'll attempt the system balance shortly. > Personally, I would recommend making a full backup of all the data (tar > works wonderfully for this), and recreate the entire filesystem from > scratch, but passing all three devices to mkfs.btrfs. This should > result in all the chunks being RAID1, and will also allow you to benefit > from newer features. I do have backups of the really important stuff from this filesystem, but they're offsite. As this is just for a home system, I don't have enough temporary space for a full backup handy (which is related to how I ended up in this situation in the first place). Once everything gets rebalanced though, I don't think I'd be missing out on any features, would I? Steve ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: safe/necessary to balance system chunks? 2014-04-25 18:43 ` Steve Leung @ 2014-04-25 19:07 ` Austin S Hemmelgarn 2014-04-26 4:01 ` Duncan 2014-04-26 1:11 ` Duncan 2014-04-26 1:24 ` Chris Murphy 2 siblings, 1 reply; 17+ messages in thread From: Austin S Hemmelgarn @ 2014-04-25 19:07 UTC (permalink / raw) To: Steve Leung, Chris Murphy; +Cc: linux-btrfs On 2014-04-25 14:43, Steve Leung wrote: > On 04/25/2014 12:12 PM, Austin S Hemmelgarn wrote: >> On 2014-04-25 13:24, Chris Murphy wrote: >>> >>> On Apr 25, 2014, at 8:57 AM, Steve Leung <sjleung@shaw.ca> wrote: >>> >>>> I've got a 3-device RAID1 btrfs filesystem that started out life as >>>> single-device. >>>> >>>> btrfs fi df: >>>> >>>> Data, RAID1: total=1.31TiB, used=1.07TiB >>>> System, RAID1: total=32.00MiB, used=224.00KiB >>>> System, DUP: total=32.00MiB, used=32.00KiB >>>> System, single: total=4.00MiB, used=0.00 >>>> Metadata, RAID1: total=66.00GiB, used=2.97GiB >>>> >>>> This still lists some system chunks as DUP, and not as RAID1. Does >>>> this mean that if one device were to fail, some system chunks would >>>> be unrecoverable? How bad would that be? >>>> >>>> Assuming this is something that needs to be fixed, would I be able >>>> to fix this by balancing the system chunks? Since the "force" flag >>>> is required, does that mean that balancing system chunks is >>>> inherently risky or unpleasant? >>> >>> I don't think force is needed. You'd use btrfs balance start >>> -sconvert=raid1 <mountpoint>; or with -sconvert=raid1,soft although >>> it's probably a minor distinction for such a small amount of data. >> The kernel won't allow a balance involving system chunks unless you >> specify force, as it considers any kind of balance using them to be >> dangerous. Given your circumstances, I'd personally say that the safety >> provided by RAID1 outweighs the risk of making the FS un-mountable. > > Agreed, I'll attempt the system balance shortly. > >> Personally, I would recommend making a full backup of all the data (tar >> works wonderfully for this), and recreate the entire filesystem from >> scratch, but passing all three devices to mkfs.btrfs. This should >> result in all the chunks being RAID1, and will also allow you to benefit >> from newer features. > > I do have backups of the really important stuff from this filesystem, > but they're offsite. As this is just for a home system, I don't have > enough temporary space for a full backup handy (which is related to how > I ended up in this situation in the first place). > > Once everything gets rebalanced though, I don't think I'd be missing out > on any features, would I? > > Steve In general, it shouldn't be an issue, but it might get you slightly better performance to recreate it. I actually have a similar situation with how I have my desktop system set up, when I go about recreating the filesystem (which I do every time I upgrade either the tools or the kernel), I use the following approach: 1. Delete one of the devices from the filesystem 2. Create a new btrfs file system on the device just removed from the filesystem 3. Copy the data from the old filesystem to the new one 4. one at a time, delete the remaining devices from the old filesystem and add them to the new one, re-balancing the new filesystem after adding each device. This seems to work relatively well for me, and prevents the possibility that there is ever just one copy of the data. It does, however, require that the amount of data that you are storing on the filesystem is less than the size of one of the devices (although you can kind of work around this limitation by setting compress-force=zlib on the new file system when you mount it, then using defrag to decompress everything after the conversion is done), and that you have to drop to single user mode for the conversion (unless it's something that isn't needed all the time, like the home directories or /usr/src, in which case you just log everyone out and log in as root on the console to do it). ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: safe/necessary to balance system chunks? 2014-04-25 19:07 ` Austin S Hemmelgarn @ 2014-04-26 4:01 ` Duncan 0 siblings, 0 replies; 17+ messages in thread From: Duncan @ 2014-04-26 4:01 UTC (permalink / raw) To: linux-btrfs Austin S Hemmelgarn posted on Fri, 25 Apr 2014 15:07:40 -0400 as excerpted: > I actually have a similar situation with how I have my desktop system > set up, when I go about recreating the filesystem (which I do every > time I upgrade either the tools or the kernel), Wow. Given that I run a git kernel and btrfs-tools, I'd be spending a *LOT* of time on redoing my filesystems if I did that! Tho see my just- previous reply for what I do (a fresh mkfs.btrfs every few kernel cycles, to take advantage of new on-device-format feature options and to clean out any possibly remaining cruft from bugs now fixed, given that btrfs isn't fully stable yet). Anyway, why I'm replying here: [in the context of btrfs raid1 mode] > I use the following approach: > > 1. Delete one of the devices from the filesystem > 2. Create a new btrfs file system on the device just removed from the > filesystem > 3. Copy the data from the old filesystem to the new one > 4. one at a time, delete the remaining devices from the old filesystem > and add them to the new one, re-balancing the new filesystem after > adding each device. > > This seems to work relatively well for me, and prevents the possibility > that there is ever just one copy of the data. It does, however, require > that the amount of data that you are storing on the filesystem is less > than the size of one of the devices (although you can kind of work > around this limitation by setting compress-force=zlib on the new file > system when you mount it, then using defrag to decompress everything > after the conversion is done), and that you have to drop to single user > mode for the conversion (unless it's something that isn't needed all the > time, like the home directories or /usr/src, in which case you just log > everyone out and log in as root on the console to do it). I believe you're laboring under an unfortunate but understandable misconception of the nature of btrfs raid1. Since in the event of device- loss it's a critical misconception, I decided to deal with it in a reply separate from the other one (which I then made as a sibling post to yours in reply to the same parent, instead of as a reply to you). Unlike for instance mdraid raid1 mode, which is N mirror-copies of the data across N devices (so 3 devices = 3 copies, 5 devices = 5 copies, etc)... **BTRFS RAID1 MODE IS CURRENTLY PAIR-MIRROR ONLY!** No matter the number of devices in the btrfs so-called "raid1", btrfs only pair-mirrors each chunk, so it's only two copies of the data per filesystem. To have more than two-copy redundancy, you must use multiple filesystems and make one a copy of the other using either conventional backup methods or the btrfs-specific send/receive. This is actually my biggest annoyance/feature-request with current btrfs, as my own sweet-spot ideal is triplet-mirroring, and N-way-mirroring is indeed on the roadmap and has been for years, but the devs plan to use some of the code from btrfs raid5/6 to implement it, and of course while incomplete raid5/6 mode was introduced in 3.9, as of 3.14 at least, that's exactly what raid5/6 mode is, incomplete, and while I saw patches to properly support raid5/6 scrub recently, I believe it's still incomplete in 3.15 as well. And of course N-way-mirroring remains roadmapped for after that... So not being a dev, I continue to wait, as patiently as I can manage since I'd rather a good implementation later than a buggy one now, for that still coming N-way-mirroring. Tho at this point I admit to having some sympathy for the donkey forever following that apple held on the end of a stick just out of reach... even if I /would/ rather wait another five years for it and have it done /right/, than be dealing with a bad implementation available right now. Anyway, given that we /are/ dealing with pair-mirror-only raid1 mode currently... as well as your pre-condition that for your method to work, the data to store on the filesystem must fit on a single device... If you have a 3-device-plus btrfs raid1 and you're using btrfs device delete to remove the device you're going to create the new filesystem on, you do still have two-way-redundancy at all times, since the the btrfs device delete will ensure the two copies are on the remaining devices, but that's unnecessary work compared to simply leaving it a device down in the first place, and starting with the last device of the previous (grandparent generation) filesystem as the first of a new (child generation) filesystem, leaving it unused between. If OTOH you're hard-removing a device from the raid1, without a btrfs device delete first, then at the moment you do so, you only have a single copy of any chunk where one of the pair was on that device, and it remains that way until you do the mkfs and finish populating the new filesystem with the contents of the old one. So you're either doing extra work (if you're using btrfs device delete), or leaving yourself with a single copy of anything on the removed device, until it is back up and running as the new filesystem! =:^( I'd suggest not bothering with more than two (or possibly three) devices per filesystem, since by btrfs raid1, you only get pair-mirroring, so more devices is a waste for that, and by your own pre-condition, you limit the amount of data to the capacity of one device, so you can't take advantage of the extra storage capacity of more devices with >2 devices on a two-way-mirroring-limited raid1 either, making it a waste for that as well. Save the extra devices for when you do the transfer. If you have only three devices, setup the btrfs raid1 with two, and leave the third as a spare. Then for the transfer, create and populate the new filesystem on the third, remove a device from the btrfs raid1 pair, add it to the new btrfs and convert to raid1. At that point you can drop the old filesystem and leave its remaining device as your first device when you repeat the process later, making the last device of the grandparent into the first device of the child. This way you'll have two copies of the data at all times and/or will save the work of the third device add and rebalance, and later the device delete, bringing it to two devices again. And as a bonus, except for the time you're actually doing the mkfs and repopulating the new filesystem, you'll have a third copy, albeit a bit outdated, as a backup, that being the spare that you're not including in the current filesystem, since it still has a complete copy of the old filesystem from before it was removed, and that old copy can still be mounted using the degraded option (since it's the single device remaining of what was previously a multi-device raid1). Alternatively, do the three-device raid1 thing and btrfs device delete when you're taking a device out and btrfs balance after adding the third device. This will be more hassle, but dropping a device from a two- device raid1 forces it read-only as writes can no longer be made in raid1 mode, while a three-device raid1 doesn't give you more redundancy since btrfs raid1 remains pair-mirror-only, but DOES give you the ability to continue writing in raid1 mode with a missing device, since you still have two devices and can do raid1 pair-mirror writing. So in view of the pair-mirror restriction, three devices won't give you additional redundancy, but it WILL give you a continued writable raid1 if a device drops out. Whether that's worth the hassle of the additional steps needed to btrfs device delete to create the new filesystem and btrfs balance on adding the third device, is up to you, but it does give you that choice. =:^) Similarly if you have four devices, only in that case you can actually do two independent two-device btrfs raid1 filesystems, one working and one backup, taking the backup down to recreate as the new primary/working filesystem when necessary, thus avoiding the whole device-add and rebalance thing entirely. And your backup is then a full pair-redundant backup as well, tho of course you lose the backup for the period you're doing the mkfs and repopulating the new version. This is actually pretty much what I'm doing here, except that my physical devices are more than twice the size of my data and I only have two physical devices. But I use partitioning and create the dual-device btrfs raid1 pair-mirror across two partitions, one on each physical device, with the backup set being two different partitions, one each on the same pair of physical devices. If you have five devices, I'd recommend doing about the same thing, only with the fifth device as a normally physically disconnected (and possibly stored separately, perhaps even off-site) backup of the two separate btrfs pair-mirror raid1s. Actually, you can remove a device from one of the raid1s (presumably the backup/secondary) to create the new btrfs raid1, still leaving the one (presumably the working/primary) as a complete two-device raid1 pair, leaving the other device as a backup that can still be mounted using degraded, should that be necessary. Or simply use the fifth device for something else. =:^) With six devices you have a multi-way choice: 1) Btrfs raid1 pairs as with four devices but with two levels of backup. This would be the same as the 5-device scenario, but completing the pair for the secondary backup. 2) Btrfs raid1 pairs with an addition device in primary and backup. 2a) This gives you a bit more flexibility in terms of size, since you now get 1.5 times the capacity of a single device, for both primary/working and secondary/backup. 2b) You also get the device-dropped write-flexibility described under the three-device case, but now for both primary and backup. =:^) 3) Six-device raid10. In "simple" configuration, this would give you 3- way-striping and 3X capacity of a single device, still pair mirroring, but you'd lose the independent backups. However, if you used partitioning to split each physical device in half and made each set of six partitions an independent btrfs raid10, you'd still have half the 3X capacity, so 1.5X the capacity of a single device, still have the three- way-striping and 2-way-mirroring for 3X the speed with pair-mirroring redundancy, *AND* have independent primary and backup sets, each its own 6-way set of partitions across the 6 devices, giving you simple tear-down and recreate of the backup raid10 as the new working raid10. That would be a very nice setup; something I'd like for myself. =:^) Actually, once N-way-mirroring hits I'm going to want to setup pretty close to just this, except using triplet mirroring and two-way-striping instead of the reverse. Keeping the two-way-partitioning as well, that'd give me 2X speed and 3X redundancy, at 1X capacity, with a primary and backup raid10 on different 6-way partition sets of the same six physical devices. Ideally, the selectable-way mirroring/striping code will be flexible enough by that time to let me temporarily reduce striping (and speed/ capacity) to 1-way while keeping 3-way-mirroring, should I lose a device or two, thus avoiding the force-to-read-only that dropping below two- devices in a raid1 or four devices in a raid10 currently does. Upon replacing the bad devices, I could rebalance the 1-way-striped bits and get full 2-way-striping once again, while the triplet mirroring would have never been compromised. That's my ideal. =:^) But to do that I still need triplet-mirroring, and triplet-mirroring isn't available yet. =:^( But it'll sure be nice when I CAN do it! =:^) 4) Do something else with the last pair of devices. =:^) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: safe/necessary to balance system chunks? 2014-04-25 18:43 ` Steve Leung 2014-04-25 19:07 ` Austin S Hemmelgarn @ 2014-04-26 1:11 ` Duncan 2014-04-26 1:24 ` Chris Murphy 2 siblings, 0 replies; 17+ messages in thread From: Duncan @ 2014-04-26 1:11 UTC (permalink / raw) To: linux-btrfs Steve Leung posted on Fri, 25 Apr 2014 12:43:12 -0600 as excerpted: > On 04/25/2014 12:12 PM, Austin S Hemmelgarn wrote: > >> Personally, I would recommend making a full backup of all the data (tar >> works wonderfully for this), and recreate the entire filesystem from >> scratch, but passing all three devices to mkfs.btrfs. This should >> result in all the chunks being RAID1, and will also allow you to >> benefit from newer features. > > I do have backups of the really important stuff from this filesystem, > but they're offsite. As this is just for a home system, I don't have > enough temporary space for a full backup handy (which is related to how > I ended up in this situation in the first place). > > Once everything gets rebalanced though, I don't think I'd be missing out > on any features, would I? As ASH says, nothing critical. But there are some relatively minor newer features available, and I actually re-mkfs.btrfs most of my (several) btrfs every few kernel cycles to take advantage of them, since btrfs is still under development and these minor features do accumulate over time. The on-device format is now guaranteed to be readable by newer kernels, but that doesn't mean a newer kernel couldn't take advantage of minor features available to it in a newer filesystem, if the filesystem was new enough to make them available. Of course the other reason is that doing a mkfs guarantees (especially with ssds, where it by default does a trim/discard on the entire space it's mkfsing, guaranteeing a zero-out that you'd otherwise have to do manually for that level of zero-out guarantee) that I've eliminated any cruft from now-fixed bugs that otherwise might come back to haunt me at some point. The other consideration is the range of kernels you plan on mounting/ accessing the filesystem with. If you're planning on accessing the filesystem with an old kernel, mkfs.btrfs does have an option to toggle these newer features (with one, extref, allowing more per-directory hard- links, defaulting on, others generally defaulting off), and keeping them off to work with older kernels is possible, but then of course eliminates the newer features as a reason for doing the mkfs in the first place. My local policy being upto four kernel stability levels, current/testing development-kernel, last tested working kernel as first level fallback, latest stable series as second level fallback, and some reasonably recent but occasionally 2-3 stable series stable kernel before that (depending on when I last updated my backup /boot) as backup-boot stable fallback, even the latter is reasonably new, and given that I tend to wait a couple kernel cycles to work out the bugs before activating a new minor-feature here anyway, I don't generally worry much about old kernels when activating such features. So what are these minor features? Using mkfs.btrfs -O list-all (as suggested in the mkfs.btrfs manpage, for btrfs-progs v3.14 (slightly reformatted to avoid wrap when posting): $ mkfs.btrfs -O list-all Filesystem features available at mkfs time: mixed-bg - mixed data and metadata block groups (0x4) extref - increased hardlink limit per file to 65536 (0x40, def) raid56 - raid56 extended format (0x80) skinny-metadata - reduced-size metadata extent refs (0x100) no-holes - no explicit hole extents for files (0x200) Mixed-bg: This one's reasonably old and is available with the -M option as well. It has been the default for filesystems under 1 GiB for some time. Some people recommend it for filesystems upto perhaps 32-64 GiB as well, and it does lessen the hassle with data/metadata getting out of balance since they're then combined, but there is a performance cost to enabling it. Basically, I'd say don't bother with activating it via -O, use -M instead if you want it, but do consider well if you really want it above say 64 or 128 MiB, because there IS a performance cost, and as filesystem sizes get bigger, the benefit of -M/mixed-bg on smaller filesystems doesn't matter as much. Tho mixed-bg DOES make possible dup data (and indeed, requires if if you want dup metadata, since they're mixed together in this mode) on a single- device btrfs, something that's not otherwise possible. Extref: As mentioned, extref is now the default. The reason being it was introduced a number of kernels ago and is reasonably important as some people were running into hardlinking issues with the previous layout, so activating it by default is the right choice. Raid56: Unless you plan on doing raid56 in the near term (and that's not recommended ATM as btrfs raid56 mode isn't yet complete in terms of device loss recovery, etc, anyway), that one probably doesn't matter. Recommend not using raid56 at this time and thus keeping the option off. Skinny-metadata: This one's /relatively/ new, being introduced in kernel 3.10 according to the wiki. In the 3.10 and possibly 3.11 cycles I did see a number of bugfixes going by for it, and wasn't using or recommending it at that time. But I used it on one less critical btrfs in the 3.12 timeframe and had no issues, and with my last mkfs.btrfs round shortly after v3.14's release, I enabled it on everything I redid. The benefit of skinny-metadata is simply less metadata to deal with. It's not critical as a new kernel can write the "fat" metadata just fine, and is not yet the default, but if you're recreating filesystems anyway and don't plan on accessing them with anything older than 3.11, I suggest enabling it. No-holes: This one is still new, enabled in kernel (and btrfs-progs) v3.14, and thus could have a few bugs to work out still. In theory, like skinny-metadata it simply makes for more efficient metadata. However, unlike skinny metadata I've yet to see any bugs at all related to it, and in fact, tracking explicit-hole mapping has I believe caused a few bugs of its own, so despite its newness, I enabled it for all new btrfs in my last round of mkfs.btrfs filesystem redos shortly after v3.14 release. So a cautious no-holes recommend once you are fairly sure you won't be mounting with anything pre-3.14 series, tho be aware that since 3.14 itself is so new and because this isn't yet the default, it won't yet have the testing that the other minor-features have, and thus could in theory still have a few bugs. But as I said, I believe there were actually bugs in the hole-extent processing before, so I think the risk profile on this one is actually pretty favorable, and I'd consider the accessing kernel age factor the major caveat, at this point. So here I'm doing -O extref,skinny-metadata,no-holes . (Minor usage note: In btrfs-progs v3.14 itself, --features, the long form of the -O option, was buggy and didn't work. That was actually a bug I reported here after finding it when I was doing those redoes as I use a script that was coded to use the long form, only to have it bug out. -O worked tho, and after rewriting that bit of the script to use that, it worked fine. I haven't actually updated btrfs-progs in 10 days or so, but I've seen mention of a v3.14.1, which presumably fixes this bug.) Meanwhile, as I've observed before, I tend to be more comfortable on newsgroups and mailing lists than editing the wiki, and I still haven't gotten a wiki account setup. If someone with such an account wants to put all that on the wiki somewhere I'm sure many will find it useful. =;^) So back to the immediate situation at hand. Since you don't have all the data at hand (it's partially remote) to do a mkfs and restore at this time, you may or may not wish to do a full mkfs.btrfs and restore, and indeed, the features and performance you'd gain in doing so are relatively minor. But in general, you probably want to consider doing such a mkfs.btrfs and restore at some point, even if it's only once, perhaps a year or so from now as btrfs continues toward full stabilization and the frequency of these individually relatively minor on- device-format changes drops toward zero, the ultimate idea being to rebuild your filesystem with a stable btrfs, doing away with all the cruft that might have built up after years of running a not-entirely stable development filesystem, as well as taking advantage of all the individually incremental feature tweaks that were made available one at a time as the filesystem stabilized. Personally I've been routinely testing pre-stable releases of various things for a couple decades now, including what I now consider MS proprietary servantware (in the context of my sig) before the turn of the century (I was active back on the IE/OE beta newsgroups back in the day and at one point was considering becoming an MSMVP, before I discovered freedomware), and a policy of cleaning out the beta cruft and making a clean start once there's a proper stable release out, has never done me wrong. I don't always do so, and in fact am using the same basic user- level KDE config I used back with KDE 2 shortly after the turn of the century, tho I've of course gone thru and manually cleaned out old config files from time to time, but particularly for something as critical to the safety of my data as a filesystem, I'd consider, and could certainly recommend, nothing else. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: safe/necessary to balance system chunks? 2014-04-25 18:43 ` Steve Leung 2014-04-25 19:07 ` Austin S Hemmelgarn 2014-04-26 1:11 ` Duncan @ 2014-04-26 1:24 ` Chris Murphy 2014-04-26 2:56 ` Steve Leung 2 siblings, 1 reply; 17+ messages in thread From: Chris Murphy @ 2014-04-26 1:24 UTC (permalink / raw) To: Steve Leung; +Cc: Btrfs BTRFS On Apr 25, 2014, at 12:43 PM, Steve Leung <sjleung@shaw.ca> wrote: > Once everything gets rebalanced though, I don't think I'd be missing out on any features, would I? The default nodesize/leafsize is 16KB since btrfs-progs v3.12. This isn't changed with a balance. The difference between the previous default 4KB, and 16KB is performance and small file efficiency. Also, I think newly default with v3.12 btrfs-progs is extref support is enabled, which permits significantly more hardlinks. But this can be turned on for an existing volume using btrfstune. Any other efficiencies in writing things to disk aren't actually rewritten with newer methods using a balance. Balance just causes chunks to be rewritten. Chris Murphy ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: safe/necessary to balance system chunks? 2014-04-26 1:24 ` Chris Murphy @ 2014-04-26 2:56 ` Steve Leung 2014-04-26 4:05 ` Chris Murphy 2014-04-26 4:55 ` Duncan 0 siblings, 2 replies; 17+ messages in thread From: Steve Leung @ 2014-04-26 2:56 UTC (permalink / raw) To: Chris Murphy; +Cc: Btrfs BTRFS On Fri, 25 Apr 2014, Chris Murphy wrote: > On Apr 25, 2014, at 12:43 PM, Steve Leung <sjleung@shaw.ca> wrote: >> Once everything gets rebalanced though, I don't think I'd be missing out on any features, would I? > The default nodesize/leafsize is 16KB since btrfs-progs v3.12. This > isn't changed with a balance. The difference between the previous > default 4KB, and 16KB is performance and small file efficiency. Ah, now it's coming back to me. The last major gyration I had on this filesystem (and the ultimate trigger for my original issue) was juggling everything around so that I could reformat for the 16kB node size. Incidentally, is there a way for someone to tell what the node size currently is for a btrfs filesystem? I never noticed that info printed anywhere from any of the btrfs utilities. In case anyone's wondering, I did balance the system chunks on my filesystem and "btrfs fi df" now looks normal. So thanks to all for the hints and advice. Steve ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: safe/necessary to balance system chunks? 2014-04-26 2:56 ` Steve Leung @ 2014-04-26 4:05 ` Chris Murphy 2014-04-26 4:55 ` Duncan 1 sibling, 0 replies; 17+ messages in thread From: Chris Murphy @ 2014-04-26 4:05 UTC (permalink / raw) To: Btrfs BTRFS; +Cc: Steve Leung On Apr 25, 2014, at 8:56 PM, Steve Leung <sjleung@shaw.ca> wrote: > > Incidentally, is there a way for someone to tell what the node size currently is for a btrfs filesystem? I never noticed that info printed anywhere from any of the btrfs utilities. btrfs-show-super > In case anyone's wondering, I did balance the system chunks on my filesystem and "btrfs fi df" now looks normal. So thanks to all for the hints and advice. Good news. I kinda wonder if some of the degraded multiple device mount failures we've seen are the result of partially missing system chunks. Chris Murphy ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: safe/necessary to balance system chunks? 2014-04-26 2:56 ` Steve Leung 2014-04-26 4:05 ` Chris Murphy @ 2014-04-26 4:55 ` Duncan 1 sibling, 0 replies; 17+ messages in thread From: Duncan @ 2014-04-26 4:55 UTC (permalink / raw) To: linux-btrfs Steve Leung posted on Fri, 25 Apr 2014 20:56:06 -0600 as excerpted: > Incidentally, is there a way for someone to tell what the node size > currently is for a btrfs filesystem? I never noticed that info printed > anywhere from any of the btrfs utilities. btrfs-show-super <device> displays that, among other relatively obscure information. Look for node-size and leaf-size. (Today they are labeled synonyms in the mkfs.btrfs manpage and should be set the same. But if I'm remembering correctly, originally they could be set separately in mkfs.btrfs, and apparently had slightly different technical meaning. Tho I don't believe actually setting them to different sizes was ever supported.) Sectorsize is also printed. The only value actually supported for it, however, has always been the architecture's kernel page size, 4096 bytes for x86 in both 32- and 64-bit variants, and I'm told in arm as well. But there are other archs (including sparc, mips and s390) where it's different, and as the mkfs.btrfs manpage says, don't set it unless you plan on actually using the filesystem on a different arch. There is, however, work to allow btrfs to use different sector-sizes, 2048 bytes to I believe 64 KiB, thus allowing a btrfs created on an arch with a different page size to at least work on other archs, even if it's never going to be horribly efficient. The former default for all three settings was page size, 4096 bytes on x86, but node/leafsize were apparently merged at the same time their default was changed to 16 KiB, since that's more efficient for nearly all users. What I've wondered, however, is if a 16K nodesize is more efficient than 4K for nearly everyone, under what conditions might the even larger 32 KiB or even 64 KiB (the max) be even MORE efficient. That I don't know, and anyway, I strongly suspect that being less tested, it might trigger more bugs anyway, and while I'm testing a still not entirely stable btrfs, I've not been /that/ interested in trying the more unusual stuff or in triggering more bugs than I might normally come across. But someday curiosity might get the better of me and I might try it... > In case anyone's wondering, I did balance the system chunks on my > filesystem and "btrfs fi df" now looks normal. So thanks to all for the > hints and advice. Heh, good to read. =:^) Anyway, you provokes quite a discussion, and I think most of us learned something from it or at least thought about angles we'd not thought of before, so I'm glad you posted the questions. Challenged me, anyway! =:^) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: safe/necessary to balance system chunks? 2014-04-25 18:12 ` Austin S Hemmelgarn 2014-04-25 18:43 ` Steve Leung @ 2014-04-25 19:14 ` Hugo Mills 2014-06-19 11:32 ` Alex Lyakas 2014-04-25 23:03 ` Duncan 2 siblings, 1 reply; 17+ messages in thread From: Hugo Mills @ 2014-04-25 19:14 UTC (permalink / raw) To: Austin S Hemmelgarn; +Cc: Chris Murphy, Steve Leung, linux-btrfs [-- Attachment #1: Type: text/plain, Size: 2959 bytes --] On Fri, Apr 25, 2014 at 02:12:17PM -0400, Austin S Hemmelgarn wrote: > On 2014-04-25 13:24, Chris Murphy wrote: > > > > On Apr 25, 2014, at 8:57 AM, Steve Leung <sjleung@shaw.ca> wrote: > > > >> > >> Hi list, > >> > >> I've got a 3-device RAID1 btrfs filesystem that started out life as single-device. > >> > >> btrfs fi df: > >> > >> Data, RAID1: total=1.31TiB, used=1.07TiB > >> System, RAID1: total=32.00MiB, used=224.00KiB > >> System, DUP: total=32.00MiB, used=32.00KiB > >> System, single: total=4.00MiB, used=0.00 > >> Metadata, RAID1: total=66.00GiB, used=2.97GiB > >> > >> This still lists some system chunks as DUP, and not as RAID1. Does this mean that if one device were to fail, some system chunks would be unrecoverable? How bad would that be? > > > > Since it's "system" type, it might mean the whole volume is toast if the drive containing those 32KB dies. I'm not sure what kind of information is in system chunk type, but I'd expect it's important enough that if unavailable that mounting the file system may be difficult or impossible. Perhaps btrfs restore would still work? > > > > Anyway, it's probably a high penalty for losing only 32KB of data. I think this could use some testing to try and reproduce conversions where some amount of "system" or "metadata" type chunks are stuck in DUP. This has come up before on the list but I'm not sure how it's happening, as I've never encountered it. > > > As far as I understand it, the system chunks are THE root chunk tree for > the entire system, that is to say, it's the tree of tree roots that is > pointed to by the superblock. (I would love to know if this > understanding is wrong). Thus losing that data almost always means > losing the whole filesystem. From a conversation I had with cmason a while ago, the System chunks contain the chunk tree. They're special because *everything* in the filesystem -- including the locations of all the trees, including the chunk tree and the roots tree -- is positioned in terms of the internal virtual address space. Therefore, when starting up the FS, you can read the superblock (which is at a known position on each device), which tells you the virtual address of the other trees... and you still need to find out where that really is. The superblock has (I think) a list of physical block addresses at the end of it (sys_chunk_array), which allows you to find the blocks for the chunk tree and work out this mapping, which allows you to find everything else. I'm not 100% certain of the actual format of that array -- it's declared as u8 [2048], so I'm guessing there's a load of casting to something useful going on in the code somewhere. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Is it still called an affair if I'm sleeping with my wife --- behind her lover's back? [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 811 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: safe/necessary to balance system chunks? 2014-04-25 19:14 ` Hugo Mills @ 2014-06-19 11:32 ` Alex Lyakas 0 siblings, 0 replies; 17+ messages in thread From: Alex Lyakas @ 2014-06-19 11:32 UTC (permalink / raw) To: Hugo Mills, Austin S Hemmelgarn, Chris Murphy, Steve Leung, linux-btrfs On Fri, Apr 25, 2014 at 10:14 PM, Hugo Mills <hugo@carfax.org.uk> wrote: > On Fri, Apr 25, 2014 at 02:12:17PM -0400, Austin S Hemmelgarn wrote: >> On 2014-04-25 13:24, Chris Murphy wrote: >> > >> > On Apr 25, 2014, at 8:57 AM, Steve Leung <sjleung@shaw.ca> wrote: >> > >> >> >> >> Hi list, >> >> >> >> I've got a 3-device RAID1 btrfs filesystem that started out life as single-device. >> >> >> >> btrfs fi df: >> >> >> >> Data, RAID1: total=1.31TiB, used=1.07TiB >> >> System, RAID1: total=32.00MiB, used=224.00KiB >> >> System, DUP: total=32.00MiB, used=32.00KiB >> >> System, single: total=4.00MiB, used=0.00 >> >> Metadata, RAID1: total=66.00GiB, used=2.97GiB >> >> >> >> This still lists some system chunks as DUP, and not as RAID1. Does this mean that if one device were to fail, some system chunks would be unrecoverable? How bad would that be? >> > >> > Since it's "system" type, it might mean the whole volume is toast if the drive containing those 32KB dies. I'm not sure what kind of information is in system chunk type, but I'd expect it's important enough that if unavailable that mounting the file system may be difficult or impossible. Perhaps btrfs restore would still work? >> > >> > Anyway, it's probably a high penalty for losing only 32KB of data. I think this could use some testing to try and reproduce conversions where some amount of "system" or "metadata" type chunks are stuck in DUP. This has come up before on the list but I'm not sure how it's happening, as I've never encountered it. >> > >> As far as I understand it, the system chunks are THE root chunk tree for >> the entire system, that is to say, it's the tree of tree roots that is >> pointed to by the superblock. (I would love to know if this >> understanding is wrong). Thus losing that data almost always means >> losing the whole filesystem. > > From a conversation I had with cmason a while ago, the System > chunks contain the chunk tree. They're special because *everything* in > the filesystem -- including the locations of all the trees, including > the chunk tree and the roots tree -- is positioned in terms of the > internal virtual address space. Therefore, when starting up the FS, > you can read the superblock (which is at a known position on each > device), which tells you the virtual address of the other trees... and > you still need to find out where that really is. > > The superblock has (I think) a list of physical block addresses at > the end of it (sys_chunk_array), which allows you to find the blocks > for the chunk tree and work out this mapping, which allows you to find > everything else. I'm not 100% certain of the actual format of that > array -- it's declared as u8 [2048], so I'm guessing there's a load of > casting to something useful going on in the code somewhere. The format is just a list of pairs: struct btrfs_disk_key, struct btrfs_chunk struct btrfs_disk_key, struct btrfs_chunk ... For each SYSTEM block-group (btrfs_chunk), we need one entry in the sys_chunk_array. During mkfs the first SYSTEM block group is created, for me its 4MB. So only if the whole chunk tree grows over 4MB, we need to create an additional SYSTEM block group, and then we need to have a second entry in the sys_chunk_array. And so on. Alex. > > Hugo. > > -- > === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === > PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk > --- Is it still called an affair if I'm sleeping with my wife --- > behind her lover's back? ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: safe/necessary to balance system chunks? 2014-04-25 18:12 ` Austin S Hemmelgarn 2014-04-25 18:43 ` Steve Leung 2014-04-25 19:14 ` Hugo Mills @ 2014-04-25 23:03 ` Duncan 2014-04-26 1:41 ` Chris Murphy 2 siblings, 1 reply; 17+ messages in thread From: Duncan @ 2014-04-25 23:03 UTC (permalink / raw) To: linux-btrfs Austin S Hemmelgarn posted on Fri, 25 Apr 2014 14:12:17 -0400 as excerpted: > > On 2014-04-25 13:24, Chris Murphy wrote: >> >> On Apr 25, 2014, at 8:57 AM, Steve Leung <sjleung@shaw.ca> wrote: >> >>> Assuming this is something that needs to be fixed, would I be able to >>> fix this by balancing the system chunks? Since the "force" flag is >>> required, does that mean that balancing system chunks is inherently >>> risky or unpleasant? >> >> I don't think force is needed. You'd use btrfs balance start >> -sconvert=raid1 <mountpoint>; or with -sconvert=raid1,soft although >> it's probably a minor distinction for such a small amount of data. > > The kernel won't allow a balance involving system chunks unless you > specify force, as it considers any kind of balance using them to be > dangerous. Given your circumstances, I'd personally say that the safety > provided by RAID1 outweighs the risk of making the FS un-mountable. To clear this up, FWIW... In a balance, metadata includes system by default. If you go back and look at the committed balance filters patch, the wording on the -s/system chunks option is that it requires -f/force because one would normally handle system as part of metadata, not for any other reason. What it looks like to me is that the original patch in progress may not have had -s/system as a separate filter at all, treating it as -m/metadata, but perhaps someone suggested having -s/system as a separate option too, and the author agreed. But since -m/metadata includes -s/ system by default, and that was the intended way of doing things, -f/force was added as necessary when doing only -s/system, since presumably that was considered an artificial distinction, and handling -s/ system as a part of -m/metadata was considered the more natural method. Which begs the question[1], is there a safety or procedural reason one should prefer handling metadata and system chunks at the same time, perhaps because rewriting the one involves rewriting critical bits of the other anyway, or is it simply that the author considered system a subset of metadata, anyway? That I don't know. But what I do know is that -f/force isn't required with -m/metadata, which includes -s/system by default anyway, so unless there's reason to treat the two differently, just use -m/metadata and let it handle -s/ system as well. =:^) --- [1] Begs the question: Modern more natural/literal majority usage meaning: invites/forces the question, the question becomes so obvious that it's "begging" to be asked, at least in the speaker/author's (my) own head. Yes, I am aware of but generally prefer "assumes and thus can't prove the postulate" or similar wording as an alternate to the translation-accident meaning. If you have some time and are wondering what I'm talking about and/or think I used the term incorrectly, google it (using duck-duck-go or the like if you don't like google's profiling). =:^) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: safe/necessary to balance system chunks? 2014-04-25 23:03 ` Duncan @ 2014-04-26 1:41 ` Chris Murphy 2014-04-26 4:23 ` Duncan 0 siblings, 1 reply; 17+ messages in thread From: Chris Murphy @ 2014-04-26 1:41 UTC (permalink / raw) To: Btrfs BTRFS On Apr 25, 2014, at 5:03 PM, Duncan <1i5t5.duncan@cox.net> wrote: > But since -m/metadata includes -s/ > system by default, and that was the intended way of doing things, > -f/force was added as necessary when doing only -s/system, since > presumably that was considered an artificial distinction, and handling -s/ > system as a part of -m/metadata was considered the more natural method. OK so somehow in Steve's conversion, metadata was converted from DUP to RAID1 completely, but some portion of system was left as DUP, incompletely converted to RAID1. It doesn't seem obvious that -mconvert is what he'd use now, but maybe newer btrfs-progs it will also convert any unconverted system chunk. If not, then -sconvert=raid1 -f and optionally -v. This isn't exactly risk free, given that it requires -f; and I'm not sure we can risk assess conversion failure vs the specific drive containing system DUP chunks dying. But for me a forced susage balance was fast: [root@rawhide ~]# time btrfs balance start -susage=100 -f -v / Dumping filters: flags 0xa, state 0x0, force is on SYSTEM (flags 0x2): balancing, usage=100 Done, had to relocate 1 out of 8 chunks real 0m0.095s user 0m0.001s sys 0m0.017s Chris Murphy ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: safe/necessary to balance system chunks? 2014-04-26 1:41 ` Chris Murphy @ 2014-04-26 4:23 ` Duncan 0 siblings, 0 replies; 17+ messages in thread From: Duncan @ 2014-04-26 4:23 UTC (permalink / raw) To: linux-btrfs Chris Murphy posted on Fri, 25 Apr 2014 19:41:43 -0600 as excerpted: > OK so somehow in Steve's conversion, metadata was converted from DUP to > RAID1 completely, but some portion of system was left as DUP, > incompletely converted to RAID1. It doesn't seem obvious that -mconvert > is what he'd use now, but maybe newer btrfs-progs it will also convert > any unconverted system chunk. > > If not, then -sconvert=raid1 -f and optionally -v. > > This isn't exactly risk free, given that it requires -f; and I'm not > sure we can risk assess conversion failure vs the specific drive > containing system DUP chunks dying. But for me a forced susage balance > was fast: > > [root@rawhide ~]# time btrfs balance start -susage=100 -f -v / > Dumping filters: flags 0xa, state 0x0, force is on > SYSTEM (flags 0x2): balancing, usage=100 > Done, had to relocate 1 out of 8 chunks > > real 0m0.095s user 0m0.001s sys 0m0.017s Yes. The one thing that can be said about system chunks is that they're small enough that processing just them should be quite fast, even on spinning rust. So regardless of whether there's a safety issue justifying the required -f/force for -s/system-only, or not, unlike the possibly many hours a full balance or some minutes to an hour or so a full balance on a large spinning rust based btrfs may take, at least if there is some possible danger in the -s system alone rebalance, the risk-window should be quite small, time-wise. =:^) And correspondingly, safety issue or not, I've never seen a report here of bugs or filesystem loss due to use of -s -f. That doesn't mean it can't happen, that's under debate and I can't safely say; it does mean you're pretty unlucky if you're the first to have a need to report such a thing, here. =:^\ But we all know that btrfs is still under heavy development, and thus have those tested backups ready just in case, right? In which case, I think whatever risk there might be relative to that of simply using btrfs at all at this point in time, must be pretty negligible. =:^) Tho a few people each year still do get struck by lightening... or win the lottery. Just living is a risk. <shrug> -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: safe/necessary to balance system chunks? 2014-04-25 17:24 ` Chris Murphy 2014-04-25 18:12 ` Austin S Hemmelgarn @ 2014-04-25 18:36 ` Steve Leung 1 sibling, 0 replies; 17+ messages in thread From: Steve Leung @ 2014-04-25 18:36 UTC (permalink / raw) To: Chris Murphy; +Cc: linux-btrfs On 04/25/2014 11:24 AM, Chris Murphy wrote: > > On Apr 25, 2014, at 8:57 AM, Steve Leung <sjleung@shaw.ca> wrote: > >> I've got a 3-device RAID1 btrfs filesystem that started out life as single-device. >> >> btrfs fi df: >> >> Data, RAID1: total=1.31TiB, used=1.07TiB >> System, RAID1: total=32.00MiB, used=224.00KiB >> System, DUP: total=32.00MiB, used=32.00KiB >> System, single: total=4.00MiB, used=0.00 >> Metadata, RAID1: total=66.00GiB, used=2.97GiB >> >> This still lists some system chunks as DUP, and not as RAID1. Does this mean that if one device were to fail, some system chunks would be unrecoverable? How bad would that be? > > Anyway, it's probably a high penalty for losing only 32KB of data. I think this could use some testing to try and reproduce conversions where some amount of "system" or "metadata" type chunks are stuck in DUP. This has come up before on the list but I'm not sure how it's happening, as I've never encountered it. As for how it occurred, I'm not sure. I created this filesystem some time ago (not sure exactly, but I'm guessing with a 3.4-era kernel?) so it's quite possible it's not reproducible on newer kernels. It's also nice to know I've been one failed device away from a dead filesystem for a long time now, but better to notice it late than never. :) Steve ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2014-06-19 11:32 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-04-25 14:57 safe/necessary to balance system chunks? Steve Leung 2014-04-25 17:24 ` Chris Murphy 2014-04-25 18:12 ` Austin S Hemmelgarn 2014-04-25 18:43 ` Steve Leung 2014-04-25 19:07 ` Austin S Hemmelgarn 2014-04-26 4:01 ` Duncan 2014-04-26 1:11 ` Duncan 2014-04-26 1:24 ` Chris Murphy 2014-04-26 2:56 ` Steve Leung 2014-04-26 4:05 ` Chris Murphy 2014-04-26 4:55 ` Duncan 2014-04-25 19:14 ` Hugo Mills 2014-06-19 11:32 ` Alex Lyakas 2014-04-25 23:03 ` Duncan 2014-04-26 1:41 ` Chris Murphy 2014-04-26 4:23 ` Duncan 2014-04-25 18:36 ` Steve Leung
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).