* BTRFS list of grievances
@ 2024-09-27 11:20 waxhead
2024-09-27 16:27 ` Roman Mamedov
` (2 more replies)
0 siblings, 3 replies; 14+ messages in thread
From: waxhead @ 2024-09-27 11:20 UTC (permalink / raw)
To: Btrfs BTRFS
First thing first: I am a long time BTRFS user and frequent reader of
the mailing list. I am *NOT* a BTRFS developer, but that being said I
have been known to summon a segmentation failure or two from years of
programming in C.
Since I have been using BTRFS more or less problem free since 2013 or so
for nearly everything, I figured that I should be entitled to simply
write down a list of things that I personally think sucks (more or less)
with this otherwise fine filesystem
Make of it what you will, but what I am trying to get across is what the
upper class would probably call 'constructive criticism'.
So here goes:
1. FS MANAGEMENT
================
BTRFS is rather simple to manage. We can add/remove devices on the fly,
balance the filesystem, scrub, defrag, select compression algorithms
etc. Some of these things are done as mount options, some as properties
and some by issuing a command that process something.
Personally, I feel this is a bit messy and in some cases quite backwards
at times. I believe the original idea was that BTRFS should support pr.
subvolume mount options, storage profiles, etc etc.... and subvolumes
are after all a key feature of the filesystem.
Heck, we even have a root subvolume (id 256) which ideally is the parent
(or root) for all other subvolumes on the filesystem. So why on earth do
we have commands such as 'btrfs balance start -dusage=50 /fsmnt' when
logically it could just has easily have been 'btrfs <subvolume> balance
start -dusage=50' . E.g. on the root subvolume instead of the fs mount
point.
Besides, if BTRFS at some point are supposed to be more "subvolume
centric" then why are not things like scrub, balance, convert
(data/metadata), device add/remove or even defrag handled as properties
to a subvolume. E.g. why not set a flag that triggers what needs to be
done, and let the filesystem process that as a background task.
That would for example allow for finer granularity for scrub for certain
subvolumes, instead of having to do the entire filesystem as it
currently is now.
Status for the jobs do in my opinion belong in sysfs, but there is
nothing wrong with a simple command to "pretty'fy" the status either.
And yes, I even mentioned device add/remove because if it would be
possible at some point to assign priority/weight to certain devices for
certain subvolumes then making a subvolume prefer or avoid using a
certain storage device wold be as "simple" as setting a suitable
weight/priority, and it would be possible to add/remove (assign) storage
devices without affecting all other subvolumes.
So for me , 'btrfs property set' (or something similar) sounds like the
only sensible way of properly managing a BTRFS. And really, with the
exception of the rescue and subvolume mount options most, if not all
other mount options seems to better belong as a property for a subvolume
(which may or may not be the id 256 / root subvolume)
2. USE DEVICE ID's EVERYWHERE INSTEAD OF /dev/sdX:
==================================================
Using "btrfs filesystem show" will list all BTRFS devices, and also show
the assigned ID for that device / partition / whatever. Since BTRFS
already have the notion of a device ID, it seems pointless to not use
that ID for management / identification anywhere possible.
(for example btrfs device stat /mnt)
3. SOME DEVICES MISSING SHOULD BE ID 1,2,3,4... MISSING:
========================================================
If one or more devices are missing it would have been great to know WHAT
devices where missing. Why not print the ID's of the missing devices
instead of just let the user know that "some" of them are missing?
4. THE ABILITY TO SET A LABEL FOR A DEVICE ID:
==============================================
It would have been great to set a label for a BTRFS device ID. For
example ID1 = "Shelf01.24", ID2 = "NAS_01", ID3 = "localdiskXYZ"
5. DEDUPLICATION IS NOT INTEGRATED IN BTRFS:
============================================
I think that some form of (simple) deduplication should be integrated in
BTRFS. Using unofficial tools may be perfectly safe, but it feels
"unsafe" to be honest. Besides deduplication is something that might
have been interesting to turn on/on_whenidle/off as a property to a
subvolume as well.
6. DEVICE STATS:
================
Again device ID's are not used, but also why is this info not listed in
a table? Showing this in a table would make 5x lines become 1x line
which would be far more readable. Finaly it is not clear to me what is
fixed errors, and what are actual damage accumulated in the filesystem
7. LIST OF DAMAGED FILES:
=========================
There is no easy way to get a list of damaged files on a BTRFS
filesystem to my knowledge. It would be great to have a command for that.
8. ABILITY TO RESERVE SPARE SPACE:
==================================
Because of the way BTRFS works a spare device is not very useful. Rather
spare space would be a good idea I think. That way if one device is
missing data, it could be replicated to other drives (or even on a
single device [DUP] in emergency situations)
9. ABILITY TO MERGE / CONSUME EXISTING BTRFS:
=============================================
It would have been great to merge existing BTRFS volumes into a larger
volume e.g. assimilate it ..because we all know resistance is futile.
Again a subvolume would be the cleanest way of importing another BTRFS I
think.
10. AUTOREJECT FAILED DEVICES:
==============================
As I have mentioned before. It it was possible to assign certain storage
devices / storage device groups to certain subvolumes then as the
failure count for a device increase, it may be preferable to
automatically lower the weight/priority of that device so that things
are stored elsewhere. If auto-migration is triggered at a low enough
weight then devices with a high failure rate/count could be rejected.
11. That's it folks!
====================
I know it is a lot of "rant", but hope someone find it useful or
inspiring. If for nothing more than to keep my mouth shut. ;)
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: BTRFS list of grievances 2024-09-27 11:20 BTRFS list of grievances waxhead @ 2024-09-27 16:27 ` Roman Mamedov 2024-09-27 18:05 ` Remi Gauvin ` (2 more replies) 2024-09-27 17:44 ` Mark Harmstone 2024-09-30 21:43 ` Goffredo Baroncelli 2 siblings, 3 replies; 14+ messages in thread From: Roman Mamedov @ 2024-09-27 16:27 UTC (permalink / raw) To: waxhead; +Cc: Btrfs BTRFS On Fri, 27 Sep 2024 13:20:14 +0200 waxhead <waxhead@dirtcellar.net> wrote: > 1. FS MANAGEMENT > ================ > BTRFS is rather simple to manage. We can add/remove devices on the fly, > balance the filesystem, scrub, defrag, select compression algorithms > etc. Some of these things are done as mount options, some as properties > and some by issuing a command that process something. I will add my annoyance or rather a showstopper. Consider a RAID1 of two 20TB disks. One disk disconnects and the system operates on just the remaining one for a few days. Side note: will Btrfs even agree to operate in such state without constant stream of errors to dmesg? Then the disk is reconnected to the system. For a start, are we even able to cleanly forget an abruptly disappeared drive in RAID1, and then re-add it back when the same disk it reappears (under a different /dev/sdX location)? Without remounting and reboot? Secondly, it feels like you'll be extremely lucky not to die a fiery death of "parent transid mismatch errors" right away with Btrfs, after this. Or if not, then how do you get from there to a consistent state? Run a scrub, make the system reread the entire 40 TB of data, correcting errors and lack of duplication where necessary. Meanwhile, mdadm RAID1: thanks to the Write-intent bitmap, after a re-add the RAID resyncs just the small changed areas from the continuously running disk to the temporarily-absent one, and the array consistency is almost instantly restored, in many cases just with a few GBs read and written. Or maybe I underestimate the current Btrfs capabilities here? -- With respect, Roman ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: BTRFS list of grievances 2024-09-27 16:27 ` Roman Mamedov @ 2024-09-27 18:05 ` Remi Gauvin 2024-09-27 19:01 ` Colin S 2024-09-28 10:15 ` Paul Jones 2024-09-28 17:51 ` Roman Mamedov 2 siblings, 1 reply; 14+ messages in thread From: Remi Gauvin @ 2024-09-27 18:05 UTC (permalink / raw) To: Roman Mamedov, waxhead; +Cc: Btrfs BTRFS On 2024-09-27 12:27 p.m., Roman Mamedov wrote: > > Or if not, then how do you get from there to a consistent state? Run a scrub, > make the system reread the entire 40 TB of data, correcting errors and lack of > duplication where necessary. > The BTRFS handling of this situation is actually worse. The often given, (and entirely too simple) answer is to scrub. But this has several caveats. 1. The system will not detect the error state automatically. So fixing this requires the admin to be actively monitoring for errors to detect the missed writes. (regular monitoring of btrfs dev stats and allert on errors is required.) 2. Any files that are stored on on the device with CoW Disabled will not be fixed, and the two copies will be different, with no real way to detect or fix. There are packages that disable CoW on files by default. (systemd log files, but probably more concerning, and virtual disk created by libvirt, for example. (Some amount of divergence can happen at any unclean shutdown in this scenario) 3. I don't have exact math at my fingertips, but with enough failed writes, the chances of a CRC32 collision of the stale data leaving unfixed/corrupted data behind gets fairly high. For reasons of 2 and 3, the only way to fix this without increasing chance of data corruption is to replace the previously disconnected drive to a hot spare. (with the -r option to btrfs replace.) ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: BTRFS list of grievances 2024-09-27 18:05 ` Remi Gauvin @ 2024-09-27 19:01 ` Colin S 2024-10-02 19:31 ` Chris Murphy 0 siblings, 1 reply; 14+ messages in thread From: Colin S @ 2024-09-27 19:01 UTC (permalink / raw) To: Btrfs BTRFS On 27/09/2024 13:05, Remi Gauvin wrote: > 2. Any files that are stored on on the device with CoW Disabled will > not be fixed, and the two copies will be different, with no real way to > detect or fix. There are packages that disable CoW on files by > default. (systemd log files, but probably more concerning, and virtual > disk created by libvirt, for example. (Some amount of divergence can > happen at any unclean shutdown in this scenario) For reference, there is a longstanding open request for enhancement to detect mismatch and enable manual recovery[0]. > 3. I don't have exact math at my fingertips, but with enough failed > writes, the chances of a CRC32 collision of the stale data leaving > unfixed/corrupted data behind gets fairly high. It was mentioned to me that CRC32C 12TiB stale collision chance is 75%[1]. Given that there is a proven alternative with no risk of collision (write-intent bitmap), I would say relying on checksums here is the wrong thing to do. > For reasons of 2 and 3, the only way to fix this without increasing > chance of data corruption is to replace the previously disconnected > drive to a hot spare. (with the -r option to btrfs replace.) Furthermore, if a lost device ever mounts rw on its own, it will cause permanent split-brain, because btrfs doesn’t track lost devices so will happily rejoin all devices again later. Compared to everything else that btrfs already solves, this seems like such a trivial problem, as my understanding is that it only requires storing a bitmap on each device that indicates which other devices were present/absent according to that device, and if the bitmaps don’t match, then don’t rejoin the devices without manual intervention. I wrote about this exact thing already a little over a month ago[2] plus gave a dozen citations to past discussions, and didn’t get any feedback from anyone working on btrfs. btrfs developers: short of implementing a write-intent bitmap myself, which is not possible, what can I (or anyone else) do to get some developer time on this? Thanks, [0] https://github.com/kdave/btrfs-progs/issues/134 [1] https://github.com/kdave/btrfs-progs/pull/863#discussion_r1710574045 [2] https://lore.kernel.org/linux-btrfs/55c3f03d-a650-4193-8982-ffcb70575c2e@zetafleet.com/T/ ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: BTRFS list of grievances 2024-09-27 19:01 ` Colin S @ 2024-10-02 19:31 ` Chris Murphy 2024-10-02 23:18 ` Colin S 0 siblings, 1 reply; 14+ messages in thread From: Chris Murphy @ 2024-10-02 19:31 UTC (permalink / raw) To: Colin S, Btrfs BTRFS On Fri, Sep 27, 2024, at 3:01 PM, Colin S wrote: > Furthermore, if a lost device ever mounts rw on its own, it will cause > permanent split-brain, because btrfs doesn’t track lost devices so will > happily rejoin all devices again later. RW degraded mount makes transid ambiguous. There isn't a timestamp in the super, so we can't use that to help disambiguate matching or similar transids on multiple members that were mounted rw degraded. One idea I had is a "mounted degraded" flag that would cause the kernel to do some logic to prevent rw mount that will cause the split brain problem. i.e. do not permit the mount of a file system when 2+ devices present have the degraded rw flag set. Perhaps not even RO, I'm not sure. Would such a flag need to go in the super though? Or could we just make such a thing an item in the device tree? And for that matter, add fs create time, and the last mounted and unmounted times in device tree? We also need a partial scrub, i.e. start a scrub from a certain point so that not all data and metadata needs to be read. Write intent bitmap would help do that but can we infer a write intent bitmap via transid? Or still another idea, a variation on the seed device but a single device can be both seed and sprout? i.e. upon mounting rw degraded, changes to the filesystem need to go in a separate location, the point being to preserve the state prior to mounting degraded, and isolate the degraded writes to "play them back" later when all the drives are together again and we're running normally (not degraded). We really need some things in place with automatic degraded recovery and device readd before we could ever figure out how to have unattended degraded boot (for the 10 people on earth who want this - bad but funny joke). Right now we can't set degraded mount option persistently because split brain. And we can't even try to mount when not all devices are present because mount will fail (without degraded mount option). Therefore there's a udev rule in place to not even try to mount during boot if not all devices are found. Indefinitely waits. Kinda annoying! -- Chris Murphy ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: BTRFS list of grievances 2024-10-02 19:31 ` Chris Murphy @ 2024-10-02 23:18 ` Colin S 0 siblings, 0 replies; 14+ messages in thread From: Colin S @ 2024-10-02 23:18 UTC (permalink / raw) To: Btrfs BTRFS On 02/10/2024 14:31, Chris Murphy wrote: > On Fri, Sep 27, 2024, at 3:01 PM, Colin S wrote: > >> Furthermore, if a lost device ever mounts rw on its own, it will cause >> permanent split-brain, because btrfs doesn’t track lost devices so will >> happily rejoin all devices again later. > RW degraded mount makes transid ambiguous. There isn't a timestamp in the super, so we can't use that to help disambiguate matching or similar transids on multiple members that were mounted rw degraded. > > One idea I had is a "mounted degraded" flag that would cause the kernel to do some logic to prevent rw mount that will cause the split brain problem. i.e. do not permit the mount of a file system when 2+ devices present have the degraded rw flag set. Perhaps not even RO, I'm not sure. > > Would such a flag need to go in the super though? Or could we just make such a thing an item in the device tree? And for that matter, add fs create time, and the last mounted and unmounted times in device tree? I suspect I may be missing something important because I am no FS expert and md-raid doesn’t appear to do this exactly, but I can’t think of a case where it doesn’t work to have a simple per-device bitmap that says which devices have up-to-date writes, from the perspective of each device. Using a bitwise-AND across all visible device bitmaps, whichever bits remain set must be the good device(s) because they were always seen by all devices as never having missed a write. If the bitwise-AND results in 0, then there is a split-brain, because it means each device saw some other device miss a write. When a lost device reappears, set its own up-to-date bit to 0 until it is recovered, and from there I think the existing btrfs-replace mechanism can be used to ‘replace’ the lost disk with itself. > We also need a partial scrub, i.e. start a scrub from a certain point so that not all data and metadata needs to be read. Write intent bitmap would help do that but can we infer a write intent bitmap via transid? Whether this is possible is beyond my knowledge level. This patch from 2022[0] says some thing about why not btrfs btree, but I don’t know if that is talking about some implementation detail, or if it is saying that what you suggest is a fundamentally unsound approach. Either way, my understanding is that the write-intent bitmap is important to reduce the time/resource impact of a failure, but is not strictly necessary to solve the data corruption problem; doing a full device scan and updating mismatched blocks would be sufficient, just slow. (I think checksums cannot be used safely here due to the collision risk.) > We really need some things in place with automatic degraded recovery and device readd before we could ever figure out how to have unattended degraded boot (for the 10 people on earth who want this - bad but funny joke). Right now we can't set degraded mount option persistently because split brain. And we can't even try to mount when not all devices are present because mount will fail (without degraded mount option). Therefore there's a udev rule in place to not even try to mount during boot if not all devices are found. Indefinitely waits. Kinda annoying! I may be just repeating/agreeing with what you are saying, but just in case, most of the time a degraded mount will not result in split-brain, and in the case of a split-brain I don’t believe there is ever a safe way to automatically choose which one might be the right one for a degraded mount, so btrfs should not try. Timestamps would not account for errors during hardware recovery, like rw-mounting the wrong device and writing to it inadvertently. So long as btrfs does not try to rejoin split-brain then the user could take action to restore the correct half. Thank you for sharing your thoughts, and I hope this is the start of a solution for this issue. Best, [0] https://lore.kernel.org/linux-btrfs/bd94acc1-5c1d-203b-8523-e6986206b267@suse.com/T/ ^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: BTRFS list of grievances 2024-09-27 16:27 ` Roman Mamedov 2024-09-27 18:05 ` Remi Gauvin @ 2024-09-28 10:15 ` Paul Jones 2024-09-28 17:51 ` Roman Mamedov 2 siblings, 0 replies; 14+ messages in thread From: Paul Jones @ 2024-09-28 10:15 UTC (permalink / raw) To: Roman Mamedov, waxhead; +Cc: Btrfs BTRFS > -----Original Message----- > From: Roman Mamedov <rm@romanrm.net> > Sent: Saturday, 28 September 2024 2:28 AM > To: waxhead <waxhead@dirtcellar.net> > Cc: Btrfs BTRFS <linux-btrfs@vger.kernel.org> > Subject: Re: BTRFS list of grievances > > On Fri, 27 Sep 2024 13:20:14 +0200 > waxhead <waxhead@dirtcellar.net> wrote: > > > 1. FS MANAGEMENT > > ================ > > BTRFS is rather simple to manage. We can add/remove devices on the > > fly, balance the filesystem, scrub, defrag, select compression > > algorithms etc. Some of these things are done as mount options, some > > as properties and some by issuing a command that process something. > > I will add my annoyance or rather a showstopper. > > Consider a RAID1 of two 20TB disks. One disk disconnects and the system > operates on just the remaining one for a few days. > > Side note: will Btrfs even agree to operate in such state without constant > stream of errors to dmesg? > > Then the disk is reconnected to the system. > > For a start, are we even able to cleanly forget an abruptly disappeared drive > in RAID1, and then re-add it back when the same disk it reappears (under a > different /dev/sdX location)? Without remounting and reboot? > > Secondly, it feels like you'll be extremely lucky not to die a fiery death of > "parent transid mismatch errors" right away with Btrfs, after this. > > Or if not, then how do you get from there to a consistent state? Run a scrub, > make the system reread the entire 40 TB of data, correcting errors and lack of > duplication where necessary. > > Meanwhile, mdadm RAID1: thanks to the Write-intent bitmap, after a re-add > the RAID resyncs just the small changed areas from the continuously running > disk to the temporarily-absent one, and the array consistency is almost > instantly restored, in many cases just with a few GBs read and written. > > Or maybe I underestimate the current Btrfs capabilities here? I have some experience with this - once the disk is reconnected: unmount, btrfs sync, mount. Yes, there will be a firestorm of errors when recent data is accessed (I've had over 100k errors fixed by scrub) but all the data stays intact. You do need to run scrub eventually to be sure all errors have been found and eliminated, but btrfs will fix any problems it encounters on the fly so immediate scrub/rebuild is not needed. It's not the perfect solution but it's definitely robust. Re-adding a disk without unmounting would be amazing. Paul. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: BTRFS list of grievances 2024-09-27 16:27 ` Roman Mamedov 2024-09-27 18:05 ` Remi Gauvin 2024-09-28 10:15 ` Paul Jones @ 2024-09-28 17:51 ` Roman Mamedov 2 siblings, 0 replies; 14+ messages in thread From: Roman Mamedov @ 2024-09-28 17:51 UTC (permalink / raw) To: waxhead; +Cc: Btrfs BTRFS On Fri, 27 Sep 2024 21:27:55 +0500 Roman Mamedov <rm@romanrm.net> wrote: > "parent transid mismatch errors" right away with Btrfs, after this. Speaking of which, another annoyance is that a "parent transid verify failed" still seems to be a game over for any Btrfs filesystem (salvage data, reformat and restore from backups), even with a low number of the transid difference. There is no option to forcibly restore FS consistency in exchange of losing some of the stored data. And it still happens sometimes, in conjunction with things like USB enclosures, faulty cables or power cuts. -- With respect, Roman ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: BTRFS list of grievances 2024-09-27 11:20 BTRFS list of grievances waxhead 2024-09-27 16:27 ` Roman Mamedov @ 2024-09-27 17:44 ` Mark Harmstone 2024-09-30 21:43 ` Goffredo Baroncelli 2 siblings, 0 replies; 14+ messages in thread From: Mark Harmstone @ 2024-09-27 17:44 UTC (permalink / raw) To: waxhead@dirtcellar.net, Btrfs BTRFS On 27/9/24 12:20, waxhead wrote: > 2. USE DEVICE ID's EVERYWHERE INSTEAD OF /dev/sdX: > 4. THE ABILITY TO SET A LABEL FOR A DEVICE ID: I think the view is that this is the job of udev, not btrfs. Presumably you can use udev rules to give block devices arbitrary names. Maybe it might be useful if there was an option to btrfs-progs so that it printed the symlink names in /dev/disk/by-partlabel if present. > 9. ABILITY TO MERGE / CONSUME EXISTING BTRFS: Yeah, I've had the same idea - this is something that's definitely possible, it just needs someone to implement it. If we're writing a wishlist, I'll add: BAD SECTOR TREE i.e. a list of sectors known to be bad that the allocator should avoid, in the same way that it avoids the superblocks. NTFS has something similar, and I think ext2 does too. Mark ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: BTRFS list of grievances 2024-09-27 11:20 BTRFS list of grievances waxhead 2024-09-27 16:27 ` Roman Mamedov 2024-09-27 17:44 ` Mark Harmstone @ 2024-09-30 21:43 ` Goffredo Baroncelli 2024-10-03 17:10 ` Goffredo Baroncelli 2 siblings, 1 reply; 14+ messages in thread From: Goffredo Baroncelli @ 2024-09-30 21:43 UTC (permalink / raw) To: waxhead, Btrfs BTRFS On 27/09/2024 13.20, waxhead wrote: > First thing first: I am a long time BTRFS user and frequent reader of the mailing list. I am *NOT* a BTRFS developer, but that being said I have been known to summon a segmentation failure or two from years of programming in C. > > Since I have been using BTRFS more or less problem free since 2013 or so for nearly everything, I figured that I should be entitled to simply write down a list of things that I personally think sucks (more or less) with this otherwise fine filesystem > > Make of it what you will, but what I am trying to get across is what the upper class would probably call 'constructive criticism'. > > So here goes: > > > > 1. FS MANAGEMENT > ================ > BTRFS is rather simple to manage. We can add/remove devices on the fly, balance the filesystem, scrub, defrag, select compression algorithms etc. Some of these things are done as mount options, some as properties and some by issuing a command that process something. > > Personally, I feel this is a bit messy and in some cases quite backwards at times. I believe the original idea was that BTRFS should support pr. subvolume mount options, storage profiles, etc etc.... and subvolumes are after all a key feature of the filesystem. > > Heck, we even have a root subvolume (id 256) which ideally is the parent (or root) for all other subvolumes on the filesystem. So why on earth do we have commands such as 'btrfs balance start -dusage=50 /fsmnt' when logically it could just has easily have been 'btrfs <subvolume> balance start -dusage=50' . E.g. on the root subvolume instead of the fs mount point. > > Besides, if BTRFS at some point are supposed to be more "subvolume centric" then why are not things like scrub, balance, convert (data/metadata), device add/remove or even defrag handled as properties to a subvolume. E.g. why not set a flag that triggers what needs to be done, and let the filesystem process that as a background task. > > That would for example allow for finer granularity for scrub for certain subvolumes, instead of having to do the entire filesystem as it currently is now. I am not sure to agree. Some properties are per "filesystem", others are per "sub-volume"; being a "subvolume" a subset of a filesystem, it might seem that providing a setting on a per "sub-volume" basis gives to the user more flexibility. However this is a gain only if there isn't any possible confusion about what the filesystem will do. For example, it is not clear to me, what means doing a balance (e.g. reshape the raid profile) for a subvolume when also a snapshot exists: the user want to balance only the subvolume (un-sharing the data), or the user want to balance the subvolume data and all the shared extents. I am not saying that we cannot define a semantic of a subvolume balance; I am saying that this is not so obvious and should be avoided. I think that, depending by the user case, the expectation might be different. IMHO a filesystem should behaves following the "least surprise" principle. And if something my be misunderstood, then it is better to not have it. This to say that form me, if something is related to shared data, it should be "per filesystem" (like the raid profile), to avoid any ambiguity. Other properties (like a inode property) should be per "sub-volume" basis. > > Status for the jobs do in my opinion belong in sysfs, but there is nothing wrong with a simple command to "pretty'fy" the status either. > > And yes, I even mentioned device add/remove because if it would be possible at some point to assign priority/weight to certain devices for certain subvolumes then making a subvolume prefer or avoid using a certain storage device wold be as "simple" as setting a suitable weight/priority, and it would be possible to add/remove (assign) storage devices without affecting all other subvolumes. > > So for me , 'btrfs property set' (or something similar) sounds like the only sensible way of properly managing a BTRFS. And really, with the exception of the rescue and subvolume mount options most, if not all other mount options seems to better belong as a property for a subvolume (which may or may not be the id 256 / root subvolume) > > > > 2. USE DEVICE ID's EVERYWHERE INSTEAD OF /dev/sdX: > ================================================== > Using "btrfs filesystem show" will list all BTRFS devices, and also show the assigned ID for that device / partition / whatever. Since BTRFS already have the notion of a device ID, it seems pointless to not use that ID for management / identification anywhere possible. > (for example btrfs device stat /mnt) > I suggest both the ways. If something is a device, interpret as device, otherwise a try to interpret as id; I worked in the past on something like that. But I never finalize it. > > 3. SOME DEVICES MISSING SHOULD BE ID 1,2,3,4... MISSING: > ======================================================== > If one or more devices are missing it would have been great to know WHAT devices where missing. Why not print the ID's of the missing devices instead of just let the user know that "some" of them are missing? > +1 > > 4. THE ABILITY TO SET A LABEL FOR A DEVICE ID: > ============================================== > It would have been great to set a label for a BTRFS device ID. For example ID1 = "Shelf01.24", ID2 = "NAS_01", ID3 = "localdiskXYZ" > Considering the ubiquitous of a GUID partition table, currently we have: - a device name (/dev/sdx), which can be customized by udev - a partition type GUID - a unique partition UUID - a partition label (yes GUID has room for 36 UTF16 code unit) - a btrfs sub UUID - a btrfs ID I think that it is enough :-), and a further label would only increase the confusion > > 5. DEDUPLICATION IS NOT INTEGRATED IN BTRFS: > ============================================ > I think that some form of (simple) deduplication should be integrated in BTRFS. Using unofficial tools may be perfectly safe, but it feels "unsafe" to be honest. Besides deduplication is something that might have been interesting to turn on/on_whenidle/off as a property to a subvolume as well. > It is not clear if the problem is "online vs offline" deduplication or the fact that the dedup is not integrate in the btrfs-prog command. > > 6. DEVICE STATS: > ================ > Again device ID's are not used, but also why is this info not listed in a table? Showing this in a table would make 5x lines become 1x line which would be far more readable. Finaly it is not clear to me what is fixed errors, and what are actual damage accumulated in the filesystem > +1 > > 7. LIST OF DAMAGED FILES: > ========================= > There is no easy way to get a list of damaged files on a BTRFS filesystem to my knowledge. It would be great to have a command for that. > I am not sure if it's worth the complexity. Basically now it is enough to look in the log for a filesystem error showing the inode. Logging an inode at the filesystem level would increase the complexity. > > 8. ABILITY TO RESERVE SPARE SPACE: > ================================== > Because of the way BTRFS works a spare device is not very useful. Rather spare space would be a good idea I think. That way if one device is missing data, it could be replicated to other drives (or even on a single device [DUP] in emergency situations) > We could reserve (e.g.) 1G for each disk, that cannot be allocated until root request it. It will not prevent the exhaustion of the free space, but would prevent the situation where the user cannot free space because.. it has not space. When the filesystem fill all the disk(s) (with the except of the above 1GB reserved space), it goes in RO; then the administrator might unlock the reserved space and start to remove the thing. > > 9. ABILITY TO MERGE / CONSUME EXISTING BTRFS: > ============================================= > It would have been great to merge existing BTRFS volumes into a larger volume e.g. assimilate it ..because we all know resistance is futile. > Again a subvolume would be the cleanest way of importing another BTRFS I think. > What about the inode collision ? [...] -- gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it> Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: BTRFS list of grievances 2024-09-30 21:43 ` Goffredo Baroncelli @ 2024-10-03 17:10 ` Goffredo Baroncelli 2024-10-03 17:26 ` Remi Gauvin 0 siblings, 1 reply; 14+ messages in thread From: Goffredo Baroncelli @ 2024-10-03 17:10 UTC (permalink / raw) To: waxhead; +Cc: Btrfs BTRFS On 30/09/2024 23.43, Goffredo Baroncelli wrote: > On 27/09/2024 13.20, waxhead wrote: [...] >> >> 6. DEVICE STATS: >> ================ >> Again device ID's are not used, but also why is this info not listed in a table? Showing this in a table would make 5x lines become 1x line which would be far more readable. This was an already solved problem $ sudo ./btrfs dev stat -T /mnt/btrfs-raid1/ Id Path Write errors Read errors Flush errors Corruption errors Generation errors -- --------- ------------ ----------- ------------ ----------------- ----------------- 1 /dev/sda2 0 0 0 763 0 2 /dev/sdb2 0 0 0 3504 0 3 /dev/sdd2 13 0 0 6218 0 -- gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it> Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: BTRFS list of grievances 2024-10-03 17:10 ` Goffredo Baroncelli @ 2024-10-03 17:26 ` Remi Gauvin 2024-10-03 18:24 ` Goffredo Baroncelli 0 siblings, 1 reply; 14+ messages in thread From: Remi Gauvin @ 2024-10-03 17:26 UTC (permalink / raw) To: kreijack; +Cc: Btrfs BTRFS On 2024-10-03 1:10 p.m., Goffredo Baroncelli wrote: > > $ sudo ./btrfs dev stat -T /mnt/btrfs-raid1/ > Id Path Write errors Read errors Flush errors Corruption errors > Generation errors > -- --------- ------------ ----------- ------------ ----------------- > ----------------- > 1 /dev/sda2 0 0 0 > 763 0 > 2 /dev/sdb2 0 0 0 > 3504 0 > 3 /dev/sdd2 13 0 0 > 6218 0 > I hope that's a made up sample and not actual output of your filesystem. Otherwise, you have a problem... ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: BTRFS list of grievances 2024-10-03 17:26 ` Remi Gauvin @ 2024-10-03 18:24 ` Goffredo Baroncelli 2024-10-03 18:32 ` Remi Gauvin 0 siblings, 1 reply; 14+ messages in thread From: Goffredo Baroncelli @ 2024-10-03 18:24 UTC (permalink / raw) To: Remi Gauvin; +Cc: Btrfs BTRFS On 03/10/2024 19.26, Remi Gauvin wrote: > On 2024-10-03 1:10 p.m., Goffredo Baroncelli wrote: >> >> $ sudo ./btrfs dev stat -T /mnt/btrfs-raid1/ >> Id Path Write errors Read errors Flush errors Corruption errors >> Generation errors >> -- --------- ------------ ----------- ------------ ----------------- >> ----------------- >> 1 /dev/sda2 0 0 0 >> 763 0 >> 2 /dev/sdb2 0 0 0 >> 3504 0 >> 3 /dev/sdd2 13 0 0 >> 6218 0 >> > > I hope that's a made up sample and not actual output of your > filesystem. Otherwise, you have a problem... > It is a real output, and I didn't notice these errors. This was an old disks set. And these are old errors, due to a bad power supply. Replaced the power supply all the problem disappeared. Of course I didn't have any data issue, due to btrfs+raid1. However I never cared to clear those errors. Anyway I run a "btrfs scrub" which didn't find any error. BR -- gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it> Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: BTRFS list of grievances 2024-10-03 18:24 ` Goffredo Baroncelli @ 2024-10-03 18:32 ` Remi Gauvin 0 siblings, 0 replies; 14+ messages in thread From: Remi Gauvin @ 2024-10-03 18:32 UTC (permalink / raw) To: kreijack; +Cc: Btrfs BTRFS On 2024-10-03 2:24 p.m., Goffredo Baroncelli wrote: > >> > > It is a real output, and I didn't notice these errors. > > This was an old disks set. And these are old errors, due to a bad > power supply. > Replaced the power supply all the problem disappeared. > > Of course I didn't have any data issue, due to btrfs+raid1. > > However I never cared to clear those errors. Anyway I run a "btrfs scrub" > which didn't find any error. An excellent example of btrfs doing it's job as advertised!.. I would suggest resetting the counters though, so future problems will be easy to spot. ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2024-10-03 18:32 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-09-27 11:20 BTRFS list of grievances waxhead 2024-09-27 16:27 ` Roman Mamedov 2024-09-27 18:05 ` Remi Gauvin 2024-09-27 19:01 ` Colin S 2024-10-02 19:31 ` Chris Murphy 2024-10-02 23:18 ` Colin S 2024-09-28 10:15 ` Paul Jones 2024-09-28 17:51 ` Roman Mamedov 2024-09-27 17:44 ` Mark Harmstone 2024-09-30 21:43 ` Goffredo Baroncelli 2024-10-03 17:10 ` Goffredo Baroncelli 2024-10-03 17:26 ` Remi Gauvin 2024-10-03 18:24 ` Goffredo Baroncelli 2024-10-03 18:32 ` Remi Gauvin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox