Linux Btrfs filesystem development
 help / color / mirror / Atom feed
From: Goffredo Baroncelli <kreijack@libero.it>
To: waxhead@dirtcellar.net, Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: BTRFS list of grievances
Date: Mon, 30 Sep 2024 23:43:41 +0200	[thread overview]
Message-ID: <4f672a82-28d8-490e-bdce-e794748d41fd@libero.it> (raw)
In-Reply-To: <aebe9671-6f44-9d20-f077-b19e09fa1fcd@dirtcellar.net>

On 27/09/2024 13.20, waxhead wrote:
> First thing first: I am a long time BTRFS user and frequent reader of the mailing list. I am *NOT* a BTRFS developer, but that being said I have been known to summon a segmentation failure or two from years of programming in C.
> 
> Since I have been using BTRFS more or less problem free since 2013 or so for nearly everything, I figured that I should be entitled to simply write down a list of things that I personally think sucks (more or less) with this otherwise fine filesystem
> 
> Make of it what you will, but what I am trying to get across is what the upper class would probably call 'constructive criticism'.
> 
> So here goes:
> 
> 
> 
> 1. FS MANAGEMENT
> ================
> BTRFS is rather simple to manage. We can add/remove devices on the fly, balance the filesystem, scrub, defrag, select compression algorithms etc. Some of these things are done as mount options, some as properties and some by issuing a command that process something.
> 
> Personally, I feel this is a bit messy and in some cases quite backwards at times. I believe the original idea was that BTRFS should support pr. subvolume mount options, storage profiles, etc etc.... and subvolumes are after all a key feature of the filesystem.
> 
> Heck, we even have a root subvolume (id 256) which ideally is the parent (or root) for all other subvolumes on the filesystem. So why on earth do we have commands such as 'btrfs balance start -dusage=50 /fsmnt' when logically it could just has easily have been 'btrfs <subvolume> balance start -dusage=50' . E.g. on the root subvolume instead of the fs mount point.
> 
> Besides, if BTRFS at some point are supposed to be more "subvolume centric" then why are not things like scrub, balance, convert (data/metadata), device add/remove or even defrag handled as properties to a subvolume. E.g. why not set a flag that triggers what needs to be done, and let the filesystem process that as a background task.
> 
> That would for example allow for finer granularity for scrub for certain subvolumes, instead of having to do the entire filesystem as it currently is now.

I am not sure to agree. Some properties are per "filesystem", others are per "sub-volume"; being a "subvolume" a subset of a filesystem, it might seem that providing a setting on a per "sub-volume" basis gives to the user more flexibility.
However this is a gain only if there isn't any possible confusion about what the filesystem will do. For example, it is not clear to me, what means doing a balance (e.g. reshape the raid profile) for a subvolume when also a snapshot exists: the user want to balance only the subvolume (un-sharing the data), or the user want to balance the subvolume data and all the shared extents. I am not saying that we cannot define a semantic of a subvolume balance; I am saying that this is not so obvious and should be avoided.

I think that, depending by the user case, the expectation might be different. IMHO a filesystem should behaves following the "least surprise" principle. And if something my be misunderstood, then it is better to not have it.

This to say that form me, if something is related to shared data, it should be "per filesystem" (like the raid profile), to avoid any ambiguity. Other properties (like a inode property) should be per "sub-volume" basis.

> 
> Status for the jobs do in my opinion belong in sysfs, but there is nothing wrong with a simple command to "pretty'fy" the status either.
> 
> And yes, I even mentioned device add/remove because if it would be possible at some point to assign priority/weight to certain devices for certain subvolumes then making a subvolume prefer or avoid using a certain storage device wold be as "simple" as setting a suitable weight/priority, and it would be possible to add/remove (assign) storage devices without affecting all other subvolumes.
> 
> So for me , 'btrfs property set' (or something similar) sounds like the only sensible way of properly managing a BTRFS. And really, with the exception of the rescue and subvolume mount options most, if not all other mount options seems to better belong as a property for a subvolume (which may or may not be the id 256 / root subvolume)
> 
> 
> 
> 2. USE DEVICE ID's EVERYWHERE INSTEAD OF /dev/sdX:
> ==================================================
> Using "btrfs filesystem show" will list all BTRFS devices, and also show the assigned ID for that device / partition / whatever. Since BTRFS already have the notion of a device ID, it seems pointless to not use that ID for management / identification anywhere possible.
> (for example btrfs device stat /mnt)
> 

I suggest both the ways. If something is a device, interpret as device, otherwise a try to interpret as id; I worked in the past on something like that. But I never finalize it.

> 
> 3. SOME DEVICES MISSING SHOULD BE ID 1,2,3,4... MISSING:
> ========================================================
> If one or more devices are missing it would have been great to know WHAT devices where missing. Why not print the ID's of the missing devices instead of just let the user know that "some" of them are missing?
> 

+1

> 
> 4. THE ABILITY TO SET A LABEL FOR A DEVICE ID:
> ==============================================
> It would have been great to set a label for a BTRFS device ID. For example ID1 = "Shelf01.24", ID2 = "NAS_01", ID3 = "localdiskXYZ"
> 

Considering the ubiquitous of a GUID partition table, currently we have:
- a device name (/dev/sdx), which can be customized by udev
- a partition type GUID
- a unique partition UUID
- a partition label (yes GUID has room for 36 UTF16 code unit)
- a btrfs sub UUID
- a btrfs ID

I think that it is enough :-), and a further label would only increase the confusion

> 
> 5. DEDUPLICATION IS NOT INTEGRATED IN BTRFS:
> ============================================
> I think that some form of (simple) deduplication should be integrated in BTRFS. Using unofficial tools may be perfectly safe, but it feels "unsafe" to be honest. Besides deduplication is something that might have been interesting to turn on/on_whenidle/off as a property to a subvolume as well.
> 

It is not clear if the problem is "online vs offline" deduplication or the fact that the dedup is not integrate in the btrfs-prog command.

> 
> 6. DEVICE STATS:
> ================
> Again device ID's are not used, but also why is this info not listed in a table? Showing this in a table would make 5x lines become 1x line which would be far more readable. Finaly it is not clear to me what is fixed errors, and what are actual damage accumulated in the filesystem
> 

+1

> 
> 7. LIST OF DAMAGED FILES:
> =========================
> There is no easy way to get a list of damaged files on a BTRFS filesystem to my knowledge. It would be great to have a command for that.
> 

I am not sure if it's worth the complexity. Basically now it is enough to look in the log for a filesystem error showing the inode. Logging an inode at the filesystem level would increase the complexity.

> 
> 8. ABILITY TO RESERVE SPARE SPACE:
> ==================================
> Because of the way BTRFS works a spare device is not very useful. Rather spare space would be a good idea I think. That way if one device is missing data, it could be replicated to other drives (or even on a single device [DUP] in emergency situations)
> 

We could reserve (e.g.) 1G for each disk, that cannot be allocated until root request it. It will not prevent the exhaustion of the free space, but would prevent the situation where the user cannot free space because.. it has not space.
When the filesystem fill all the disk(s) (with the except of the above 1GB reserved space), it goes in RO; then the administrator might unlock the reserved space and start to remove the thing.


> 
> 9. ABILITY TO MERGE / CONSUME EXISTING BTRFS:
> =============================================
> It would have been great to merge existing BTRFS volumes into a larger volume e.g. assimilate it ..because we all know resistance is futile.
> Again a subvolume would be the cleanest way of importing another BTRFS I think.
> 

What about the inode collision ?

[...]

-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

  parent reply	other threads:[~2024-09-30 21:46 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-27 11:20 BTRFS list of grievances waxhead
2024-09-27 16:27 ` Roman Mamedov
2024-09-27 18:05   ` Remi Gauvin
2024-09-27 19:01     ` Colin S
2024-10-02 19:31       ` Chris Murphy
2024-10-02 23:18         ` Colin S
2024-09-28 10:15   ` Paul Jones
2024-09-28 17:51   ` Roman Mamedov
2024-09-27 17:44 ` Mark Harmstone
2024-09-30 21:43 ` Goffredo Baroncelli [this message]
2024-10-03 17:10   ` Goffredo Baroncelli
2024-10-03 17:26     ` Remi Gauvin
2024-10-03 18:24       ` Goffredo Baroncelli
2024-10-03 18:32         ` Remi Gauvin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4f672a82-28d8-490e-bdce-e794748d41fd@libero.it \
    --to=kreijack@libero.it \
    --cc=kreijack@inwind.it \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=waxhead@dirtcellar.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox