On 2015-09-22 13:32, Austin S Hemmelgarn wrote: > On 2015-09-22 11:39, Hugo Mills wrote: >> On Tue, Sep 22, 2015 at 10:54:45AM -0400, Austin S Hemmelgarn wrote: >>> On 2015-09-22 10:36, Hugo Mills wrote: >>>> On Tue, Sep 22, 2015 at 04:23:33PM +0200, David Sterba wrote: >>>>> On Tue, Sep 22, 2015 at 01:41:31PM +0000, Hugo Mills wrote: >>>>>> On Tue, Sep 22, 2015 at 03:36:43PM +0200, Holger Hoffstätte wrote: >>>>>>> On 09/22/15 14:59, Jeff Mahoney wrote: >>>>>>> (snip) >>>>>>>> So if they way we want to prevent the loss of raid type info is by >>>>>>>> maintaining the last block group allocated with that raid type, >>>>>>>> fine, >>>>>>>> but that's a separate discussion. Personally, I think keeping 1GB >>>>>>> >>>>>>> At this point I'm much more surprised to learn that the RAID type >>>>>>> can >>>>>>> apparently get "lost" in the first place, and is not persisted >>>>>>> separately. I mean..wat? >>>>>> >>>>>> It's always been like that, unfortunately. >>>>>> >>>>>> The code tries to use the RAID type that's already present to >>>>>> work >>>>>> out what the next allocation should be. If there aren't any chunks in >>>>>> the FS, the configuration is lost, because it's not stored anywhere >>>>>> else. It's one of the things that tripped me up badly when I was >>>>>> failing to rewrite the chunk allocator last year. >>>>> >>>>> Yeah, right now there's no persistent default for the allocator. I'm >>>>> still hoping that the object properties will magically solve that. >>>> >>>> There's no obvious place that filesystem-wide properties can be >>>> stored, though. There's a userspace tool to manipulate the few current >>>> FS-wide properties, but that's all special-cased to use the >>>> "historical" ioctls for those properties, with no generalisation of a >>>> property store, or even (IIRC) any external API for them. >>>> >>>> We're nominally using xattrs in the btrfs: namespace on directories >>>> and files, and presumably on the top directory of a subvolume for >>>> subvol-wide properties, but it's not clear where the FS-wide values >>>> should go: in the top directory of subvolid=5 would be confusing, >>>> because then you couldn't separate the properties for *that subvol* >>> >from the ones for the whole FS (say, the default replication policy, >>>> where you might want the top subvol to have different properties from >>>> everything else). >>> Possibly do special names for the defaults and store them there? In >>> general, I personally see little value in having some special >>> 'default' properties however. >> >> That would work. >> >>> The way I would expect things to work is that a new subvolume >>> inherits it's properties from it's parent (if it's a snapshot), >> >> Definitely this. >> >>> or >>> from the next higher subvolume it's nested in. >> >> I don't think I like this. I'm not quite sure why, though, at the >> moment. >> >> It definitely makes the process at the start of allocating a new >> block group much more complex: you have to walk back up through an >> arbitrary depth of nested subvols to find the one that's actually got >> a replication policy record in it. (Because after this feature is >> brought in, there will be lots of filesystems without per-subvol >> replication policies in them, and we have to have some way of dealing >> with those as well). > ro-compat flag perhaps? >> >> With an FS default policy, you only need check the current subvol, >> and then fall back to the FS default if that's not found. >> >> These things are, I think, likely to be lightly used: I would be >> reasonably surprised to find more than two or possibly three storage >> policies in use on any given system with a sane sysadmin. >> >> I'm actually not sure what the interactions of multiple storage >> policies are going to be like. It's entirely possible, particularly >> with some of the more exotic (but useful) suggestions I've thought of, >> that the behaviour of the FS is dependent on the order in which the >> block groups are allocated. (i.e. "20 GiB to subvol-A, then 20 GiB to >> subvol-B" results in different behaviour than "1 GiB to subvol-A then >> 1 GiB to subvol-B and repeat"). I tried some simple Monte-Carlo >> simulations, but I didn't get any concrete results out of it before >> the end of the train journey. :) > Yeah, I could easily see that getting complicated when you add in the > (hopefully soon) possibility of n-copy replication. On that note, it might be nice to have the ability to say 'store at least n copies of this data' in addition to being able to say 'store exactly this many copies of this data'. (could be really helpful for filesystems with differing device sizes). >> >>> This would obviate >>> the need for some special 'default' properties, and would be >>> relatively intuitive behavior for a significant majority of people. >> >> Of course, you shouldn't be nesting subvolumes anyway. It makes >> it much harder to manage them. > That depends though, I only ever do single nesting (ie, a subvolume in a > subvolume), and I use it to exclude stuff from getting saved in > snapshots (mostly stuff like clones of public git trees, or other stuff > that's easy to reproduce without a backup). Beyond that though, there > are other inherent issues of course.