On 2015-09-22 11:39, Hugo Mills wrote: > On Tue, Sep 22, 2015 at 10:54:45AM -0400, Austin S Hemmelgarn wrote: >> On 2015-09-22 10:36, Hugo Mills wrote: >>> On Tue, Sep 22, 2015 at 04:23:33PM +0200, David Sterba wrote: >>>> On Tue, Sep 22, 2015 at 01:41:31PM +0000, Hugo Mills wrote: >>>>> On Tue, Sep 22, 2015 at 03:36:43PM +0200, Holger Hoffstätte wrote: >>>>>> On 09/22/15 14:59, Jeff Mahoney wrote: >>>>>> (snip) >>>>>>> So if they way we want to prevent the loss of raid type info is by >>>>>>> maintaining the last block group allocated with that raid type, fine, >>>>>>> but that's a separate discussion. Personally, I think keeping 1GB >>>>>> >>>>>> At this point I'm much more surprised to learn that the RAID type can >>>>>> apparently get "lost" in the first place, and is not persisted >>>>>> separately. I mean..wat? >>>>> >>>>> It's always been like that, unfortunately. >>>>> >>>>> The code tries to use the RAID type that's already present to work >>>>> out what the next allocation should be. If there aren't any chunks in >>>>> the FS, the configuration is lost, because it's not stored anywhere >>>>> else. It's one of the things that tripped me up badly when I was >>>>> failing to rewrite the chunk allocator last year. >>>> >>>> Yeah, right now there's no persistent default for the allocator. I'm >>>> still hoping that the object properties will magically solve that. >>> >>> There's no obvious place that filesystem-wide properties can be >>> stored, though. There's a userspace tool to manipulate the few current >>> FS-wide properties, but that's all special-cased to use the >>> "historical" ioctls for those properties, with no generalisation of a >>> property store, or even (IIRC) any external API for them. >>> >>> We're nominally using xattrs in the btrfs: namespace on directories >>> and files, and presumably on the top directory of a subvolume for >>> subvol-wide properties, but it's not clear where the FS-wide values >>> should go: in the top directory of subvolid=5 would be confusing, >>> because then you couldn't separate the properties for *that subvol* >> >from the ones for the whole FS (say, the default replication policy, >>> where you might want the top subvol to have different properties from >>> everything else). >> Possibly do special names for the defaults and store them there? In >> general, I personally see little value in having some special >> 'default' properties however. > > That would work. > >> The way I would expect things to work is that a new subvolume >> inherits it's properties from it's parent (if it's a snapshot), > > Definitely this. > >> or >> from the next higher subvolume it's nested in. > > I don't think I like this. I'm not quite sure why, though, at the > moment. > > It definitely makes the process at the start of allocating a new > block group much more complex: you have to walk back up through an > arbitrary depth of nested subvols to find the one that's actually got > a replication policy record in it. (Because after this feature is > brought in, there will be lots of filesystems without per-subvol > replication policies in them, and we have to have some way of dealing > with those as well). ro-compat flag perhaps? > > With an FS default policy, you only need check the current subvol, > and then fall back to the FS default if that's not found. > > These things are, I think, likely to be lightly used: I would be > reasonably surprised to find more than two or possibly three storage > policies in use on any given system with a sane sysadmin. > > I'm actually not sure what the interactions of multiple storage > policies are going to be like. It's entirely possible, particularly > with some of the more exotic (but useful) suggestions I've thought of, > that the behaviour of the FS is dependent on the order in which the > block groups are allocated. (i.e. "20 GiB to subvol-A, then 20 GiB to > subvol-B" results in different behaviour than "1 GiB to subvol-A then > 1 GiB to subvol-B and repeat"). I tried some simple Monte-Carlo > simulations, but I didn't get any concrete results out of it before > the end of the train journey. :) Yeah, I could easily see that getting complicated when you add in the (hopefully soon) possibility of n-copy replication. > >> This would obviate >> the need for some special 'default' properties, and would be >> relatively intuitive behavior for a significant majority of people. > > Of course, you shouldn't be nesting subvolumes anyway. It makes > it much harder to manage them. That depends though, I only ever do single nesting (ie, a subvolume in a subvolume), and I use it to exclude stuff from getting saved in snapshots (mostly stuff like clones of public git trees, or other stuff that's easy to reproduce without a backup). Beyond that though, there are other inherent issues of course.