Some questions about per-ag metadata space reservations...

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

* Some questions about per-ag metadata space reservations...
@ 2017-09-06 10:30 Dave Chinner
  2017-09-07 13:44 ` Brian Foster
  0 siblings, 1 reply; 7+ messages in thread
From: Dave Chinner @ 2017-09-06 10:30 UTC (permalink / raw)
  To: linux-xfs

Hi folks,

I've got a bit of a problem with the per-ag reservations we are
using at the moment. The existance of them is fine, but the
implementation is problematic for something I'm working on right
now.

I've been making a couple of mods to the filesystem to separate
physical space accounting from free space accounting to allow us to
optimise the filesystem for thinly provisioned devices. That is,
the filesystem is laid out as though it is the size of the
underlying device, but then free space is artificially limited. i.e.
we have a "physical size" of the filesystem and a "logical size"
that limits the amount of data and metadata that can actually be
stored in it.

When combined with a thinly provisioned device, this enables us to
shrink the XFS filesystem simply by running fstrim to punch all the
free space out of the underlying thin device and then adjusting the
free space down appropriately. Because the thin device abstracts the
physical location of the data in the block device away from the
address space presented to the filesystem, we don't need to move any
data or metadata to free up this space - it's just an accounting
change.

The problem arises with the per AG reservations in that they are
based on the physical size of the AG, which for a thin filesystem
will always be larger than the space available. e.g. we might
allocate a 32TB thin device to give 32x1TB AGs in the filesystem,
but we might only start by allocating 1TB of space to the
filesystem. e.g.:

# mkfs.xfs -f -m rmapbt=1,reflink=1 -d size=32t,thin=1t /dev/vdc
Default configuration sourced from package build definitions
meta-data=/dev/vdc               isize=512    agcount=32, agsize=268435455 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=0, rmapbt=1, reflink=1
data     =                       bsize=4096   blocks=268435456, imaxpct=5, thin=1
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal log           bsize=4096   blocks=521728, version=2
         =                       sectsz=512   sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
#

The issue is now when we mount it:

# mount /dev/vdc /mnt/scratch ; df -h /mnt/scratch/ ; sudo umount /mnt/scratch
Filesystem      Size  Used Avail Use% Mounted on
/dev/vdc       1023G  628G  395G  62% /mnt/scratch
#

Of that 1TB of space, we immediately remove 600+GB of free space for
finobt, rmapbt and reflink metadata reservations. This is based on
the physical size and number of AGs in the filesystem, so it always
gets removed from the free block count available to the user.
This is clearly seen when I grow the filesystem to 10x the size:

# xfs_growfs -D 2684354560 /mnt/scratch
....
data blocks changed from 268435456 to 2684354560
# df -h /mnt/scratch
Filesystem      Size  Used Avail Use% Mounted on
/dev/vdc         10T  628G  9.4T   7% /mnt/scratch
#

And also shows up on shrinking back down a chunk, too:

# xfs_growfs -D 468435456 /mnt/scratch
.....
data blocks changed from 2684354560 to 468435456
# df -h /mnt/scratch
Filesystem      Size  Used Avail Use% Mounted on
/dev/vdc        1.8T  628G  1.2T  36% /mnt/scratch
#

(Oh, did I mention I have working code and that's how I came across
this problem? :P)

For a normal filesystem, there's no problem with doing this brute
force physical reservation, though it is slightly disconcerting to
see a new, empty 100TB filesystem say it's got 2TB used and only
98TB free...

The issue is that for a thin filesystem, this space reservation
come out of the *logical* free space, not the physical free space.
With 1TB of thin space, we've got 31TB of /physical free space/ the
reservation can be taken out of without the user ever seeing it. The
question is this: how on earth do I do this?

I want the available space to match the "thin=size" value on the
mkfs command line, but I don't want metadata reservations to take
away from this space. metadata allocations need to be accounted to
the available space, but the reservations should not be. So how
should I go about providing these reservations? Do we even need them
to be accounted against free space in this case where we control the
filesysetm free blocks to be a /lot/ less than the physical space?

e.g. if I limit a thin filesystem to 95% of the underlying thin
device size, then we've always got a 5% space margin and so we don't
need to take the reservations out of the global free block counter
to ensure we always have physical space for the metadata. We still
take the per-ag reservations to ensure everything still works on the
physical side, we just don't pull the space from the free block
counter. I think this will work, but I'm not sure I've fully grokked
all the conditions the per-ag reservation is protecting against or
whether there's more accounting work needed deep in allocation code
to make it work correctly.

Thoughts, anyone?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Some questions about per-ag metadata space reservations...
  2017-09-06 10:30 Some questions about per-ag metadata space reservations Dave Chinner
@ 2017-09-07 13:44 ` Brian Foster
  2017-09-07 23:11   ` Dave Chinner
  0 siblings, 1 reply; 7+ messages in thread
From: Brian Foster @ 2017-09-07 13:44 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Wed, Sep 06, 2017 at 08:30:54PM +1000, Dave Chinner wrote:
> Hi folks,
> 
> I've got a bit of a problem with the per-ag reservations we are
> using at the moment. The existance of them is fine, but the
> implementation is problematic for something I'm working on right
> now.
> 
> I've been making a couple of mods to the filesystem to separate
> physical space accounting from free space accounting to allow us to
> optimise the filesystem for thinly provisioned devices. That is,
> the filesystem is laid out as though it is the size of the
> underlying device, but then free space is artificially limited. i.e.
> we have a "physical size" of the filesystem and a "logical size"
> that limits the amount of data and metadata that can actually be
> stored in it.
> 

Interesting...

> When combined with a thinly provisioned device, this enables us to
> shrink the XFS filesystem simply by running fstrim to punch all the
> free space out of the underlying thin device and then adjusting the
> free space down appropriately. Because the thin device abstracts the
> physical location of the data in the block device away from the
> address space presented to the filesystem, we don't need to move any
> data or metadata to free up this space - it's just an accounting
> change.
> 

How are you dealing with block size vs. thin chunk allocation size
alignment? You could require they match, but if not it seems like there
could be a bit more involved than an accounting change.

> The problem arises with the per AG reservations in that they are
> based on the physical size of the AG, which for a thin filesystem
> will always be larger than the space available. e.g. we might
> allocate a 32TB thin device to give 32x1TB AGs in the filesystem,
> but we might only start by allocating 1TB of space to the
> filesystem. e.g.:
> 
> # mkfs.xfs -f -m rmapbt=1,reflink=1 -d size=32t,thin=1t /dev/vdc
> Default configuration sourced from package build definitions
> meta-data=/dev/vdc               isize=512    agcount=32, agsize=268435455 blks
>          =                       sectsz=512   attr=2, projid32bit=1
>          =                       crc=1        finobt=1, sparse=0, rmapbt=1, reflink=1
> data     =                       bsize=4096   blocks=268435456, imaxpct=5, thin=1
>          =                       sunit=0      swidth=0 blks
> naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
> log      =internal log           bsize=4096   blocks=521728, version=2
>          =                       sectsz=512   sunit=1 blks, lazy-count=1
> realtime =none                   extsz=4096   blocks=0, rtextents=0
> #
> 
> The issue is now when we mount it:
> 
> # mount /dev/vdc /mnt/scratch ; df -h /mnt/scratch/ ; sudo umount /mnt/scratch
> Filesystem      Size  Used Avail Use% Mounted on
> /dev/vdc       1023G  628G  395G  62% /mnt/scratch
> #
> 
> Of that 1TB of space, we immediately remove 600+GB of free space for
> finobt, rmapbt and reflink metadata reservations. This is based on
> the physical size and number of AGs in the filesystem, so it always
> gets removed from the free block count available to the user.
> This is clearly seen when I grow the filesystem to 10x the size:
> 
> # xfs_growfs -D 2684354560 /mnt/scratch
> ....
> data blocks changed from 268435456 to 2684354560
> # df -h /mnt/scratch
> Filesystem      Size  Used Avail Use% Mounted on
> /dev/vdc         10T  628G  9.4T   7% /mnt/scratch
> #
> 
> And also shows up on shrinking back down a chunk, too:
> 
> # xfs_growfs -D 468435456 /mnt/scratch
> .....
> data blocks changed from 2684354560 to 468435456
> # df -h /mnt/scratch
> Filesystem      Size  Used Avail Use% Mounted on
> /dev/vdc        1.8T  628G  1.2T  36% /mnt/scratch
> #
> 
> (Oh, did I mention I have working code and that's how I came across
> this problem? :P)
> 
> For a normal filesystem, there's no problem with doing this brute
> force physical reservation, though it is slightly disconcerting to
> see a new, empty 100TB filesystem say it's got 2TB used and only
> 98TB free...
> 

Ugh, I think the reservation requirement there is kind of insane. We
reserve 1GB out of a 1TB fs just for finobt (13GB for rmap and 6GB for
reflink), most of which will probably never be used.

I'm not a big fan of this approach. I think the patch was originally
added because there was some unknown workload that reproduced a finobt
block allocation failure and filesystem shutdown that couldn't be
reproduced independently, hence the overkill reservation. I'd much
prefer to see if we can come up with something that is more dynamic in
nature.

For example, the finobt cannot be larger than the inobt. If we mount a
1TB fs with one inode chunk allocated in the fs, there is clearly no
immediate risk for the finobt to grow beyond a single block until more
inodes are allocated. I'm wondering if we could come up with something
that grows and shrinks the reservation as needed based on the size delta
between the inobt/finobt and rather than guarantee we can always create
a maximum size finobt, guarantee that the finobt can always grow to the
size of the inobt. I suppose this might require some clever accounting
tricks on finobt block allocation/free and some estimation at mount time
of an already populated fs. I've also not really gone through the per-AG
reservation stuff since it was originally reviewed, so this is all
handwaving atm.

Anyways, I think it would be nice if we could improve these reservation
requirements first and foremost, though I'm not sure I understand
whether that would address your issue...

> The issue is that for a thin filesystem, this space reservation
> come out of the *logical* free space, not the physical free space.
> With 1TB of thin space, we've got 31TB of /physical free space/ the
> reservation can be taken out of without the user ever seeing it. The
> question is this: how on earth do I do this?
> 

Hmm, so is the issue that the reservations aren't accounted out of
whatever counters you're using to artificially limit block allocation?
I'm a little confused.. ISTM that if you have a 32TB fs and have
artificially limited the free block accounting to 1TB based on available
physical space, the reservation accounting needs to be accounted against
that same artificially limited pool. IOW, it looks like the perag res
code eventually calls xfs_mod_fdblocks() just the same as we would for a
delayed allocation. Can you elaborate a bit on how your updated
accounting works and how it breaks this model?

> I want the available space to match the "thin=size" value on the
> mkfs command line, but I don't want metadata reservations to take
> away from this space. metadata allocations need to be accounted to
> the available space, but the reservations should not be. So how
> should I go about providing these reservations? Do we even need them
> to be accounted against free space in this case where we control the
> filesysetm free blocks to be a /lot/ less than the physical space?
> 

I don't understand how you'd guarantee availability of physical blocks
for metadata if you don't account metadata block reservations out of the
(physically) available free space. ISTM the only way around that is to
eliminate the requirement for a reservation in the first place (i.e.,
allocate physical blocks up front or something like that).

Brian

> e.g. if I limit a thin filesystem to 95% of the underlying thin
> device size, then we've always got a 5% space margin and so we don't
> need to take the reservations out of the global free block counter
> to ensure we always have physical space for the metadata. We still
> take the per-ag reservations to ensure everything still works on the
> physical side, we just don't pull the space from the free block
> counter. I think this will work, but I'm not sure I've fully grokked
> all the conditions the per-ag reservation is protecting against or
> whether there's more accounting work needed deep in allocation code
> to make it work correctly.
> 
> Thoughts, anyone?
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Some questions about per-ag metadata space reservations...
  2017-09-07 13:44 ` Brian Foster
@ 2017-09-07 23:11   ` Dave Chinner
  2017-09-08 13:33     ` Brian Foster
  0 siblings, 1 reply; 7+ messages in thread
From: Dave Chinner @ 2017-09-07 23:11 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Thu, Sep 07, 2017 at 09:44:58AM -0400, Brian Foster wrote:
> On Wed, Sep 06, 2017 at 08:30:54PM +1000, Dave Chinner wrote:
> > Hi folks,
> > 
> > I've got a bit of a problem with the per-ag reservations we are
> > using at the moment. The existance of them is fine, but the
> > implementation is problematic for something I'm working on right
> > now.
> > 
> > I've been making a couple of mods to the filesystem to separate
> > physical space accounting from free space accounting to allow us to
> > optimise the filesystem for thinly provisioned devices. That is,
> > the filesystem is laid out as though it is the size of the
> > underlying device, but then free space is artificially limited. i.e.
> > we have a "physical size" of the filesystem and a "logical size"
> > that limits the amount of data and metadata that can actually be
> > stored in it.
> > 
> 
> Interesting...
> 
> > When combined with a thinly provisioned device, this enables us to
> > shrink the XFS filesystem simply by running fstrim to punch all the
> > free space out of the underlying thin device and then adjusting the
> > free space down appropriately. Because the thin device abstracts the
> > physical location of the data in the block device away from the
> > address space presented to the filesystem, we don't need to move any
> > data or metadata to free up this space - it's just an accounting
> > change.
> > 
> 
> How are you dealing with block size vs. thin chunk allocation size
> alignment? You could require they match, but if not it seems like there
> could be a bit more involved than an accounting change.

Not a filesystem problem. If there's less pool space than you let
the filesystem have, then the pool will ENOSPC before the filesystem
will. regular fstrim (which you should be doing on thin filesystems
anyway) will keep them mostly aligned because XFS tends to pack
holes in AG space rather than continually growing the space they
use.

.....
> > For a normal filesystem, there's no problem with doing this brute
> > force physical reservation, though it is slightly disconcerting to
> > see a new, empty 100TB filesystem say it's got 2TB used and only
> > 98TB free...
> > 
> 
> Ugh, I think the reservation requirement there is kind of insane. We
> reserve 1GB out of a 1TB fs just for finobt (13GB for rmap and 6GB for
> reflink), most of which will probably never be used.

Yeah, the reservations are large, but the rmap/reflink ones are
necessary. I don't think finobt should use this mechanism - it
should not require more than a few blocks for any given inode chunk
allocation, and they should stop pretty quickly if the finobt blocks
are having to work around ENOSPC conditions by dipping into the
reserve pool.

> I'm not a big fan of this approach. I think the patch was originally
> added because there was some unknown workload that reproduced a finobt
> block allocation failure and filesystem shutdown that couldn't be
> reproduced independently, hence the overkill reservation. I'd much
> prefer to see if we can come up with something that is more dynamic in
> nature.

Based on the commit message, I think the justification for finobt
reservations was weak and wasn't backed up by analysis as to why the
reserve block pool was drained (which should never occur in normal
ENOSPC conditions). The per-ag reserve also requires a walk of every
finobt at mount time, so there's also mount time regressions for
filesystems with sparsely populated inode trees.

> For example, the finobt cannot be larger than the inobt. If we mount a
> 1TB fs with one inode chunk allocated in the fs, there is clearly no
> immediate risk for the finobt to grow beyond a single block until more
> inodes are allocated. I'm wondering if we could come up with something
> that grows and shrinks the reservation as needed based on the size delta
> between the inobt/finobt and rather than guarantee we can always create
> a maximum size finobt, guarantee that the finobt can always grow to the
> size of the inobt. I suppose this might require some clever accounting
> tricks on finobt block allocation/free and some estimation at mount time
> of an already populated fs. I've also not really gone through the per-AG
> reservation stuff since it was originally reviewed, so this is all
> handwaving atm.
> 
> Anyways, I think it would be nice if we could improve these reservation
> requirements first and foremost, though I'm not sure I understand
> whether that would address your issue...

No, it doesn't really. Unless they are brought down to the size of
the existing reserve pool, it's going to be an issue....

> > The issue is that for a thin filesystem, this space reservation
> > come out of the *logical* free space, not the physical free space.
> > With 1TB of thin space, we've got 31TB of /physical free space/ the
> > reservation can be taken out of without the user ever seeing it. The
> > question is this: how on earth do I do this?
> > 
> 
> Hmm, so is the issue that the reservations aren't accounted out of
> whatever counters you're using to artificially limit block allocation?

No, the issue is that they are being accounted out of the existing
freespace counters. They are a persistent reservation
that will always be present. However, rather than hiding this
unusable space from users, we simply pull it from free space.

> I'm a little confused.. ISTM that if you have a 32TB fs and have
> artificially limited the free block accounting to 1TB based on available
> physical space, the reservation accounting needs to be accounted against
> that same artificially limited pool.  IOW, it looks like the perag res
> code eventually calls xfs_mod_fdblocks() just the same as we would for a
> delayed allocation. Can you elaborate a bit on how your updated
> accounting works and how it breaks this model?

Yes, that's what it does to ensure users get ENOSPC for data
allocation before we run out of metadata reservation space, even if
we don't need the metadata reservation space.  It's size is
physically bound by the AG size so we can calculate it any time we
know what the AG size is.

> > I want the available space to match the "thin=size" value on the
> > mkfs command line, but I don't want metadata reservations to take
> > away from this space. metadata allocations need to be accounted to
> > the available space, but the reservations should not be. So how
> > should I go about providing these reservations? Do we even need them
> > to be accounted against free space in this case where we control the
> > filesysetm free blocks to be a /lot/ less than the physical space?
> > 
> 
> I don't understand how you'd guarantee availability of physical blocks
> for metadata if you don't account metadata block reservations out of the
> (physically) available free space. ISTM the only way around that is to
> eliminate the requirement for a reservation in the first place (i.e.,
> allocate physical blocks up front or something like that).

I was working on the idea that thin filesystems have sufficient
spare physical space (e.g. logical size < (physical size - max
metadata reservation) that even when maxxed out there's sufficient
physical space remaining for all the metadata without needing to
reserve that space.

In theory, this /should/ work as the metadata blocks are already
reserved as used space at mount time and hence the actual allocation
of those blocks is only accounted against the reservation, not the
global freespace counter.  Hence these metadata blocks aren't
counted as used space when they are allocated - they are always
accounted as used space whether used or not. Hence if I remove the
"accounted as used space" part of the reservation, but then ensure
that there is physically enough room for them to always succeed, we
end up with exactly the same metadata space guarantee.  The only
difference is how it's accounted and provided....

I've written the patches to do this, but I haven't tested it other
than checking falloc triggers ENOSPC when it's supposed to. I'm just
finishing off the repair support so I can run it through xfstests.
That will be interesting. :P

FWIW, I think there is a good case for storing the metadata
reservation on disk in the AGF and removing it from user visible
global free space.  We already account for free space, rmap and
refcount btree block usage in the AGF, so we already have the
mechanisms for tracking the necessary per-ag metadata usage outside
of the global free space counters. Hence there doesn't appear to me
to be any reason why why we can't do the per-ag metadata
reservation/usage accounting in the AGF and get rid of the in-memory
reservation stuff.

If we do that, users will end up with exactly the same amount of
free space, but the metadata reservations are no longer accounted as
user visible used space.  i.e. the users never need to see the
internal space reservations we need to make the filesystem work
reliably. This would work identically for normal filesystems and
thin filesystems without needing to play special games for thin
filesystems....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Some questions about per-ag metadata space reservations...
  2017-09-07 23:11   ` Dave Chinner
@ 2017-09-08 13:33     ` Brian Foster
  2017-09-09  0:25       ` Dave Chinner
  0 siblings, 1 reply; 7+ messages in thread
From: Brian Foster @ 2017-09-08 13:33 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, Christoph Hellwig

cc Christoph (re: finobt perag reservation)

On Fri, Sep 08, 2017 at 09:11:36AM +1000, Dave Chinner wrote:
> On Thu, Sep 07, 2017 at 09:44:58AM -0400, Brian Foster wrote:
> > On Wed, Sep 06, 2017 at 08:30:54PM +1000, Dave Chinner wrote:
...
> > > When combined with a thinly provisioned device, this enables us to
> > > shrink the XFS filesystem simply by running fstrim to punch all the
> > > free space out of the underlying thin device and then adjusting the
> > > free space down appropriately. Because the thin device abstracts the
> > > physical location of the data in the block device away from the
> > > address space presented to the filesystem, we don't need to move any
> > > data or metadata to free up this space - it's just an accounting
> > > change.
> > > 
> > 
> > How are you dealing with block size vs. thin chunk allocation size
> > alignment? You could require they match, but if not it seems like there
> > could be a bit more involved than an accounting change.
> 
> Not a filesystem problem. If there's less pool space than you let
> the filesystem have, then the pool will ENOSPC before the filesystem
> will. regular fstrim (which you should be doing on thin filesystems
> anyway) will keep them mostly aligned because XFS tends to pack
> holes in AG space rather than continually growing the space they
> use.
> 

I don't see how tracking underlying physical/available pool space in the
filesystem is a filesystem problem but tracking the alignment/size of
those physical allocations is not. It seems to me that either they are
both fs problems or they aren't. This is just a question of accuracy.

I get that the filesystem may return ENOSPC before the pool shuts down
more often than not, but that is still workload dependent. If it's not
important, perhaps I'm just not following what the
objectives/requirements are for this feature.

...
> 
> Based on the commit message, I think the justification for finobt
> reservations was weak and wasn't backed up by analysis as to why the
> reserve block pool was drained (which should never occur in normal
> ENOSPC conditions). The per-ag reserve also requires a walk of every
> finobt at mount time, so there's also mount time regressions for
> filesystems with sparsely populated inode trees.
> 

I agree. I'd be fine with ripping this out in favor of a better
solution. The problem is since we don't have a detailed root cause of
the problem, it's not clear what the right fix is. I'm not sure where
this leaves the user that originally reproduced the problem. Does
bumping the reserve block pool work around the problem? Can we revisit
it to find a more specific root cause? Christoph?

...
> Yes, that's what it does to ensure users get ENOSPC for data
> allocation before we run out of metadata reservation space, even if
> we don't need the metadata reservation space.  It's size is
> physically bound by the AG size so we can calculate it any time we
> know what the AG size is.
> 

Right. So I got the impression that the problem was enforcement of the
reservation. Is that not the case? Rather, is the problem the
calculation of the reservation requirement due to the basis on AG size
(which is no longer valid due to the thin nature)? IOW, the reservations
restrict far too much space and cause the fs to return ENOSPC too
early?

E.g., re-reading your original example.. you have a 32TB fs backed by
1TB of physical allocation to the volume. You mount the fs and see 1TB
"available" space, but ~600GB if that is already consumed by
reservation so you end up at ENOSPC after 300-400GB of real usage. Hm?

If that is the case, then it does seem that dynamic reservation based on
current usage could be a solution in-theory. I.e., basing the
reservation on usage effectively bases it against "real" space, whether
the underlying volume is thin or fully allocated. That seems do-able for
the finobt (if we don't end up removing this reservation entirely) as
noted above. If that would not help your use case, could you elaborate
on why using the finobt example? Of course, I've no idea if that's a
viable approach for the other reservations so it's still just a handwavy
idea.

...
> 
> I was working on the idea that thin filesystems have sufficient
> spare physical space (e.g. logical size < (physical size - max
> metadata reservation) that even when maxxed out there's sufficient
> physical space remaining for all the metadata without needing to
> reserve that space.
> 
> In theory, this /should/ work as the metadata blocks are already
> reserved as used space at mount time and hence the actual allocation
> of those blocks is only accounted against the reservation, not the
> global freespace counter.  Hence these metadata blocks aren't
> counted as used space when they are allocated - they are always
> accounted as used space whether used or not. Hence if I remove the
> "accounted as used space" part of the reservation, but then ensure
> that there is physically enough room for them to always succeed, we
> end up with exactly the same metadata space guarantee.  The only
> difference is how it's accounted and provided....
> 

I'd probably need to see patches to make sure I follow this correctly.
While I'm sure we can ultimately implement whatever accounting tricks we
want, I'm more curious how accuracy is maintained for anything based on
assumptions about how physical space is allocated in the underlying
volume.

> I've written the patches to do this, but I haven't tested it other
> than checking falloc triggers ENOSPC when it's supposed to. I'm just
> finishing off the repair support so I can run it through xfstests.
> That will be interesting. :P
> 
> FWIW, I think there is a good case for storing the metadata
> reservation on disk in the AGF and removing it from user visible
> global free space.  We already account for free space, rmap and
> refcount btree block usage in the AGF, so we already have the
> mechanisms for tracking the necessary per-ag metadata usage outside
> of the global free space counters. Hence there doesn't appear to me
> to be any reason why why we can't do the per-ag metadata
> reservation/usage accounting in the AGF and get rid of the in-memory
> reservation stuff.
> 

Sounds interesting, that might very well be a cleaner implementation of
reservations. The current reservation tracking tends to confuse me more
often than not. ;)

> If we do that, users will end up with exactly the same amount of
> free space, but the metadata reservations are no longer accounted as
> user visible used space.  i.e. the users never need to see the
> internal space reservations we need to make the filesystem work
> reliably. This would work identically for normal filesystems and
> thin filesystems without needing to play special games for thin
> filesystems....
> 

Indeed, though this seems more like a usability enhancement. Couldn't we
accomplish this part by just subtracting the reservations from the total
free space up front (along with whatever accounting changes need to
happen to support that)?

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Some questions about per-ag metadata space reservations...
  2017-09-08 13:33     ` Brian Foster
@ 2017-09-09  0:25       ` Dave Chinner
  2017-09-11 13:26         ` Brian Foster
  0 siblings, 1 reply; 7+ messages in thread
From: Dave Chinner @ 2017-09-09  0:25 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs, Christoph Hellwig

On Fri, Sep 08, 2017 at 09:33:54AM -0400, Brian Foster wrote:
> cc Christoph (re: finobt perag reservation)
> 
> On Fri, Sep 08, 2017 at 09:11:36AM +1000, Dave Chinner wrote:
> > On Thu, Sep 07, 2017 at 09:44:58AM -0400, Brian Foster wrote:
> > > On Wed, Sep 06, 2017 at 08:30:54PM +1000, Dave Chinner wrote:
> ...
> > > > When combined with a thinly provisioned device, this enables us to
> > > > shrink the XFS filesystem simply by running fstrim to punch all the
> > > > free space out of the underlying thin device and then adjusting the
> > > > free space down appropriately. Because the thin device abstracts the
> > > > physical location of the data in the block device away from the
> > > > address space presented to the filesystem, we don't need to move any
> > > > data or metadata to free up this space - it's just an accounting
> > > > change.
> > > > 
> > > 
> > > How are you dealing with block size vs. thin chunk allocation size
> > > alignment? You could require they match, but if not it seems like there
> > > could be a bit more involved than an accounting change.
> > 
> > Not a filesystem problem. If there's less pool space than you let
> > the filesystem have, then the pool will ENOSPC before the filesystem
> > will. regular fstrim (which you should be doing on thin filesystems
> > anyway) will keep them mostly aligned because XFS tends to pack
> > holes in AG space rather than continually growing the space they
> > use.
> > 
> 
> I don't see how tracking underlying physical/available pool space in the
> filesystem is a filesystem problem but tracking the alignment/size of
> those physical allocations is not. It seems to me that either they are
> both fs problems or they aren't. This is just a question of accuracy.

The filesystem cannot do anything about the size/alignment of blocks
in the thin device. It gets a hint through stripe alignment, but
other than that we can only track the space the filesystem uses in
the filesystem. In practice XFS tends to pack used space fairly well
over time (unlike ext4) so I'm really not too concerned about this
right now.

If it becomes a problem, then we can analyse it where the problem
lies and work out how to mitigate it. But until we see such problems
that can't be solved with "allow X% size margins" guidelines, I'm
not going to worry about it.

> I get that the filesystem may return ENOSPC before the pool shuts down
> more often than not, but that is still workload dependent. If it's not
> important, perhaps I'm just not following what the
> objectives/requirements are for this feature.

What I'm trying to do is move the first point of ENOSPC in a thin
environment up into the filesystem. ie. you don't manage user space
requirements by thin device sizing - you way, way over commit that
with the devices and instead use the filesystem "thin size" to limit
what the filesystem can draw from the pool.

That way users know exactly how much space they have available and
can plan appropriately, as opposed to the current case where the
first warning they get of the underlying storage running out of
space when they have heaps of free space is "things suddenly stop
working".

If you overcommit the filesystem thin sizes, then it's no different
to overcommiting the thin pool with large devices - the device pool
is going to ENOSPC first. If you don't leave some amount of margin
in the thin fs sizing, then you're going to ENOSPC the device pool.
If you don't leave margin for snapshots, you're going to ENOSPC the
device pool.

IOWs, using the filesystem to control thin space allocation has
exactly the same admin pitfalls as using dm-thinp to manage the pool
space. The only difference is that when the sizes/margins are set
properly then the fs layer ENOSPCs before the thin device pool
ENOSPCs and so we remove that clusterfuck completely from the
picture.

> ...
> > Yes, that's what it does to ensure users get ENOSPC for data
> > allocation before we run out of metadata reservation space, even if
> > we don't need the metadata reservation space.  It's size is
> > physically bound by the AG size so we can calculate it any time we
> > know what the AG size is.
> > 
> 
> Right. So I got the impression that the problem was enforcement of the
> reservation. Is that not the case? Rather, is the problem the
> calculation of the reservation requirement due to the basis on AG size
> (which is no longer valid due to the thin nature)?

No, the physical metadata reservation space is still required. It
just should not be *accounted* to the logical free space.

> IOW, the reservations
> restrict far too much space and cause the fs to return ENOSPC too
> early?

Yes, the initial problem is that the fixed reservations are
dynamically accounted as used space.

> E.g., re-reading your original example.. you have a 32TB fs backed by
> 1TB of physical allocation to the volume. You mount the fs and see 1TB
> "available" space, but ~600GB if that is already consumed by
> reservation so you end up at ENOSPC after 300-400GB of real usage. Hm?

Yup, that's a visible *symptom*. Another user visible symptom is df
on an empty filesystem reports hundreds of GB (TB even!) of used
space on a completely empty filesystem.

> If that is the case, then it does seem that dynamic reservation based on
> current usage could be a solution in-theory. I.e., basing the
> reservation on usage effectively bases it against "real" space, whether
> the underlying volume is thin or fully allocated. That seems do-able for
> the finobt (if we don't end up removing this reservation entirely) as
> noted above.

The finobt case is different to rmap and reflink. finobt should only
require a per-operation reservation to ensure there is space in the
AG to create the finobt record and btree blocks. We do not need a
permanent, maximum sized tree reservation for this - we just need to
ensure all the required space is available in the one AG rather than
globally available before we start the allocation operation.  If we
can do that, then the operation should (in theory) never fail with
ENOSPC...

As for rmap and refcountbt reservations, they have to have space to
allow rmap and CoW operations to succeed when no user data is
modified, and to allow metadata allocations to run without needing
to update every transaction reservation to take into account all the
rmapbt updates that are necessary. These can be many and span
multiple AGs (think badly fragmented directory blocks) and so the
worst case reservation is /huge/ and made upfront worst-case
reservations for rmap/reflink DOA.

So we avoided this entire problem by ensuring we always have space for
the rmap/refcount metadata; using 1-2% of disk space permanently
was considered a valid trade off for the simplicity of
implementation. That's what the per-ag reservations implement and
we even added on-disk metadata in the AGF to make this reservation
process low overhead.

This was all "it seems like the best compromise" design. We
based it on the existing reserve pool behaviour because it was easy
to do. Now that I'm trying to use these filesystems in anger, I'm
tripping over the problems as a result of this choice to base the
per ag metadata reservations on the reserve pool behaviour.

> > I've written the patches to do this, but I haven't tested it other
> > than checking falloc triggers ENOSPC when it's supposed to. I'm just
> > finishing off the repair support so I can run it through xfstests.
> > That will be interesting. :P
> > 
> > FWIW, I think there is a good case for storing the metadata
> > reservation on disk in the AGF and removing it from user visible
> > global free space.  We already account for free space, rmap and
> > refcount btree block usage in the AGF, so we already have the
> > mechanisms for tracking the necessary per-ag metadata usage outside
> > of the global free space counters. Hence there doesn't appear to me
> > to be any reason why why we can't do the per-ag metadata
> > reservation/usage accounting in the AGF and get rid of the in-memory
> > reservation stuff.
> > 
> 
> Sounds interesting, that might very well be a cleaner implementation of
> reservations. The current reservation tracking tends to confuse me more
> often than not. ;)

In hindsight, I think we should have baked the reservation space
fully into the on-disk format rather than tried to make it dynamic
and backwards compatible. i.e. make it completely hidden from the
user and always there for filesystems with those features enabled.

> > If we do that, users will end up with exactly the same amount of
> > free space, but the metadata reservations are no longer accounted as
> > user visible used space.  i.e. the users never need to see the
> > internal space reservations we need to make the filesystem work
> > reliably. This would work identically for normal filesystems and
> > thin filesystems without needing to play special games for thin
> > filesystems....
> > 
> 
> Indeed, though this seems more like a usability enhancement. Couldn't we
> accomplish this part by just subtracting the reservations from the total
> free space up front (along with whatever accounting changes need to
> happen to support that)?

Yes, I had a crazy thought last night that I might be able to do
some in-memory mods to sb_dblocks and sb_fdblocks at mount time to
to adjust how available space and reservations are accounted. I'll
have a bit of a think and a play over the next few days and see what
I come up with.

The testing I've been doing with thin filesystems backs this up -
they are behaving sanely at ENOSPC without accounting for the
metadata reservations in the user visible free space. I'm still
using the metadata reservations to ensure operations have space in
each AG to complete successfully, it's just not consuming user
accounted free space....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Some questions about per-ag metadata space reservations...
  2017-09-09  0:25       ` Dave Chinner
@ 2017-09-11 13:26         ` Brian Foster
  2017-09-15  1:03           ` Dave Chinner
  0 siblings, 1 reply; 7+ messages in thread
From: Brian Foster @ 2017-09-11 13:26 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, Christoph Hellwig

On Sat, Sep 09, 2017 at 10:25:43AM +1000, Dave Chinner wrote:
> On Fri, Sep 08, 2017 at 09:33:54AM -0400, Brian Foster wrote:
...
> 
> The filesystem cannot do anything about the size/alignment of blocks
> in the thin device. It gets a hint through stripe alignment, but
> other than that we can only track the space the filesystem uses in
> the filesystem. In practice XFS tends to pack used space fairly well
> over time (unlike ext4) so I'm really not too concerned about this
> right now.
> 
> If it becomes a problem, then we can analyse it where the problem
> lies and work out how to mitigate it. But until we see such problems
> that can't be solved with "allow X% size margins" guidelines, I'm
> not going to worry about it.
> 

Ok, fair enough.

> > I get that the filesystem may return ENOSPC before the pool shuts down
> > more often than not, but that is still workload dependent. If it's not
> > important, perhaps I'm just not following what the
> > objectives/requirements are for this feature.
> 
> What I'm trying to do is move the first point of ENOSPC in a thin
> environment up into the filesystem. ie. you don't manage user space
> requirements by thin device sizing - you way, way over commit that
> with the devices and instead use the filesystem "thin size" to limit
> what the filesystem can draw from the pool.
> 
> That way users know exactly how much space they have available and
> can plan appropriately, as opposed to the current case where the
> first warning they get of the underlying storage running out of
> space when they have heaps of free space is "things suddenly stop
> working".
> 

Ok, that's what I suspected. FWIW, this reminded me of the thin space
reservation thing I was hacking on a year or two ago to accomplish a
similar objective. That's where the whole size/alignment question came
up.

> If you overcommit the filesystem thin sizes, then it's no different
> to overcommiting the thin pool with large devices - the device pool
> is going to ENOSPC first. If you don't leave some amount of margin
> in the thin fs sizing, then you're going to ENOSPC the device pool.
> If you don't leave margin for snapshots, you're going to ENOSPC the
> device pool.
> 
> IOWs, using the filesystem to control thin space allocation has
> exactly the same admin pitfalls as using dm-thinp to manage the pool
> space. The only difference is that when the sizes/margins are set
> properly then the fs layer ENOSPCs before the thin device pool
> ENOSPCs and so we remove that clusterfuck completely from the
> picture.
> 

I still think some of that is non-deterministic, but I suppose if you
have a worst case slop/margin delta between usable space in the fs and
what is truly available from the underlying storage, it might not be a
problem in practice. I still have some questions, but it's probably not
worth reasoning about until code is available.

> > ...
> > > Yes, that's what it does to ensure users get ENOSPC for data
> > > allocation before we run out of metadata reservation space, even if
> > > we don't need the metadata reservation space.  It's size is
> > > physically bound by the AG size so we can calculate it any time we
> > > know what the AG size is.
> > > 
> > 
> > Right. So I got the impression that the problem was enforcement of the
> > reservation. Is that not the case? Rather, is the problem the
> > calculation of the reservation requirement due to the basis on AG size
> > (which is no longer valid due to the thin nature)?
> 
> No, the physical metadata reservation space is still required. It
> just should not be *accounted* to the logical free space.
> 

Ok, I think we're talking about the same things and just thinking about
it differently. On the presumption that we (continue to) use a worst
case reservation, it makes sense to account it against the physical free
space in the AG rather than the (more limited) logical free space. My
point was to explore whether we could adjust the actual reservation
requirements to be dynamic such that it would (continue to) not matter
that the reservations are accounted out of logical free space. Indeed,
this hasn't been a problem in situations where we know the reservation
is only 1-2% of truly available space.

Thinking about it from another angle, the old thin reservation rfc I
referenced above would probably ENOSPC on mount in the current scheme of
things because there simply isn't that much space available to reserve
out of the volume. It worked fine at the time because we only had the
capped size global reserve pool. Hence, we'd have to either change how
the reservations are made so they wouldn't reserve out of the volume (as
you suggest) or somehow or another base them on the logical size of the
volume.

> > IOW, the reservations
> > restrict far too much space and cause the fs to return ENOSPC too
> > early?
> 
> Yes, the initial problem is that the fixed reservations are
> dynamically accounted as used space.
> 
> > E.g., re-reading your original example.. you have a 32TB fs backed by
> > 1TB of physical allocation to the volume. You mount the fs and see 1TB
> > "available" space, but ~600GB if that is already consumed by
> > reservation so you end up at ENOSPC after 300-400GB of real usage. Hm?
> 
> Yup, that's a visible *symptom*. Another user visible symptom is df
> on an empty filesystem reports hundreds of GB (TB even!) of used
> space on a completely empty filesystem.
> 
> > If that is the case, then it does seem that dynamic reservation based on
> > current usage could be a solution in-theory. I.e., basing the
> > reservation on usage effectively bases it against "real" space, whether
> > the underlying volume is thin or fully allocated. That seems do-able for
> > the finobt (if we don't end up removing this reservation entirely) as
> > noted above.
> 
> The finobt case is different to rmap and reflink. finobt should only
> require a per-operation reservation to ensure there is space in the
> AG to create the finobt record and btree blocks. We do not need a
> permanent, maximum sized tree reservation for this - we just need to
> ensure all the required space is available in the one AG rather than
> globally available before we start the allocation operation.  If we
> can do that, then the operation should (in theory) never fail with
> ENOSPC...
> 

I'm not familiar with the workload that motivated the finobt perag
reservation stuff, but I suspect it's something that pushes an fs (or
AG) with a ton of inodes to near ENOSPC with a very small finobt, and
then runs a bunch of operations that populate the finobt without freeing
up enough space in the particular AG. I suppose that could be due to
having zero sized files (which seems pointless in practice), sparsely
freeing inodes such that inode chunks are never freed, using the ikeep
mount option, and/or otherwise freeing a bunch of small files that only
free up space in other AGs before the finobt allocation demand is made.

The larger point is that we don't really know much of anything to try
and at least reason about what the original problem could have been, but
it seems plausible to create the ENOSPC condition if one tried hard
enough.

> As for rmap and refcountbt reservations, they have to have space to
> allow rmap and CoW operations to succeed when no user data is
> modified, and to allow metadata allocations to run without needing
> to update every transaction reservation to take into account all the
> rmapbt updates that are necessary. These can be many and span
> multiple AGs (think badly fragmented directory blocks) and so the
> worst case reservation is /huge/ and made upfront worst-case
> reservations for rmap/reflink DOA.
> 
> So we avoided this entire problem by ensuring we always have space for
> the rmap/refcount metadata; using 1-2% of disk space permanently
> was considered a valid trade off for the simplicity of
> implementation. That's what the per-ag reservations implement and
> we even added on-disk metadata in the AGF to make this reservation
> process low overhead.
> 
> This was all "it seems like the best compromise" design. We
> based it on the existing reserve pool behaviour because it was easy
> to do. Now that I'm trying to use these filesystems in anger, I'm
> tripping over the problems as a result of this choice to base the
> per ag metadata reservations on the reserve pool behaviour.
> 

Got it. FWIW, what I was handwaving about sounds like more of a
compromise between what we do now (worst case res, user visible) and
what it sounds like you're working towards (worst case res, user
invisible). By that I mean that I've been thinking about the problem
more from the angle of whether we can avoid the worst case reservation.
The reservation itself could still be made visible or not either way. Of
course, it sounds like changing the reservation requirement for things
like the rmapbt would be significantly more complicated than for the
finobt, so "hiding" the reservation might be the next best tradeoff.

Brian

> > > I've written the patches to do this, but I haven't tested it other
> > > than checking falloc triggers ENOSPC when it's supposed to. I'm just
> > > finishing off the repair support so I can run it through xfstests.
> > > That will be interesting. :P
> > > 
> > > FWIW, I think there is a good case for storing the metadata
> > > reservation on disk in the AGF and removing it from user visible
> > > global free space.  We already account for free space, rmap and
> > > refcount btree block usage in the AGF, so we already have the
> > > mechanisms for tracking the necessary per-ag metadata usage outside
> > > of the global free space counters. Hence there doesn't appear to me
> > > to be any reason why why we can't do the per-ag metadata
> > > reservation/usage accounting in the AGF and get rid of the in-memory
> > > reservation stuff.
> > > 
> > 
> > Sounds interesting, that might very well be a cleaner implementation of
> > reservations. The current reservation tracking tends to confuse me more
> > often than not. ;)
> 
> In hindsight, I think we should have baked the reservation space
> fully into the on-disk format rather than tried to make it dynamic
> and backwards compatible. i.e. make it completely hidden from the
> user and always there for filesystems with those features enabled.
> 
> > > If we do that, users will end up with exactly the same amount of
> > > free space, but the metadata reservations are no longer accounted as
> > > user visible used space.  i.e. the users never need to see the
> > > internal space reservations we need to make the filesystem work
> > > reliably. This would work identically for normal filesystems and
> > > thin filesystems without needing to play special games for thin
> > > filesystems....
> > > 
> > 
> > Indeed, though this seems more like a usability enhancement. Couldn't we
> > accomplish this part by just subtracting the reservations from the total
> > free space up front (along with whatever accounting changes need to
> > happen to support that)?
> 
> Yes, I had a crazy thought last night that I might be able to do
> some in-memory mods to sb_dblocks and sb_fdblocks at mount time to
> to adjust how available space and reservations are accounted. I'll
> have a bit of a think and a play over the next few days and see what
> I come up with.
> 
> The testing I've been doing with thin filesystems backs this up -
> they are behaving sanely at ENOSPC without accounting for the
> metadata reservations in the user visible free space. I'm still
> using the metadata reservations to ensure operations have space in
> each AG to complete successfully, it's just not consuming user
> accounted free space....
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Some questions about per-ag metadata space reservations...
  2017-09-11 13:26         ` Brian Foster
@ 2017-09-15  1:03           ` Dave Chinner
  0 siblings, 0 replies; 7+ messages in thread
From: Dave Chinner @ 2017-09-15  1:03 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs, Christoph Hellwig

On Mon, Sep 11, 2017 at 09:26:08AM -0400, Brian Foster wrote:
> On Sat, Sep 09, 2017 at 10:25:43AM +1000, Dave Chinner wrote:
> > On Fri, Sep 08, 2017 at 09:33:54AM -0400, Brian Foster wrote:
> > > If that is the case, then it does seem that dynamic reservation based on
> > > current usage could be a solution in-theory. I.e., basing the
> > > reservation on usage effectively bases it against "real" space, whether
> > > the underlying volume is thin or fully allocated. That seems do-able for
> > > the finobt (if we don't end up removing this reservation entirely) as
> > > noted above.
> > 
> > The finobt case is different to rmap and reflink. finobt should only
> > require a per-operation reservation to ensure there is space in the
> > AG to create the finobt record and btree blocks. We do not need a
> > permanent, maximum sized tree reservation for this - we just need to
> > ensure all the required space is available in the one AG rather than
> > globally available before we start the allocation operation.  If we
> > can do that, then the operation should (in theory) never fail with
> > ENOSPC...
> > 
> 
> I'm not familiar with the workload that motivated the finobt perag
> reservation stuff, but I suspect it's something that pushes an fs (or
> AG) with a ton of inodes to near ENOSPC with a very small finobt, and
> then runs a bunch of operations that populate the finobt without freeing
> up enough space in the particular AG.

That's a characteristic of a hardlink backup farm. And, in new-skool
terms, that's what a reflink- or dedupe- based backup farm will look
like, too. i.e. old backups get removed freeing up inodes, but no
data gets freed so the only new free blocks are the directory blocks
that are no longer in use...

> I suppose that could be due to
> having zero sized files (which seems pointless in practice), sparsely
> freeing inodes such that inode chunks are never freed, using the ikeep
> mount option, and/or otherwise freeing a bunch of small files that only
> free up space in other AGs before the finobt allocation demand is made.

Yup, all of those are potential issues....

> The larger point is that we don't really know much of anything to try
> and at least reason about what the original problem could have been, but
> it seems plausible to create the ENOSPC condition if one tried hard
> enough.

*nod*. i.e. if you're not freeing data, then unlinking dataless
inodes may not succeed at ENOSPC. I think we can do better than what
we currently do, though. e.g. we can simply dump them on the
unlinked list and process them when there is free space to
create the necessary finobt btree blocks to index them rather than
as soon as the last VFS reference goes away (i.e. background
inode freeing).

> > As for rmap and refcountbt reservations, they have to have space to
> > allow rmap and CoW operations to succeed when no user data is
> > modified, and to allow metadata allocations to run without needing
> > to update every transaction reservation to take into account all the
> > rmapbt updates that are necessary. These can be many and span
> > multiple AGs (think badly fragmented directory blocks) and so the
> > worst case reservation is /huge/ and made upfront worst-case
> > reservations for rmap/reflink DOA.
> > 
> > So we avoided this entire problem by ensuring we always have space for
> > the rmap/refcount metadata; using 1-2% of disk space permanently
> > was considered a valid trade off for the simplicity of
> > implementation. That's what the per-ag reservations implement and
> > we even added on-disk metadata in the AGF to make this reservation
> > process low overhead.
> > 
> > This was all "it seems like the best compromise" design. We
> > based it on the existing reserve pool behaviour because it was easy
> > to do. Now that I'm trying to use these filesystems in anger, I'm
> > tripping over the problems as a result of this choice to base the
> > per ag metadata reservations on the reserve pool behaviour.
> > 
> 
> Got it. FWIW, what I was handwaving about sounds like more of a
> compromise between what we do now (worst case res, user visible) and
> what it sounds like you're working towards (worst case res, user
> invisible). By that I mean that I've been thinking about the problem
> more from the angle of whether we can avoid the worst case reservation.
> The reservation itself could still be made visible or not either way. Of
> course, it sounds like changing the reservation requirement for things
> like the rmapbt would be significantly more complicated than for the
> finobt, so "hiding" the reservation might be the next best tradeoff.

Yeah, and having done that I'm tripping over the next issue: it's
possible for the log to be larger than than thin space, so I think
I'm going to have to cut that out of visible used space, too....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-09-15  1:04 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-09-06 10:30 Some questions about per-ag metadata space reservations Dave Chinner
2017-09-07 13:44 ` Brian Foster
2017-09-07 23:11   ` Dave Chinner
2017-09-08 13:33     ` Brian Foster
2017-09-09  0:25       ` Dave Chinner
2017-09-11 13:26         ` Brian Foster
2017-09-15  1:03           ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox