From: Dave Chinner <david@fromorbit.com>
To: Brian Foster <bfoster@redhat.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [RFC PATCH 0/14] xfs: Towards thin provisioning aware filesystems
Date: Tue, 31 Oct 2017 08:09:41 +1100 [thread overview]
Message-ID: <20171030210941.GA5858@dastard> (raw)
In-Reply-To: <20171030133117.GA3535@bfoster.bfoster>
On Mon, Oct 30, 2017 at 09:31:17AM -0400, Brian Foster wrote:
> On Thu, Oct 26, 2017 at 07:33:08PM +1100, Dave Chinner wrote:
> > This patchset is aimed at filesystems that are installed on sparse
> > block devices, a.k.a thin provisioned devices. The aim of the
> > patchset is to bring the space management aspect of the storage
> > stack up into the filesystem rather than keeping it below the
> > filesystem where users and the filesystem have no clue they are
> > about to run out of space.
> >
>
> Just a few high level comments/thoughts after a very quick pass..
>
> ...
> > The patchset has several parts to it. It is built on a 4.14-rc5
> > kernel with for-next and Darrick's scrub tree from a couple of days
> > ago merged into it.
> >
> > The first part of teh series is a growfs refactoring. This can
> > probably stand alone, and the idea is to move the refactored
> > infrastructure into libxfs so it can be shared with mkfs. This also
> > cleans up a lot of the cruft in growfs and so makes it much easier
> > to add the changes later in the series.
> >
>
> It'd be nice to see this as a separate patchset. It does indeed seem
> like it could be applied before all of the other bits are worked out.
Yes, i plan to do that, and with it separated also get it into shape
where it can be shared with mkfs via libxfs.
> > The second part of the patchset moves the functionality of
> > sb_dblocks into the struct xfs_mount. This provides the separation
> > of address space checks and capacty related calculations that the
> > thinspace mods require. This also fixes the problem of freshly made,
> > empty filesystems reporting 2% of the space as used.
> >
>
> This all seems mostly sane to me. FWIW, the in-core fields (particularly
> ->m_LBA_size) strike me as oddly named. "LBA size" strikes me as the
> addressable range of the underlying device, regardless of fs size (but I
> understand that's not how it's used here). If we have ->m_usable_blocks,
> why not name the other something like ->m_total_blocks or just
> ->m_blocks? That suggests to me that the fields are related, mutable,
> accounted in units of blocks, etc. Just a nit, though.
Yeah, I couldn't think of a better pair of names - the point I was
trying to get across is that what we call "dblocks" is actually an
assumption about the structure of the block device address space
we are working on. i.e. that it's a contiguous, linear address space
from 0 to "dblocks".
I get that "total_blocks" sounds better, but to me that's a capacity
measurement, not an indication of the size of the underlying address
space the block device has provided. m_usable_blocks is obviously a
capacity measurement, but I was trying to convey that m_LBA_size is
not a capacity measurement but an externally imposed addressing
limit.
<shrug>
I guess if I document it well enough m_total_blocks will work.
> That aside, it also seems like this part could technically stand as an
> independent change.
*nod*
> > The XFS_IOC_FSGEOMETRY ioctl needed to be bumped to a new version
> > because the structure needed growing.
> >
> > Finally, there's the patches that provide thinspace support and the
> > growfs mods needed to grow and shrink.
> >
>
> I still think there's a gap related to fs and bdev block size that needs
> to be addressed with regard to preventing pool depletion, but I think
> that can ultimately be handled with documentation. I also wonder a bit
> about consistency between online modifications to usable blocks and fs
> blocks that might have been allocated but not yet written (i.e.,
> unwritten extents and the log on freshly created filesystems). IOW,
> shrinking the filesystem in such cases may not limit pool allocation in
> the way a user might expect.
>
> Did we ever end up with support for bdev fallocate?
Yes, we do have that.
> If so, I wonder if
> for example it would be useful to fallocate the physical log at mount or
> mkfs time for thin enabled fs'.
We zero the log and stamp it with cycle headers, so it will already
be fully allocated in th eunderlying device after running mkfs.
> The unwritten extent case may just be a
> matter of administration sanity since the filesystem shrink will
> consider those as allocated blocks and thus imply that the fs expects
> the ability to write them.
Yes, that is correct. To make things work as expected, we'll need
to propagating fallocate calls to the block device from the
filesystem when mounted. There's a bunch of stuff there that we can
add once the core "this is a thin filesystem" awareness has been
added.
> Finally, I tend to agree with Amir's comment with regard to
> shrink/growfs... at least infosar as I understand his concern. If we do
> support physical shrink in the future, what do we expect the interface
> to look like in light of this change?
I don't expect it to look any different. It's exactly the same as
growfs - thinspace filesystem will simply do a logical grow/shrink,
fat filesystems will need to do a physical grow/shrink
adding/removing AGs.
I suspect Amir is worried about the fact that I put "LBA_size"
in geom.datablocks instead of "usable_space" for thin-aware
filesystems (i.e. I just screwed up writing the new patch). Like I
said, I haven't updated the userspace stuff yet, so the thinspace
side of that hasn't been tested yet. If I screwed up xfs_growfs (and
I have because some of the tests are reporting incorrect
post-grow sizes on fat filesytsems) I tend to find out as soon as I
run it.
Right now I think using m_LBA_size and m_usable_space in the geom
structure was a mistake - they should remain the superblock values
because otherwise the hidden metadata reservations can affect what
is reported to userspace, and that's where I think the test failures
are coming from....
> FWIW, I also think there's an
> element of design/interface consistency to the argument (aside from the
> concern over the future of the physical shrink api). We've separated
> total blocks from usable blocks in other userspace interfaces
> (geometry). Not doing so for growfs is somewhat inconsistent, and also
> creates confusion over the meaning of newblocks in different contexts.
It has to be done for the geometry info so that xfs_info can report
the thinspace size of the filesytem in addition to the physical
size. It does not need to be done to make growfs work correctly.
> Given that this already requires on-disk changes and the addition of a
> feature bit, it seems prudent to me to update the growfs API
> accordingly. Isn't a growfs new_usable_blocks field or some such all we
> really need to address that concern?
I really do not see any reason for changing the growfs interface
right now. If there's a problem in future that physical shrink
introduces, we can rev the interface when the problem arises.
Cheers
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2017-10-30 21:10 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-10-26 8:33 [RFC PATCH 0/14] xfs: Towards thin provisioning aware filesystems Dave Chinner
2017-10-26 8:33 ` [PATCH 01/14] xfs: factor out AG header initialisation from growfs core Dave Chinner
2017-10-26 8:33 ` [PATCH 02/14] xfs: convert growfs AG header init to use buffer lists Dave Chinner
2017-10-26 8:33 ` [PATCH 03/14] xfs: factor ag btree reoot block initialisation Dave Chinner
2017-10-26 8:33 ` [PATCH 04/14] xfs: turn ag header initialisation into a table driven operation Dave Chinner
2017-10-26 8:33 ` [PATCH 05/14] xfs: make imaxpct changes in growfs separate Dave Chinner
2017-10-26 8:33 ` [PATCH 06/14] xfs: separate secondary sb update in growfs Dave Chinner
2017-10-26 8:33 ` [PATCH 07/14] xfs: rework secondary superblock updates " Dave Chinner
2017-10-26 8:33 ` [PATCH 08/14] xfs: move various type verifiers to common file Dave Chinner
2017-10-26 8:33 ` [PATCH 09/14] xfs: split usable space from block device size Dave Chinner
2017-10-26 8:33 ` [PATCH 10/14] xfs: hide reserved metadata space from users Dave Chinner
2017-10-26 8:33 ` [PATCH 11/14] xfs: bump XFS_IOC_FSGEOMETRY to v5 structures Dave Chinner
2017-10-26 8:33 ` [PATCH 12/14] xfs: convert remaingin xfs_sb_version_... checks to bool Dave Chinner
2017-10-26 16:03 ` Darrick J. Wong
2017-10-26 8:33 ` [PATCH 13/14] xfs: add suport for "thin space" filesystems Dave Chinner
2017-10-26 8:33 ` [PATCH 14/14] xfs: add growfs support for changing usable blocks Dave Chinner
2017-10-26 11:30 ` Amir Goldstein
2017-10-26 12:48 ` Dave Chinner
2017-10-26 13:32 ` Amir Goldstein
2017-10-27 10:26 ` Amir Goldstein
2017-10-26 11:09 ` [RFC PATCH 0/14] xfs: Towards thin provisioning aware filesystems Amir Goldstein
2017-10-26 12:35 ` Dave Chinner
2017-11-01 22:31 ` Darrick J. Wong
2017-10-30 13:31 ` Brian Foster
2017-10-30 21:09 ` Dave Chinner [this message]
2017-10-31 4:49 ` Amir Goldstein
2017-10-31 22:40 ` Dave Chinner
2017-10-31 11:24 ` Brian Foster
2017-11-01 0:45 ` Dave Chinner
2017-11-01 14:17 ` Brian Foster
2017-11-01 23:53 ` Dave Chinner
2017-11-02 11:25 ` Brian Foster
2017-11-02 23:30 ` Dave Chinner
2017-11-03 2:47 ` Darrick J. Wong
2017-11-03 11:36 ` Brian Foster
2017-11-05 22:50 ` Dave Chinner
2017-11-06 13:01 ` Brian Foster
2017-11-06 21:20 ` Dave Chinner
2017-11-07 11:28 ` Brian Foster
2017-11-03 11:26 ` Brian Foster
2017-11-03 12:19 ` Amir Goldstein
2017-11-06 1:16 ` Dave Chinner
2017-11-06 9:48 ` Amir Goldstein
2017-11-06 21:46 ` Dave Chinner
2017-11-07 5:30 ` Amir Goldstein
2017-11-05 23:51 ` Dave Chinner
2017-11-06 13:07 ` Brian Foster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171030210941.GA5858@dastard \
--to=david@fromorbit.com \
--cc=bfoster@redhat.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).