linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Amir Goldstein <amir73il@gmail.com>,
	linux-xfs <linux-xfs@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: [RFC PATCH 0/14] xfs: Towards thin provisioning aware filesystems
Date: Wed, 1 Nov 2017 15:31:15 -0700	[thread overview]
Message-ID: <20171101223115.GL4911@magnolia> (raw)
In-Reply-To: <20171026123548.GA3666@dastard>

On Thu, Oct 26, 2017 at 11:35:48PM +1100, Dave Chinner wrote:
> On Thu, Oct 26, 2017 at 02:09:26PM +0300, Amir Goldstein wrote:
> > On Thu, Oct 26, 2017 at 11:33 AM, Dave Chinner <david@fromorbit.com> wrote:
> > > This patchset is aimed at filesystems that are installed on sparse
> > > block devices, a.k.a thin provisioned devices. The aim of the
> > > patchset is to bring the space management aspect of the storage
> > > stack up into the filesystem rather than keeping it below the
> > > filesystem where users and the filesystem have no clue they are
> > > about to run out of space.
> .....
> > > I've smoke tested the non-thinspace code paths (running auto tests
> > > on a scrub enabled kernel+userspace right now) as I haven't updated
> > > the userspace code to exercise the thinp code paths yet. I know the
> > > concept works, but my userspace code has an older on-disk format
> > > from the prototype so it will take me a couple of days to update and
> > > work out how to get fstests to integrate it reliably. So this is
> > > mainly a heads-up RFC patchset....
> > >
> > > Comments, thoughts, flames all welcome....
> > >
> > 
> > This proposal is very interesting outside the scope of xfs, so I hope you
> > don't mind I've CC'ed fsdevel.
> > 
> > I am thinking how a slightly similar approach could be used to online shrink
> > the physical size for filesystems that are not on thin provisioned devices:
> > 
> > - Set/get a geometry variable of "agsoftlimit" (better names are welcome)
> >   which is <= agcount.
> > - agsoftlimit < agcount means that free space of AG > agsoftlimit is zero,
> >   so total disk space usage will not show this space as available user space.
> > - inode and block allocators will avoid dipping into the high AG pool,
> >   expect for metadata block needed for freeing high AG inodes/blocks.
> > - A variant of xfs_fsr (or e4defrag for that matter) could "migrate" inodes
> >   and/or blocks from high to low AGs.
> > - Migrating directories is quite different than migrating files, but doable.
> > - Finally, on XFS_IOC_FSGROWFSDATA, if shrinking filesystem size and
> >   high AG usage counters are zero, then physical size can be shrunk
> >   as down as agsoftlimit instead of reducing usable_blocks.
> 
> Yup, you've just described all the craziness that a physical shrink
> requires on XFS. Lots of new user APIs, new tools to move data
> around, new code to transparently migrate directories and other
> metadata (like xattrs), etc.
> 
> Also, the log is placed half way through the XFS filesystem, so
> unless we add code to allocate and switch to a new journal (in a
> crash safe and recoverable way!) we can't shrink by more than 50%.
> 
> Also, none of the growfs code touches existing AGs - they'll have to
> be scanned to determine they really are empty before they get
> removed from the filesystem, and then there's the other issues like
> we can't shrink to less than 2 AGs, which puts a significant minimum
> shrink size on filesystems (again there's that "shrink more than 50%
> requires a lot more work" problem for filesystems < 4TB).
> 
> And to do it efficiently, we really need rmap support in filesystems
> so the fs can tell us what files and metadata need to be moved,
> rather than having to do brute force scans to work out what needs
> moving. Especially as the brute force scans can't find all the
> metadata that we might need to relocate before we've emptied the
> space we need to stop using.
> 
> IOWs, it's a *lot* of work, and IMO there's more work in
> verification and proving that everything is crash safe, recoverable
> and restartable. We've known how much work it is for years - why do
> you think it hasn't been implemented? See:
> 
> http://xfs.org/index.php/Shrinking_Support
> 
> And:
> 
> http://xfs.org/index.php/Unfinished_work#The_xfs_reno_tool
> 
> And specifically follow the reference to a discussion in 2007:
> 
> https://marc.info/?l=linux-xfs&m=119131697224361&w=2
> 
> > With this, xfs can gain physical shrink support and ext4 can gain online
> > (and safe) shrink support.
> 
> Yes, I estimate it'll probably take about a man-year's worth of work
> to get xfs shrink to production ready from all the pieces we have
> sitting around today.

Ewww, physical shrink.  Maybe that becomes feasible after parent pointer
support lands, both from a "making the directory rewrite easier" and a
"do the reviewers have time for this?" perspective. :)

I've worked on bashing resize2fs into better shape for shrink support;
the things you have to do (even on ext4, which doesn't share extents) to
the fs are pretty awful.  Ideally you'd move whole extents (or just
defrag the file into the space that will be left) but once reflink comes
into play you /have/ to have a strategy for maintaining the sharedness
across the migration or else you run the risk of blowing up the space
usage.

That's a lot to review, even if the strategy is "bail out with ENOSPC
having potentially done a ton of work and/or fragmented the fs".

--D

> > Assuming that this idea is not shot down on sight, the only implication
> > I can think of w.r.t your current patches is leaving enough room in new APIs
> > to accomodate this prospect functionality.
> 
> I'm not introducing any new APIs. XFS_IOC_FSGROWFSDATA already
> supports shrinking and resizing/moving the log, they just aren't
> implemented.
> 
> > You have already reserved 15 u64 in geometry V5 ioctl struct, so that's good.
> > You have not changed XFS_IOC_FSGROWFSDATA at all, so going forward
> > the ambiguity of physical shrink vs. virtual shrink could either be determined
> > by heuristics
> 
> No heuristics at all. filesystems on thin devices will have a
> feature bit in the superblock indicating they are thin filesystems.
> If the "thinspace" bit is set, shrink is just an accounting
> operation. If it's not set, then it needs to physically change the
> geometry of the filesystem....
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2017-11-01 22:31 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-26  8:33 [RFC PATCH 0/14] xfs: Towards thin provisioning aware filesystems Dave Chinner
2017-10-26  8:33 ` [PATCH 01/14] xfs: factor out AG header initialisation from growfs core Dave Chinner
2017-10-26  8:33 ` [PATCH 02/14] xfs: convert growfs AG header init to use buffer lists Dave Chinner
2017-10-26  8:33 ` [PATCH 03/14] xfs: factor ag btree reoot block initialisation Dave Chinner
2017-10-26  8:33 ` [PATCH 04/14] xfs: turn ag header initialisation into a table driven operation Dave Chinner
2017-10-26  8:33 ` [PATCH 05/14] xfs: make imaxpct changes in growfs separate Dave Chinner
2017-10-26  8:33 ` [PATCH 06/14] xfs: separate secondary sb update in growfs Dave Chinner
2017-10-26  8:33 ` [PATCH 07/14] xfs: rework secondary superblock updates " Dave Chinner
2017-10-26  8:33 ` [PATCH 08/14] xfs: move various type verifiers to common file Dave Chinner
2017-10-26  8:33 ` [PATCH 09/14] xfs: split usable space from block device size Dave Chinner
2017-10-26  8:33 ` [PATCH 10/14] xfs: hide reserved metadata space from users Dave Chinner
2017-10-26  8:33 ` [PATCH 11/14] xfs: bump XFS_IOC_FSGEOMETRY to v5 structures Dave Chinner
2017-10-26  8:33 ` [PATCH 12/14] xfs: convert remaingin xfs_sb_version_... checks to bool Dave Chinner
2017-10-26 16:03   ` Darrick J. Wong
2017-10-26  8:33 ` [PATCH 13/14] xfs: add suport for "thin space" filesystems Dave Chinner
2017-10-26  8:33 ` [PATCH 14/14] xfs: add growfs support for changing usable blocks Dave Chinner
2017-10-26 11:30   ` Amir Goldstein
2017-10-26 12:48     ` Dave Chinner
2017-10-26 13:32       ` Amir Goldstein
2017-10-27 10:26         ` Amir Goldstein
2017-10-26 11:09 ` [RFC PATCH 0/14] xfs: Towards thin provisioning aware filesystems Amir Goldstein
2017-10-26 12:35   ` Dave Chinner
2017-11-01 22:31     ` Darrick J. Wong [this message]
2017-10-30 13:31 ` Brian Foster
2017-10-30 21:09   ` Dave Chinner
2017-10-31  4:49     ` Amir Goldstein
2017-10-31 22:40       ` Dave Chinner
2017-10-31 11:24     ` Brian Foster
2017-11-01  0:45       ` Dave Chinner
2017-11-01 14:17         ` Brian Foster
2017-11-01 23:53           ` Dave Chinner
2017-11-02 11:25             ` Brian Foster
2017-11-02 23:30               ` Dave Chinner
2017-11-03  2:47                 ` Darrick J. Wong
2017-11-03 11:36                   ` Brian Foster
2017-11-05 22:50                     ` Dave Chinner
2017-11-06 13:01                       ` Brian Foster
2017-11-06 21:20                         ` Dave Chinner
2017-11-07 11:28                           ` Brian Foster
2017-11-03 11:26                 ` Brian Foster
2017-11-03 12:19                   ` Amir Goldstein
2017-11-06  1:16                     ` Dave Chinner
2017-11-06  9:48                       ` Amir Goldstein
2017-11-06 21:46                         ` Dave Chinner
2017-11-07  5:30                           ` Amir Goldstein
2017-11-05 23:51                   ` Dave Chinner
2017-11-06 13:07                     ` Brian Foster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171101223115.GL4911@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=amir73il@gmail.com \
    --cc=david@fromorbit.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).