From: Brian Foster <bfoster@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [RFC PATCH 0/14] xfs: Towards thin provisioning aware filesystems
Date: Mon, 6 Nov 2017 08:07:08 -0500 [thread overview]
Message-ID: <20171106130708.GB30884@bfoster.bfoster> (raw)
In-Reply-To: <20171105235104.GF4094@dastard>
On Mon, Nov 06, 2017 at 10:51:04AM +1100, Dave Chinner wrote:
> On Fri, Nov 03, 2017 at 07:26:27AM -0400, Brian Foster wrote:
> > On Fri, Nov 03, 2017 at 10:30:17AM +1100, Dave Chinner wrote:
> > > On Thu, Nov 02, 2017 at 07:25:33AM -0400, Brian Foster wrote:
> > > > On Thu, Nov 02, 2017 at 10:53:00AM +1100, Dave Chinner wrote:
> > > > > On Wed, Nov 01, 2017 at 10:17:21AM -0400, Brian Foster wrote:
> > > > > > On Wed, Nov 01, 2017 at 11:45:13AM +1100, Dave Chinner wrote:
> > > > > > > On Tue, Oct 31, 2017 at 07:24:32AM -0400, Brian Foster wrote:
> > > > > > > > On Tue, Oct 31, 2017 at 08:09:41AM +1100, Dave Chinner wrote:
> > > > > > > > > On Mon, Oct 30, 2017 at 09:31:17AM -0400, Brian Foster wrote:
> > > > > > > > > > On Thu, Oct 26, 2017 at 07:33:08PM +1100, Dave Chinner wrote:
> > ...
> > > > > > BTW, was there ever any kind of solution to the metadata block
> > > > > > reservation issue in the thin case? We now hide metadata reservation
> > > > > > from the user via the m_usable_blocks account. If m_phys_blocks
> > > > > > represents a thin volume, how exactly do we prevent those metadata
> > > > > > allocations/writes from overrunning what the admin has specified as
> > > > > > "usable" with respect to the thin volume?
> > > > >
> > > > > The reserved metadata blocks are not accounted from free space when
> > > > > they are allocated - they are pulled from the reserved space that
> > > > > has already been removed from the free space.
> > > > >
> > > >
> > > > Ok, so the user can set a usable blocks value of something less than the
> > > > fs geometry, then the reservation is pulled from that, reducing the
> > > > reported "usable" value further. Hence, what ends up reported to the
> > > > user is actually something less than the value set by the user, which
> > > > means that the filesystem overall respects how much space the admin says
> > > > it can use in the underlying volume.
> > > >
> > > > For example, the user creates a 100T thin volume with 10T of usable
> > > > space. The fs reserves a further 2T out of that for metadata, so then
> > > > what the user sees is 8T of writeable space. The filesystem itself
> > > > cannot use more than 10T out of the volume, as instructed. Am I
> > > > following that correctly? If so, that sounds reasonable to me from the
> > > > "don't overflow my thin volume" perspective.
> > >
> > > No, that's not what happens. For thick filesystems, the 100TB volume
> > > gets 2TB pulled from it so it appears as a 98TB filesystem. This is
> > > done by modifying the free block counts and m_usable_space when the
> > > reservations are made.
> > >
> >
> > Ok..
> >
> > > For thin filesystems, we've already got 90TB of space "reserved",
> > > and so the metadata reservations and allocations come from that.
> > > i.e. we skip the modification of free block counts and m_usable
> > > space in the case of a thinspace filesystem, and so the user still
> > > sees 10TB of usable space that they asked to have.
> > >
> >
> > Hmm.. so then I'm slightly confused regarding the thin use case
> > regarding prevention of pool depletion. The usable blocks value that the
> > user settles on is likely based on how much space the filesystem should
> > use to safely avoid pool depletion.
>
> I did say up front that the user data thinspace accounting would not
> be an exact reflection of underlying storage pool usage. Things like
> partially written blocks in the underlying storage pool mean write
> amplification factors would need to be considered, but that's
> something the admin already has to deal with in thinly provisioned
> storage.
>
Ok, I recall this coming up one way or another. For some reason I
thought something might have changed in the implementation since then
and/or managed to confuse myself over the current behavior.
> > If a usable value of 10T means the
> > filesystem can write to the usable 10T + some amount of metadata
> > reservation, how does the user determine a sane usable value based on
> > the current pool geometry?
>
> From an admin POV it's damn easy to document in admin guides that
> actual space usage of a thinspace filesysetm is going to be in the
> order of 2% greater than the space given to the filesystem for user
> data. Use an overhead of 2-5% for internal management and the "small
> amount of extra space for internal metadata" issue can be ignored.
>
It's easy to document whatever we want. :) I'm not convinced that is as
effective as a hard limit based on the fs features, but the latter is
more complex and may be overkill in most cases. So, documentation works
for me until/unless testing or real usage shows otherwise.
If it does come up, perhaps a script or userspace tool that somehow
presents the current internal reservation calculations (combined with
whatever geometry information is relevant) as something consumable for
the user (whether it be a simple dump of the active reservations, the
worst case consumption of a thin fs, etc.) might be a nice compromise.
> > > > The best I can read into the response here is that you think physical
> > > > shrink is unlikely enough to not need to care very much what kind of
> > > > interface confusion could result from needing to rev the current growfs
> > > > interface to support physical shrink on thin filesystems in the future.
> > > > Is that a fair assessment..?
> > >
> > > Not really. I understand just how complex a physical shrink
> > > implementation is going to be, and have a fair idea of the sorts of
> > > craziness we'll need to add to xfs_growfs to support/co-ordinate a
> > > physical shrink operation. From that perspective, I don't see a
> > > physical shrink working with an unchanged growfs interface. The
> > > discussion about whether or not we should physically shrink
> > > thinspace filesystems is almost completely irrelevant to the
> > > interface requirements of a physical shrink....
> >
> > So it's not so much about the likelihood of realizing physical shrink,
> > but rather the likelihood that physical shrink would require to rev the
> > growfs structure anyways (regardless of this feature).
>
> Yup, pretty much.
>
Ok. I don't agree, but at least I understand your perspective. ;)
Brian
> Cheers,
>
> Dave.
>
> --
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
prev parent reply other threads:[~2017-11-06 13:07 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-10-26 8:33 [RFC PATCH 0/14] xfs: Towards thin provisioning aware filesystems Dave Chinner
2017-10-26 8:33 ` [PATCH 01/14] xfs: factor out AG header initialisation from growfs core Dave Chinner
2017-10-26 8:33 ` [PATCH 02/14] xfs: convert growfs AG header init to use buffer lists Dave Chinner
2017-10-26 8:33 ` [PATCH 03/14] xfs: factor ag btree reoot block initialisation Dave Chinner
2017-10-26 8:33 ` [PATCH 04/14] xfs: turn ag header initialisation into a table driven operation Dave Chinner
2017-10-26 8:33 ` [PATCH 05/14] xfs: make imaxpct changes in growfs separate Dave Chinner
2017-10-26 8:33 ` [PATCH 06/14] xfs: separate secondary sb update in growfs Dave Chinner
2017-10-26 8:33 ` [PATCH 07/14] xfs: rework secondary superblock updates " Dave Chinner
2017-10-26 8:33 ` [PATCH 08/14] xfs: move various type verifiers to common file Dave Chinner
2017-10-26 8:33 ` [PATCH 09/14] xfs: split usable space from block device size Dave Chinner
2017-10-26 8:33 ` [PATCH 10/14] xfs: hide reserved metadata space from users Dave Chinner
2017-10-26 8:33 ` [PATCH 11/14] xfs: bump XFS_IOC_FSGEOMETRY to v5 structures Dave Chinner
2017-10-26 8:33 ` [PATCH 12/14] xfs: convert remaingin xfs_sb_version_... checks to bool Dave Chinner
2017-10-26 16:03 ` Darrick J. Wong
2017-10-26 8:33 ` [PATCH 13/14] xfs: add suport for "thin space" filesystems Dave Chinner
2017-10-26 8:33 ` [PATCH 14/14] xfs: add growfs support for changing usable blocks Dave Chinner
2017-10-26 11:30 ` Amir Goldstein
2017-10-26 12:48 ` Dave Chinner
2017-10-26 13:32 ` Amir Goldstein
2017-10-27 10:26 ` Amir Goldstein
2017-10-26 11:09 ` [RFC PATCH 0/14] xfs: Towards thin provisioning aware filesystems Amir Goldstein
2017-10-26 12:35 ` Dave Chinner
2017-11-01 22:31 ` Darrick J. Wong
2017-10-30 13:31 ` Brian Foster
2017-10-30 21:09 ` Dave Chinner
2017-10-31 4:49 ` Amir Goldstein
2017-10-31 22:40 ` Dave Chinner
2017-10-31 11:24 ` Brian Foster
2017-11-01 0:45 ` Dave Chinner
2017-11-01 14:17 ` Brian Foster
2017-11-01 23:53 ` Dave Chinner
2017-11-02 11:25 ` Brian Foster
2017-11-02 23:30 ` Dave Chinner
2017-11-03 2:47 ` Darrick J. Wong
2017-11-03 11:36 ` Brian Foster
2017-11-05 22:50 ` Dave Chinner
2017-11-06 13:01 ` Brian Foster
2017-11-06 21:20 ` Dave Chinner
2017-11-07 11:28 ` Brian Foster
2017-11-03 11:26 ` Brian Foster
2017-11-03 12:19 ` Amir Goldstein
2017-11-06 1:16 ` Dave Chinner
2017-11-06 9:48 ` Amir Goldstein
2017-11-06 21:46 ` Dave Chinner
2017-11-07 5:30 ` Amir Goldstein
2017-11-05 23:51 ` Dave Chinner
2017-11-06 13:07 ` Brian Foster [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171106130708.GB30884@bfoster.bfoster \
--to=bfoster@redhat.com \
--cc=david@fromorbit.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).