From: Dave Chinner <david@fromorbit.com>
To: Brian Foster <bfoster@redhat.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [RFC PATCH 0/14] xfs: Towards thin provisioning aware filesystems
Date: Mon, 6 Nov 2017 10:51:04 +1100 [thread overview]
Message-ID: <20171105235104.GF4094@dastard> (raw)
In-Reply-To: <20171103112626.GA19974@bfoster.bfoster>
On Fri, Nov 03, 2017 at 07:26:27AM -0400, Brian Foster wrote:
> On Fri, Nov 03, 2017 at 10:30:17AM +1100, Dave Chinner wrote:
> > On Thu, Nov 02, 2017 at 07:25:33AM -0400, Brian Foster wrote:
> > > On Thu, Nov 02, 2017 at 10:53:00AM +1100, Dave Chinner wrote:
> > > > On Wed, Nov 01, 2017 at 10:17:21AM -0400, Brian Foster wrote:
> > > > > On Wed, Nov 01, 2017 at 11:45:13AM +1100, Dave Chinner wrote:
> > > > > > On Tue, Oct 31, 2017 at 07:24:32AM -0400, Brian Foster wrote:
> > > > > > > On Tue, Oct 31, 2017 at 08:09:41AM +1100, Dave Chinner wrote:
> > > > > > > > On Mon, Oct 30, 2017 at 09:31:17AM -0400, Brian Foster wrote:
> > > > > > > > > On Thu, Oct 26, 2017 at 07:33:08PM +1100, Dave Chinner wrote:
> ...
> > > > > BTW, was there ever any kind of solution to the metadata block
> > > > > reservation issue in the thin case? We now hide metadata reservation
> > > > > from the user via the m_usable_blocks account. If m_phys_blocks
> > > > > represents a thin volume, how exactly do we prevent those metadata
> > > > > allocations/writes from overrunning what the admin has specified as
> > > > > "usable" with respect to the thin volume?
> > > >
> > > > The reserved metadata blocks are not accounted from free space when
> > > > they are allocated - they are pulled from the reserved space that
> > > > has already been removed from the free space.
> > > >
> > >
> > > Ok, so the user can set a usable blocks value of something less than the
> > > fs geometry, then the reservation is pulled from that, reducing the
> > > reported "usable" value further. Hence, what ends up reported to the
> > > user is actually something less than the value set by the user, which
> > > means that the filesystem overall respects how much space the admin says
> > > it can use in the underlying volume.
> > >
> > > For example, the user creates a 100T thin volume with 10T of usable
> > > space. The fs reserves a further 2T out of that for metadata, so then
> > > what the user sees is 8T of writeable space. The filesystem itself
> > > cannot use more than 10T out of the volume, as instructed. Am I
> > > following that correctly? If so, that sounds reasonable to me from the
> > > "don't overflow my thin volume" perspective.
> >
> > No, that's not what happens. For thick filesystems, the 100TB volume
> > gets 2TB pulled from it so it appears as a 98TB filesystem. This is
> > done by modifying the free block counts and m_usable_space when the
> > reservations are made.
> >
>
> Ok..
>
> > For thin filesystems, we've already got 90TB of space "reserved",
> > and so the metadata reservations and allocations come from that.
> > i.e. we skip the modification of free block counts and m_usable
> > space in the case of a thinspace filesystem, and so the user still
> > sees 10TB of usable space that they asked to have.
> >
>
> Hmm.. so then I'm slightly confused regarding the thin use case
> regarding prevention of pool depletion. The usable blocks value that the
> user settles on is likely based on how much space the filesystem should
> use to safely avoid pool depletion.
I did say up front that the user data thinspace accounting would not
be an exact reflection of underlying storage pool usage. Things like
partially written blocks in the underlying storage pool mean write
amplification factors would need to be considered, but that's
something the admin already has to deal with in thinly provisioned
storage.
> If a usable value of 10T means the
> filesystem can write to the usable 10T + some amount of metadata
> reservation, how does the user determine a sane usable value based on
> the current pool geometry?
>From an admin POV it's damn easy to document in admin guides that
actual space usage of a thinspace filesysetm is going to be in the
order of 2% greater than the space given to the filesystem for user
data. Use an overhead of 2-5% for internal management and the "small
amount of extra space for internal metadata" issue can be ignored.
> > > The best I can read into the response here is that you think physical
> > > shrink is unlikely enough to not need to care very much what kind of
> > > interface confusion could result from needing to rev the current growfs
> > > interface to support physical shrink on thin filesystems in the future.
> > > Is that a fair assessment..?
> >
> > Not really. I understand just how complex a physical shrink
> > implementation is going to be, and have a fair idea of the sorts of
> > craziness we'll need to add to xfs_growfs to support/co-ordinate a
> > physical shrink operation. From that perspective, I don't see a
> > physical shrink working with an unchanged growfs interface. The
> > discussion about whether or not we should physically shrink
> > thinspace filesystems is almost completely irrelevant to the
> > interface requirements of a physical shrink....
>
> So it's not so much about the likelihood of realizing physical shrink,
> but rather the likelihood that physical shrink would require to rev the
> growfs structure anyways (regardless of this feature).
Yup, pretty much.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2017-11-05 23:51 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-10-26 8:33 [RFC PATCH 0/14] xfs: Towards thin provisioning aware filesystems Dave Chinner
2017-10-26 8:33 ` [PATCH 01/14] xfs: factor out AG header initialisation from growfs core Dave Chinner
2017-10-26 8:33 ` [PATCH 02/14] xfs: convert growfs AG header init to use buffer lists Dave Chinner
2017-10-26 8:33 ` [PATCH 03/14] xfs: factor ag btree reoot block initialisation Dave Chinner
2017-10-26 8:33 ` [PATCH 04/14] xfs: turn ag header initialisation into a table driven operation Dave Chinner
2017-10-26 8:33 ` [PATCH 05/14] xfs: make imaxpct changes in growfs separate Dave Chinner
2017-10-26 8:33 ` [PATCH 06/14] xfs: separate secondary sb update in growfs Dave Chinner
2017-10-26 8:33 ` [PATCH 07/14] xfs: rework secondary superblock updates " Dave Chinner
2017-10-26 8:33 ` [PATCH 08/14] xfs: move various type verifiers to common file Dave Chinner
2017-10-26 8:33 ` [PATCH 09/14] xfs: split usable space from block device size Dave Chinner
2017-10-26 8:33 ` [PATCH 10/14] xfs: hide reserved metadata space from users Dave Chinner
2017-10-26 8:33 ` [PATCH 11/14] xfs: bump XFS_IOC_FSGEOMETRY to v5 structures Dave Chinner
2017-10-26 8:33 ` [PATCH 12/14] xfs: convert remaingin xfs_sb_version_... checks to bool Dave Chinner
2017-10-26 16:03 ` Darrick J. Wong
2017-10-26 8:33 ` [PATCH 13/14] xfs: add suport for "thin space" filesystems Dave Chinner
2017-10-26 8:33 ` [PATCH 14/14] xfs: add growfs support for changing usable blocks Dave Chinner
2017-10-26 11:30 ` Amir Goldstein
2017-10-26 12:48 ` Dave Chinner
2017-10-26 13:32 ` Amir Goldstein
2017-10-27 10:26 ` Amir Goldstein
2017-10-26 11:09 ` [RFC PATCH 0/14] xfs: Towards thin provisioning aware filesystems Amir Goldstein
2017-10-26 12:35 ` Dave Chinner
2017-11-01 22:31 ` Darrick J. Wong
2017-10-30 13:31 ` Brian Foster
2017-10-30 21:09 ` Dave Chinner
2017-10-31 4:49 ` Amir Goldstein
2017-10-31 22:40 ` Dave Chinner
2017-10-31 11:24 ` Brian Foster
2017-11-01 0:45 ` Dave Chinner
2017-11-01 14:17 ` Brian Foster
2017-11-01 23:53 ` Dave Chinner
2017-11-02 11:25 ` Brian Foster
2017-11-02 23:30 ` Dave Chinner
2017-11-03 2:47 ` Darrick J. Wong
2017-11-03 11:36 ` Brian Foster
2017-11-05 22:50 ` Dave Chinner
2017-11-06 13:01 ` Brian Foster
2017-11-06 21:20 ` Dave Chinner
2017-11-07 11:28 ` Brian Foster
2017-11-03 11:26 ` Brian Foster
2017-11-03 12:19 ` Amir Goldstein
2017-11-06 1:16 ` Dave Chinner
2017-11-06 9:48 ` Amir Goldstein
2017-11-06 21:46 ` Dave Chinner
2017-11-07 5:30 ` Amir Goldstein
2017-11-05 23:51 ` Dave Chinner [this message]
2017-11-06 13:07 ` Brian Foster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171105235104.GF4094@dastard \
--to=david@fromorbit.com \
--cc=bfoster@redhat.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).