From: Dave Chinner <david@fromorbit.com>
To: Brian Foster <bfoster@redhat.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [RFC PATCH] xfs: skip discard of unwritten extents
Date: Tue, 1 May 2018 08:00:20 +1000 [thread overview]
Message-ID: <20180430220020.GE13766@dastard> (raw)
In-Reply-To: <20180426180624.6134-1-bfoster@redhat.com>
On Thu, Apr 26, 2018 at 02:06:24PM -0400, Brian Foster wrote:
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
>
> Hi all,
>
> What do folks think of something like this?
Definitely sounds like something we need to address.
> The motivation here is that
> the VDO (dedup) devs had reported seeing online discards during
> write-only workloads. These turn out to be related to trimming post-eof
> preallocation blocks after large file copies. To my knowledge, this
> isn't really a prevalent or serious issue, but I think that technically
> these discards are unnecessary and so I was looking into how we could
> avoid them.
We simply trucate post-eof extents, right? So we know in
xfs_itruncate_extents() if the inode size is changing, not to
mention we know if the extent is beyond EOF? e.g. all calls to
xfs_itruncate_extents() other than xfs_free_eofblocks() change the
inode size and so directly indicate they are removing written
blocks. Anything where the inode size is not changing is doing a
post-eof removal, and so we can assume no data has been written?
So rather than converting everything to unwritten extents, the "skip
discard flag" is simply triggered via extents being freed sitting
beyond the current EOF (not the new EOF) and/or being unwritten?
> This behavior is of course not directly related to unwritten extents,
> but the immediate/obvious solution to bubble up a bmapi flag of some
> kind to xfs_free_eofblocks() seemed rather crude. From there, I figured
> that we technically don't need to discard any unwritten extents (within
> or beyond EOF) because they haven't been written to since being
> allocated. In fact, I'm not sure we have to even busy them, but it's
> roughly equivalent logic either way and I'm trying to avoid getting too
> clever.
I think we still need to busy them to avoid re-allocating them in
the same checkpoint, as data extent free/realloc in the same
checkpoint could result in a failed recovery (i.e. partial
checkpoint replay) leaving the extent linked into two separate
files.
> I also recall that we've discussed using unwritten extents for delalloc
> -> real conversion to avoid the small stale data exposure window that
> exists in writeback. Without getting too deep into the reason we don't
> currently do an initial unwritten allocation [1], I don't think there's
> anything blocking us from converting any post-eof blocks that happen to
> be part of the resulting normal allocation. As it is, the imap is
> already trimmed to EOF by the writeback code for coherency reasons. If
> we were to convert post-eof blocks (not part of this patch) along with
> something like this patch, then we'd indirectly prevent discards for
> eofblocks trims.
I think we should leave that as a separate problem, as writeback
currently has issues with the way we manage bufferhead state.
i.e. things don't work right if we put unwritten extents under
delalloc buffers and vice versa. [ I have patches to address that
I'm working on.] And there's also the issue that we need to change
the delalloc reservations to take into account block allocations
required by unwritten extent conversion needed by delalloc.
Hence I think we should address that as a separate problem, not as
the solution to avoiding discard of post-eof extents.
> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index 4bcc095fe44a..942c90ec6747 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -2954,13 +2954,15 @@ xfs_free_extent(
> xfs_fsblock_t bno, /* starting block number of extent */
> xfs_extlen_t len, /* length of extent */
> struct xfs_owner_info *oinfo, /* extent owner */
> - enum xfs_ag_resv_type type) /* block reservation type */
> + enum xfs_ag_resv_type type, /* block reservation type */
> + bool skip_discard)
> {
> struct xfs_mount *mp = tp->t_mountp;
> struct xfs_buf *agbp;
> xfs_agnumber_t agno = XFS_FSB_TO_AGNO(mp, bno);
> xfs_agblock_t agbno = XFS_FSB_TO_AGBNO(mp, bno);
> int error;
> + unsigned int busy_flags = 0;
>
> ASSERT(len != 0);
> ASSERT(type != XFS_AG_RESV_AGFL);
> @@ -2984,7 +2986,9 @@ xfs_free_extent(
> if (error)
> goto err;
>
> - xfs_extent_busy_insert(tp, agno, agbno, len, 0);
> + if (skip_discard)
> + busy_flags |= XFS_EXTENT_BUSY_SKIP_DISCARD;
> + xfs_extent_busy_insert(tp, agno, agbno, len, busy_flags);
> return 0;
Rather than changing xfs_free_extent(), how about adding a
xfs_free_extent_nodiscard() wrapper, and only call it when
processing an extent that doesn't need discard? This means none of
the other code that frees extents needs to be changed. Similarly
add a xfs_bmap_add_free_nodiscard() wrapper. That will cut down on
the code churn caused by passing new parameters everywhere....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2018-04-30 22:00 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-04-26 18:06 [RFC PATCH] xfs: skip discard of unwritten extents Brian Foster
2018-04-30 9:06 ` Carlos Maiolino
2018-04-30 12:17 ` Brian Foster
2018-04-30 13:26 ` Carlos Maiolino
2018-04-30 22:00 ` Dave Chinner [this message]
2018-05-01 17:38 ` Brian Foster
2018-05-01 22:39 ` Dave Chinner
2018-05-02 11:18 ` Brian Foster
2018-05-03 0:48 ` Dave Chinner
2018-05-03 12:07 ` Brian Foster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180430220020.GE13766@dastard \
--to=david@fromorbit.com \
--cc=bfoster@redhat.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).