From: Brian Foster <bfoster@redhat.com>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 2/3] xfs: reduce log recovery transaction block reservations
Date: Fri, 24 Apr 2020 10:04:08 -0400 [thread overview]
Message-ID: <20200424140408.GE53690@bfoster> (raw)
In-Reply-To: <158752130035.2142108.11825776210575708747.stgit@magnolia>
On Tue, Apr 21, 2020 at 07:08:20PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
>
> On filesystems that support them, bmap intent log items can be used to
> change mappings in inode data or attr forks. However, if the bmbt must
> expand, the enormous block reservations that we make for finishing
> chains of deferred log items become a liability because the bmbt block
> allocator sets minleft to the transaction reservation and there probably
> aren't any AGs in the filesystem that have that much free space.
>
> Whereas previously we would reserve 93% of the free blocks in the
> filesystem, now we only want to reserve 7/8ths of the free space in the
> least full AG, and no more than half of the usable blocks in an AG. In
> theory we shouldn't run out of space because (prior to the unclean
> shutdown) all of the in-progress transactions successfully reserved the
> worst case number of disk blocks.
>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
> fs/xfs/xfs_log_recover.c | 55 ++++++++++++++++++++++++++++++++++++----------
> 1 file changed, 43 insertions(+), 12 deletions(-)
>
>
> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> index e9b3e901d009..a416b028b320 100644
> --- a/fs/xfs/xfs_log_recover.c
> +++ b/fs/xfs/xfs_log_recover.c
> @@ -2669,6 +2669,44 @@ xlog_recover_process_data(
> return 0;
> }
>
> +/*
> + * Estimate a block reservation for a log recovery transaction. Since we run
> + * separate transactions for each chain of deferred ops that get created as a
> + * result of recovering unfinished log intent items, we must be careful not to
> + * reserve so many blocks that block allocations fail because we can't satisfy
> + * the minleft requirements (e.g. for bmbt blocks).
> + */
> +static int
> +xlog_estimate_recovery_resblks(
> + struct xfs_mount *mp,
> + unsigned int *resblks)
> +{
> + struct xfs_perag *pag;
> + xfs_agnumber_t agno;
> + unsigned int free = 0;
> + int error;
> +
> + /* Don't use more than 7/8th of the free space in the least full AG. */
> + for (agno = 0; agno < mp->m_sb.sb_agcount; agno++) {
> + unsigned int ag_free;
> +
> + error = xfs_alloc_pagf_init(mp, NULL, agno, 0);
> + if (error)
> + return error;
> + pag = xfs_perag_get(mp, agno);
> + ag_free = pag->pagf_freeblks + pag->pagf_flcount;
> + free = max(free, (ag_free * 7) / 8);
> + xfs_perag_put(pag);
> + }
> +
Somewhat unfortunate that we have to iterate all AGs for each chain. I'm
wondering if that has any effect on a large recovery on fs' with an
inordinate AG count. Have you tested under those particular conditions?
I suppose it's possible the recovery is slow enough that this won't
matter...
Also, perhaps not caused by this patch but does this
outsized/manufactured reservation have the effect of artificially
steering allocations to a particular AG if one happens to be notably
larger than the rest?
Brian
> + /* Don't try to reserve more than half the usable AG blocks. */
> + *resblks = min(free, xfs_alloc_ag_max_usable(mp) / 2);
> + if (*resblks == 0)
> + return -ENOSPC;
> +
> + return 0;
> +}
> +
> /* Take all the collected deferred ops and finish them in order. */
> static int
> xlog_finish_defer_ops(
> @@ -2677,27 +2715,20 @@ xlog_finish_defer_ops(
> {
> struct xfs_defer_freezer *dff, *next;
> struct xfs_trans *tp;
> - int64_t freeblks;
> uint resblks;
> int error = 0;
>
> list_for_each_entry_safe(dff, next, dfops_freezers, dff_list) {
> + error = xlog_estimate_recovery_resblks(mp, &resblks);
> + if (error)
> + break;
> +
> /*
> * We're finishing the defer_ops that accumulated as a result
> * of recovering unfinished intent items during log recovery.
> * We reserve an itruncate transaction because it is the
> - * largest permanent transaction type. Since we're the only
> - * user of the fs right now, take 93% (15/16) of the available
> - * free blocks. Use weird math to avoid a 64-bit division.
> + * largest permanent transaction type.
> */
> - freeblks = percpu_counter_sum(&mp->m_fdblocks);
> - if (freeblks <= 0) {
> - error = -ENOSPC;
> - break;
> - }
> -
> - resblks = min_t(int64_t, UINT_MAX, freeblks);
> - resblks = (resblks * 15) >> 4;
> error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, resblks,
> 0, XFS_TRANS_RESERVE, &tp);
> if (error)
>
next prev parent reply other threads:[~2020-04-24 14:04 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-04-22 2:08 [PATCH 0/3] xfs: fix inode use-after-free during log recovery Darrick J. Wong
2020-04-22 2:08 ` [PATCH 1/3] xfs: proper replay of deferred ops queued " Darrick J. Wong
2020-04-24 14:02 ` Brian Foster
2020-04-28 22:28 ` Darrick J. Wong
2020-04-22 2:08 ` [PATCH 2/3] xfs: reduce log recovery transaction block reservations Darrick J. Wong
2020-04-24 14:04 ` Brian Foster [this message]
2020-04-28 22:22 ` Darrick J. Wong
2020-05-27 22:39 ` Darrick J. Wong
2020-04-22 2:08 ` [PATCH 3/3] xfs: teach deferred op freezer to freeze and thaw inodes Darrick J. Wong
2020-04-25 19:01 ` Christoph Hellwig
2020-04-27 11:37 ` Brian Foster
2020-04-28 22:17 ` Darrick J. Wong
2020-04-29 11:38 ` Brian Foster
2020-04-29 11:48 ` Christoph Hellwig
2020-04-29 14:28 ` Darrick J. Wong
2020-04-29 14:55 ` Christoph Hellwig
2020-04-29 23:58 ` Darrick J. Wong
2020-05-01 17:09 ` Christoph Hellwig
-- strict thread matches above, loose matches on Subject: below --
2020-05-05 1:13 [PATCH v2 0/3] xfs: fix inode use-after-free during log recovery Darrick J. Wong
2020-05-05 1:13 ` [PATCH 2/3] xfs: reduce log recovery transaction block reservations Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200424140408.GE53690@bfoster \
--to=bfoster@redhat.com \
--cc=darrick.wong@oracle.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).