From: Dave Chinner <david@fromorbit.com>
To: linux-xfs@vger.kernel.org
Subject: [PATCH 0/3] xfs: reduce AGF hold times during fstrim operations
Date: Thu, 21 Sep 2023 11:39:42 +1000 [thread overview]
Message-ID: <20230921013945.559634-1-david@fromorbit.com> (raw)
A recent log space overflow and recovery failure was root caused to
a long running truncate blocking on the AGF and ending up pinning
the tail of the log. The filesystem then hung, the machine was
rebooted, and log recoery then refused to run because there wasn't
enough space in the log for EFI transaction reservation.
The reason the long running truncate got blocked on the AGF for so
long was that an fstrim was being run. THe underlying block device
was large and very slow (10TB ceph rbd volume) and so discarding all
the free space in the AG took a really long time.
The current fstrim implementation holds the AGF across the entire
operations - both the freee space scan and the issuing of all the
discards. The discards are synchronous and single depth, so if there
are millions of free spaces, we hold the AGF lock across millions of
discard operations.
It doesn't really need to be said that this is a Bad Thing.
THis series reworks the fstrim discard path to use the same
mechanisms as online discard. This allows discards to be issued
asynchronously without holding the AGF locked, enabling higher
discard queue depths (much faster on fast devices) and only
requiring the AGF lock to be held whilst we are scanning free space.
To do this, we make use of busy extents - we lock the AGF, mark all
the extents we want to discard as "busy under discard" so that
nothing will be allowed to allocate them, and then drop the AGF
lock. We then issue discards on the gathered busy extents and on
discard completion remove them from the busy list.
This results in AGF lock holds times for fstrim dropping to a few
milliseconds each batch of free extents we scan, and so the hours
long hold times that can currently occur on large, slow, badly
fragmented device no longer occur.
This passes fstests with '-o discard' enabled, and has run the '-g
trim' group many, many times without any reported regressions.
-----
Version 2:
- fix various typos and formatting things
- move online discard code to fs/xfs/xfs_discard.c and make it
generic (new patch)
- use xfs_alloc_rec_incore() as the iteration cursor
- remove hacky "keep gathering until size changes" batching code now
that cursor can restart at an exact extent
- rework fstrim iteration to use new shared discard code
- added fstrim-vs-suspend holdoff fix (new patch)
RFC:
- https://lore.kernel.org/linux-xfs/20230829065710.938039-1-david@fromorbit.com/
next reply other threads:[~2023-09-21 1:39 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-21 1:39 Dave Chinner [this message]
2023-09-21 1:39 ` [PATCH 1/3] xfs: move log discard work to xfs_discard.c Dave Chinner
2023-09-21 15:52 ` Darrick J. Wong
2023-09-22 1:04 ` Dave Chinner
2023-09-25 15:13 ` Darrick J. Wong
2023-09-21 1:39 ` [PATCH 2/3] xfs: reduce AGF hold times during fstrim operations Dave Chinner
2023-09-21 15:41 ` Darrick J. Wong
2023-09-21 1:39 ` [PATCH 3/3] xfs: abort fstrim if kernel is suspending Dave Chinner
2023-09-21 15:33 ` Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230921013945.559634-1-david@fromorbit.com \
--to=david@fromorbit.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).