public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Krister Johansen <kjlx@templeofstupid.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Chandan Babu R <chandan.babu@oracle.com>,
	"Darrick J. Wong" <djwong@kernel.org>,
	Dave Chinner <dchinner@redhat.com>, Gao Xiang <xiang@kernel.org>,
	linux-xfs@vger.kernel.org
Subject: Re: [RFC PATCH 0/4] bringing back the AGFL reserve
Date: Mon, 17 Jun 2024 15:25:27 -0700	[thread overview]
Message-ID: <20240617222527.GA2044@templeofstupid.com> (raw)
In-Reply-To: <ZmuSsYn/ma9ejCoP@dread.disaster.area>

On Fri, Jun 14, 2024 at 10:45:37AM +1000, Dave Chinner wrote:
> On Thu, Jun 13, 2024 at 01:27:09PM -0700, Krister Johansen wrote:
> > I managed to work out a reproducer for the problem.  Debugging that, the
> > steps Gao outlined turned out to be essentially what was necessary to
> > get the problem to happen repeatably.
> > 
> > 1. Allocate almost all of the space in an AG
> > 2. Free and reallocate that space to fragement it so the freespace
> > b-trees are just about to split.
> > 3. Allocate blocks in a file such that the next extent allocated for
> > that file will cause its bmbt to get converted from an inline extent to
> > a b-tree.
> > 4. Free space such that the free-space btrees have a contiguous extent
> > with a busy portion on either end
> > 5. Allocate the portion in the middle, splitting the extent and
> > triggering a b-tree split.
> 
> Do you have a script that sets up this precondition reliably?
> It sounds like it can be done from a known filesystem config. If you
> do have a script, can you share it? Or maybe even better, turn it
> into an fstest?

I do have a script that reproduces the problem.  At the moment it is in
a pretty embarrasing state.  I'm happy to clean it up a bit and share
it, or try to turn it into a fstest, or both.  The script currently
creates small loop devices to generate a filesystem layout that's a
little easier to work with.  Is it considered acceptable to have a
fstest create a filesystem with a particular geometry?  (And would you
consider taking a patch to let mkfs.xfs --unsupported take both size and
agsize arguments so the overall filesystem size and the per-ag size
could be set by a test?)

> > On older kernels this is all it takes.  After the AG-aware allocator
> > changes I also need to start the allocation in the highest numbered AG
> > available while inducing lock contention in the lower numbered AGs.
> 
> Ah, so you have to perform a DOS on the lower AGFs so that the
> attempts made by the xfs_alloc_vextent_start_ag() to trylock the
> lower AGFs once it finds it cannot allocate in the highest AG
> anymore also fail.
> 
> That was one of the changes made in the perag aware allocator
> rework; it added full-range AG iteration when XFS_ALLOC_FLAG_TRYLOCK
> is set because we can't deadlock on reverse order AGF locking when
> using trylocks.
> 
> However, if the trylock iteration fails, it then sets the restart AG
> to the minimum AG be can wait for without deadlocking, removes the
> trylock and restarts the iteration. Hence you've had to create AGF
> lock contention to force the allocator back to being restricted by
> the AGF locking orders.

The other thing that I really appreciated here is that the patchset
cleaned up a bunch of the different allocation functions and made
everything easier to read and follow.  Thanks for that as well.

> Is this new behaviour sufficient to mitigate the problem being seen
> with this database workload? Has it been tested with kernels that
> have those changes, and if so did it have any impact on the
> frequency of the issue occurring?

I don't have a good answer for this yet.  The team is planning to start
migrating later in the year and this will probably run through to next
year.  I'll have that information eventually and will share it when I
do, but don't know yet.  Aside from the script, other synethtic
load-tests have not been successful in reproducing the problemr.  That
may be the result of the databases that are spun up for load testing not
having filesystems that as full and fragmented as the production ones.

> > In order to ensure that AGs have enough space to complete transactions
> > with multiple allocations, I've taken a stab at implementing an AGFL
> > reserve pool.
> 
> OK. I'll comment directly on the code from here, hopefully I'll
> address your other questions in those comments.

Thanks, Dave.  I appreciate you spending the time to review and provide
feedback.

-K

  reply	other threads:[~2024-06-17 23:45 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-13 20:27 [RFC PATCH 0/4] bringing back the AGFL reserve Krister Johansen
2024-06-13 20:27 ` [RFC PATCH 1/4] xfs: resurrect the AGFL reservation Krister Johansen
2024-06-14  2:59   ` Dave Chinner
2024-06-17 23:46     ` Krister Johansen
2024-07-09  2:20       ` Dave Chinner
2024-07-23  6:51         ` Krister Johansen
2024-08-01  0:53           ` Dave Chinner
2024-09-16 23:01             ` Krister Johansen
2024-06-13 20:27 ` [RFC PATCH 2/4] xfs: modify xfs_alloc_min_freelist to take an increment Krister Johansen
2024-06-13 20:27 ` [RFC PATCH 3/4] xfs: let allocations tap the AGFL reserve Krister Johansen
2024-06-13 20:27 ` [RFC PATCH 4/4] xfs: refuse allocations without agfl refill space Krister Johansen
2024-06-14  0:45 ` [RFC PATCH 0/4] bringing back the AGFL reserve Dave Chinner
2024-06-17 22:25   ` Krister Johansen [this message]
2024-06-17 23:54     ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240617222527.GA2044@templeofstupid.com \
    --to=kjlx@templeofstupid.com \
    --cc=chandan.babu@oracle.com \
    --cc=david@fromorbit.com \
    --cc=dchinner@redhat.com \
    --cc=djwong@kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=xiang@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox