From: Brian Foster <bfoster@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 2/3] xfs: AIL needs asynchronous CIL forcing
Date: Thu, 25 Feb 2021 08:12:12 -0500 [thread overview]
Message-ID: <YDeiLOdFhIMJegWZ@bfoster> (raw)
In-Reply-To: <20210224211058.GA4662@dread.disaster.area>
On Thu, Feb 25, 2021 at 08:10:58AM +1100, Dave Chinner wrote:
> On Tue, Feb 23, 2021 at 04:32:11PM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> >
> > The AIL pushing is stalling on log forces when it comes across
> > pinned items. This is happening on removal workloads where the AIL
> > is dominated by stale items that are removed from AIL when the
> > checkpoint that marks the items stale is committed to the journal.
> > This results is relatively few items in the AIL, but those that are
> > are often pinned as directories items are being removed from are
> > still being logged.
> >
> > As a result, many push cycles through the CIL will first issue a
> > blocking log force to unpin the items. This can take some time to
> > complete, with tracing regularly showing push delays of half a
> > second and sometimes up into the range of several seconds. Sequences
> > like this aren't uncommon:
> >
> > ....
> > 399.829437: xfsaild: last lsn 0x11002dd000 count 101 stuck 101 flushing 0 tout 20
> > <wanted 20ms, got 270ms delay>
> > 400.099622: xfsaild: target 0x11002f3600, prev 0x11002f3600, last lsn 0x0
> > 400.099623: xfsaild: first lsn 0x11002f3600
> > 400.099679: xfsaild: last lsn 0x1100305000 count 16 stuck 11 flushing 0 tout 50
> > <wanted 50ms, got 500ms delay>
> > 400.589348: xfsaild: target 0x110032e600, prev 0x11002f3600, last lsn 0x0
> > 400.589349: xfsaild: first lsn 0x1100305000
> > 400.589595: xfsaild: last lsn 0x110032e600 count 156 stuck 101 flushing 30 tout 50
> > <wanted 50ms, got 460ms delay>
> > 400.950341: xfsaild: target 0x1100353000, prev 0x110032e600, last lsn 0x0
> > 400.950343: xfsaild: first lsn 0x1100317c00
> > 400.950436: xfsaild: last lsn 0x110033d200 count 105 stuck 101 flushing 0 tout 20
> > <wanted 20ms, got 200ms delay>
> > 401.142333: xfsaild: target 0x1100361600, prev 0x1100353000, last lsn 0x0
> > 401.142334: xfsaild: first lsn 0x110032e600
> > 401.142535: xfsaild: last lsn 0x1100353000 count 122 stuck 101 flushing 8 tout 10
> > <wanted 10ms, got 10ms delay>
> > 401.154323: xfsaild: target 0x1100361600, prev 0x1100361600, last lsn 0x1100353000
> > 401.154328: xfsaild: first lsn 0x1100353000
> > 401.154389: xfsaild: last lsn 0x1100353000 count 101 stuck 101 flushing 0 tout 20
> > <wanted 20ms, got 300ms delay>
> > 401.451525: xfsaild: target 0x1100361600, prev 0x1100361600, last lsn 0x0
> > 401.451526: xfsaild: first lsn 0x1100353000
> > 401.451804: xfsaild: last lsn 0x1100377200 count 170 stuck 22 flushing 122 tout 50
> > <wanted 50ms, got 500ms delay>
> > 401.933581: xfsaild: target 0x1100361600, prev 0x1100361600, last lsn 0x0
> > ....
> >
> > In each of these cases, every AIL pass saw 101 log items stuck on
> > the AIL (pinned) with very few other items being found. Each pass, a
> > log force was issued, and delay between last/first is the sleep time
> > + the sync log force time.
> >
> > Some of these 101 items pinned the tail of the log. The tail of the
> > log does slowly creep forward (first lsn), but the problem is that
> > the log is actually out of reservation space because it's been
> > running so many transactions that stale items that never reach the
> > AIL but consume log space. Hence we have a largely empty AIL, with
> > long term pins on items that pin the tail of the log that don't get
> > pushed frequently enough to keep log space available.
> >
> > The problem is the hundreds of milliseconds that we block in the log
> > force pushing the CIL out to disk. The AIL should not be stalled
> > like this - it needs to run and flush items that are at the tail of
> > the log with minimal latency. What we really need to do is trigger a
> > log flush, but then not wait for it at all - we've already done our
> > waiting for stuff to complete when we backed off prior to the log
> > force being issued.
> >
> > Even if we remove the XFS_LOG_SYNC from the xfs_log_force() call, we
> > still do a blocking flush of the CIL and that is what is causing the
> > issue. Hence we need a new interface for the CIL to trigger an
> > immediate background push of the CIL to get it moving faster but not
> > to wait on that to occur. While the CIL is pushing, the AIL can also
> > be pushing.
> >
> > We already have an internal interface to do this -
> > xlog_cil_push_now() - but we need a wrapper for it to be used
> > externally. xlog_cil_force_seq() can easily be extended to do what
> > we need as it already implements the synchronous CIL push via
> > xlog_cil_push_now(). Add the necessary flags and "push current
> > sequence" semantics to xlog_cil_force_seq() and convert the AIL
> > pushing to use it.
>
> Overnight testing indicates generic/530 hangs on small logs with
> this patch. It essentially runs out of log space because there are
> inode cluster buffers permanently pinned by this workload.
>
> That is, as inodes are unlinked, they repeatedly relog the inode
> cluster buffer, and so while the CIL is flushing to unpin the
> buffer, a new unlink relogs it and adds a new pin to the buffer.
> Hence the log force that the AIL does to unpin pinned items doesn't
> actually unpin buffers that are being relogged.
>
> These buffers only get unpinned when the log actually runs out of
> space because they pin the tail of the log. Then the modifications
> that are relogging the buffer stop because they fail to get a log
> reservation, and the tail of the log does not move forward until the
> AIL issues a log force that finally unpins the buffer and writes it
> back.
>
> Essentially, relogged buffers cannot be flushed by the AIL until the
> log completely stalls.
>
> The problem being tripped over here is that we no longer force the
> final iclogs in a CIL push to disk - we leave the iclog with the
> commit record in it in ACTIVE state, and by not waiting and forcing
> all the iclogs to disk, the buffer never gets unpinned because there
> isn't any more log pressure to force it out because everything is
> blocked on reservation space.
>
> The solution to this is to have the CIL push change the state of the
> commit iclog to WANT_SYNC before it is released. This means the CIL
> push will always flush the iclog to disk and the checkpoint will
> complete and unpin the buffers.
>
> Right now, we really only want to do this state switch for these
> async pushes - for small sync transactions and fsync we really want
> the iclog aggregation that we have now to optimise iclogbuf usage,
> so I'll have to pass a new flag through the push code and back into
> xlog_write(). That will make this patch behave the same as we
> currently do.
>
Unfortunately I've not yet caught up to reviewing your most recently
posted set of log patches so I can easily be missing some context, but
when passing through some of the feedback/updates so far this has me
rather confused. We discussed this pre-existing CIL behavior in the
previous version, I suggested some similar potential behavior change
where we would opportunistically send off checkpoint iclogs for I/O a
bit earlier than normal and you argued [1] against it. Now it sounds
like not only are we implementing that, but it's actually necessary to
fix a log hang problem..? What am I missing?
The updated iclog behavior does sound more friendly to me than what we
currently do (obviously, based on my previous comments), but I am
slightly skeptical of how such a change fixes the root cause of a hang.
Is this a stall/perf issue or an actual log deadlock? If the latter,
what prevents this deadlock on current upstream?
Brian
[1] https://lore.kernel.org/linux-xfs/20210129222559.GT4662@dread.disaster.area/
> In the longer term, we need to change how the AIL deals with pinned
> buffers, as the current method is definitely sub-optimal. It also
> explains the "everything stops for a second" performance anomalies
> I've been seeing for a while in testing. But fixing that is outside
> the scope of this patchset, so in the mean time I'll fix this one up
> and repost it in a little while.
>
> FWIW, this is also the likely cause of the "CIL workqueue too deep"
> hangs I was seeing with the next patch in the series, too.
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
>
next prev parent reply other threads:[~2021-02-25 13:13 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-02-23 5:32 [PATCH 0/3] xfs: CIL improvements Dave Chinner
2021-02-23 5:32 ` [PATCH 1/3] xfs: xfs_log_force_lsn isn't passed a LSN Dave Chinner
2021-02-24 21:42 ` Darrick J. Wong
2021-02-24 22:19 ` Dave Chinner
2021-02-25 19:01 ` Christoph Hellwig
2021-03-02 18:12 ` Brian Foster
2021-02-23 5:32 ` [PATCH 2/3] xfs: AIL needs asynchronous CIL forcing Dave Chinner
2021-02-24 21:10 ` Dave Chinner
2021-02-24 23:26 ` [PATCH 2/3 v2] " Dave Chinner
2021-02-25 2:15 ` Dave Chinner
2021-03-02 21:44 ` Brian Foster
2021-03-03 0:57 ` Dave Chinner
2021-03-03 17:32 ` Brian Foster
2021-03-04 1:59 ` Dave Chinner
2021-03-04 13:13 ` Brian Foster
2021-03-04 22:48 ` Dave Chinner
2021-03-05 14:58 ` Brian Foster
2021-03-09 0:44 ` Dave Chinner
2021-03-09 4:35 ` [PATCH 2/3 v2] xfs: AIL needs asynchronous CIL forcing\ Darrick J. Wong
2021-03-10 2:10 ` Brian Foster
2021-03-10 22:00 ` [PATCH 2/3 v2] xfs: AIL needs asynchronous CIL forcing Dave Chinner
2021-03-10 15:13 ` [PATCH 2/3 v2] xfs: AIL needs asynchronous CIL forcing\ Brian Foster
2021-03-11 12:41 ` Christoph Hellwig
2021-03-10 14:49 ` [PATCH 2/3 v2] xfs: AIL needs asynchronous CIL forcing Brian Foster
2021-02-25 13:12 ` Brian Foster [this message]
2021-02-25 22:03 ` [PATCH 2/3] " Dave Chinner
2021-02-27 16:25 ` Brian Foster
2021-03-01 4:54 ` Dave Chinner
2021-03-01 13:32 ` Brian Foster
2021-03-03 1:23 ` Dave Chinner
2021-03-03 17:20 ` Brian Foster
2021-03-04 2:01 ` Dave Chinner
2021-02-23 5:32 ` [PATCH 3/3] xfs: CIL work is serialised, not pipelined Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YDeiLOdFhIMJegWZ@bfoster \
--to=bfoster@redhat.com \
--cc=david@fromorbit.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).