linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 2/3] xfs: AIL needs asynchronous CIL forcing
Date: Thu, 25 Feb 2021 08:12:12 -0500	[thread overview]
Message-ID: <YDeiLOdFhIMJegWZ@bfoster> (raw)
In-Reply-To: <20210224211058.GA4662@dread.disaster.area>

On Thu, Feb 25, 2021 at 08:10:58AM +1100, Dave Chinner wrote:
> On Tue, Feb 23, 2021 at 04:32:11PM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > The AIL pushing is stalling on log forces when it comes across
> > pinned items. This is happening on removal workloads where the AIL
> > is dominated by stale items that are removed from AIL when the
> > checkpoint that marks the items stale is committed to the journal.
> > This results is relatively few items in the AIL, but those that are
> > are often pinned as directories items are being removed from are
> > still being logged.
> > 
> > As a result, many push cycles through the CIL will first issue a
> > blocking log force to unpin the items. This can take some time to
> > complete, with tracing regularly showing push delays of half a
> > second and sometimes up into the range of several seconds. Sequences
> > like this aren't uncommon:
> > 
> > ....
> >  399.829437:  xfsaild: last lsn 0x11002dd000 count 101 stuck 101 flushing 0 tout 20
> > <wanted 20ms, got 270ms delay>
> >  400.099622:  xfsaild: target 0x11002f3600, prev 0x11002f3600, last lsn 0x0
> >  400.099623:  xfsaild: first lsn 0x11002f3600
> >  400.099679:  xfsaild: last lsn 0x1100305000 count 16 stuck 11 flushing 0 tout 50
> > <wanted 50ms, got 500ms delay>
> >  400.589348:  xfsaild: target 0x110032e600, prev 0x11002f3600, last lsn 0x0
> >  400.589349:  xfsaild: first lsn 0x1100305000
> >  400.589595:  xfsaild: last lsn 0x110032e600 count 156 stuck 101 flushing 30 tout 50
> > <wanted 50ms, got 460ms delay>
> >  400.950341:  xfsaild: target 0x1100353000, prev 0x110032e600, last lsn 0x0
> >  400.950343:  xfsaild: first lsn 0x1100317c00
> >  400.950436:  xfsaild: last lsn 0x110033d200 count 105 stuck 101 flushing 0 tout 20
> > <wanted 20ms, got 200ms delay>
> >  401.142333:  xfsaild: target 0x1100361600, prev 0x1100353000, last lsn 0x0
> >  401.142334:  xfsaild: first lsn 0x110032e600
> >  401.142535:  xfsaild: last lsn 0x1100353000 count 122 stuck 101 flushing 8 tout 10
> > <wanted 10ms, got 10ms delay>
> >  401.154323:  xfsaild: target 0x1100361600, prev 0x1100361600, last lsn 0x1100353000
> >  401.154328:  xfsaild: first lsn 0x1100353000
> >  401.154389:  xfsaild: last lsn 0x1100353000 count 101 stuck 101 flushing 0 tout 20
> > <wanted 20ms, got 300ms delay>
> >  401.451525:  xfsaild: target 0x1100361600, prev 0x1100361600, last lsn 0x0
> >  401.451526:  xfsaild: first lsn 0x1100353000
> >  401.451804:  xfsaild: last lsn 0x1100377200 count 170 stuck 22 flushing 122 tout 50
> > <wanted 50ms, got 500ms delay>
> >  401.933581:  xfsaild: target 0x1100361600, prev 0x1100361600, last lsn 0x0
> > ....
> > 
> > In each of these cases, every AIL pass saw 101 log items stuck on
> > the AIL (pinned) with very few other items being found. Each pass, a
> > log force was issued, and delay between last/first is the sleep time
> > + the sync log force time.
> > 
> > Some of these 101 items pinned the tail of the log. The tail of the
> > log does slowly creep forward (first lsn), but the problem is that
> > the log is actually out of reservation space because it's been
> > running so many transactions that stale items that never reach the
> > AIL but consume log space. Hence we have a largely empty AIL, with
> > long term pins on items that pin the tail of the log that don't get
> > pushed frequently enough to keep log space available.
> > 
> > The problem is the hundreds of milliseconds that we block in the log
> > force pushing the CIL out to disk. The AIL should not be stalled
> > like this - it needs to run and flush items that are at the tail of
> > the log with minimal latency. What we really need to do is trigger a
> > log flush, but then not wait for it at all - we've already done our
> > waiting for stuff to complete when we backed off prior to the log
> > force being issued.
> > 
> > Even if we remove the XFS_LOG_SYNC from the xfs_log_force() call, we
> > still do a blocking flush of the CIL and that is what is causing the
> > issue. Hence we need a new interface for the CIL to trigger an
> > immediate background push of the CIL to get it moving faster but not
> > to wait on that to occur. While the CIL is pushing, the AIL can also
> > be pushing.
> > 
> > We already have an internal interface to do this -
> > xlog_cil_push_now() - but we need a wrapper for it to be used
> > externally. xlog_cil_force_seq() can easily be extended to do what
> > we need as it already implements the synchronous CIL push via
> > xlog_cil_push_now(). Add the necessary flags and "push current
> > sequence" semantics to xlog_cil_force_seq() and convert the AIL
> > pushing to use it.
> 
> Overnight testing indicates generic/530 hangs on small logs with
> this patch. It essentially runs out of log space because there are
> inode cluster buffers permanently pinned by this workload.
> 
> That is, as inodes are unlinked, they repeatedly relog the inode
> cluster buffer, and so while the CIL is flushing to unpin the
> buffer, a new unlink relogs it and adds a new pin to the buffer.
> Hence the log force that the AIL does to unpin pinned items doesn't
> actually unpin buffers that are being relogged.
> 
> These buffers only get unpinned when the log actually runs out of
> space because they pin the tail of the log. Then the modifications
> that are relogging the buffer stop because they fail to get a log
> reservation, and the tail of the log does not move forward until the
> AIL issues a log force that finally unpins the buffer and writes it
> back.
> 
> Essentially, relogged buffers cannot be flushed by the AIL until the
> log completely stalls.
> 
> The problem being tripped over here is that we no longer force the
> final iclogs in a CIL push to disk - we leave the iclog with the
> commit record in it in ACTIVE state, and by not waiting and forcing
> all the iclogs to disk, the buffer never gets unpinned because there
> isn't any more log pressure to force it out because everything is
> blocked on reservation space.
> 
> The solution to this is to have the CIL push change the state of the
> commit iclog to WANT_SYNC before it is released. This means the CIL
> push will always flush the iclog to disk and the checkpoint will
> complete and unpin the buffers.
> 
> Right now, we really only want to do this state switch for these
> async pushes - for small sync transactions and fsync we really want
> the iclog aggregation that we have now to optimise iclogbuf usage,
> so I'll have to pass a new flag through the push code and back into
> xlog_write(). That will make this patch behave the same as we
> currently do.
> 

Unfortunately I've not yet caught up to reviewing your most recently
posted set of log patches so I can easily be missing some context, but
when passing through some of the feedback/updates so far this has me
rather confused. We discussed this pre-existing CIL behavior in the
previous version, I suggested some similar potential behavior change
where we would opportunistically send off checkpoint iclogs for I/O a
bit earlier than normal and you argued [1] against it. Now it sounds
like not only are we implementing that, but it's actually necessary to
fix a log hang problem..? What am I missing?

The updated iclog behavior does sound more friendly to me than what we
currently do (obviously, based on my previous comments), but I am
slightly skeptical of how such a change fixes the root cause of a hang.
Is this a stall/perf issue or an actual log deadlock? If the latter,
what prevents this deadlock on current upstream?

Brian

[1] https://lore.kernel.org/linux-xfs/20210129222559.GT4662@dread.disaster.area/

> In the longer term, we need to change how the AIL deals with pinned
> buffers, as the current method is definitely sub-optimal. It also
> explains the "everything stops for a second" performance anomalies
> I've been seeing for a while in testing. But fixing that is outside
> the scope of this patchset, so in the mean time I'll fix this one up
> and repost it in a little while.
> 
> FWIW, this is also the likely cause of the "CIL workqueue too deep"
> hangs I was seeing with the next patch in the series, too.
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 


  parent reply	other threads:[~2021-02-25 13:13 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-23  5:32 [PATCH 0/3] xfs: CIL improvements Dave Chinner
2021-02-23  5:32 ` [PATCH 1/3] xfs: xfs_log_force_lsn isn't passed a LSN Dave Chinner
2021-02-24 21:42   ` Darrick J. Wong
2021-02-24 22:19     ` Dave Chinner
2021-02-25 19:01     ` Christoph Hellwig
2021-03-02 18:12   ` Brian Foster
2021-02-23  5:32 ` [PATCH 2/3] xfs: AIL needs asynchronous CIL forcing Dave Chinner
2021-02-24 21:10   ` Dave Chinner
2021-02-24 23:26     ` [PATCH 2/3 v2] " Dave Chinner
2021-02-25  2:15       ` Dave Chinner
2021-03-02 21:44       ` Brian Foster
2021-03-03  0:57         ` Dave Chinner
2021-03-03 17:32           ` Brian Foster
2021-03-04  1:59             ` Dave Chinner
2021-03-04 13:13               ` Brian Foster
2021-03-04 22:48                 ` Dave Chinner
2021-03-05 14:58                   ` Brian Foster
2021-03-09  0:44                     ` Dave Chinner
2021-03-09  4:35                       ` [PATCH 2/3 v2] xfs: AIL needs asynchronous CIL forcing\ Darrick J. Wong
2021-03-10  2:10                         ` Brian Foster
2021-03-10 22:00                           ` [PATCH 2/3 v2] xfs: AIL needs asynchronous CIL forcing Dave Chinner
2021-03-10 15:13                         ` [PATCH 2/3 v2] xfs: AIL needs asynchronous CIL forcing\ Brian Foster
2021-03-11 12:41                         ` Christoph Hellwig
2021-03-10 14:49                       ` [PATCH 2/3 v2] xfs: AIL needs asynchronous CIL forcing Brian Foster
2021-02-25 13:12     ` Brian Foster [this message]
2021-02-25 22:03       ` [PATCH 2/3] " Dave Chinner
2021-02-27 16:25         ` Brian Foster
2021-03-01  4:54           ` Dave Chinner
2021-03-01 13:32             ` Brian Foster
2021-03-03  1:23               ` Dave Chinner
2021-03-03 17:20                 ` Brian Foster
2021-03-04  2:01                   ` Dave Chinner
2021-02-23  5:32 ` [PATCH 3/3] xfs: CIL work is serialised, not pipelined Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YDeiLOdFhIMJegWZ@bfoster \
    --to=bfoster@redhat.com \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).