Re: [PATCH 2/2] xfs: Throttle commits on delayed background CIL push

linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Brian Foster <bfoster@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 2/2] xfs: Throttle commits on delayed background CIL push
Date: Tue, 1 Oct 2019 09:13:36 -0400	[thread overview]
Message-ID: <20191001131336.GB62428@bfoster> (raw)
In-Reply-To: <20191001034207.GS16973@dread.disaster.area>

On Tue, Oct 01, 2019 at 01:42:07PM +1000, Dave Chinner wrote:
> On Tue, Oct 01, 2019 at 07:53:36AM +1000, Dave Chinner wrote:
> > On Mon, Sep 30, 2019 at 01:03:58PM -0400, Brian Foster wrote:
> > > On Mon, Sep 30, 2019 at 04:03:44PM +1000, Dave Chinner wrote:
> > > > From: Dave Chinner <dchinner@redhat.com>
> > > > 
> > > > In certain situations the background CIL push can be indefinitely
> > > > delayed. While we have workarounds from the obvious cases now, it
> > > > doesn't solve the underlying issue. This issue is that there is no
> > > > upper limit on the CIL where we will either force or wait for
> > > > a background push to start, hence allowing the CIL to grow without
> > > > bound until it consumes all log space.
> > > > 
> > > > To fix this, add a new wait queue to the CIL which allows background
> > > > pushes to wait for the CIL context to be switched out. This happens
> > > > when the push starts, so it will allow us to block incoming
> > > > transaction commit completion until the push has started. This will
> > > > only affect processes that are running modifications, and only when
> > > > the CIL threshold has been significantly overrun.
> > > > 
> > > > This has no apparent impact on performance, and doesn't even trigger
> > > > until over 45 million inodes had been created in a 16-way fsmark
> > > > test on a 2GB log. That was limiting at 64MB of log space used, so
> > > > the active CIL size is only about 3% of the total log in that case.
> > > > The concurrent removal of those files did not trigger the background
> > > > sleep at all.
> > > > 
> > > 
> > > Have you done similar testing for small/minimum sized logs?
> > 
> > Yes. I've had the tracepoint active during xfstests runs on test
> > filesystems using default log sizes on 5-15GB filesystems. The only
> > test in all of xfstests that has triggered it is generic/017, and it
> > only triggered once.
> > 
> > e.g.
> > 
> > # trace-cmd start -e xfs_log_cil_wait
> > <run xfstests>
> > # trace-cmd show
> > # tracer: nop
> > #
> > # entries-in-buffer/entries-written: 1/1   #P:4
> > #
> > #                              _-----=> irqs-off
> > #                             / _----=> need-resched
> > #                            | / _---=> hardirq/softirq
> > #                            || / _--=> preempt-depth
> > #                            ||| /     delay
> > #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
> > #              | |       |   ||||       |         |
> >           xfs_io-2158  [001] ...1   309.285959: xfs_log_cil_wait: dev 8:96 t_ocnt 1 t_cnt 1 t_curr_res 67956 t_unit_res 67956 t_flags XLOG_TIC_INITED reserveq empty writeq empty grant_reserve_cycle 75 grant_reserve_bytes 12878480 grant_write_cycle 75 grant_write_bytes 12878480 curr_cycle 75 curr_block 10448 tail_cycle 75 tail_block 3560
> > #
> > 
> > And the timestamp matched the time that generic/017 was running.
> 
> SO I've run this on my typical 16-way fsmark workload with different
> size logs. It barely triggers on log sizes larger than 64MB, on 32MB
> logs I can see it capturing all 16 fsmark processes while waiting
> for the CIL context to switch. This will give you an idea of the
> log cycles the capture is occuring on, and the count of processes
> being captured:
> 
> $ sudo trace-cmd show | awk -e '/^ / {print $23}' | sort -n |uniq -c
>      16 251
>      32 475
>      16 494
>      32 870
>      15 1132
>      15 1166
>      14 1221
>       1 1222
>      16 1223
>       7 1307
>       8 1308
>      16 1315
>      16 1738
>      16 1832
>       9 2167
>       7 2168
>      16 2200
>      16 2375
>      16 2383
>      16 2700
>      16 2797
>      16 2798
>      16 2892
> $

Thanks. I assume I'm looking at cycle numbers and event counts here?

> 
> So typically groups of captures are hundreds of log cycles apart
> (100 cycles x 32MB = ~3GB of log writes), then there will be a
> stutter where the CIL dispatch is delayed, and then everything
> continues on. These all show the log is always around the 75% full
> (AIL tail pushing theshold) but the reservation grant wait lists are
> always empty so we're not running out of reservation space here.
> 

It's somewhat interesting that we manage to block every thread most of
the time before the CIL push task starts. I wonder a bit if that pattern
would hold for a system/workload with more CPUs (and if so, if there are
any odd side effects of stalling and waking hundreds of tasks at the
same time vs. our traditional queuing behavior).

> If I make the log even smaller - 16MB - the log is always full, the
> AIL is always tail pushing, and there is a constant stream of log
> forces (30-40/s) because tail pushing is hitting pinned items
> several thousand times a second.  IOWs, the frequency of the log
> forces means that CIL is almost never growing large enough to do a
> background push, let alone overrun the blocking threshold. Same
> trace for the same workload as above:
> 
> $ sudo trace-cmd show | awk -e '/^ / {print $23}' | sort -n |uniq -c
>      16 1400
>      16 5284
>      16 5624
>      16 5778
>      16 6159
>      10 6477
> $
> 
> So when we have lots of concurrency and modification, tiny logs
> appear to be less susceptible to CIL overruns than small logs
> because they are constantly tail pushing and issuing log forces that
> trigger trigger flushes of the CIL before an overruns could occur.
> 

Makes sense, thanks.

Brian

> Cheers,
> 
> Dave.
> 
> -- 
> Dave Chinner
> david@fromorbit.com

next prev parent reply	other threads:[~2019-10-01 13:13 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-30  6:03 [PATCH v2 0/2] xfs: limit CIL push sizes Dave Chinner
2019-09-30  6:03 ` [PATCH 1/2] xfs: Lower CIL flush limit for large logs Dave Chinner
2019-09-30 16:55   ` Brian Foster
2019-09-30  6:03 ` [PATCH 2/2] xfs: Throttle commits on delayed background CIL push Dave Chinner
2019-09-30 17:03   ` Brian Foster
2019-09-30 21:53     ` Dave Chinner
2019-10-01  3:42       ` Dave Chinner
2019-10-01 13:13         ` Brian Foster [this message]
2019-10-01 23:14           ` Dave Chinner
2019-10-02 12:41             ` Brian Foster
2019-10-03  1:25               ` Dave Chinner
2019-10-03 14:41                 ` Brian Foster
2019-10-04  2:27                   ` Dave Chinner
2019-10-04 11:50                     ` Brian Foster
2019-10-08  2:51                       ` Dave Chinner
2019-10-08 13:22                         ` Brian Foster
2019-10-08 17:34                           ` Brian Foster
2019-10-01 13:13       ` Brian Foster
2019-10-01 22:31         ` Dave Chinner
2019-10-02 12:40           ` Brian Foster
2019-10-03  0:53             ` Dave Chinner
2019-10-03 14:39               ` Brian Foster
2019-10-08  3:34                 ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191001131336.GB62428@bfoster \
    --to=bfoster@redhat.com \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).