Re: [GIT PULL] xfs: xlog_write rework and CIL scalability

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Darrick J. Wong" <djwong@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [GIT PULL] xfs: xlog_write rework and CIL scalability
Date: Wed, 12 Jan 2022 15:56:04 -0800	[thread overview]
Message-ID: <20220112235604.GC19198@magnolia> (raw)
In-Reply-To: <20220111175828.GC656707@magnolia>

On Tue, Jan 11, 2022 at 09:58:29AM -0800, Darrick J. Wong wrote:
> On Tue, Jan 11, 2022 at 04:04:37PM +1100, Dave Chinner wrote:
> > On Thu, Jan 06, 2022 at 01:40:33PM -0800, Darrick J. Wong wrote:
> > > On Fri, Dec 10, 2021 at 11:09:56AM +1100, Dave Chinner wrote:
> > > > Hi Darrick,
> > > > 
> > > > Can you please pull the following changes from the tag listed below
> > > > for the XFS dev tree?
> > > 
> > > Hi Dave,
> > > 
> > > I tried, but the regressions with generic/017 persist.  It trips the
> > > ticket reservation pretty consistently within 45-60 seconds of starting,
> > > at least on the OCI VM that I created.  /dev/sd[ab] are (software
> > > defined) disks that can sustain reads of ~50MB/s and ~5000iops; and
> > > writes of about half those numbers.
> > > 
> > >  run fstests generic/017 at 2022-01-06 13:18:59
> > >  XFS (sda4): Mounting V5 Filesystem
> > >  XFS (sda4): Ending clean mount
> > >  XFS (sda4): Quotacheck needed: Please wait.
> > >  XFS (sda4): Quotacheck: Done.
> > >  XFS (sda4): ctx ticket reservation ran out. Need to up reservation
> > >  XFS (sda4): ticket reservation summary:
> > >  XFS (sda4):   unit res    = 548636 bytes
> > >  XFS (sda4):   current res = -76116 bytes
> > >  XFS (sda4):   original count  = 1
> > >  XFS (sda4):   remaining count = 1
> > >  XFS (sda4): Log I/O Error (0x2) detected at xlog_write+0x5ee/0x660 [xfs] (fs/xfs/xfs_log.c:2499).  Shutting down filesystem.
> > >  XFS (sda4): Please unmount the filesystem and rectify the problem(s)
> > >  XFS (sda3): Unmounting Filesystem
> > >  XFS (sda4): Unmounting Filesystem
> > 
> > Ok, I *think* I've worked out what was going on here. The patch
> > below has run several hundred iterations of g/017 with an external
> > log on two different fs/log size configurations that typically
> > reproduced in within 10 cycles.
> > 
> > Essentially, the problem is largely caused by using
> > XLOG_CIL_BLOCKING_SPACE_LIMIT() instead of XLOG_CIL_SPACE_LIMIT()
> > when determining how much used space we can allow the percpu
> > counters to accumulate before aggregating them back into the global
> > counter. Using the hard limit meant that we could accumulate almost
> > the entire hard limit before we aggregate even a single percpu value
> > back into the global limit, resulting in failing to trigger either
> > condition for aggregation until we'd effectively blown through the
> > hard limit.
> > 
> > This then meant the extra reservations that need to be taken for
> > space used beyond the hard limit didn't get stolen for the ctx
> > ticket, and it then overruns.
> > 
> > It also means that we could overrun the hard limit substantially
> > before throttling kicked in. With the percpu aggregation threshold
> > brought back down to the (soft limit / num online cpus) we are
> > guaranteed to always start aggregation back into the global counter
> > before or at the point in time the soft limit should be hit, meaning
> > that we start updating the global counter much sooner and so are it
> > tracks actual space used once over the soft limit much more closely.
> > 
> > Darrick, can you rerun the branch with the patch below also included, and
> > see if it reproduces on your setup? If it does, can you grab a trace
> > of the trace_printk() calls I left in the patch?
> 
> Ok, I'll do that and report back.

...everything passes now, except for generic/650 on the same machine
that has a 128M external log:

[21310.267037] run fstests generic/650 at 2022-01-11 22:41:45
[21311.121539] XFS (sda3): Mounting V5 Filesystem
[21312.295609] XFS (sda3): Ending clean mount
[21314.160622] smpboot: CPU 2 is now offline
[21314.737842] smpboot: Booting Node 0 Processor 2 APIC 0x2
[21314.740726] kvm-clock: cpu 2, msr 43f608081, secondary cpu clock
[21314.787354] kvm-guest: stealtime: cpu 2, msr 43fd1b040
[21315.917154] smpboot: CPU 1 is now offline
[21317.993809] x86: Booting SMP configuration:
[21317.996484] smpboot: Booting Node 0 Processor 1 APIC 0x1
[21318.001940] kvm-clock: cpu 1, msr 43f608041, secondary cpu clock
[21318.020207] kvm-guest: stealtime: cpu 1, msr 43fc9b040
[21320.126617] smpboot: CPU 3 is now offline
[21320.127744] XFS (sda3): ctx ticket reservation ran out. Need to up reservation
[21320.153944] XFS (sda3): ticket reservation summary:
[21320.158868] XFS (sda3):   unit res    = 2100 bytes
[21320.163064] XFS (sda3):   current res = -40 bytes
[21320.167323] XFS (sda3):   original count  = 1
[21320.170436] XFS (sda3):   remaining count = 1
[21320.171742] XFS (sda3): Log I/O Error (0x2) detected at xlog_write+0x5f3/0x670 [xfs] (fs/xfs/xfs_log.c:2512).  Shutting down filesystem.
[21320.176445] XFS (sda3): Please unmount the filesystem and rectify the problem(s)
[21320.179719] potentially unexpected fatal signal 6.
[21320.182490] potentially unexpected fatal signal 6.
[21320.182632] potentially unexpected fatal signal 6.
[21320.183842] CPU: 0 PID: 3460987 Comm: fsstress Tainted: G        W         5.16.0-rc5-djwx #rc5 79050ab45c4cbd1b9fbe98125ec0eea3a2cdfa1d

--D

> > Note that this change does not make the algorithm fully correct - we
> > can still have accumulation on other CPUs that isn't folded back
> > into the global value. What I want is feedback on whether it makes
> > the problem largely go away on configs other than my own before
> > spending more time coming up with a better lockless aggregation
> > algorithm...
> 
> --D
> 
> > Cheers,
> > 
> > Dave.
> > -- 
> > Dave Chinner
> > david@fromorbit.com
> > 
> > ---
> >  fs/xfs/xfs_log_cil.c | 18 ++++++++++++++----
> >  1 file changed, 14 insertions(+), 4 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> > index 366c0aaad640..47d46d6e15b3 100644
> > --- a/fs/xfs/xfs_log_cil.c
> > +++ b/fs/xfs/xfs_log_cil.c
> > @@ -96,6 +96,9 @@ xlog_cil_pcp_aggregate(
> >  		ctx->ticket->t_curr_res += cilpcp->space_reserved;
> >  		ctx->ticket->t_unit_res += cilpcp->space_reserved;
> >  		cilpcp->space_reserved = 0;
> > +	trace_printk("cilpcp space used %d, reserved %d unit-res %d cur-res %d",
> > +			cilpcp->space_used, cilpcp->space_reserved,
> > +			ctx->ticket->t_unit_res, ctx->ticket->t_curr_res);
> >  
> >  		if (!list_empty(&cilpcp->busy_extents)) {
> >  			list_splice_init(&cilpcp->busy_extents,
> > @@ -515,11 +518,16 @@ xlog_cil_insert_items(
> >  	 *
> >  	 * This can steal more than we need, but that's OK.
> >  	 */
> > -	space_used = atomic_read(&ctx->space_used);
> > +	space_used = atomic_read(&ctx->space_used) + len;
> >  	if (atomic_read(&cil->xc_iclog_hdrs) > 0 ||
> > -	    space_used + len >= XLOG_CIL_BLOCKING_SPACE_LIMIT(log)) {
> > +	    space_used >= XLOG_CIL_BLOCKING_SPACE_LIMIT(log)) {
> >  		int	split_res = log->l_iclog_hsize +
> >  					sizeof(struct xlog_op_header);
> > +
> > +	trace_printk("space used %d, len %d iclog hdrs %d, slim %d, hlim %d",
> > +			space_used, len, atomic_read(&cil->xc_iclog_hdrs),
> > +			XLOG_CIL_SPACE_LIMIT(log),
> > +			XLOG_CIL_BLOCKING_SPACE_LIMIT(log));
> >  		if (ctx_res)
> >  			ctx_res += split_res * (tp->t_ticket->t_iclog_hdrs - 1);
> >  		else
> > @@ -540,8 +548,9 @@ xlog_cil_insert_items(
> >  	cilpcp->space_used += len;
> >  	if (space_used >= XLOG_CIL_SPACE_LIMIT(log) ||
> >  	    cilpcp->space_used >
> > -			((XLOG_CIL_BLOCKING_SPACE_LIMIT(log) - space_used) /
> > -					num_online_cpus())) {
> > +			(XLOG_CIL_SPACE_LIMIT(log) / num_online_cpus())) {
> > +	trace_printk("cilpcp space used %d, reserved %d ctxres %d",
> > +			cilpcp->space_used, cilpcp->space_reserved, ctx_res);
> >  		atomic_add(cilpcp->space_used, &ctx->space_used);
> >  		cilpcp->space_used = 0;
> >  	}
> > @@ -1331,6 +1340,7 @@ xlog_cil_push_background(
> >  
> >  	spin_lock(&cil->xc_push_lock);
> >  	if (cil->xc_push_seq < cil->xc_current_sequence) {
> > +		trace_printk("push sapce used %d", space_used);
> >  		cil->xc_push_seq = cil->xc_current_sequence;
> >  		queue_work(cil->xc_push_wq, &cil->xc_ctx->push_work);
> >  	}

next prev parent reply	other threads:[~2022-01-12 23:56 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-10  0:09 [GIT PULL] xfs: xlog_write rework and CIL scalability Dave Chinner
2022-01-06 21:40 ` Darrick J. Wong
2022-01-11  5:04   ` Dave Chinner
2022-01-11 17:58     ` Darrick J. Wong
2022-01-12 23:56       ` Darrick J. Wong [this message]
2022-01-13  3:46         ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220112235604.GC19198@magnolia \
    --to=djwong@kernel.org \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.