Re: [PATCH 30/39] xfs: implement percpu cil space used calculation

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

From: "Darrick J. Wong" <djwong@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 30/39] xfs: implement percpu cil space used calculation
Date: Wed, 2 Jun 2021 20:01:48 -0700	[thread overview]
Message-ID: <20210603030148.GT26380@locust> (raw)
In-Reply-To: <20210603022814.GM664593@dread.disaster.area>

On Thu, Jun 03, 2021 at 12:28:14PM +1000, Dave Chinner wrote:
> On Wed, Jun 02, 2021 at 06:26:09PM -0700, Darrick J. Wong wrote:
> > On Thu, Jun 03, 2021 at 09:47:47AM +1000, Dave Chinner wrote:
> > > On Thu, May 27, 2021 at 11:41:21AM -0700, Darrick J. Wong wrote:
> > > > On Wed, May 19, 2021 at 10:13:08PM +1000, Dave Chinner wrote:
> > > > > From: Dave Chinner <dchinner@redhat.com>
> > > > > 
> > > > > Now that we have the CIL percpu structures in place, implement the
> > > > > space used counter with a fast sum check similar to the
> > > > > percpu_counter infrastructure.
> > > > > 
> > > > > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > > > > ---
> > > > >  fs/xfs/xfs_log_cil.c  | 61 ++++++++++++++++++++++++++++++++++++++-----
> > > > >  fs/xfs/xfs_log_priv.h |  2 +-
> > > > >  2 files changed, 55 insertions(+), 8 deletions(-)
> > > > > 
> > > > > diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> > > > > index ba1c6979a4c7..72693fba929b 100644
> > > > > --- a/fs/xfs/xfs_log_cil.c
> > > > > +++ b/fs/xfs/xfs_log_cil.c
> > > > > @@ -76,6 +76,24 @@ xlog_cil_ctx_alloc(void)
> > > > >  	return ctx;
> > > > >  }
> > > > >  
> > > > > +/*
> > > > > + * Aggregate the CIL per cpu structures into global counts, lists, etc and
> > > > > + * clear the percpu state ready for the next context to use.
> > > > > + */
> > > > > +static void
> > > > > +xlog_cil_pcp_aggregate(
> > > > > +	struct xfs_cil		*cil,
> > > > > +	struct xfs_cil_ctx	*ctx)
> > > > > +{
> > > > > +	struct xlog_cil_pcp	*cilpcp;
> > > > > +	int			cpu;
> > > > > +
> > > > > +	for_each_online_cpu(cpu) {
> > > > > +		cilpcp = per_cpu_ptr(cil->xc_pcp, cpu);
> > > > > +		cilpcp->space_used = 0;
> > > > 
> > > > How does this aggregate anything?  All I see here is zeroing a counter?
> > > 
> > > Yup, zeroing all the percpu counters is an aggregation function....
> > > 
> > > By definition "aggregate != sum".
> > > 
> > > An aggregate is formed by the collection of discrete units into a
> > > larger whole; the collective definition involves manipulating all
> > > discrete units as a single whole entity. e.g. a percpu counter is
> > > an aggregate of percpu variables that, via aggregation, can sum the
> > > discrete variables into a single value. IOWs, percpu_counter_sum()
> > > is an aggregation function that sums...
> > > 
> > > > I see that we /can/ add the percpu space_used counter to the cil context
> > > > if we're over the space limits, but I don't actually see where...
> > > 
> > > In this case, the global CIL space used counter is summed by the
> > > per-cpu counter update context and not an aggregation context. For
> > > it to work as a global counter since a distinct point in time, it
> > > needs an aggregation operation that zeros all the discrete units of
> > > the counter at a single point in time. IOWs, the aggregation
> > > function of this counter is a zeroing operation, not a summing
> > > operation. This is what xlog_cil_pcp_aggregate() is doing here.
> > > 
> > > Put simply, an aggregation function is not a summing function, but a
> > > function that operates on all the discrete units of the
> > > aggregate so that it can operate correctly as a single unit....
> > 
> > <nod> I grok what 'aggregate' means as a general term, though perhaps I
> > was too quick to associate it with 'sum' here.
> > 
> > > I don't know of a better way of describing what this function does.
> > > At the end of the series, this function will zero some units. In
> > > other cases it will sum units. In some cases it will do both. Not to
> > > mention that it will merge discrete lists into a global list. And so
> > > on. The only common thing between these operations is that they are
> > > all aggregation functions that allow the CIL context to operate as a
> > > whole unit...
> > 
> > *Oh* I think I realized where my understanding gap lies.
> > 
> > space_used isn't part of some global space accounting scheme that has to
> > be kept accurate.  It's a per-cil-context variable that we used to
> > throttle incoming commits when the current context starts to get full.
> > That's why _cil_insert_items only bothers to add the per-cpu space_used
> > to the ctx space_used if either (a) the current cpu has used up more
> > than its slice of space or (b) enough cpus have hit (a) to push the
> > ctx's space_used above the XLOG_CIL_SPACE_LIMIT.  If no cpus have hit
> > (a) then the current cil context still has plenty of space and there
> > isn't any need to throttle the frontend.
> > 
> > By the time we get to the aggregation step in _cil_push_work we've
> > already decided to install a new context and write the current context
> > to disk.  We don't care about throttling the frontend on this (closed)
> > context and we hold xc_ctx_lock so nobody can see the new context.
> > That's why the aggregate function zeroes the per-cpu space_used.
> 
> Exactly.

oh good!

> > > > > +	/*
> > > > > +	 * Update the CIL percpu pointer. This updates the global counter when
> > > > > +	 * over the percpu batch size or when the CIL is over the space limit.
> > > > > +	 * This means low lock overhead for normal updates, and when over the
> > > > > +	 * limit the space used is immediately accounted. This makes enforcing
> > > > > +	 * the hard limit much more accurate. The per cpu fold threshold is
> > > > > +	 * based on how close we are to the hard limit.
> > > > > +	 */
> > > > > +	cilpcp = get_cpu_ptr(cil->xc_pcp);
> > > > > +	cilpcp->space_used += len;
> > > > > +	if (space_used >= XLOG_CIL_SPACE_LIMIT(log) ||
> > > > > +	    cilpcp->space_used >
> > > > > +			((XLOG_CIL_BLOCKING_SPACE_LIMIT(log) - space_used) /
> > > > > +					num_online_cpus())) {
> > > > > +		atomic_add(cilpcp->space_used, &ctx->space_used);
> > > > > +		cilpcp->space_used = 0;
> > > > > +	}
> > > > > +	put_cpu_ptr(cilpcp);
> > > > > +
> > > > >  	spin_lock(&cil->xc_cil_lock);
> > > > > -	tp->t_ticket->t_curr_res -= ctx_res + len;
> > > > >  	ctx->ticket->t_unit_res += ctx_res;
> > > > >  	ctx->ticket->t_curr_res += ctx_res;
> > > > > -	ctx->space_used += len;
> > > > 
> > > > ...this update happens if we're not over the space limit?
> > > 
> > > It's the second case in the above if statement. As the space used in
> > > the percpu pointer goes over it's fraction of the remaining space
> > > limit (limit remaining / num_cpus_online), then it adds the
> > > pcp counter back into the global counter. Essentially it is:
> > > 
> > > 	if (over push threshold ||
> > > >>>>>>	    pcp->used > ((hard limit - ctx->space_used) / cpus)) {
> > > 		ctx->space_used += pcp->used;
> > > 		pcp->used = 0;
> > > 	}
> > > 
> > > Hence, to begin with, the percpu counter is allowed to sum a large
> > > chunk of space before it trips the per CPU summing threshold. When
> > > summing occurs, the per-cpu threshold goes down, meaning there pcp
> > > counters will trip sooner in the next cycle.
> > > 
> > > IOWs, the summing threshold gets closer to zero the closer the
> > > global count gets to the hard limit. Hence when there's lots of
> > > space available, we have little summing contention, but when we
> > > are close to the blocking limit we essentially update the global
> > > counter on every modification.
> > 
> > Ok, I think I get it now.  The statement "ctx->space_used = 0" is part
> > of clearing the percpu state and is not part of aggregating the CIL per
> > cpu structure into the context.
> 
> If you mean the 'cilpcp->space_used = 0' statement in the
> _aggregate() function, then yes.

Yes.  My brain is tired.

(Also it's really hot here.)

> We don't actually ever zero the ctx->space_used because it always is
> initialised to zero by allocation of a new context and switching to
> it...
> 
> > So assuming that I grokked it all on the second try, maybe a comment is
> > in order for the aggregate function?
> > 
> > 	/*
> > 	 * We're in the middle of switching cil contexts.  Reset the
> > 	 * counter we use to detect when the current context is nearing
> > 	 * full.
> > 	 */
> > 	ctx->space_used = 0;
> 
> Hmmmm - I'm not sure where you are asking I put this comment...

Sorry, I meant the comment to be placed above the
"cilpcp->space_used = 0;" in the _aggregate function.

> > > As such, we get scalability when the CIL is empty by trading off
> > > accuracy, but we get accuracy when it is nearing full by trading off
> > > scalability. We might need to tweak it for really large CPU counts
> > > (maybe use log2(num_online_cpus()), but fundamentally the algorithm
> > > is designed to scale according to how close we are to the push
> > > thresholds....
> > 
> > <nod> I suppose if you had a very large number of CPUs and a very small
> > log then the slivers could be zero ... which just means we'd just lose
> > performance due to overly-careful accounting.  I wonder if you want to
> > leave a breadcrumb to warn if:
> > 
> > XLOG_CIL_BLOCKING_SPACE_LIMIT() / num_online_cpus() == 0
> > 
> > just in case someone ever wanders in with such a configuration?
> 
> I don't think that can happen. The smallest blocking size limit will
> be a quarter of the smallest log size which puts us around 800KB
> as the blocking limit. I can't see us supporting 800k CPUs any time
> soon, and even if you have a couple of hundred thousand CPUs, you're
> going to have lots of other basic problems trying to do concurrent
> operations on such a tiny log before we even consider this
> threshold...

Just think of what IPIs look like! :D
> 
> I'm not too worried about scalability and performance on tiny logs -
> if they result in an atomic update every transaction, it'll still be
> sufficient to scale to the concurrency the transaction reservation
> code will let into the commit path in the first place...

Ok.

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

next prev parent reply	other threads:[~2021-06-03  3:11 UTC|newest]

Thread overview: 88+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-19 12:12 [PATCH 00/39 v4] xfs: CIL and log optimisations Dave Chinner
2021-05-19 12:12 ` [PATCH 01/39] xfs: log stripe roundoff is a property of the log Dave Chinner
2021-05-28  0:54   ` Allison Henderson
2021-05-19 12:12 ` [PATCH 02/39] xfs: separate CIL commit record IO Dave Chinner
2021-05-28  0:54   ` Allison Henderson
2021-05-19 12:12 ` [PATCH 03/39] xfs: remove xfs_blkdev_issue_flush Dave Chinner
2021-05-28  0:54   ` Allison Henderson
2021-05-19 12:12 ` [PATCH 04/39] xfs: async blkdev cache flush Dave Chinner
2021-05-20 23:53   ` Darrick J. Wong
2021-05-28  0:54   ` Allison Henderson
2021-05-19 12:12 ` [PATCH 05/39] xfs: CIL checkpoint flushes caches unconditionally Dave Chinner
2021-05-28  0:54   ` Allison Henderson
2021-05-19 12:12 ` [PATCH 06/39] xfs: remove need_start_rec parameter from xlog_write() Dave Chinner
2021-05-19 12:12 ` [PATCH 07/39] xfs: journal IO cache flush reductions Dave Chinner
2021-05-21  0:16   ` Darrick J. Wong
2021-05-19 12:12 ` [PATCH 08/39] xfs: Fix CIL throttle hang when CIL space used going backwards Dave Chinner
2021-05-19 12:12 ` [PATCH 09/39] xfs: xfs_log_force_lsn isn't passed a LSN Dave Chinner
2021-05-21  0:20   ` Darrick J. Wong
2021-05-19 12:12 ` [PATCH 10/39] xfs: AIL needs asynchronous CIL forcing Dave Chinner
2021-05-21  0:33   ` Darrick J. Wong
2021-05-19 12:12 ` [PATCH 11/39] xfs: CIL work is serialised, not pipelined Dave Chinner
2021-05-21  0:32   ` Darrick J. Wong
2021-05-19 12:12 ` [PATCH 12/39] xfs: factor out the CIL transaction header building Dave Chinner
2021-05-19 12:12 ` [PATCH 13/39] xfs: only CIL pushes require a start record Dave Chinner
2021-05-19 12:12 ` [PATCH 14/39] xfs: embed the xlog_op_header in the unmount record Dave Chinner
2021-05-21  0:35   ` Darrick J. Wong
2021-05-19 12:12 ` [PATCH 15/39] xfs: embed the xlog_op_header in the commit record Dave Chinner
2021-05-19 12:12 ` [PATCH 16/39] xfs: log tickets don't need log client id Dave Chinner
2021-05-21  0:38   ` Darrick J. Wong
2021-05-19 12:12 ` [PATCH 17/39] xfs: move log iovec alignment to preparation function Dave Chinner
2021-05-19 12:12 ` [PATCH 18/39] xfs: reserve space and initialise xlog_op_header in item formatting Dave Chinner
2021-05-19 12:12 ` [PATCH 19/39] xfs: log ticket region debug is largely useless Dave Chinner
2021-05-19 12:12 ` [PATCH 20/39] xfs: pass lv chain length into xlog_write() Dave Chinner
2021-05-27 17:20   ` Darrick J. Wong
2021-06-02 22:18     ` Dave Chinner
2021-06-02 22:24       ` Darrick J. Wong
2021-06-02 22:58         ` [PATCH 20/39 V2] " Dave Chinner
2021-06-02 23:01           ` Darrick J. Wong
2021-05-19 12:12 ` [PATCH 21/39] xfs: introduce xlog_write_single() Dave Chinner
2021-05-27 17:27   ` Darrick J. Wong
2021-05-19 12:13 ` [PATCH 22/39] xfs:_introduce xlog_write_partial() Dave Chinner
2021-05-27 18:06   ` Darrick J. Wong
2021-06-02 22:21     ` Dave Chinner
2021-05-19 12:13 ` [PATCH 23/39] xfs: xlog_write() no longer needs contwr state Dave Chinner
2021-05-19 12:13 ` [PATCH 24/39] xfs: xlog_write() doesn't need optype anymore Dave Chinner
2021-05-27 18:07   ` Darrick J. Wong
2021-05-19 12:13 ` [PATCH 25/39] xfs: CIL context doesn't need to count iovecs Dave Chinner
2021-05-27 18:08   ` Darrick J. Wong
2021-05-19 12:13 ` [PATCH 26/39] xfs: use the CIL space used counter for emptiness checks Dave Chinner
2021-05-19 12:13 ` [PATCH 27/39] xfs: lift init CIL reservation out of xc_cil_lock Dave Chinner
2021-05-19 12:13 ` [PATCH 28/39] xfs: rework per-iclog header CIL reservation Dave Chinner
2021-05-27 18:17   ` Darrick J. Wong
2021-05-19 12:13 ` [PATCH 29/39] xfs: introduce per-cpu CIL tracking structure Dave Chinner
2021-05-27 18:31   ` Darrick J. Wong
2021-05-19 12:13 ` [PATCH 30/39] xfs: implement percpu cil space used calculation Dave Chinner
2021-05-27 18:41   ` Darrick J. Wong
2021-06-02 23:47     ` Dave Chinner
2021-06-03  1:26       ` Darrick J. Wong
2021-06-03  2:28         ` Dave Chinner
2021-06-03  3:01           ` Darrick J. Wong [this message]
2021-06-03  3:56             ` Dave Chinner
2021-05-19 12:13 ` [PATCH 31/39] xfs: track CIL ticket reservation in percpu structure Dave Chinner
2021-05-27 18:48   ` Darrick J. Wong
2021-05-19 12:13 ` [PATCH 32/39] xfs: convert CIL busy extents to per-cpu Dave Chinner
2021-05-27 18:49   ` Darrick J. Wong
2021-05-19 12:13 ` [PATCH 33/39] xfs: Add order IDs to log items in CIL Dave Chinner
2021-05-27 19:00   ` Darrick J. Wong
2021-06-03  0:16     ` Dave Chinner
2021-06-03  0:49       ` Darrick J. Wong
2021-06-03  2:13         ` Dave Chinner
2021-06-03  3:02           ` Darrick J. Wong
2021-05-19 12:13 ` [PATCH 34/39] xfs: convert CIL to unordered per cpu lists Dave Chinner
2021-05-27 19:03   ` Darrick J. Wong
2021-06-03  0:27     ` Dave Chinner
2021-05-19 12:13 ` [PATCH 35/39] xfs: convert log vector chain to use list heads Dave Chinner
2021-05-27 19:13   ` Darrick J. Wong
2021-06-03  0:38     ` Dave Chinner
2021-06-03  0:50       ` Darrick J. Wong
2021-05-19 12:13 ` [PATCH 36/39] xfs: move CIL ordering to the logvec chain Dave Chinner
2021-05-27 19:14   ` Darrick J. Wong
2021-05-19 12:13 ` [PATCH 37/39] xfs: avoid cil push lock if possible Dave Chinner
2021-05-27 19:18   ` Darrick J. Wong
2021-05-19 12:13 ` [PATCH 38/39] xfs: xlog_sync() manually adjusts grant head space Dave Chinner
2021-05-19 12:13 ` [PATCH 39/39] xfs: expanding delayed logging design with background material Dave Chinner
2021-05-27 20:38   ` Darrick J. Wong
2021-06-03  0:57     ` Dave Chinner
  -- strict thread matches above, loose matches on Subject: below --
2021-06-03  5:22 [PATCH 00/39 v5] xfs: CIL and log optimisations Dave Chinner
2021-06-03  5:22 ` [PATCH 30/39] xfs: implement percpu cil space used calculation Dave Chinner
2021-06-03 16:44   ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210603030148.GT26380@locust \
    --to=djwong@kernel.org \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox