public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 40/45] xfs: convert CIL to unordered per cpu lists
Date: Wed, 10 Mar 2021 17:15:05 -0800	[thread overview]
Message-ID: <20210311011505.GN3419940@magnolia> (raw)
In-Reply-To: <20210305051143.182133-41-david@fromorbit.com>

On Fri, Mar 05, 2021 at 04:11:38PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> So that we can remove the cil_lock which is a global serialisation
> point. We've already got ordering sorted, so all we need to do is
> treat the CIL list like the busy extent list and reconstruct it
> before the push starts.
> 
> This is what we're trying to avoid:
> 
>  -   75.35%     1.83%  [kernel]            [k] xfs_log_commit_cil
>     - 46.35% xfs_log_commit_cil
>        - 41.54% _raw_spin_lock
>           - 67.30% do_raw_spin_lock
>                66.96% __pv_queued_spin_lock_slowpath
> 
> Which happens on a 32p system when running a 32-way 'rm -rf'
> workload. After this patch:
> 
> -   20.90%     3.23%  [kernel]               [k] xfs_log_commit_cil
>    - 17.67% xfs_log_commit_cil
>       - 6.51% xfs_log_ticket_ungrant
>            1.40% xfs_log_space_wake
>         2.32% memcpy_erms
>       - 2.18% xfs_buf_item_committing
>          - 2.12% xfs_buf_item_release
>             - 1.03% xfs_buf_unlock
>                  0.96% up
>               0.72% xfs_buf_rele
>         1.33% xfs_inode_item_format
>         1.19% down_read
>         0.91% up_read
>         0.76% xfs_buf_item_format
>       - 0.68% kmem_alloc_large
>          - 0.67% kmem_alloc
>               0.64% __kmalloc
>         0.50% xfs_buf_item_size
> 
> It kinda looks like the workload is running out of log space all
> the time. But all the spinlock contention is gone and the
> transaction commit rate has gone from 800k/s to 1.3M/s so the amount
> of real work being done has gone up a *lot*.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_log_cil.c  | 61 ++++++++++++++++++++-----------------------
>  fs/xfs/xfs_log_priv.h |  2 --
>  2 files changed, 29 insertions(+), 34 deletions(-)
> 
> diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> index 7420389f4cee..3d43a5088154 100644
> --- a/fs/xfs/xfs_log_cil.c
> +++ b/fs/xfs/xfs_log_cil.c
> @@ -448,10 +448,9 @@ xlog_cil_insert_items(
>  	/*
>  	 * We need to take the CIL checkpoint unit reservation on the first
>  	 * commit into the CIL. Test the XLOG_CIL_EMPTY bit first so we don't
> -	 * unnecessarily do an atomic op in the fast path here. We don't need to
> -	 * hold the xc_cil_lock here to clear the XLOG_CIL_EMPTY bit as we are
> -	 * under the xc_ctx_lock here and that needs to be held exclusively to
> -	 * reset the XLOG_CIL_EMPTY bit.
> +	 * unnecessarily do an atomic op in the fast path here. We can clear the
> +	 * XLOG_CIL_EMPTY bit as we are under the xc_ctx_lock here and that
> +	 * needs to be held exclusively to reset the XLOG_CIL_EMPTY bit.
>  	 */
>  	if (test_bit(XLOG_CIL_EMPTY, &cil->xc_flags) &&
>  	    test_and_clear_bit(XLOG_CIL_EMPTY, &cil->xc_flags))
> @@ -505,24 +504,6 @@ xlog_cil_insert_items(
>  	/* attach the transaction to the CIL if it has any busy extents */
>  	if (!list_empty(&tp->t_busy))
>  		list_splice_init(&tp->t_busy, &cilpcp->busy_extents);
> -	put_cpu_ptr(cilpcp);
> -
> -	/*
> -	 * If we've overrun the reservation, dump the tx details before we move
> -	 * the log items. Shutdown is imminent...
> -	 */
> -	tp->t_ticket->t_curr_res -= ctx_res + len;
> -	if (WARN_ON(tp->t_ticket->t_curr_res < 0)) {
> -		xfs_warn(log->l_mp, "Transaction log reservation overrun:");
> -		xfs_warn(log->l_mp,
> -			 "  log items: %d bytes (iov hdrs: %d bytes)",
> -			 len, iovhdr_res);
> -		xfs_warn(log->l_mp, "  split region headers: %d bytes",
> -			 split_res);
> -		xfs_warn(log->l_mp, "  ctx ticket: %d bytes", ctx_res);
> -		xlog_print_trans(tp);
> -	}
> -
>  	/*
>  	 * Now update the order of everything modified in the transaction
>  	 * and insert items into the CIL if they aren't already there.
> @@ -530,7 +511,6 @@ xlog_cil_insert_items(
>  	 * the transaction commit.
>  	 */
>  	order = atomic_inc_return(&ctx->order_id);
> -	spin_lock(&cil->xc_cil_lock);
>  	list_for_each_entry(lip, &tp->t_items, li_trans) {
>  
>  		/* Skip items which aren't dirty in this transaction. */
> @@ -540,10 +520,26 @@ xlog_cil_insert_items(
>  		lip->li_order_id = order;
>  		if (!list_empty(&lip->li_cil))
>  			continue;
> -		list_add(&lip->li_cil, &cil->xc_cil);
> +		list_add(&lip->li_cil, &cilpcp->log_items);

Ok, so if I understand this correctly -- every time a transaction
commits, it marks every dirty log item with a monotonically increasing
counter.  If the log item isn't already on another CPU's CIL list, it
gets added to the current CPU's CIL list...

> +	}
> +	put_cpu_ptr(cilpcp);
> +
> +	/*
> +	 * If we've overrun the reservation, dump the tx details before we move
> +	 * the log items. Shutdown is imminent...
> +	 */
> +	tp->t_ticket->t_curr_res -= ctx_res + len;
> +	if (WARN_ON(tp->t_ticket->t_curr_res < 0)) {
> +		xfs_warn(log->l_mp, "Transaction log reservation overrun:");
> +		xfs_warn(log->l_mp,
> +			 "  log items: %d bytes (iov hdrs: %d bytes)",
> +			 len, iovhdr_res);
> +		xfs_warn(log->l_mp, "  split region headers: %d bytes",
> +			 split_res);
> +		xfs_warn(log->l_mp, "  ctx ticket: %d bytes", ctx_res);
> +		xlog_print_trans(tp);
>  	}
>  
> -	spin_unlock(&cil->xc_cil_lock);
>  
>  	if (tp->t_ticket->t_curr_res < 0)
>  		xfs_force_shutdown(log->l_mp, SHUTDOWN_LOG_IO_ERROR);
> @@ -806,6 +802,7 @@ xlog_cil_push_work(
>  	bool			commit_iclog_sync = false;
>  	int			cpu;
>  	struct xlog_cil_pcp	*cilpcp;
> +	LIST_HEAD		(log_items);
>  
>  	new_ctx = xlog_cil_ctx_alloc();
>  	new_ctx->ticket = xlog_cil_ticket_alloc(log);
> @@ -822,6 +819,9 @@ xlog_cil_push_work(
>  			list_splice_init(&cilpcp->busy_extents,
>  					&ctx->busy_extents);
>  		}
> +		if (!list_empty(&cilpcp->log_items)) {
> +			list_splice_init(&cilpcp->log_items, &log_items);

...and then at CIL push time, we splice each per-CPU list into a big
list, sort the dirty log items by counter number, and process them.

The first thought I had was that it's a darn shame that _insert_items
can't steal a log item from another CPU's CIL list, because you could
then mergesort the per-CPU CIL lists into @log_items.  Unfortunately, I
don't think there's a safe way to steal items from a per-CPU list
without involving locks.

The second thought I had was that we have the xfs_pwork mechanism for
launching a bunch of worker threads.  A pwork workqueue is (probably)
too costly when the item list is short or there aren't that many CPUs,
but once list_sort starts getting painful, would it be faster to launch
a bunch of threads in push_work to sort each per-CPU list and then merge
sort them into the final list?

FWIW at least mechanically, the last two patches look reasonable to me.

--D

> +		}
>  	}
>  
>  	spin_lock(&cil->xc_push_lock);
> @@ -907,12 +907,12 @@ xlog_cil_push_work(
>  	 * needed on the transaction commit side which is currently locked out
>  	 * by the flush lock.
>  	 */
> -	list_sort(NULL, &cil->xc_cil, xlog_cil_order_cmp);
> +	list_sort(NULL, &log_items, xlog_cil_order_cmp);
>  	lv = NULL;
> -	while (!list_empty(&cil->xc_cil)) {
> +	while (!list_empty(&log_items)) {
>  		struct xfs_log_item	*item;
>  
> -		item = list_first_entry(&cil->xc_cil,
> +		item = list_first_entry(&log_items,
>  					struct xfs_log_item, li_cil);
>  		list_del_init(&item->li_cil);
>  		item->li_order_id = 0;
> @@ -1099,7 +1099,6 @@ xlog_cil_push_background(
>  	 * The cil won't be empty because we are called while holding the
>  	 * context lock so whatever we added to the CIL will still be there.
>  	 */
> -	ASSERT(!list_empty(&cil->xc_cil));
>  	ASSERT(!test_bit(XLOG_CIL_EMPTY, &cil->xc_flags));
>  
>  	/*
> @@ -1491,6 +1490,7 @@ xlog_cil_pcp_alloc(
>  	for_each_possible_cpu(cpu) {
>  		cilpcp = per_cpu_ptr(pcptr, cpu);
>  		INIT_LIST_HEAD(&cilpcp->busy_extents);
> +		INIT_LIST_HEAD(&cilpcp->log_items);
>  	}
>  
>  	if (xlog_cil_pcp_hpadd(cil) < 0) {
> @@ -1531,9 +1531,7 @@ xlog_cil_init(
>  		return -ENOMEM;
>  	}
>  
> -	INIT_LIST_HEAD(&cil->xc_cil);
>  	INIT_LIST_HEAD(&cil->xc_committing);
> -	spin_lock_init(&cil->xc_cil_lock);
>  	spin_lock_init(&cil->xc_push_lock);
>  	init_waitqueue_head(&cil->xc_push_wait);
>  	init_rwsem(&cil->xc_ctx_lock);
> @@ -1559,7 +1557,6 @@ xlog_cil_destroy(
>  		kmem_free(cil->xc_ctx);
>  	}
>  
> -	ASSERT(list_empty(&cil->xc_cil));
>  	ASSERT(test_bit(XLOG_CIL_EMPTY, &cil->xc_flags));
>  	xlog_cil_pcp_free(cil, cil->xc_pcp);
>  	kmem_free(cil);
> diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
> index 92d9e1a03a07..12a1a36eef7e 100644
> --- a/fs/xfs/xfs_log_priv.h
> +++ b/fs/xfs/xfs_log_priv.h
> @@ -262,8 +262,6 @@ struct xfs_cil {
>  	struct xlog		*xc_log;
>  	unsigned long		xc_flags;
>  	atomic_t		xc_iclog_hdrs;
> -	struct list_head	xc_cil;
> -	spinlock_t		xc_cil_lock;
>  
>  	struct rw_semaphore	xc_ctx_lock ____cacheline_aligned_in_smp;
>  	struct xfs_cil_ctx	*xc_ctx;
> -- 
> 2.28.0
> 

  reply	other threads:[~2021-03-11  1:15 UTC|newest]

Thread overview: 145+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-05  5:10 [PATCH 00/45 v3] xfs: consolidated log and optimisation changes Dave Chinner
2021-03-05  5:10 ` [PATCH 01/45] xfs: initialise attr fork on inode create Dave Chinner
2021-03-08 22:20   ` Darrick J. Wong
2021-03-16  8:35   ` Christoph Hellwig
2021-03-05  5:11 ` [PATCH 02/45] xfs: log stripe roundoff is a property of the log Dave Chinner
2021-03-05  5:11 ` [PATCH 03/45] xfs: separate CIL commit record IO Dave Chinner
2021-03-08  8:34   ` Chandan Babu R
2021-03-15 14:40   ` Brian Foster
2021-03-16  8:40   ` Christoph Hellwig
2021-03-05  5:11 ` [PATCH 04/45] xfs: remove xfs_blkdev_issue_flush Dave Chinner
2021-03-08  9:31   ` Chandan Babu R
2021-03-08 22:21   ` Darrick J. Wong
2021-03-15 14:40   ` Brian Foster
2021-03-16  8:41   ` Christoph Hellwig
2021-03-05  5:11 ` [PATCH 05/45] xfs: async blkdev cache flush Dave Chinner
2021-03-08  9:48   ` Chandan Babu R
2021-03-08 22:24     ` Darrick J. Wong
2021-03-15 14:41       ` Brian Foster
2021-03-15 16:32         ` Darrick J. Wong
2021-03-16  8:43           ` Christoph Hellwig
2021-03-08 22:26   ` Darrick J. Wong
2021-03-15 14:42   ` Brian Foster
2021-03-05  5:11 ` [PATCH 06/45] xfs: CIL checkpoint flushes caches unconditionally Dave Chinner
2021-03-15 14:43   ` Brian Foster
2021-03-16  8:47   ` Christoph Hellwig
2021-03-05  5:11 ` [PATCH 07/45] xfs: remove need_start_rec parameter from xlog_write() Dave Chinner
2021-03-15 14:45   ` Brian Foster
2021-03-16 14:15   ` Christoph Hellwig
2021-03-05  5:11 ` [PATCH 08/45] xfs: journal IO cache flush reductions Dave Chinner
2021-03-08 10:49   ` Chandan Babu R
2021-03-08 12:25   ` Brian Foster
2021-03-09  1:13     ` Dave Chinner
2021-03-10 20:49       ` Brian Foster
2021-03-10 21:28         ` Dave Chinner
2021-03-05  5:11 ` [PATCH 09/45] xfs: Fix CIL throttle hang when CIL space used going backwards Dave Chinner
2021-03-05  5:11 ` [PATCH 10/45] xfs: reduce buffer log item shadow allocations Dave Chinner
2021-03-15 14:52   ` Brian Foster
2021-03-05  5:11 ` [PATCH 11/45] xfs: xfs_buf_item_size_segment() needs to pass segment offset Dave Chinner
2021-03-05  5:11 ` [PATCH 12/45] xfs: optimise xfs_buf_item_size/format for contiguous regions Dave Chinner
2021-03-05  5:11 ` [PATCH 13/45] xfs: xfs_log_force_lsn isn't passed a LSN Dave Chinner
2021-03-08 22:53   ` Darrick J. Wong
2021-03-11  0:26     ` Dave Chinner
2021-03-05  5:11 ` [PATCH 14/45] xfs: AIL needs asynchronous CIL forcing Dave Chinner
2021-03-08 23:45   ` Darrick J. Wong
2021-03-05  5:11 ` [PATCH 15/45] xfs: CIL work is serialised, not pipelined Dave Chinner
2021-03-08 23:14   ` Darrick J. Wong
2021-03-08 23:38     ` Dave Chinner
2021-03-09  1:55       ` Darrick J. Wong
2021-03-09 22:35         ` Andi Kleen
2021-03-10  6:11           ` Dave Chinner
2021-03-05  5:11 ` [PATCH 16/45] xfs: type verification is expensive Dave Chinner
2021-03-05  5:11 ` [PATCH 17/45] xfs: No need for inode number error injection in __xfs_dir3_data_check Dave Chinner
2021-03-05  5:11 ` [PATCH 18/45] xfs: reduce debug overhead of dir leaf/node checks Dave Chinner
2021-03-05  5:11 ` [PATCH 19/45] xfs: factor out the CIL transaction header building Dave Chinner
2021-03-08 23:47   ` Darrick J. Wong
2021-03-16 14:50   ` Brian Foster
2021-03-05  5:11 ` [PATCH 20/45] xfs: only CIL pushes require a start record Dave Chinner
2021-03-09  0:07   ` Darrick J. Wong
2021-03-16 14:51   ` Brian Foster
2021-03-05  5:11 ` [PATCH 21/45] xfs: embed the xlog_op_header in the unmount record Dave Chinner
2021-03-09  0:15   ` Darrick J. Wong
2021-03-11  2:54     ` Dave Chinner
2021-03-05  5:11 ` [PATCH 22/45] xfs: embed the xlog_op_header in the commit record Dave Chinner
2021-03-09  0:17   ` Darrick J. Wong
2021-03-05  5:11 ` [PATCH 23/45] xfs: log tickets don't need log client id Dave Chinner
2021-03-09  0:21   ` Darrick J. Wong
2021-03-09  1:19     ` Dave Chinner
2021-03-09  1:48       ` Darrick J. Wong
2021-03-11  3:01         ` Dave Chinner
2021-03-16 14:51   ` Brian Foster
2021-03-05  5:11 ` [PATCH 24/45] xfs: move log iovec alignment to preparation function Dave Chinner
2021-03-09  2:14   ` Darrick J. Wong
2021-03-16 14:51   ` Brian Foster
2021-03-05  5:11 ` [PATCH 25/45] xfs: reserve space and initialise xlog_op_header in item formatting Dave Chinner
2021-03-09  2:21   ` Darrick J. Wong
2021-03-11  3:29     ` Dave Chinner
2021-03-11  3:41       ` Darrick J. Wong
2021-03-16 14:54         ` Brian Foster
2021-03-16 14:53   ` Brian Foster
2021-05-19  3:18     ` Dave Chinner
2021-03-05  5:11 ` [PATCH 26/45] xfs: log ticket region debug is largely useless Dave Chinner
2021-03-09  2:31   ` Darrick J. Wong
2021-03-16 14:55   ` Brian Foster
2021-05-19  3:27     ` Dave Chinner
2021-03-05  5:11 ` [PATCH 27/45] xfs: pass lv chain length into xlog_write() Dave Chinner
2021-03-09  2:36   ` Darrick J. Wong
2021-03-11  3:37     ` Dave Chinner
2021-03-16 18:38   ` Brian Foster
2021-03-05  5:11 ` [PATCH 28/45] xfs: introduce xlog_write_single() Dave Chinner
2021-03-09  2:39   ` Darrick J. Wong
2021-03-11  4:19     ` Dave Chinner
2021-03-16 18:39   ` Brian Foster
2021-05-19  3:44     ` Dave Chinner
2021-03-05  5:11 ` [PATCH 29/45] xfs:_introduce xlog_write_partial() Dave Chinner
2021-03-09  2:59   ` Darrick J. Wong
2021-03-11  4:33     ` Dave Chinner
2021-03-18 13:22   ` Brian Foster
2021-05-19  4:49     ` Dave Chinner
2021-05-20 12:33       ` Brian Foster
2021-05-27 18:03         ` Darrick J. Wong
2021-03-05  5:11 ` [PATCH 30/45] xfs: xlog_write() no longer needs contwr state Dave Chinner
2021-03-09  3:01   ` Darrick J. Wong
2021-03-05  5:11 ` [PATCH 31/45] xfs: CIL context doesn't need to count iovecs Dave Chinner
2021-03-09  3:16   ` Darrick J. Wong
2021-03-11  5:03     ` Dave Chinner
2021-03-05  5:11 ` [PATCH 32/45] xfs: use the CIL space used counter for emptiness checks Dave Chinner
2021-03-10 23:01   ` Darrick J. Wong
2021-03-05  5:11 ` [PATCH 33/45] xfs: lift init CIL reservation out of xc_cil_lock Dave Chinner
2021-03-10 23:25   ` Darrick J. Wong
2021-03-11  5:42     ` Dave Chinner
2021-03-05  5:11 ` [PATCH 34/45] xfs: rework per-iclog header CIL reservation Dave Chinner
2021-03-11  0:03   ` Darrick J. Wong
2021-03-11  6:03     ` Dave Chinner
2021-03-05  5:11 ` [PATCH 35/45] xfs: introduce per-cpu CIL tracking sructure Dave Chinner
2021-03-11  0:11   ` Darrick J. Wong
2021-03-11  6:33     ` Dave Chinner
2021-03-11  6:42       ` Dave Chinner
2021-03-05  5:11 ` [PATCH 36/45] xfs: implement percpu cil space used calculation Dave Chinner
2021-03-11  0:20   ` Darrick J. Wong
2021-03-11  6:51     ` Dave Chinner
2021-03-05  5:11 ` [PATCH 37/45] xfs: track CIL ticket reservation in percpu structure Dave Chinner
2021-03-11  0:26   ` Darrick J. Wong
2021-03-12  0:47     ` Dave Chinner
2021-03-05  5:11 ` [PATCH 38/45] xfs: convert CIL busy extents to per-cpu Dave Chinner
2021-03-11  0:36   ` Darrick J. Wong
2021-03-12  1:15     ` Dave Chinner
2021-03-05  5:11 ` [PATCH 39/45] xfs: Add order IDs to log items in CIL Dave Chinner
2021-03-11  1:00   ` Darrick J. Wong
2021-03-05  5:11 ` [PATCH 40/45] xfs: convert CIL to unordered per cpu lists Dave Chinner
2021-03-11  1:15   ` Darrick J. Wong [this message]
2021-03-12  2:18     ` Dave Chinner
2021-03-05  5:11 ` [PATCH 41/45] xfs: move CIL ordering to the logvec chain Dave Chinner
2021-03-11  1:34   ` Darrick J. Wong
2021-03-12  2:29     ` Dave Chinner
2021-03-05  5:11 ` [PATCH 42/45] xfs: __percpu_counter_compare() inode count debug too expensive Dave Chinner
2021-03-11  1:36   ` Darrick J. Wong
2021-03-05  5:11 ` [PATCH 43/45] xfs: avoid cil push lock if possible Dave Chinner
2021-03-11  1:47   ` Darrick J. Wong
2021-03-12  2:36     ` Dave Chinner
2021-03-05  5:11 ` [PATCH 44/45] xfs: xlog_sync() manually adjusts grant head space Dave Chinner
2021-03-11  2:00   ` Darrick J. Wong
2021-03-16  3:04     ` Dave Chinner
2021-03-05  5:11 ` [PATCH 45/45] xfs: expanding delayed logging design with background material Dave Chinner
2021-03-11  2:30   ` Darrick J. Wong
2021-03-16  3:28     ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210311011505.GN3419940@magnolia \
    --to=djwong@kernel.org \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox