From: Dave Chinner <david@fromorbit.com>
To: linux-xfs@vger.kernel.org
Subject: [PATCH 2/8] xfs: reduce buffer log item shadow allocations
Date: Wed, 17 Mar 2021 15:57:00 +1100 [thread overview]
Message-ID: <20210317045706.651306-3-david@fromorbit.com> (raw)
In-Reply-To: <20210317045706.651306-1-david@fromorbit.com>
From: Dave Chinner <dchinner@redhat.com>
When we modify btrees repeatedly, we regularly increase the size of
the logged region by a single chunk at a time (per transaction
commit). This results in the CIL formatting code having to
reallocate the log vector buffer every time the buffer dirty region
grows. Hence over a typical 4kB btree buffer, we might grow the log
vector 4096/128 = 32x over a short period where we repeatedly add
or remove records to/from the buffer over a series of running
transaction. This means we are doing 32 memory allocations and frees
over this time during a performance critical path in the journal.
The amount of space tracked in the CIL for the object is calculated
during the ->iop_format() call for the buffer log item, but the
buffer memory allocated for it is calculated by the ->iop_size()
call. The size callout determines the size of the buffer, the format
call determines the space used in the buffer.
Hence we can oversize the buffer space required in the size
calculation without impacting the amount of space used and accounted
to the CIL for the changes being logged. This allows us to reduce
the number of allocations by rounding up the buffer size to allow
for future growth. This can safe a substantial amount of CPU time in
this path:
- 46.52% 2.02% [kernel] [k] xfs_log_commit_cil
- 44.49% xfs_log_commit_cil
- 30.78% _raw_spin_lock
- 30.75% do_raw_spin_lock
30.27% __pv_queued_spin_lock_slowpath
(oh, ouch!)
....
- 1.05% kmem_alloc_large
- 1.02% kmem_alloc
0.94% __kmalloc
This overhead here us what this patch is aimed at. After:
- 0.76% kmem_alloc_large
- 0.75% kmem_alloc
0.70% __kmalloc
The size of 512 bytes is based on the bitmap chunk size being 128
bytes and that random directory entry updates almost never require
more than 3-4 128 byte regions to be logged in the directory block.
The other observation is for per-ag btrees. When we are inserting
into a new btree block, we'll pack it from the front. Hence the
first few records land in the first 128 bytes so we log only 128
bytes, the next 8-16 records land in the second region so now we log
256 bytes. And so on. If we are doing random updates, it will only
allocate every 4 random 128 byte regions that are dirtied instead of
every single one.
Any larger than 512 bytes and I noticed an increase in memory
footprint in my scalability workloads. Any less than this and I
didn't really see any significant benefit to CPU usage.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
fs/xfs/xfs_buf_item.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
index dc0be2a639cc..cb8fd8afd140 100644
--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -143,6 +143,7 @@ xfs_buf_item_size(
{
struct xfs_buf_log_item *bip = BUF_ITEM(lip);
int i;
+ int bytes;
ASSERT(atomic_read(&bip->bli_refcount) > 0);
if (bip->bli_flags & XFS_BLI_STALE) {
@@ -174,7 +175,7 @@ xfs_buf_item_size(
}
/*
- * the vector count is based on the number of buffer vectors we have
+ * The vector count is based on the number of buffer vectors we have
* dirty bits in. This will only be greater than one when we have a
* compound buffer with more than one segment dirty. Hence for compound
* buffers we need to track which segment the dirty bits correspond to,
@@ -182,10 +183,18 @@ xfs_buf_item_size(
* count for the extra buf log format structure that will need to be
* written.
*/
+ bytes = 0;
for (i = 0; i < bip->bli_format_count; i++) {
xfs_buf_item_size_segment(bip, &bip->bli_formats[i],
- nvecs, nbytes);
+ nvecs, &bytes);
}
+
+ /*
+ * Round up the buffer size required to minimise the number of memory
+ * allocations that need to be done as this item grows when relogged by
+ * repeated modifications.
+ */
+ *nbytes = round_up(bytes, 512);
trace_xfs_buf_item_size(bip);
}
--
2.30.1
next prev parent reply other threads:[~2021-03-17 4:58 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-17 4:56 [PATCH v4 0/8] xfs: miscellaneous optimisations Dave Chinner
2021-03-17 4:56 ` [PATCH 1/8] xfs: initialise attr fork on inode create Dave Chinner
2021-03-17 16:44 ` Gao Xiang
2021-03-17 4:57 ` Dave Chinner [this message]
2021-03-17 16:52 ` [PATCH 2/8] xfs: reduce buffer log item shadow allocations Gao Xiang
2021-03-17 4:57 ` [PATCH 3/8] xfs: xfs_buf_item_size_segment() needs to pass segment offset Dave Chinner
2021-03-17 4:57 ` [PATCH 4/8] xfs: optimise xfs_buf_item_size/format for contiguous regions Dave Chinner
2021-03-17 4:57 ` [PATCH 5/8] xfs: type verification is expensive Dave Chinner
2021-03-17 4:57 ` [PATCH 6/8] xfs: No need for inode number error injection in __xfs_dir3_data_check Dave Chinner
2021-03-17 4:57 ` [PATCH 7/8] xfs: reduce debug overhead of dir leaf/node checks Dave Chinner
2021-03-17 4:57 ` [PATCH 8/8] xfs: __percpu_counter_compare() inode count debug too expensive Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210317045706.651306-3-david@fromorbit.com \
--to=david@fromorbit.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox