From: David Sterba <dsterba@suse.cz>
To: Qu Wenruo <wqu@suse.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH RFC] btrfs: allow extent buffer helpers to skip cross-page handling
Date: Mon, 20 Nov 2023 18:00:15 +0100 [thread overview]
Message-ID: <20231120170015.GM11264@twin.jikos.cz> (raw)
In-Reply-To: <721bab821198fc9b49d2795b2028ed6c436ab886.1700111928.git.wqu@suse.com>
On Thu, Nov 16, 2023 at 03:49:06PM +1030, Qu Wenruo wrote:
> Currently btrfs extent buffer helpers are doing all the cross-page
> handling, as there is no guarantee that all those eb pages are
> contiguous.
>
> However on systems with enough memory, there is a very high chance the
> page cache for btree_inode are allocated with physically contiguous
> pages.
>
> In that case, we can skip all the complex cross-page handling, thus
> speeding up the code.
>
> This patch adds a new member, extent_buffer::addr, which is only set to
> non-NULL if all the extent buffer pages are physically contiguous.
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
> Reason for RFC:
>
> This change would increase the code size for all extent buffer helpers,
> and since there one more branch introduced, it may even slow down the
> system if most ebs do not have physically contiguous pages.
>
> But I still believe this is worthy trying, as my previous attempt to
> use virtually contiguous pages are rejected due to possible slow down in
> vm_map() call.
>
> I don't have convincing benchmark yet, but so far no obvious performance
> drop observed either.
> ---
> fs/btrfs/disk-io.c | 9 +++++++-
> fs/btrfs/extent_io.c | 55 ++++++++++++++++++++++++++++++++++++++++++++
> fs/btrfs/extent_io.h | 7 ++++++
> 3 files changed, 70 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 5ac6789ca55f..7fc78171a262 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -80,8 +80,16 @@ static void csum_tree_block(struct extent_buffer *buf, u8 *result)
> char *kaddr;
> int i;
>
> + memset(result, 0, BTRFS_CSUM_SIZE);
> shash->tfm = fs_info->csum_shash;
> crypto_shash_init(shash);
> +
> + if (buf->addr) {
> + crypto_shash_digest(shash, buf->addr + offset_in_page(buf->start) + BTRFS_CSUM_SIZE,
> + buf->len - BTRFS_CSUM_SIZE, result);
> + return;
> + }
> +
> kaddr = page_address(buf->pages[0]) + offset_in_page(buf->start);
> crypto_shash_update(shash, kaddr + BTRFS_CSUM_SIZE,
> first_page_part - BTRFS_CSUM_SIZE);
> @@ -90,7 +98,6 @@ static void csum_tree_block(struct extent_buffer *buf, u8 *result)
> kaddr = page_address(buf->pages[i]);
> crypto_shash_update(shash, kaddr, PAGE_SIZE);
> }
> - memset(result, 0, BTRFS_CSUM_SIZE);
This is not related to the contig pages but the result buffer for
checksum should be always cleared before storing the digest.
> crypto_shash_final(shash, result);
> }
>
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 03cef28d9e37..004b0ba6b1c7 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -3476,6 +3476,7 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
> struct address_space *mapping = fs_info->btree_inode->i_mapping;
> struct btrfs_subpage *prealloc = NULL;
> u64 lockdep_owner = owner_root;
> + bool page_contig = true;
> int uptodate = 1;
> int ret;
>
> @@ -3562,6 +3563,14 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
>
> WARN_ON(btrfs_page_test_dirty(fs_info, p, eb->start, eb->len));
> eb->pages[i] = p;
> +
> + /*
> + * Check if the current page is physically contiguous with previous eb
> + * page.
> + */
> + if (i && eb->pages[i - 1] + 1 != p)
> + page_contig = false;
This hasn't been fixed from last time, this has almost zero chance to
succeed once the system is up for some time. Page addresses returned
from allocator are random. What I was suggesting is to use alloc_page()
with the given order (16K pages are 2).
This works for all eb sizes we need, the prolematic one could be for 64K
because this is order 4 and PAGE_ALLOC_COSTLY_ORDER is 3, so this would
cost more on the MM side. But who uses 64K node size on x8_64.
> --- a/fs/btrfs/extent_io.h
> +++ b/fs/btrfs/extent_io.h
> @@ -77,6 +77,13 @@ struct extent_buffer {
> unsigned long len;
> unsigned long bflags;
> struct btrfs_fs_info *fs_info;
> +
> + /*
> + * The address where the eb can be accessed without any cross-page handling.
> + * This can be NULL if not possible.
> + */
> + void *addr;
So this is a remnant of the vm_map, we would not need to store the
address in case all the pages are contiguous, it would be the address of
pages[0]. That it's contiguous could be tracked as a bit in the flags.
> +
> spinlock_t refs_lock;
> atomic_t refs;
> int read_mirror;
> --
> 2.42.1
>
next prev parent reply other threads:[~2023-11-20 17:07 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-16 5:19 [PATCH RFC] btrfs: allow extent buffer helpers to skip cross-page handling Qu Wenruo
2023-11-20 17:00 ` David Sterba [this message]
2023-11-20 20:25 ` Qu Wenruo
2023-11-21 15:35 ` David Sterba
2023-11-21 20:37 ` Qu Wenruo
2023-11-21 21:14 ` David Sterba
2023-11-21 21:30 ` Qu Wenruo
2023-11-22 13:23 ` David Sterba
2023-11-22 13:34 ` David Sterba
2023-11-22 13:46 ` David Sterba
2023-11-22 20:01 ` Qu Wenruo
2023-11-22 22:05 ` David Sterba
2023-11-23 18:50 ` David Sterba
2023-11-23 20:51 ` Qu Wenruo
2023-11-24 16:03 ` David Sterba
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20231120170015.GM11264@twin.jikos.cz \
--to=dsterba@suse.cz \
--cc=linux-btrfs@vger.kernel.org \
--cc=wqu@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox