public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: David Sterba <dsterba@suse.cz>
To: Qu Wenruo <wqu@suse.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH RFC] btrfs: allow extent buffer helpers to skip cross-page handling
Date: Mon, 20 Nov 2023 18:00:15 +0100	[thread overview]
Message-ID: <20231120170015.GM11264@twin.jikos.cz> (raw)
In-Reply-To: <721bab821198fc9b49d2795b2028ed6c436ab886.1700111928.git.wqu@suse.com>

On Thu, Nov 16, 2023 at 03:49:06PM +1030, Qu Wenruo wrote:
> Currently btrfs extent buffer helpers are doing all the cross-page
> handling, as there is no guarantee that all those eb pages are
> contiguous.
> 
> However on systems with enough memory, there is a very high chance the
> page cache for btree_inode are allocated with physically contiguous
> pages.
> 
> In that case, we can skip all the complex cross-page handling, thus
> speeding up the code.
> 
> This patch adds a new member, extent_buffer::addr, which is only set to
> non-NULL if all the extent buffer pages are physically contiguous.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
> Reason for RFC:
> 
> This change would increase the code size for all extent buffer helpers,
> and since there one more branch introduced, it may even slow down the
> system if most ebs do not have physically contiguous pages.
> 
> But I still believe this is worthy trying, as my previous attempt to
> use virtually contiguous pages are rejected due to possible slow down in
> vm_map() call.
> 
> I don't have convincing benchmark yet, but so far no obvious performance
> drop observed either.
> ---
>  fs/btrfs/disk-io.c   |  9 +++++++-
>  fs/btrfs/extent_io.c | 55 ++++++++++++++++++++++++++++++++++++++++++++
>  fs/btrfs/extent_io.h |  7 ++++++
>  3 files changed, 70 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 5ac6789ca55f..7fc78171a262 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -80,8 +80,16 @@ static void csum_tree_block(struct extent_buffer *buf, u8 *result)
>  	char *kaddr;
>  	int i;
>  
> +	memset(result, 0, BTRFS_CSUM_SIZE);
>  	shash->tfm = fs_info->csum_shash;
>  	crypto_shash_init(shash);
> +
> +	if (buf->addr) {
> +		crypto_shash_digest(shash, buf->addr + offset_in_page(buf->start) + BTRFS_CSUM_SIZE,
> +				    buf->len - BTRFS_CSUM_SIZE, result);
> +		return;
> +	}
> +
>  	kaddr = page_address(buf->pages[0]) + offset_in_page(buf->start);
>  	crypto_shash_update(shash, kaddr + BTRFS_CSUM_SIZE,
>  			    first_page_part - BTRFS_CSUM_SIZE);
> @@ -90,7 +98,6 @@ static void csum_tree_block(struct extent_buffer *buf, u8 *result)
>  		kaddr = page_address(buf->pages[i]);
>  		crypto_shash_update(shash, kaddr, PAGE_SIZE);
>  	}
> -	memset(result, 0, BTRFS_CSUM_SIZE);

This is not related to the contig pages but the result buffer for
checksum should be always cleared before storing the digest.

>  	crypto_shash_final(shash, result);
>  }
>  
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 03cef28d9e37..004b0ba6b1c7 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -3476,6 +3476,7 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
>  	struct address_space *mapping = fs_info->btree_inode->i_mapping;
>  	struct btrfs_subpage *prealloc = NULL;
>  	u64 lockdep_owner = owner_root;
> +	bool page_contig = true;
>  	int uptodate = 1;
>  	int ret;
>  
> @@ -3562,6 +3563,14 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
>  
>  		WARN_ON(btrfs_page_test_dirty(fs_info, p, eb->start, eb->len));
>  		eb->pages[i] = p;
> +
> +		/*
> +		 * Check if the current page is physically contiguous with previous eb
> +		 * page.
> +		 */
> +		if (i && eb->pages[i - 1] + 1 != p)
> +			page_contig = false;

This hasn't been fixed from last time, this has almost zero chance to
succeed once the system is up for some time. Page addresses returned
from allocator are random. What I was suggesting is to use alloc_page()
with the given order (16K pages are 2).

This works for all eb sizes we need, the prolematic one could be for 64K
because this is order 4 and PAGE_ALLOC_COSTLY_ORDER is 3, so this would
cost more on the MM side. But who uses 64K node size on x8_64.

> --- a/fs/btrfs/extent_io.h
> +++ b/fs/btrfs/extent_io.h
> @@ -77,6 +77,13 @@ struct extent_buffer {
>  	unsigned long len;
>  	unsigned long bflags;
>  	struct btrfs_fs_info *fs_info;
> +
> +	/*
> +	 * The address where the eb can be accessed without any cross-page handling.
> +	 * This can be NULL if not possible.
> +	 */
> +	void *addr;

So this is a remnant of the vm_map, we would not need to store the
address in case all the pages are contiguous, it would be the address of
pages[0]. That it's contiguous could be tracked as a bit in the flags.

> +
>  	spinlock_t refs_lock;
>  	atomic_t refs;
>  	int read_mirror;
> -- 
> 2.42.1
> 

  reply	other threads:[~2023-11-20 17:07 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-16  5:19 [PATCH RFC] btrfs: allow extent buffer helpers to skip cross-page handling Qu Wenruo
2023-11-20 17:00 ` David Sterba [this message]
2023-11-20 20:25   ` Qu Wenruo
2023-11-21 15:35     ` David Sterba
2023-11-21 20:37       ` Qu Wenruo
2023-11-21 21:14         ` David Sterba
2023-11-21 21:30           ` Qu Wenruo
2023-11-22 13:23             ` David Sterba
2023-11-22 13:34 ` David Sterba
2023-11-22 13:46 ` David Sterba
2023-11-22 20:01   ` Qu Wenruo
2023-11-22 22:05     ` David Sterba
2023-11-23 18:50     ` David Sterba
2023-11-23 20:51     ` Qu Wenruo
2023-11-24 16:03       ` David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231120170015.GM11264@twin.jikos.cz \
    --to=dsterba@suse.cz \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox