public inbox for linux-s390@vger.kernel.org
 help / color / mirror / Atom feed
From: Zaslonko Mikhail <zaslonko@linux.ibm.com>
To: Qu Wenruo <wqu@suse.com>, linux-btrfs@vger.kernel.org
Cc: linux-s390@vger.kernel.org, Vasily Gorbik <gor@linux.ibm.com>
Subject: Re: [PATCH] btrfs: zlib: do not do unnecessary page copying for compression
Date: Mon, 27 May 2024 18:25:48 +0200	[thread overview]
Message-ID: <08aca5cf-f259-4963-bb2a-356847317d94@linux.ibm.com> (raw)
In-Reply-To: <0a24cc8a48821e8cf3bd01263b453c4cbc22d832.1716801849.git.wqu@suse.com>

Hello Qu,

I remember implementing btrfs zlib changes related to s390 dfltcc compression support a while ago:
https://lwn.net/Articles/808809/

The workspace buffer size was indeed enlarged for performance reasons.

Please see my comments below.

On 27.05.2024 11:24, Qu Wenruo wrote:
> [BUG]
> In function zlib_compress_folios(), we handle the input by:
> 
> - If there are multiple pages left
>   We copy the page content into workspace->buf, and use workspace->buf
>   as input for compression.
> 
>   But on x86_64 (which doesn't support dfltcc), that buffer size is just
>   one page, so we're wasting our CPU time copying the page for no
>   benefit.
> 
> - If there is only one page left
>   We use the mapped page address as input for compression.
> 
> The problem is, this means we will copy the whole input range except the
> last page (can be as large as 124K), without much obvious benefit.
> 
> Meanwhile the cost is pretty obvious.

Actually, the behavior for kernels w/o dfltcc support (currently available on s390
only) should not be affected. 
We copy input pages to the workspace->buf only if the buffer size is larger than 1 page.
At least it worked this way after my original btrfs zlib patch:
https://lwn.net/ml/linux-kernel/20200108105103.29028-1-zaslonko@linux.ibm.com/

Has this behavior somehow changed after your page->folio conversion performed for btrfs? 
https://lore.kernel.org/all/cover.1706521511.git.wqu@suse.com/

Am I missing something?

> 
> [POSSIBLE REASON]
> The possible reason may be related to the support of S390 hardware zlib
> decompression acceleration.
> 
> As we allocate 4 pages (4 * 4K) as workspace input buffer just for s390.
> 
> [FIX]
> I checked the dfltcc code, there seems to be no requirement on the
> input buffer size.
> The function dfltcc_can_deflate() only checks:
> 
> - If the compression settings are supported
>   Only level/w_bits/strategy/level_mask is checked.
> 
> - If the hardware supports
> 
> No mention at all on the input buffer size, thus I believe there is no
> need to waste time doing the page copying.
> 
> Maybe the hardware acceleration is so good for s390 that they can offset
> the page copying cost, but it's definitely a penalty for non-s390
> systems.
> 
> So fix the problem by:
> 
> - Use the same buffer size
>   No matter if dfltcc support is enabled or not
> 
> - Always use page address as input
> 
> Cc: linux-s390@vger.kernel.org
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/zlib.c | 67 +++++++++++--------------------------------------
>  1 file changed, 14 insertions(+), 53 deletions(-)
> 
> diff --git a/fs/btrfs/zlib.c b/fs/btrfs/zlib.c
> index d9e5c88a0f85..9c88a841a060 100644
> --- a/fs/btrfs/zlib.c
> +++ b/fs/btrfs/zlib.c
> @@ -65,21 +65,8 @@ struct list_head *zlib_alloc_workspace(unsigned int level)
>  			zlib_inflate_workspacesize());
>  	workspace->strm.workspace = kvzalloc(workspacesize, GFP_KERNEL | __GFP_NOWARN);
>  	workspace->level = level;
> -	workspace->buf = NULL;
> -	/*
> -	 * In case of s390 zlib hardware support, allocate lager workspace
> -	 * buffer. If allocator fails, fall back to a single page buffer.
> -	 */
> -	if (zlib_deflate_dfltcc_enabled()) {
> -		workspace->buf = kmalloc(ZLIB_DFLTCC_BUF_SIZE,
> -					 __GFP_NOMEMALLOC | __GFP_NORETRY |
> -					 __GFP_NOWARN | GFP_NOIO);
> -		workspace->buf_size = ZLIB_DFLTCC_BUF_SIZE;
> -	}
> -	if (!workspace->buf) {
> -		workspace->buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
> -		workspace->buf_size = PAGE_SIZE;
> -	}
> +	workspace->buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
> +	workspace->buf_size = PAGE_SIZE;
>  	if (!workspace->strm.workspace || !workspace->buf)
>  		goto fail;
>  
> @@ -103,7 +90,6 @@ int zlib_compress_folios(struct list_head *ws, struct address_space *mapping,
>  	struct folio *in_folio = NULL;
>  	struct folio *out_folio = NULL;
>  	unsigned long bytes_left;
> -	unsigned int in_buf_folios;
>  	unsigned long len = *total_out;
>  	unsigned long nr_dest_folios = *out_folios;
>  	const unsigned long max_out = nr_dest_folios * PAGE_SIZE;
> @@ -130,7 +116,6 @@ int zlib_compress_folios(struct list_head *ws, struct address_space *mapping,
>  	folios[0] = out_folio;
>  	nr_folios = 1;
>  
> -	workspace->strm.next_in = workspace->buf;
>  	workspace->strm.avail_in = 0;
>  	workspace->strm.next_out = cfolio_out;
>  	workspace->strm.avail_out = PAGE_SIZE;
> @@ -142,43 +127,19 @@ int zlib_compress_folios(struct list_head *ws, struct address_space *mapping,
>  		 */
>  		if (workspace->strm.avail_in == 0) {
>  			bytes_left = len - workspace->strm.total_in;
> -			in_buf_folios = min(DIV_ROUND_UP(bytes_left, PAGE_SIZE),
> -					    workspace->buf_size / PAGE_SIZE);

	doesn't this always set *in_buf_folios* to 1 in case no dfltcc support (single page workspace buffer)?

> -			if (in_buf_folios > 1) {
> -				int i;
> -
> -				for (i = 0; i < in_buf_folios; i++) {
> -					if (data_in) {
> -						kunmap_local(data_in);
> -						folio_put(in_folio);
> -						data_in = NULL;
> -					}
> -					ret = btrfs_compress_filemap_get_folio(mapping,
> -							start, &in_folio);
> -					if (ret < 0)
> -						goto out;
> -					data_in = kmap_local_folio(in_folio, 0);
> -					copy_page(workspace->buf + i * PAGE_SIZE,
> -						  data_in);
> -					start += PAGE_SIZE;
> -				}
> -				workspace->strm.next_in = workspace->buf;
> -			} else {
> -				if (data_in) {
> -					kunmap_local(data_in);
> -					folio_put(in_folio);
> -					data_in = NULL;
> -				}
> -				ret = btrfs_compress_filemap_get_folio(mapping,
> -						start, &in_folio);
> -				if (ret < 0)
> -					goto out;
> -				data_in = kmap_local_folio(in_folio, 0);
> -				start += PAGE_SIZE;
> -				workspace->strm.next_in = data_in;
> +			if (data_in) {
> +				kunmap_local(data_in);
> +				folio_put(in_folio);
> +				data_in = NULL;
>  			}
> -			workspace->strm.avail_in = min(bytes_left,
> -						       (unsigned long) workspace->buf_size);
> +			ret = btrfs_compress_filemap_get_folio(mapping,
> +					start, &in_folio);
> +			if (ret < 0)
> +				goto out;
> +			data_in = kmap_local_folio(in_folio, 0);
> +			start += PAGE_SIZE;
> +			workspace->strm.next_in = data_in;
> +			workspace->strm.avail_in = min(bytes_left, PAGE_SIZE);
>  		}
>  
>  		ret = zlib_deflate(&workspace->strm, Z_SYNC_FLUSH);

Thanks,
Mikhail

  reply	other threads:[~2024-05-27 16:25 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-27  9:24 [PATCH] btrfs: zlib: do not do unnecessary page copying for compression Qu Wenruo
2024-05-27 16:25 ` Zaslonko Mikhail [this message]
2024-05-27 22:09   ` Qu Wenruo
2024-05-28 10:44     ` Zaslonko Mikhail
2024-05-28 21:43       ` David Sterba
2024-05-28 22:02         ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=08aca5cf-f259-4963-bb2a-356847317d94@linux.ibm.com \
    --to=zaslonko@linux.ibm.com \
    --cc=gor@linux.ibm.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox