All of lore.kernel.org
 help / color / mirror / Atom feed
From: Omar Sandoval <osandov@osandov.com>
To: dsterba@suse.cz, Nick Terrell <terrelln@fb.com>,
	kernel-team@fb.com, Omar Sandoval <osandov@fb.com>,
	linux-btrfs@vger.kernel.org, Jennifer Liu <jenniferliu620@fb.com>
Subject: Re: [PATCH v2] btrfs: add zstd compression level support
Date: Mon, 19 Nov 2018 12:05:54 -0800	[thread overview]
Message-ID: <20181119200554.GB25682@vader> (raw)
In-Reply-To: <20181113003332.GV24115@twin.jikos.cz>

On Tue, Nov 13, 2018 at 01:33:32AM +0100, David Sterba wrote:
> On Wed, Oct 31, 2018 at 11:11:08AM -0700, Nick Terrell wrote:
> > From: Jennifer Liu <jenniferliu620@fb.com>
> > 
> > Adds zstd compression level support to btrfs. Zstd requires
> > different amounts of memory for each level, so the design had
> > to be modified to allow set_level() to allocate memory. We
> > preallocate one workspace of the maximum size to guarantee
> > forward progress. This feature is expected to be useful for
> > read-mostly filesystems, or when creating images.
> > 
> > Benchmarks run in qemu on Intel x86 with a single core.
> > The benchmark measures the time to copy the Silesia corpus [0] to
> > a btrfs filesystem 10 times, then read it back.
> > 
> > The two important things to note are:
> > - The decompression speed and memory remains constant.
> >   The memory required to decompress is the same as level 1.
> > - The compression speed and ratio will vary based on the source.
> > 
> > Level	Ratio	Compression	Decompression	Compression Memory
> > 1    	2.59 	153 MB/s   	112 MB/s     	0.8 MB
> > 2    	2.67 	136 MB/s   	113 MB/s     	1.0 MB
> > 3    	2.72 	106 MB/s   	115 MB/s     	1.3 MB
> > 4    	2.78 	86  MB/s   	109 MB/s     	0.9 MB
> > 5    	2.83 	69  MB/s   	109 MB/s     	1.4 MB
> > 6    	2.89 	53  MB/s   	110 MB/s     	1.5 MB
> > 7    	2.91 	40  MB/s   	112 MB/s     	1.4 MB
> > 8    	2.92 	34  MB/s   	110 MB/s     	1.8 MB
> > 9    	2.93 	27  MB/s   	109 MB/s     	1.8 MB
> > 10   	2.94 	22  MB/s   	109 MB/s     	1.8 MB
> > 11   	2.95 	17  MB/s   	114 MB/s     	1.8 MB
> > 12   	2.95 	13  MB/s   	113 MB/s     	1.8 MB
> > 13   	2.95 	10  MB/s   	111 MB/s     	2.3 MB
> > 14   	2.99 	7   MB/s   	110 MB/s     	2.6 MB
> > 15   	3.03 	6   MB/s   	110 MB/s     	2.6 MB
> > 
> > [0] http://sun.aei.polsl.pl/~sdeor/index.php?page=silesia
> > 
> > Signed-off-by: Jennifer Liu <jenniferliu620@fb.com>
> > Signed-off-by: Nick Terrell <terrelln@fb.com>
> > Reviewed-by: Omar Sandoval <osandov@fb.com>
> > ---
> > v1 -> v2:
> > - Don't reflow the unchanged line.
> > 

[snip]

> > -static struct list_head *zstd_alloc_workspace(void)
> > +static bool zstd_set_level(struct list_head *ws, unsigned int level)
> > +{
> > +	struct workspace *workspace = list_entry(ws, struct workspace, list);
> > +	ZSTD_parameters params;
> > +	int size;
> > +
> > +	if (level > BTRFS_ZSTD_MAX_LEVEL)
> > +		level = BTRFS_ZSTD_MAX_LEVEL;
> > +
> > +	if (level == 0)
> > +		level = BTRFS_ZSTD_DEFAULT_LEVEL;
> > +
> > +	params = ZSTD_getParams(level, ZSTD_BTRFS_MAX_INPUT, 0);
> > +	size = max_t(size_t,
> > +			ZSTD_CStreamWorkspaceBound(params.cParams),
> > +			ZSTD_DStreamWorkspaceBound(ZSTD_BTRFS_MAX_INPUT));
> > +	if (size > workspace->size) {
> > +		if (!zstd_reallocate_mem(workspace, size))
> 
> This can allocate memory and this can appen on the writeout path, ie.
> one of the reasons for that might be that system needs more memory.
> 
> By the table above, the size can be up to 2.6MiB, which is a lot in
> terms of kernel memory as there must be either contiguous unmapped
> memory, the virtual mappings must be created. Both are scarce resource
> or should be treated as such.
> 
> Given that there's no logic that would try to minimize the usage for
> workspaces, this can allocate many workspaces of that size.
> 
> Currently the workspace allocations have been moved to the early module
> loading phase so that they don't happen later and we don't have to
> allocate memory nor handle the failures. Your patch brings that back.

Even before this patch, we may try to allocate a workspace. See
__find_workspace():

https://github.com/kdave/btrfs-devel/blob/fd0f5617a8a2ee92dd461d01cf9c5c37363ccc8d/fs/btrfs/compression.c#L897

We already limit it to one per CPU, and only allocate when needed.
Anything greater than that has to wait. Maybe we should improve that to
also include a limit on the total amount of memory allocated? That would
be more flexible than your approach below of making the > default case
special, and I like it more than Timofey's idea of falling back to a
lower level.

> The solution I'm currently thinking about can make the levels work but
> would be limited in throughput as a trade-off for the memory
> consumption.
> 
> - preallocate one workspace for level 15 per mounted filesystem, using
>   get_free_pages_exact or kvmalloc
> 
> - preallocate workspaces for the default level, the number same as for
>   lzo/zlib
> 
> - add logic to select the zstd:15 workspace last for other compressors,
>   ie. make it almost exclusive for zstd levels > default
> 
> Further refinement could allocate the workspaces on-demand and free if
> not used. Allocation failures would still be handled gracefully at the
> cost of waiting for the remainig worspaces, at least one would be always
> available.

      parent reply	other threads:[~2018-11-19 20:05 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-31 18:11 [PATCH v2] btrfs: add zstd compression level support Nick Terrell
2018-11-01  9:57 ` Timofey Titovets
2018-11-01 15:40   ` Nick Terrell
2018-11-13  0:01 ` David Sterba
2018-11-13  0:33 ` David Sterba
2018-11-13  1:49   ` Nick Terrell
2018-11-13 13:29     ` Timofey Titovets
2018-11-16  1:42       ` Nick Terrell
2018-11-19 19:57       ` Omar Sandoval
2018-11-19 20:05   ` Omar Sandoval [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181119200554.GB25682@vader \
    --to=osandov@osandov.com \
    --cc=dsterba@suse.cz \
    --cc=jenniferliu620@fb.com \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=osandov@fb.com \
    --cc=terrelln@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.