From: Theodore Ts'o <tytso@mit.edu>
To: Tom Marshall <tom@cyngn.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>, linux-fsdevel@vger.kernel.org
Subject: Re: fs compression
Date: Wed, 20 May 2015 17:36:41 -0400 [thread overview]
Message-ID: <20150520213641.GM2871@thunk.org> (raw)
In-Reply-To: <20150520174635.GA17651@eden.sea.cyngn.com>
On Wed, May 20, 2015 at 10:46:35AM -0700, Tom Marshall wrote:
> So I've been playing around a bit and I have a basic strategy laid out.
> Please let me know if I'm on the right track.
>
> Compressed file attributes
> ==========================
>
> The filesystem is responsible for detecting whether a file is compressed and
> hooking into the compression lib. This may be done with an inode flag,
> xattr, or any other applicable method. No other special attributes are
> necessary.
So I assume what you are implementing is read-only compression; that
is, once the file is written, and the attribute set indicating that
this is a compressed file, it is now immutable.
> Compressed file format
> ======================
>
> Compressed files shall have header, block map, and data sections.
>
> Header:
>
> byte[4] magic 'zzzz' (not strictly needed)
> byte param1 method and flags
> bits 0..3 = compression method (1=zlib, 2=lz4, etc.)
> bits 4..7 = flags (none defined yet)
> byte blocksize log2 of blocksize (max 31)
I suggest using the term "compression cluster" to distinguish this
from the file system block size.
> le48 orig_size original uncompressed file size
>
>
> Block map:
>
> Vector of le16 (if blocksize <= 16) or le32 (if blocksize > 16). Each entry
> is the compressed size of the block. Zero indicates that the block is
> stored uncompressed, in case compression expanded the block.
What I would store instead is list of 32 or 64-bit offsets, where the
nth entry in the array indicates the starting offset of the nth
compression cluster.
> Questions and issues
====================
>
> Should there be any padding for the data blocks? For example, if writing is
> to be supported, padding the compressed data to the filesystem block size
> would allow for easy rewriting of individual blocks without disturbing the
> surrounding blocks. Perhaps padding could be indicated by a flag.
If you add padding then you defeat the whole point of adding
compression. What if the initial contents of a 64k cluster was all
zeros, so it trivially compresses down to a few dozen bytes; but then
it gets replaced by completely uncompressible data? If you add 64k
worth of padding to each block, then you're not saving any space, so
what's the point?
> The compression code must be able to read pages from the underlying
> filesystem. This involves using the pagecache. But the uncompressed data
> is what ultimately should end up in the pagecache. This is where I'm
> currently stuck. How do I implement the code such that the underlying
> compressed data may be read (using the pagecache or not) while not
> disturbing the pagecache for the uncompressed data? I'm wondering if I need
> to create an internal address_space to pass down into the underlying
> readpage? Or is there another way to do this?
So I would *not* reference the compressed data via the page cache. If
you do that, then you end up wasting space in the page cache, since
the page cache will contain both the compressed and decompressed data
--- and once the data has been decompressed, the compressed version is
completely useless. So it's better to have the file system supply the
physical location on disk, and then to read in the compressed data to
a scratched set of page which is freed immediately after you are done
decompressing things.
This is why compression is so very different from encryption. The
constraints make it quite different.
Regards,
- Ted
next prev parent reply other threads:[~2015-05-20 21:36 UTC|newest]
Thread overview: 67+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-05-09 4:20 [PATCH 01/18] f2fs: avoid value overflow in showing current status Jaegeuk Kim
2015-05-09 4:20 ` [PATCH 02/18] f2fs: report unwritten area in f2fs_fiemap Jaegeuk Kim
2015-05-09 4:20 ` Jaegeuk Kim
2015-05-09 4:20 ` [PATCH 03/18] f2fs crypto: declare some definitions for f2fs encryption feature Jaegeuk Kim
2015-05-09 4:20 ` Jaegeuk Kim
2015-05-13 2:02 ` Dave Chinner
2015-05-13 2:23 ` nick
2015-05-13 6:48 ` Jaegeuk Kim
2015-05-14 0:37 ` Dave Chinner
2015-05-14 1:56 ` Jaegeuk Kim
2015-05-14 1:56 ` Jaegeuk Kim
2015-05-14 16:50 ` Tom Marshall
2015-05-16 1:14 ` Jaegeuk Kim
2015-05-16 4:47 ` Tom Marshall
2015-05-18 6:24 ` Jaegeuk Kim
2015-05-16 13:24 ` Theodore Ts'o
2015-05-16 13:24 ` Theodore Ts'o
2015-05-16 17:13 ` Tom Marshall
2015-05-20 17:46 ` fs compression Tom Marshall
2015-05-20 19:50 ` Tom Marshall
2015-05-20 21:36 ` Theodore Ts'o [this message]
2015-05-20 22:46 ` Tom Marshall
2015-05-21 4:28 ` Tom Marshall
2015-05-27 18:53 ` Tom Marshall
2015-05-27 23:38 ` Theodore Ts'o
2015-05-28 0:20 ` Tom Marshall
2015-05-28 20:55 ` Tom Marshall
2015-05-29 0:18 ` Tom Marshall
2015-05-29 17:05 ` Tom Marshall
2015-05-29 21:52 ` Tom Marshall
2015-05-09 4:20 ` [PATCH 04/18] f2fs crypto: add f2fs encryption Kconfig Jaegeuk Kim
2015-05-09 4:20 ` [PATCH 05/18] f2fs crypto: add encryption xattr support Jaegeuk Kim
2015-05-09 4:20 ` Jaegeuk Kim
2015-05-09 4:20 ` [PATCH 06/18] f2fs crypto: add encryption policy and password salt support Jaegeuk Kim
2015-05-09 4:20 ` Jaegeuk Kim
2015-05-09 4:20 ` [PATCH 07/18] f2fs crypto: add f2fs encryption facilities Jaegeuk Kim
2015-05-09 4:20 ` Jaegeuk Kim
2015-05-09 4:20 ` [PATCH 08/18] f2fs crypto: add encryption key management facilities Jaegeuk Kim
2015-05-09 4:20 ` Jaegeuk Kim
2015-05-09 4:20 ` [PATCH 09/18] f2fs crypto: filename encryption facilities Jaegeuk Kim
2015-05-09 4:20 ` Jaegeuk Kim
2015-05-09 4:20 ` [PATCH 10/18] f2fs crypto: activate encryption support for fs APIs Jaegeuk Kim
2015-05-09 4:20 ` Jaegeuk Kim
2015-05-09 4:20 ` [PATCH 11/18] f2fs crypto: add encryption support in read/write paths Jaegeuk Kim
2015-05-09 4:20 ` Jaegeuk Kim
2015-05-09 4:20 ` [PATCH 12/18] f2fs crypto: add filename encryption for f2fs_add_link Jaegeuk Kim
2015-05-09 4:20 ` Jaegeuk Kim
2015-05-09 4:20 ` [PATCH 13/18] f2fs crypto: add filename encryption for f2fs_readdir Jaegeuk Kim
2015-05-09 4:20 ` Jaegeuk Kim
2015-05-09 4:20 ` [PATCH 14/18] f2fs crypto: add filename encryption for f2fs_lookup Jaegeuk Kim
2015-05-09 4:20 ` Jaegeuk Kim
2015-05-11 2:52 ` hujianyang
2015-05-11 2:52 ` [f2fs-dev] " hujianyang
2015-05-11 5:12 ` Jaegeuk Kim
2015-05-11 5:12 ` Jaegeuk Kim
2015-05-11 6:38 ` hujianyang
2015-05-11 6:38 ` hujianyang
2015-05-09 4:20 ` [PATCH 15/18] f2fs crypto: add filename encryption for roll-forward recovery Jaegeuk Kim
2015-05-09 4:20 ` Jaegeuk Kim
2015-05-09 4:20 ` [PATCH 16/18] f2fs crypto: add symlink encryption Jaegeuk Kim
2015-05-09 4:20 ` Jaegeuk Kim
2015-05-09 4:25 ` Al Viro
2015-05-11 5:15 ` Jaegeuk Kim
2015-05-12 3:48 ` [PATCH 16/18 v2] " Jaegeuk Kim
2015-05-09 4:20 ` [PATCH 17/18] f2fs crypto: fix missing key when reading a page Jaegeuk Kim
2015-05-09 4:20 ` [PATCH 18/18] f2fs crypto: remove checking key context during lookup Jaegeuk Kim
2015-05-09 4:20 ` Jaegeuk Kim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150520213641.GM2871@thunk.org \
--to=tytso@mit.edu \
--cc=jaegeuk@kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=tom@cyngn.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.