linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Sterba <dave@jikos.cz>
To: Li Zefan <lizf@cn.fujitsu.com>
Cc: uzytkownik2@gmail.com, linux-btrfs@vger.kernel.org
Subject: Re: Inefficient storing of ISO images with compress=lzo
Date: Tue, 20 Sep 2011 16:29:38 +0200	[thread overview]
Message-ID: <20110920142937.GM22205@twin.jikos.cz> (raw)
In-Reply-To: <4E76AEB9.4080109@cn.fujitsu.com>

On Mon, Sep 19, 2011 at 10:53:45AM +0800, Li Zefan wrote:
> With compress option specified, btrfs will try to compress the file, at most
> 128K at one time, and if the compressed result is not smaller, the file will
> be marked as uncompressable.
> 
> I just tried with Fedora-14-i386-DVD.iso, and the first 896K is compressed,
> with a compress ratio about 71.7%, and the remaining data is not compressed.

I'm curious how did you obtain that number and if it's a rough estimate
(ie. some rounding up to 4k or such), or the % comes from exact numbers.

AFAIK there are two possibilities to read compressed sizes:

rough:
* traverse extents, look for compressed extens and sum up
  extent_map->block_len, or just extent_map->len for uncompressed

* block_len is rounded up to 4k
* compressed inline size is not stored in any structur member, at most 4k


exact:
as you know, the only place where exact size of compressed data is
stored are first 4 bytes of every compressed extent, counting exact size
of compressed extent means to read those bytes, naturally.


Touching non-metadata just to read compressed size does not look nice. I
did some research in that area and my conclusion is that it there's a
missing structure member "compressed_length" in extent_map (in-memory
structure, no problem to add it there) which is filled from
strcut btrfs_file_extent_item (on-disk structure, eg. holding
compression type) -- disk format change :( Other members could not be
used to calculate the compressed size, being either estimates by
definition (ram_size) or contain size depending on other data
(disk_num_bytes, depend on checksum size).

Although there are 2 bytes spare for other compression types, there are
none to hold the actual compression or encryption or whateverencoding
length.

So until there's going to be format change, there are the two ways,
rough or slow, to read compressed size.  (Unless I've missed something
obvious etc.)

Looking forward to your input or patches :)


Thanks,
david

      parent reply	other threads:[~2011-09-20 14:29 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-09-18 23:32 Inefficient storing of ISO images with compress=lzo Maciej Marcin Piechotka
2011-09-19  2:53 ` Li Zefan
2011-09-19  3:13   ` Li Zefan
2011-09-19 23:06   ` Maciej Marcin Piechotka
2011-09-20  2:19     ` Li Zefan
2011-09-20 14:29   ` David Sterba [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110920142937.GM22205@twin.jikos.cz \
    --to=dave@jikos.cz \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=lizf@cn.fujitsu.com \
    --cc=uzytkownik2@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).