From: Edward Shishkin <edward.shishkin@gmail.com>
To: Ivan Shapovalov <intelfx100@gmail.com>,
ReiserFS Development mailing list
<reiserfs-devel@vger.kernel.org>
Subject: Re: reiser4 (ccreg40): very slow mount, poor unlink performance, questions about compression modes
Date: Thu, 11 Sep 2014 19:16:40 +0200 [thread overview]
Message-ID: <5411D8F8.2070603@gmail.com> (raw)
In-Reply-To: <4194447.EgbZsqfSzA@intelfx-laptop>
On 09/10/2014 11:39 PM, Ivan Shapovalov wrote:
> On Wednesday 10 September 2014 at 22:17:15, Edward Shishkin wrote:
>> On 09/10/2014 09:00 PM, Ivan Shapovalov wrote:
>>> Hi!
>>>
>>> The preamble: recently I had to force-change my configuration (the old laptop
>>> was stolen). What I have now is a combination of a tiny 16 GiB SSD and a huge
>>> 1 TiB HDD.
>>>
>>> ...So I've placed my /home on HDD. Partition size is 800 GiB, formatting
>>> options are "create=ccreg40,compress=gzip1,compressMode=latt" and I have a few
>>> questions.
>>>
>>> 1. What is the recommended compression mode?
>> The default one (conv).
> OK, thanks.
>
>>> More specifically, what is the default "conv" mode? What is its purpose, why is
>>> it the default?
>> In this mode intelligent switches take place in 2 interfaces:
>> 1) in FILE interface (if the first 64K of the file are incompressible, then
>> management is passed to unix-file plugin forever);
>> 2) in COMPRESSION interface (turn on/off compression transform
>> on a dynamic lattice).
>>
>> In other compression modes switches take place only in COMPRESSION
>> interface.
>>
>>
>>> I'm asking, because I wasn't able to understand its purpose from code, and the
>>> code itself looks hackish in some places (hardcoded fallback to extent-only
>>> files,
>> Actually, this is implementation of a compression mode, not a hardcoded
>> fallback.
>>
>>
>>> hardcoded policy, hardcoded fallback to "latt" in many cases, etc).
>> ditto
> Yes, I understand that this is implementation and it doesn't have an obligation
> to be configurable in every aspect... but still it feels somewhat strange.
> E. g. why "extents only" formatting is forced when a file is decided to be
> incompressible?
"extents only" formatting policy was set to facilitate debugging process
when implementing the "conv" compression mode.
When "conv" is set, cryptcompress plugin "sends a signal" to the upper
dispatcher to perform switch to unix-file plugin, which, in turn, performs
switches in the ITEM interface, if "smart" formatting policy is
installed (this is
"classic" tail conversion: tails to extents, if file size >= 20K, and
backward).
Setting "extents only", or "tails only" disables the switches.
Why "extents only" instead of "tails only"? When "conv" makes a decision
about the switch, the file is 64K long, so extents are better than tails.
I think that now we can set "smart" instead of "extents only": those
switches won't step on each other.
> Why the heuristic in FILE interface check (compressible only if
> size can be reduced twice) is different from the one in COMPRESSION interface
> (compressible if size can be reduced at all)?
I wanted to increase the portion of unix-files on the partition. It showed
better performance than the heuristics that performs switches in the
COMPRESSION interface. I still don't have satisfactory explanation of
this fact.
> (I'm sorry for too many questions. I'm just curious.)
>
>>> 2. The mount time of a 800-GiB partition is >20 seconds. And with
>>> dont_load_bitmap it's around 1-2 seconds. Why so much?
>> By default all bitmap blocks are loaded to memory at mount time.
>> Now calculate a number of bitmap blocks for 800-GiB partition that
>> should be read from disk.
> 25 MiB of bitmaps. 20 seconds still looks strange...
> Are the blocks specially processed? Don't see anything.
>
>>> Why other filesystems
>>> have drastically less mount times? If they have an equivalent of
>>> dont_load_bitmap enabled by default, why don't we do it?
>> For historical reasons. I recommended to not use large partitions
>> for reiser4, so there wasn't any need in this option.
> OK...
>
>>> 3. Given a directory tree with ~20k files of total size around 20 GiB,
>>> its removal takes forever. From strace I see that a single unlink takes
>>> ~1 second. Again, why so much? Is it related to my choice of "latt" compression
>>> mode over the default "conv"?
>> Yes, in particular.
>> "latt" means that all file bodies are represented by fragments in
>> formatted nodes.
> So... are all cryptcompress files stored in formatted nodes, without
> any equivalent of extents?
Yes, cryptcompress files are composed of items of only one type, so-called
"ctails" (they resembles tails, but have a 1-byte header, which contain size
of file's logical cluster). Unlike unix-file plugin cryptcompress plugin
doesn't
perform switches in ITEM interface.
>>> 3a. I can reproduce the "directory not empty" bug :) Interestingly, it is
>>> always the same directory under the aforementioned huge hierarchy. (I've
>>> done the unpack-remove cycle a few times.)
>> I've made a conclusion that this is caused by unexpected disappearing
>> of a record, which represents a directory entry in the directory item
>> (currently directory items are managed by cde ITEM plugin, aka "compound
>> directory entries"). In the error path (ENOENT) the size of the directory is
>> not decremented, which makes the directory undeletable. I still don't know
>> who kills the entries. Special debugging info is needed to find/fix it.
> What kind of information is needed?
We need to find all places, where the records are created / killed
and insert a hook, which prints such events for the entry which
unexpectedly disappears. This will get us a chance to find the culprit.
I have to say: this is not a big fun...
Thanks,
Edward.
next prev parent reply other threads:[~2014-09-11 17:16 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-09-10 19:00 reiser4 (ccreg40): very slow mount, poor unlink performance, questions about compression modes Ivan Shapovalov
2014-09-10 20:17 ` Edward Shishkin
2014-09-10 21:26 ` Edward Shishkin
2014-09-10 21:39 ` Ivan Shapovalov
2014-09-11 17:16 ` Edward Shishkin [this message]
2014-09-24 19:51 ` Non-deleted directories (Was Re: reiser4 (ccreg40)...) Edward Shishkin
2014-09-26 17:27 ` Ivan Shapovalov
2014-09-26 19:57 ` Edward Shishkin
2014-09-26 20:09 ` Ivan Shapovalov
2014-09-26 20:46 ` Edward Shishkin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5411D8F8.2070603@gmail.com \
--to=edward.shishkin@gmail.com \
--cc=intelfx100@gmail.com \
--cc=reiserfs-devel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.