reiserfs-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Edward Shishkin <edward.shishkin@gmail.com>
To: Ivan Shapovalov <intelfx100@gmail.com>
Cc: reiserfs-devel@vger.kernel.org
Subject: Re: reiser4: discard support
Date: Fri, 02 May 2014 16:07:21 +0200	[thread overview]
Message-ID: <5363A699.90108@gmail.com> (raw)
In-Reply-To: <3402083.oAjQiz7CVk@intelfx-laptop>

On 05/02/2014 03:36 PM, Ivan Shapovalov wrote:
> On Friday 02 May 2014 at 13:48:28, Edward Shishkin wrote:	
>> On 05/02/2014 01:51 AM, Ivan Shapovalov wrote:
>>> On Wednesday 30 April 2014 at 19:51:03, Edward Shishkin wrote:
>>>> On 04/30/2014 05:04 PM, Ivan Shapovalov wrote:
>>>> [...]
>>>>
>>>>>> There is a ready implementation of rb-trees in the kernel (see
>>>>>> ./lib/rbtree.c,
>>>>>> ./lib/rbtree_test.c). Could you please take a look at this?
>>>>> I've checked it already. Well, in case of trees:
>>>>> - combined extent insertion+adjacent extent merge complexity is
>>>>> O(n*log(n))
>>>> Actually, this is:
>>>>
>>>> 1) Add n extents one-by-one to empty tree:
>>>> log(1) + log(2) + ... + log(n-1)
>>>>
>>>> 2) Merge 2 trees of n extents:
>>>> log(n) + log(n+1) + ... + log(2n-1)
>>>>
>>>> Total: log(1*2*...*2n-1) = log((2n-1)!)
>>>>
>>>> in accordance with Stirling's formula:
>>>>
>>>> log((2n)!) = log(((2n)^(1/2))*(2n/e)^n) =
>>>> = (1/2)*log(2n) + n*log(2n/e) < n + n*log(n)
>>> Hm.
>>>
>>> Consider we fill two data structures with N extents each, then merge these
>>> two data structures and prepare the resulting one for traversal (for
>>> lists it will be sorting, for trees that's no-op).
>>>
>>> For trees:
>>> 1) Adding
>>> 2*log((n-1)!)
>> Why "2*" ?
>> I suppose they are filled in parallel...
>> Total complexity (when everything is serialized) is not interesting ;)
> Oh why? Everything is still serialized by the physical CPU, for that matter.
> So less complexity -> less CPU time...


We can perfectly populate different "discard trees" in parallel on 
different CPUs.
As to sorting the list: I don't know how to perform it in parallel :)
Default assumptions, that everything is serialized, usually lead to various
bad-scalable solutions...


>
>>> 2) Merging
>>> log((2n-1)!/(n-1)!)
>>>
>>> 3) Sorting
>>> none
>>>
>>> Total:
>>> log((2n-1)!) + log((n-1)!)
>>>
>>> For lists:
>>> 1) Adding
>>> 2N
>>>
>>> 2) Merging
>>> constant
>>>
>>> 3) Sorting
>>> 2N*log(2N)
>>>
>>> Total:
>>> 2N+2N*log(2N)
>>>
>>> Quick testing in WolframAlpha shows that, if these estimations are
>>> correct,
>>> lists tend to have lesser complexity starting with approx. N=160.
>>>
>>>> Since atoms can be merged in parallel, we have that
>>>> trees work better than lists. Also tree of extents has
>>>> better memory consumption because of merging extents.
>>>>
>>>>>    against O(n+n*log(n)) in lists;
>>>>>
>>>>> - joining two trees is not less than O(log(n)) against O(1) in lists.
>>>>>
>>>>> So I can't see any benefit in using trees, and join performance is a
>>>>> downside of these. Am I missing something?
>>>> [...]
>>>>
>>>>> Why after reiser4_invalidate_list()? I thought that it should be
>>>>> called between reiser4_write_logs(), but before
>>>>> reiser4_invalidate_list().
>>>> OK, insert before. I think, it doesn't matter, since we don't look
>>>> at this list when issuing discard requests, no? :)
>>> I just don't understand what implications does that function have. It
>>> seems to mess with jnodes, and I can imagine that this leads to races
>>> with somebody modifying the bitmaps while issue_discard_requests() tries
>>> to read them to check discard extents...
>> "somebody modifying the bitmaps" will wait for commit_mutex.
>> Have you found where the bitmaps are modified? This is
>> pre_commit_hook_bitmap().
> Yes, but I was not sure this is the only place where the bitmaps are
> touched...
>
>>>    Again, for me, internals of reiser4 still mostly look
>>>
>>> like heavy wizardry. :)
>> This is because the technical level of reiser4 is very high.
>> Many very strong scientists contributed to reiser4.
>>
>> I also don't understand in details how some reiser4 subsystems work.
>> Once I need details, I open the respective code and read it with the
>> comments.
>> The most important thing is general understanding of how it works.
>> Nobody is able to grok this only by reading reiser4 code before
>> sleeping. You need to implement at least one feature by your hands.
>> Yes, it is not easy, and not for everyone. Like every high-level research
>> in the science...
>>
>> Edward.
> ...and nobody to this point had written an explanation of that, or some piece
> of documentation except the design document...


what explanation do you mean?

Edward.



  reply	other threads:[~2014-05-02 14:07 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-30  7:43 reiser4: discard support Ivan Shapovalov
2014-04-30 14:11 ` Edward Shishkin
     [not found] ` <1B07908E-B768-407D-ADFA-B3C539FABB49@gmail.com>
     [not found]   ` <53613807.7090605@gmail.com>
2014-05-01 23:51     ` Ivan Shapovalov
     [not found]       ` <CAJZSrNK4=o9vocDfSM=4W5ZgtqZ6RVpmU66sGCu6HFsdN47OHw@mail.gmail.com>
2014-05-02 11:06         ` Ivan Shapovalov
2014-05-02 11:48       ` Edward Shishkin
2014-05-02 12:44         ` Edward Shishkin
2014-05-02 13:47           ` Ivan Shapovalov
2014-05-02 13:36         ` Ivan Shapovalov
2014-05-02 14:07           ` Edward Shishkin [this message]
2014-05-02 14:32             ` Ivan Shapovalov
2014-05-02 18:10               ` Edward Shishkin
2014-05-03 18:48                 ` Ivan Shapovalov
2014-05-03 20:21                   ` Edward Shishkin
2014-05-03 20:32                     ` Ivan Shapovalov
2014-05-06  8:58                       ` Edward Shishkin
2014-05-07  7:35                         ` Ivan Shapovalov
2014-05-07 21:04                           ` Edward Shishkin
  -- strict thread matches above, loose matches on Subject: below --
2014-04-30 17:55 Edward Shishkin
2014-04-30 18:53 ` Edward Shishkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5363A699.90108@gmail.com \
    --to=edward.shishkin@gmail.com \
    --cc=intelfx100@gmail.com \
    --cc=reiserfs-devel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).