From: Edward Shishkin <edward.shishkin@gmail.com>
To: Ivan Shapovalov <intelfx100@gmail.com>
Cc: reiserfs-devel@vger.kernel.org
Subject: Re: reiser4: discard implementation, pass 2: allocation issues
Date: Mon, 16 Jun 2014 13:32:02 +0200 [thread overview]
Message-ID: <539ED5B2.9060406@gmail.com> (raw)
In-Reply-To: <45271088.MxWDMbZH2T@intelfx-laptop>
On 06/16/2014 01:00 PM, Ivan Shapovalov wrote:
> On Monday 16 June 2014 at 11:24:39, Edward Shishkin wrote:
>> On 06/16/2014 07:03 AM, Ivan Shapovalov wrote:
>>> On Monday 16 June 2014 at 02:14:49, Edward Shishkin wrote:
>>>> On 06/15/2014 11:58 PM, Ivan Shapovalov wrote:
>>>>> On Sunday 15 June 2014 at 23:49:59, Edward Shishkin wrote:
>>>>>> On 06/15/2014 08:07 PM, Ivan Shapovalov wrote:
>>>>>>> On Sunday 15 June 2014 at 19:36:05, Edward Shishkin wrote:
>>>>>>>> On 06/13/2014 10:28 PM, Ivan Shapovalov wrote:
>>>>>>>>> Here is my "analysis" of what happens in reiser4 during a transaction's
>>>>>>>>> lifetime wrt. block allocation and deallocation.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> THE EFFECTS (SEMANTICS) OF RELATED FUNCTIONS
>>>>>>>>> reiser4_alloc_blocks_bitmap(): allocates in WORKING BITMAP
>>>>>>>> Yes.
>>>>>>>>
>>>>>>>>> reiser4_dealloc_blocks_bitmap(!BA_DEFER): deallocates from WORKING BITMAP
>>>>>>>>> reiser4_dealloc_blocks_bitmap(BA_DEFER): stores to ->delete_set
>>>>>>>> This is correct for the middle-level allocator (without suffix "bitmap").
>>>>>>>> The low-level one frees blocks only in WORKING BITMAP.
>>>>>>> Yes, this was a wording mistake. I've noticed it shortly after sending...
>>>>>>>
>>>>>>>>> reiser4_pre_commit_hook_bitmap(): allocates all relocated nodes in COMMIT BITMAP
>>>>>>>> "Relocated" is bad term here: nodes with new data also get the flag
>>>>>>>> JNODE_RELOC. So, I would rather say, that it applies freshly allocated
>>>>>>>> nodes of the atom to COMMIT BITMAP.
>>>>>>>>
>>>>>>>>
>>>>>>>>> deallocates ->delete_set from COMMIT BITMAP
>>>>>>>> applies the atom's delete_set to COMMIT BITMAP
>>>>>>> I've meant exactly this.
>>>>>>>
>>>>>>>>> reiser4_post_commit_hook(): deallocates ->delete_set using !BA_DEFER
>>>>>>>>> (i. e. from WORKING BITMAP)
>>>>>>>> applies the atom's delete_set to WORKING BITMAP.
>>>>>>> ditto
>>>>>>>
>>>>>>>> I would also mention a function of the middle-level block allocator
>>>>>>>> reiser4_alloc_blocks(): allocates blocks in WORKING BITMAP.
>>>>>>> Yes, sure.
>>>>>>>
>>>>>>>
>>>>>>>> Note that the middle-level block allocator (block_alloc.c) actually
>>>>>>>> manipulates
>>>>>>>> with abstract space maps. Currently in reiser4 they are represented only by
>>>>>>>> bitmaps (plugin/space/bitmap.c). We can also implement another
>>>>>>>> representation -
>>>>>>>> extent tree (like in XFS). I don't see any needs for now, though.
>>>>>>> Extent trees seem to be interesting.. They claim that they are more efficient -
>>>>>>> is the difference that huge?
>>>>>>>
>>>>>>>>> TIMELINE OF ALLOCATIONS FOR "USUAL" NODES, AND TIMELINE OF TRANSACTION COMMIT
>>>>>>>>> - nodes are allocated using reiser4_alloc_blocks() and setting JNODE_RELOC,
>>>>>>>>> so WORKING BITMAP ensures that two nodes cannot get the same block;
>>>>>>>>> - nodes are deallocated using reiser4_dealloc_blocks(BA_DEFER),
>>>>>>>>> so their deallocation is not immediately reflected in WORKING BITMAP;
>>>>>>>>> (the relocate set is written here)
>>>>>>>>> - reiser4_pre_commit_hook_bitmap() uses 1) JNODE_RELOC flag and 2) ->delete_set
>>>>>>>>> to convey effective bitmap changes into COMMIT BITMAP;
>>>>>>>>> (the journal and overwrite set are written here)
>>>>>>>>> - reiser4_post_commit_hook() uses ->delete_set to convey deallocations
>>>>>>>>> from step 2 to WORKING BITMAP.
>>>>>>>>> (the discard happens here)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> TIMELINE OF ALLOCATIONS FOR WANDERED JOURNAL BLOCKS
>>>>>>>>> - at commit time, blocks are allocated using reiser4_alloc_blocks(), so they
>>>>>>>>> are allocated in WORKING BITMAP and do not interfere with any "usual" blocks;
>>>>>>>>> - after writing wandered blocks, they are deallocated using
>>>>>>>>> reiser4_dealloc_blocks(!BA_DEFER), i. e. from the WORKING BITMAP.
>>>>>>>> So, the system of working and commit bitmaps plus the delete set seems
>>>>>>>> to be redundant? I think this is because of performance reasons: block
>>>>>>>> allocation is critical thing...
>>>>>>> Seems like it is -- for performance and simplicity (== robustness).
>>>>>>>
>>>>>>>>> CONCLUSION
>>>>>>>>> At possible transaction replay time, journal blocks are not allocated in any
>>>>>>>>> of the bitmaps. However, because the journal is read and replayed before a
>>>>>>>>> transaction has a chance to commit, this fact does not matter.
>>>>>>>>> What matters is that wandered journal blocks never hit COMMIT BITMAP.
>>>>>>>>>
>>>>>>>>> So, if I've got all this correct (which I highly doubt), the disk space leak
>>>>>>>>> (as you pointed it out) does not exist.
>>>>>>>> It seems, you are right..
>>>>>>>>
>>>>>>>>> What exists is a rather different problem with my idea of "log every
>>>>>>>>> deallocated block". Current implementation logs every block regardless of
>>>>>>>>> BA_DEFER flag presence or absence, so non-wandered blocks are logged twice.
>>>>>>>>>
>>>>>>>>> We could just use ->delete_set, but we would lose wandered blocks then.
>>>>>>>>> Or we could only log !BA_DEFER requests, which would do the right thing
>>>>>>>>> (wandered blocks + deallocations from reiser4_post_commit_hook()), but
>>>>>>>>> the reasoning behind this decision would not be obvious for a casual
>>>>>>>>> code reader.
>>>>>>>> I think that a good comment will save the situation..
>>>>>>>>
>>>>>>>>> Or we could log only wandered blocks (in addition to ->delete_set)
>>>>>>>>> at discard time, but this is messy and requires us to merge the discard log
>>>>>>>>> with ->delete_set at discard time.
>>>>>>>> what is the difference with the previous "we could.."?
>>>>>>> The previous option is to log all !BA_DEFER requests in addition to
>>>>>>> ->delete_set, so those two block sets would be partially overlapping.
>>>>>> Hmm.. Are they really overlapped?
>>>>>>
>>>>>> reiser4_dealloc_blocks() looks like the following:
>>>>>> {
>>>>>> if (flags & BA_DEFER)
>>>>>> log in the delete_set;
>>>>>> else
>>>>>> deallocate in working bitmap
>>>>>> ...
>>>>>> }
>>>>>>
>>>>>> so (!BA_DEFER) requests and the delete_set look disjoint,
>>>>>> or, I missed something?
>>>>> You're right, I've misread the code. I was under impression that
>>>>> reiser4_post_commit_hook() internally calls reiser4_dealloc_blocks(!BA_DEFER).
>>>>>
>>>>> So the second option (which I prefer) is to log all sa_dealloc_blocks() calls.
>>>>> That is, effectively, ->delete_set plus wandered journal block deallocations.
>>>> Yes, I think to do something like this:
>>>>
>>>> reiser4_dealloc_blocks()
>>>> {
>>>> if (flags & BA_DEFER) {
>>>> ...
>>>> } else {
>>>> ...
>>>> if (discard is turned on)
>>>> /* we'll want to discard these blocks, which are not to
>>>> be represented
>>>> in the COMMIT SPACE MAP, so store them in a
>>>> separate list */
>>>> do {
>>>> blocknr_set_add_extent(atom,
>>>> &atom->delete_set_for_wander, &bsep, start, len);
>>>> ...
>>>> } while (ret == -E_REPEAT);
>>>> sa_dealloc_blocks(reiser4_get_space_allocator(ctx->super), *start, *len);
>>>> ...
>>>> }
>>>>
>>>> Then join the sets (->delete_set, and ->delete_set_for_wander) at
>>>> discard time
>>>> (should be pretty cheap operation). insert a check at every atoms
>>>> fusion, that
>>>> ->delete_set_for_wander of both atoms are empty).
>>> Hm. IIRC, ultimately we wanted to use rb-trees for storing discard block sets,
>>> but this means that we'll need to convert two blocknr_sets into an rb-tree
>>> at discard time, which pretty defeats their purpose in our context... Or do you
>>> say that blocknr_set itself should be made rb-tree?
>> I didn't know about the delete sets. They are not optional and lists are
>> better for them (as they don't require sorting). So I suggest to confine
>> with lists for now, and may be to try the fast-merge trees in the future
>> (if there is a great desire).
> The discard algorithm still needs its input to be sorted, and blocknr_set is
> inherently unordered.
>
> So there are two options:
> - log into ->delete_set and ->delete_set_for_wander (as you've proposed),
> then merge them into a sorted list at discard time;
> - log into ->discard_set (just like it is now), just fix what to log.
>
> What do you think?
Are the blocknr sets comfortable?
Say, can we organize the discard process via the blocknr_set_iterator()?
If yes, then let's do everything via the blocknr sets (i.e. let's
implement the
first option). Add needed operations to blocknrset.c, the discard iteration
actor to discard.c, etc.
next prev parent reply other threads:[~2014-06-16 11:32 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-06-13 20:28 reiser4: discard implementation, pass 2: allocation issues Ivan Shapovalov
2014-06-15 17:36 ` Edward Shishkin
2014-06-15 18:07 ` Ivan Shapovalov
2014-06-15 21:49 ` Edward Shishkin
2014-06-15 21:58 ` Ivan Shapovalov
2014-06-16 0:14 ` Edward Shishkin
2014-06-16 5:03 ` Ivan Shapovalov
2014-06-16 9:24 ` Edward Shishkin
2014-06-16 11:00 ` Ivan Shapovalov
2014-06-16 11:32 ` Edward Shishkin [this message]
2014-06-16 11:47 ` Ivan Shapovalov
2014-06-17 0:37 ` Edward Shishkin
2014-06-17 10:14 ` Ivan Shapovalov
2014-06-17 10:29 ` Edward Shishkin
2014-06-17 18:31 ` Ivan Shapovalov
2014-06-17 20:47 ` Ivan Shapovalov
2014-06-18 1:41 ` Edward Shishkin
2014-06-18 9:55 ` Ivan Shapovalov
2014-06-18 11:49 ` Edward Shishkin
2014-06-18 12:26 ` Ivan Shapovalov
2014-06-18 22:46 ` Edward Shishkin
2014-06-18 0:30 ` Edward Shishkin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=539ED5B2.9060406@gmail.com \
--to=edward.shishkin@gmail.com \
--cc=intelfx100@gmail.com \
--cc=reiserfs-devel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.