All of lore.kernel.org
 help / color / mirror / Atom feed
From: Edward Shishkin <edward.shishkin@gmail.com>
To: Ivan Shapovalov <intelfx100@gmail.com>, reiserfs-devel@vger.kernel.org
Subject: Re: reiser4: discard implementation, pass 2: allocation issues
Date: Sun, 15 Jun 2014 19:36:05 +0200	[thread overview]
Message-ID: <539DD985.4010304@gmail.com> (raw)
In-Reply-To: <3401431.jj87z7i0xD@intelfx-laptop>

On 06/13/2014 10:28 PM, Ivan Shapovalov wrote:
> Here is my "analysis" of what happens in reiser4 during a transaction's
> lifetime wrt. block allocation and deallocation.
>
>
> THE EFFECTS (SEMANTICS) OF RELATED FUNCTIONS
> reiser4_alloc_blocks_bitmap(): allocates in WORKING BITMAP

Yes.

> reiser4_dealloc_blocks_bitmap(!BA_DEFER): deallocates from WORKING BITMAP
> reiser4_dealloc_blocks_bitmap(BA_DEFER): stores to ->delete_set

This is correct for the middle-level allocator (without suffix "bitmap").
The low-level one frees blocks only in WORKING BITMAP.

>
> reiser4_pre_commit_hook_bitmap(): allocates all relocated nodes in COMMIT BITMAP

"Relocated" is bad term here: nodes with new data also get the flag
JNODE_RELOC. So, I would rather say, that it applies freshly allocated
nodes of the atom to COMMIT BITMAP.


>                                    deallocates ->delete_set from COMMIT BITMAP

applies the atom's delete_set to COMMIT BITMAP

>
> reiser4_post_commit_hook(): deallocates ->delete_set using !BA_DEFER
>                                                     (i. e. from WORKING BITMAP)

applies the atom's delete_set to WORKING BITMAP.

I would also mention a function of the middle-level block allocator
reiser4_alloc_blocks(): allocates blocks in WORKING  BITMAP.

Note that the middle-level block allocator (block_alloc.c) actually 
manipulates
with abstract space maps. Currently in reiser4 they are represented only by
bitmaps (plugin/space/bitmap.c). We can also implement another 
representation -
extent tree (like in XFS). I don't see any needs for now, though.


>
>
> TIMELINE OF ALLOCATIONS FOR "USUAL" NODES, AND TIMELINE OF TRANSACTION COMMIT
> - nodes are allocated using reiser4_alloc_blocks() and setting JNODE_RELOC,
>    so WORKING BITMAP ensures that two nodes cannot get the same block;
> - nodes are deallocated using reiser4_dealloc_blocks(BA_DEFER),
>    so their deallocation is not immediately reflected in WORKING BITMAP;
> (the relocate set is written here)
> - reiser4_pre_commit_hook_bitmap() uses 1) JNODE_RELOC flag and 2) ->delete_set
>    to convey effective bitmap changes into COMMIT BITMAP;
> (the journal and overwrite set are written here)
> - reiser4_post_commit_hook() uses ->delete_set to convey deallocations
>    from step 2 to WORKING BITMAP.
> (the discard happens here)
>
>
> TIMELINE OF ALLOCATIONS FOR WANDERED JOURNAL BLOCKS
> - at commit time, blocks are allocated using reiser4_alloc_blocks(), so they
>    are allocated in WORKING BITMAP and do not interfere with any "usual" blocks;
> - after writing wandered blocks, they are deallocated using
>    reiser4_dealloc_blocks(!BA_DEFER), i. e. from the WORKING BITMAP.


So, the system of working and commit bitmaps plus the delete set seems
to be redundant? I think this is because of performance reasons: block
allocation is critical thing...


>
>
> CONCLUSION
> At possible transaction replay time, journal blocks are not allocated in any
> of the bitmaps. However, because the journal is read and replayed before a
> transaction has a chance to commit, this fact does not matter.
> What matters is that wandered journal blocks never hit COMMIT BITMAP.
>
> So, if I've got all this correct (which I highly doubt), the disk space leak
> (as you pointed it out) does not exist.

It seems, you are right..

>
> What exists is a rather different problem with my idea of "log every
> deallocated block". Current implementation logs every block regardless of
> BA_DEFER flag presence or absence, so non-wandered blocks are logged twice.
>
> We could just use ->delete_set, but we would lose wandered blocks then.
> Or we could only log !BA_DEFER requests, which would do the right thing
> (wandered blocks + deallocations from reiser4_post_commit_hook()), but
> the reasoning behind this decision would not be obvious for a casual
> code reader.

I think that a good comment will save the situation..

> Or we could log only wandered blocks (in addition to ->delete_set)
> at discard time, but this is messy and requires us to merge the discard log
> with ->delete_set at discard time.

what is the difference with the previous "we could.."?

> Or we could log wandered blocks straight into ->delete_set and do something in
> reiser4_post_commit_hook() to separate these entries, but this is super messy.
>
> I'm preferring the second way... Edward, please proof-read all this.

BTW, what about your current implementation? Does it work for you?

Thanks,
Edward.

  reply	other threads:[~2014-06-15 17:36 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-13 20:28 reiser4: discard implementation, pass 2: allocation issues Ivan Shapovalov
2014-06-15 17:36 ` Edward Shishkin [this message]
2014-06-15 18:07   ` Ivan Shapovalov
2014-06-15 21:49     ` Edward Shishkin
2014-06-15 21:58       ` Ivan Shapovalov
2014-06-16  0:14         ` Edward Shishkin
2014-06-16  5:03           ` Ivan Shapovalov
2014-06-16  9:24             ` Edward Shishkin
2014-06-16 11:00               ` Ivan Shapovalov
2014-06-16 11:32                 ` Edward Shishkin
2014-06-16 11:47                   ` Ivan Shapovalov
2014-06-17  0:37                     ` Edward Shishkin
2014-06-17 10:14                       ` Ivan Shapovalov
2014-06-17 10:29                         ` Edward Shishkin
2014-06-17 18:31                           ` Ivan Shapovalov
2014-06-17 20:47                             ` Ivan Shapovalov
2014-06-18  1:41                               ` Edward Shishkin
2014-06-18  9:55                                 ` Ivan Shapovalov
2014-06-18 11:49                                   ` Edward Shishkin
2014-06-18 12:26                                     ` Ivan Shapovalov
2014-06-18 22:46                                       ` Edward Shishkin
2014-06-18  0:30                             ` Edward Shishkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=539DD985.4010304@gmail.com \
    --to=edward.shishkin@gmail.com \
    --cc=intelfx100@gmail.com \
    --cc=reiserfs-devel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.