All of lore.kernel.org
 help / color / mirror / Atom feed
* reiser4: discard implementation, pass 2: allocation issues
@ 2014-06-13 20:28 Ivan Shapovalov
  2014-06-15 17:36 ` Edward Shishkin
  0 siblings, 1 reply; 22+ messages in thread
From: Ivan Shapovalov @ 2014-06-13 20:28 UTC (permalink / raw)
  To: reiserfs-devel; +Cc: Edward Shishkin

[-- Attachment #1: Type: text/plain, Size: 3163 bytes --]

Here is my "analysis" of what happens in reiser4 during a transaction's
lifetime wrt. block allocation and deallocation.


THE EFFECTS (SEMANTICS) OF RELATED FUNCTIONS
reiser4_alloc_blocks_bitmap(): allocates in WORKING BITMAP
reiser4_dealloc_blocks_bitmap(!BA_DEFER): deallocates from WORKING BITMAP
reiser4_dealloc_blocks_bitmap(BA_DEFER): stores to ->delete_set

reiser4_pre_commit_hook_bitmap(): allocates all relocated nodes in COMMIT BITMAP
                                  deallocates ->delete_set from COMMIT BITMAP

reiser4_post_commit_hook(): deallocates ->delete_set using !BA_DEFER
                                                   (i. e. from WORKING BITMAP)


TIMELINE OF ALLOCATIONS FOR "USUAL" NODES, AND TIMELINE OF TRANSACTION COMMIT
- nodes are allocated using reiser4_alloc_blocks() and setting JNODE_RELOC,
  so WORKING BITMAP ensures that two nodes cannot get the same block;
- nodes are deallocated using reiser4_dealloc_blocks(BA_DEFER),
  so their deallocation is not immediately reflected in WORKING BITMAP;
(the relocate set is written here)
- reiser4_pre_commit_hook_bitmap() uses 1) JNODE_RELOC flag and 2) ->delete_set
  to convey effective bitmap changes into COMMIT BITMAP;
(the journal and overwrite set are written here)
- reiser4_post_commit_hook() uses ->delete_set to convey deallocations
  from step 2 to WORKING BITMAP.
(the discard happens here)


TIMELINE OF ALLOCATIONS FOR WANDERED JOURNAL BLOCKS
- at commit time, blocks are allocated using reiser4_alloc_blocks(), so they
  are allocated in WORKING BITMAP and do not interfere with any "usual" blocks;
- after writing wandered blocks, they are deallocated using
  reiser4_dealloc_blocks(!BA_DEFER), i. e. from the WORKING BITMAP.


CONCLUSION
At possible transaction replay time, journal blocks are not allocated in any
of the bitmaps. However, because the journal is read and replayed before a
transaction has a chance to commit, this fact does not matter.
What matters is that wandered journal blocks never hit COMMIT BITMAP.

So, if I've got all this correct (which I highly doubt), the disk space leak
(as you pointed it out) does not exist.

What exists is a rather different problem with my idea of "log every
deallocated block". Current implementation logs every block regardless of
BA_DEFER flag presence or absence, so non-wandered blocks are logged twice.

We could just use ->delete_set, but we would lose wandered blocks then.
Or we could only log !BA_DEFER requests, which would do the right thing
(wandered blocks + deallocations from reiser4_post_commit_hook()), but
the reasoning behind this decision would not be obvious for a casual
code reader.
Or we could log only wandered blocks (in addition to ->delete_set)
at discard time, but this is messy and requires us to merge the discard log
with ->delete_set at discard time.
Or we could log wandered blocks straight into ->delete_set and do something in
reiser4_post_commit_hook() to separate these entries, but this is super messy.

I'm preferring the second way... Edward, please proof-read all this.

-- 
Ivan Shapovalov / intelfx /

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 213 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2014-06-18 22:46 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-06-13 20:28 reiser4: discard implementation, pass 2: allocation issues Ivan Shapovalov
2014-06-15 17:36 ` Edward Shishkin
2014-06-15 18:07   ` Ivan Shapovalov
2014-06-15 21:49     ` Edward Shishkin
2014-06-15 21:58       ` Ivan Shapovalov
2014-06-16  0:14         ` Edward Shishkin
2014-06-16  5:03           ` Ivan Shapovalov
2014-06-16  9:24             ` Edward Shishkin
2014-06-16 11:00               ` Ivan Shapovalov
2014-06-16 11:32                 ` Edward Shishkin
2014-06-16 11:47                   ` Ivan Shapovalov
2014-06-17  0:37                     ` Edward Shishkin
2014-06-17 10:14                       ` Ivan Shapovalov
2014-06-17 10:29                         ` Edward Shishkin
2014-06-17 18:31                           ` Ivan Shapovalov
2014-06-17 20:47                             ` Ivan Shapovalov
2014-06-18  1:41                               ` Edward Shishkin
2014-06-18  9:55                                 ` Ivan Shapovalov
2014-06-18 11:49                                   ` Edward Shishkin
2014-06-18 12:26                                     ` Ivan Shapovalov
2014-06-18 22:46                                       ` Edward Shishkin
2014-06-18  0:30                             ` Edward Shishkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.