* Re: [RFC] [PATCHv5 0/4] reiser4: discard support: initial implementation, refactored.
@ 2014-06-21 17:07 Edward Shishkin
0 siblings, 0 replies; 6+ messages in thread
From: Edward Shishkin @ 2014-06-21 17:07 UTC (permalink / raw)
To: ReiserFS Development mailing list
On 06/21/2014 01:52 PM, Dušan Čolić wrote:
>
> How much trouble would be to merge neighbouring discardable ranges of
data in one large request by relocating small (size or relative size
defined as mount argument) nondiscardable chunk of data between them?
That way we would make less fragmentation (on ssd you say?
This will increase number of issued IOs, whereas we put a lot of efforts to
reduce it.. Indeed, such merge means that we relocate a part of blocks of
a discard unit to another one.
> Well it kils us at deletion time as you have to make a lot of slow
requests instead of few large ones).
This is resolved by the feature fs-block-size = discard-unit-size.
I'll say straight: it is hard.
Edward.
> On Jun 21, 2014 1:21 PM, "Edward Shishkin"
<edward.shishkin@gmail.com> wrote:
>
> On 06/21/2014 12:35 AM, Ivan Shapovalov wrote:
>
> On Saturday 21 June 2014 at 00:39:54, Ivan Shapovalov wrote:
>
> v1: - initial implementation (patches 1, 2)
>
> v2: - cleanup, fixes discovered in debug mode
> - saner logging
> - assertions
> - enablement of discard through mount option
>
> v3: - fixed the extent merge loop in discard_atom()
>
> v4: - squashed fix-ups into the main patch (with exception of
reiser4_debug())
> - fixed bug in usage of division ops discovered while building on ARM
>
> v5: - squashed mount option into the main patch
> - refactor based on discussion (see commit msg)
> - splitted off blocknr_list code
> - replaced ->discard_set with ->delete_set and ->aux_delete_set
>
> Ivan Shapovalov (4):
> reiser4: make space_allocator's check_blocks() reusable.
> reiser4: add an implementation of "block lists", splitted off the
discard code.
> reiser4: add reiser4_debug(): a conditional equivalent of reiser4_log().
> reiser4: discard support: initial implementation using linked lists.
>
> fs/reiser4/Makefile | 2 +
> fs/reiser4/block_alloc.c | 49 ++---
> fs/reiser4/block_alloc.h | 14 +-
> fs/reiser4/blocknrlist.c | 315 ++++++++++++++++++++++++++++++
> fs/reiser4/debug.h | 4 +
> fs/reiser4/dformat.h | 2 +
> fs/reiser4/discard.c | 247 +++++++++++++++++++++++
> fs/reiser4/discard.h | 31 +++
> fs/reiser4/forward.h | 1 +
> fs/reiser4/init_super.c | 2 +
> fs/reiser4/plugin/space/bitmap.c | 84 +++++---
> fs/reiser4/plugin/space/bitmap.h | 2 +-
> fs/reiser4/plugin/space/space_allocator.h | 4 +-
> fs/reiser4/super.h | 4 +-
> fs/reiser4/txnmgr.c | 125 +++++++++++-
> fs/reiser4/txnmgr.h | 63 +++++-
> fs/reiser4/znode.c | 9 +-
> 17 files changed, 884 insertions(+), 74 deletions(-)
> create mode 100644 fs/reiser4/blocknrlist.c
> create mode 100644 fs/reiser4/discard.c
> create mode 100644 fs/reiser4/discard.h
>
> Also I would like if this code could be given a review. :)
>
>
> Great! Looks nice for me, thanks!
> There are 2 issues, though...
>
> 1) kmalloc/kfree a huge number of 32-byte chunks (blocknr_list
entries) is
> suboptimal. There is a special low-level memory allocator for such
purposes.
> Take a look how we initialize so-called "slab cache" for jnodes
(_jnode_slab),
> atoms (_atom_slab), etc, and allocate memory for them
(kmem_cache_alloc()).
>
> 2) A lot of blocknr_list entries are allocated at flush time, when
the high-level
> allocator (txmod.c) makes "relocation decisions" (especially when
txmod=wa).
> The problem is that the flush (with the following commit) usually is
the file system
> response to memory pressure notifications, when additional memory
allocation
> is not desirable.
>
> I think that with the fixed (1) we'll include the discard support (if
everything will
> be OK in the next 1-2 weeks).
>
> As to (2): that is a common problem of all Linux subsystems which
want memory
> to free memory. It is unresolvable, however, we can improve the
situation. It
> would be nice to implement a per-atom pool of memory (as a list of
kmalloc-ed
> buffers with "cursors") with an optional possibility to pre-allocate
1-2 such buffers
> at atom initialization time. But this is for the future...
>
> I don't see other urgent improvements. Yes, overall scalability of
rb-trees is better,
> as we found, however, merging rb-trees is more expensive, plus atom's
fusion
> is not a background process, so it can lead to performance drop.
There are
> rb-trees with fingers, however I haven't seen their implementation on
C language
> (it can be not so simple).
>
> Thanks!
> Edward.
>
> --
> To unsubscribe from this list: send the line "unsubscribe
reiserfs-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe reiserfs-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* [RFC] [PATCHv5 0/4] reiser4: discard support: initial implementation, refactored.
@ 2014-06-20 20:39 Ivan Shapovalov
2014-06-20 22:35 ` Ivan Shapovalov
0 siblings, 1 reply; 6+ messages in thread
From: Ivan Shapovalov @ 2014-06-20 20:39 UTC (permalink / raw)
To: reiserfs-devel; +Cc: edward.shishkin, Ivan Shapovalov
v1: - initial implementation (patches 1, 2)
v2: - cleanup, fixes discovered in debug mode
- saner logging
- assertions
- enablement of discard through mount option
v3: - fixed the extent merge loop in discard_atom()
v4: - squashed fix-ups into the main patch (with exception of reiser4_debug())
- fixed bug in usage of division ops discovered while building on ARM
v5: - squashed mount option into the main patch
- refactor based on discussion (see commit msg)
- splitted off blocknr_list code
- replaced ->discard_set with ->delete_set and ->aux_delete_set
Ivan Shapovalov (4):
reiser4: make space_allocator's check_blocks() reusable.
reiser4: add an implementation of "block lists", splitted off the discard code.
reiser4: add reiser4_debug(): a conditional equivalent of reiser4_log().
reiser4: discard support: initial implementation using linked lists.
fs/reiser4/Makefile | 2 +
fs/reiser4/block_alloc.c | 49 ++---
fs/reiser4/block_alloc.h | 14 +-
fs/reiser4/blocknrlist.c | 315 ++++++++++++++++++++++++++++++
fs/reiser4/debug.h | 4 +
fs/reiser4/dformat.h | 2 +
fs/reiser4/discard.c | 247 +++++++++++++++++++++++
fs/reiser4/discard.h | 31 +++
fs/reiser4/forward.h | 1 +
fs/reiser4/init_super.c | 2 +
fs/reiser4/plugin/space/bitmap.c | 84 +++++---
fs/reiser4/plugin/space/bitmap.h | 2 +-
fs/reiser4/plugin/space/space_allocator.h | 4 +-
fs/reiser4/super.h | 4 +-
fs/reiser4/txnmgr.c | 125 +++++++++++-
fs/reiser4/txnmgr.h | 63 +++++-
fs/reiser4/znode.c | 9 +-
17 files changed, 884 insertions(+), 74 deletions(-)
create mode 100644 fs/reiser4/blocknrlist.c
create mode 100644 fs/reiser4/discard.c
create mode 100644 fs/reiser4/discard.h
--
2.0.0
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [RFC] [PATCHv5 0/4] reiser4: discard support: initial implementation, refactored.
2014-06-20 20:39 Ivan Shapovalov
@ 2014-06-20 22:35 ` Ivan Shapovalov
2014-06-21 11:20 ` Edward Shishkin
0 siblings, 1 reply; 6+ messages in thread
From: Ivan Shapovalov @ 2014-06-20 22:35 UTC (permalink / raw)
To: reiserfs-devel; +Cc: edward.shishkin
[-- Attachment #1: Type: text/plain, Size: 2337 bytes --]
On Saturday 21 June 2014 at 00:39:54, Ivan Shapovalov wrote:
> v1: - initial implementation (patches 1, 2)
>
> v2: - cleanup, fixes discovered in debug mode
> - saner logging
> - assertions
> - enablement of discard through mount option
>
> v3: - fixed the extent merge loop in discard_atom()
>
> v4: - squashed fix-ups into the main patch (with exception of reiser4_debug())
> - fixed bug in usage of division ops discovered while building on ARM
>
> v5: - squashed mount option into the main patch
> - refactor based on discussion (see commit msg)
> - splitted off blocknr_list code
> - replaced ->discard_set with ->delete_set and ->aux_delete_set
>
> Ivan Shapovalov (4):
> reiser4: make space_allocator's check_blocks() reusable.
> reiser4: add an implementation of "block lists", splitted off the discard code.
> reiser4: add reiser4_debug(): a conditional equivalent of reiser4_log().
> reiser4: discard support: initial implementation using linked lists.
>
> fs/reiser4/Makefile | 2 +
> fs/reiser4/block_alloc.c | 49 ++---
> fs/reiser4/block_alloc.h | 14 +-
> fs/reiser4/blocknrlist.c | 315 ++++++++++++++++++++++++++++++
> fs/reiser4/debug.h | 4 +
> fs/reiser4/dformat.h | 2 +
> fs/reiser4/discard.c | 247 +++++++++++++++++++++++
> fs/reiser4/discard.h | 31 +++
> fs/reiser4/forward.h | 1 +
> fs/reiser4/init_super.c | 2 +
> fs/reiser4/plugin/space/bitmap.c | 84 +++++---
> fs/reiser4/plugin/space/bitmap.h | 2 +-
> fs/reiser4/plugin/space/space_allocator.h | 4 +-
> fs/reiser4/super.h | 4 +-
> fs/reiser4/txnmgr.c | 125 +++++++++++-
> fs/reiser4/txnmgr.h | 63 +++++-
> fs/reiser4/znode.c | 9 +-
> 17 files changed, 884 insertions(+), 74 deletions(-)
> create mode 100644 fs/reiser4/blocknrlist.c
> create mode 100644 fs/reiser4/discard.c
> create mode 100644 fs/reiser4/discard.h
Also I would like if this code could be given a review. :)
Thanks,
--
Ivan Shapovalov / intelfx /
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 213 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC] [PATCHv5 0/4] reiser4: discard support: initial implementation, refactored.
2014-06-20 22:35 ` Ivan Shapovalov
@ 2014-06-21 11:20 ` Edward Shishkin
2014-06-21 20:15 ` Ivan Shapovalov
0 siblings, 1 reply; 6+ messages in thread
From: Edward Shishkin @ 2014-06-21 11:20 UTC (permalink / raw)
To: Ivan Shapovalov; +Cc: reiserfs-devel
On 06/21/2014 12:35 AM, Ivan Shapovalov wrote:
> On Saturday 21 June 2014 at 00:39:54, Ivan Shapovalov wrote:
>> v1: - initial implementation (patches 1, 2)
>>
>> v2: - cleanup, fixes discovered in debug mode
>> - saner logging
>> - assertions
>> - enablement of discard through mount option
>>
>> v3: - fixed the extent merge loop in discard_atom()
>>
>> v4: - squashed fix-ups into the main patch (with exception of reiser4_debug())
>> - fixed bug in usage of division ops discovered while building on ARM
>>
>> v5: - squashed mount option into the main patch
>> - refactor based on discussion (see commit msg)
>> - splitted off blocknr_list code
>> - replaced ->discard_set with ->delete_set and ->aux_delete_set
>>
>> Ivan Shapovalov (4):
>> reiser4: make space_allocator's check_blocks() reusable.
>> reiser4: add an implementation of "block lists", splitted off the discard code.
>> reiser4: add reiser4_debug(): a conditional equivalent of reiser4_log().
>> reiser4: discard support: initial implementation using linked lists.
>>
>> fs/reiser4/Makefile | 2 +
>> fs/reiser4/block_alloc.c | 49 ++---
>> fs/reiser4/block_alloc.h | 14 +-
>> fs/reiser4/blocknrlist.c | 315 ++++++++++++++++++++++++++++++
>> fs/reiser4/debug.h | 4 +
>> fs/reiser4/dformat.h | 2 +
>> fs/reiser4/discard.c | 247 +++++++++++++++++++++++
>> fs/reiser4/discard.h | 31 +++
>> fs/reiser4/forward.h | 1 +
>> fs/reiser4/init_super.c | 2 +
>> fs/reiser4/plugin/space/bitmap.c | 84 +++++---
>> fs/reiser4/plugin/space/bitmap.h | 2 +-
>> fs/reiser4/plugin/space/space_allocator.h | 4 +-
>> fs/reiser4/super.h | 4 +-
>> fs/reiser4/txnmgr.c | 125 +++++++++++-
>> fs/reiser4/txnmgr.h | 63 +++++-
>> fs/reiser4/znode.c | 9 +-
>> 17 files changed, 884 insertions(+), 74 deletions(-)
>> create mode 100644 fs/reiser4/blocknrlist.c
>> create mode 100644 fs/reiser4/discard.c
>> create mode 100644 fs/reiser4/discard.h
> Also I would like if this code could be given a review. :)
Great! Looks nice for me, thanks!
There are 2 issues, though...
1) kmalloc/kfree a huge number of 32-byte chunks (blocknr_list entries) is
suboptimal. There is a special low-level memory allocator for such purposes.
Take a look how we initialize so-called "slab cache" for jnodes
(_jnode_slab),
atoms (_atom_slab), etc, and allocate memory for them (kmem_cache_alloc()).
2) A lot of blocknr_list entries are allocated at flush time, when the
high-level
allocator (txmod.c) makes "relocation decisions" (especially when txmod=wa).
The problem is that the flush (with the following commit) usually is the
file system
response to memory pressure notifications, when additional memory allocation
is not desirable.
I think that with the fixed (1) we'll include the discard support (if
everything will
be OK in the next 1-2 weeks).
As to (2): that is a common problem of all Linux subsystems which want
memory
to free memory. It is unresolvable, however, we can improve the
situation. It
would be nice to implement a per-atom pool of memory (as a list of
kmalloc-ed
buffers with "cursors") with an optional possibility to pre-allocate 1-2
such buffers
at atom initialization time. But this is for the future...
I don't see other urgent improvements. Yes, overall scalability of
rb-trees is better,
as we found, however, merging rb-trees is more expensive, plus atom's fusion
is not a background process, so it can lead to performance drop. There are
rb-trees with fingers, however I haven't seen their implementation on C
language
(it can be not so simple).
Thanks!
Edward.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC] [PATCHv5 0/4] reiser4: discard support: initial implementation, refactored.
2014-06-21 11:20 ` Edward Shishkin
@ 2014-06-21 20:15 ` Ivan Shapovalov
2014-06-22 0:17 ` Ivan Shapovalov
0 siblings, 1 reply; 6+ messages in thread
From: Ivan Shapovalov @ 2014-06-21 20:15 UTC (permalink / raw)
To: reiserfs-devel; +Cc: Edward Shishkin
[-- Attachment #1: Type: text/plain, Size: 2313 bytes --]
On Saturday 21 June 2014 at 13:20:18, Edward Shishkin wrote:
> On 06/21/2014 12:35 AM, Ivan Shapovalov wrote:
> > [...]
> > Also I would like if this code could be given a review. :)
>
> Great! Looks nice for me, thanks!
> There are 2 issues, though...
>
> 1) kmalloc/kfree a huge number of 32-byte chunks (blocknr_list entries) is
> suboptimal. There is a special low-level memory allocator for such purposes.
> Take a look how we initialize so-called "slab cache" for jnodes
> (_jnode_slab),
> atoms (_atom_slab), etc, and allocate memory for them (kmem_cache_alloc()).
>
> 2) A lot of blocknr_list entries are allocated at flush time, when the
> high-level
> allocator (txmod.c) makes "relocation decisions" (especially when txmod=wa).
> The problem is that the flush (with the following commit) usually is the
> file system
> response to memory pressure notifications, when additional memory allocation
> is not desirable.
>
> I think that with the fixed (1) we'll include the discard support (if
> everything will
> be OK in the next 1-2 weeks).
>
> As to (2): that is a common problem of all Linux subsystems which want
> memory
> to free memory. It is unresolvable, however, we can improve the
> situation. It
> would be nice to implement a per-atom pool of memory (as a list of
> kmalloc-ed
> buffers with "cursors") with an optional possibility to pre-allocate 1-2
> such buffers
> at atom initialization time. But this is for the future...
>
> I don't see other urgent improvements. Yes, overall scalability of
> rb-trees is better,
> as we found, however, merging rb-trees is more expensive, plus atom's fusion
> is not a background process, so it can lead to performance drop. There are
> rb-trees with fingers, however I haven't seen their implementation on C
> language
> (it can be not so simple).
>
> Thanks!
> Edward.
>
Thanks for the review!
(1) surely seems trivial. Do we need something similar for blocknr_set as well?
For (2) I don't quite understand you. How such a pool should be organized?
Do you mean `blocknr_list_entry **pool` of size N * sizeof(void*) filled with
kmem_cache_alloc()'d pointers, or just `blocknr_list_entry *pool` of size
N * sizeof(blocknr_list_entry)?
--
Ivan Shapovalov / intelfx /
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 213 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC] [PATCHv5 0/4] reiser4: discard support: initial implementation, refactored.
2014-06-21 20:15 ` Ivan Shapovalov
@ 2014-06-22 0:17 ` Ivan Shapovalov
0 siblings, 0 replies; 6+ messages in thread
From: Ivan Shapovalov @ 2014-06-22 0:17 UTC (permalink / raw)
To: reiserfs-devel; +Cc: Edward Shishkin
[-- Attachment #1: Type: text/plain, Size: 2647 bytes --]
On Sunday 22 June 2014 at 00:15:49, Ivan Shapovalov wrote:
> On Saturday 21 June 2014 at 13:20:18, Edward Shishkin wrote:
> > On 06/21/2014 12:35 AM, Ivan Shapovalov wrote:
> > > [...]
> > > Also I would like if this code could be given a review. :)
> >
> > Great! Looks nice for me, thanks!
> > There are 2 issues, though...
> >
> > 1) kmalloc/kfree a huge number of 32-byte chunks (blocknr_list entries) is
> > suboptimal. There is a special low-level memory allocator for such purposes.
> > Take a look how we initialize so-called "slab cache" for jnodes
> > (_jnode_slab),
> > atoms (_atom_slab), etc, and allocate memory for them (kmem_cache_alloc()).
> >
> > 2) A lot of blocknr_list entries are allocated at flush time, when the
> > high-level
> > allocator (txmod.c) makes "relocation decisions" (especially when txmod=wa).
> > The problem is that the flush (with the following commit) usually is the
> > file system
> > response to memory pressure notifications, when additional memory allocation
> > is not desirable.
> >
> > I think that with the fixed (1) we'll include the discard support (if
> > everything will
> > be OK in the next 1-2 weeks).
> >
> > As to (2): that is a common problem of all Linux subsystems which want
> > memory
> > to free memory. It is unresolvable, however, we can improve the
> > situation. It
> > would be nice to implement a per-atom pool of memory (as a list of
> > kmalloc-ed
> > buffers with "cursors") with an optional possibility to pre-allocate 1-2
> > such buffers
> > at atom initialization time. But this is for the future...
> >
> > I don't see other urgent improvements. Yes, overall scalability of
> > rb-trees is better,
> > as we found, however, merging rb-trees is more expensive, plus atom's fusion
> > is not a background process, so it can lead to performance drop. There are
> > rb-trees with fingers, however I haven't seen their implementation on C
> > language
> > (it can be not so simple).
> >
> > Thanks!
> > Edward.
> >
>
> Thanks for the review!
>
> (1) surely seems trivial. Do we need something similar for blocknr_set as well?
...I suppose yes, because there is at least one blocknr_set_entry per each txn_atom,
while the latter is already allocated with slab allocator).
--
Ivan Shapovalov / intelfx /
>
> For (2) I don't quite understand you. How such a pool should be organized?
>
> Do you mean `blocknr_list_entry **pool` of size N * sizeof(void*) filled with
> kmem_cache_alloc()'d pointers, or just `blocknr_list_entry *pool` of size
> N * sizeof(blocknr_list_entry)?
>
>
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 213 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2014-06-22 0:17 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-06-21 17:07 [RFC] [PATCHv5 0/4] reiser4: discard support: initial implementation, refactored Edward Shishkin
-- strict thread matches above, loose matches on Subject: below --
2014-06-20 20:39 Ivan Shapovalov
2014-06-20 22:35 ` Ivan Shapovalov
2014-06-21 11:20 ` Edward Shishkin
2014-06-21 20:15 ` Ivan Shapovalov
2014-06-22 0:17 ` Ivan Shapovalov
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.