All of lore.kernel.org
 help / color / mirror / Atom feed
From: Edward Shishkin <edward.shishkin@gmail.com>
To: "Dušan Čolić" <dusanc@gmail.com>
Cc: reiserfs-devel <reiserfs-devel@vger.kernel.org>
Subject: Re: [RFC] Smart fibration plugin ext_4321
Date: Fri, 06 Jan 2017 21:58:02 +0200	[thread overview]
Message-ID: <586FF6CA.7090703@gmail.com> (raw)
In-Reply-To: <CADW=+3nwVneU3ftJOQ8eT5t9Mdm8_XuTwkkBS2u5B0xKT8h86A@mail.gmail.com>



On 01/06/2017 05:34 PM, Dušan Čolić wrote:
> On Fri, Jan 6, 2017 at 2:44 PM, Edward Shishkin
> <edward.shishkin@gmail.com> wrote:
>> On 12/26/2016 11:13 PM, Dušan Čolić wrote:
>>> On Mon, Dec 26, 2016 at 7:47 PM, Edward Shishkin
>>> <edward.shishkin@gmail.com> wrote:
>>>>
>>>>
>>>> On 12/25/2016 02:59 AM, Dušan Čolić wrote:
>>>>> Fibration is a great way to decrease fragmentation and increase
>>>>> throughput.
>>>>> Currently there are 4 fibration plugins, lex, dot.o, ext_1 and ext_3
>>>>> and they all have their upsides and downsides.
>>>>>
>>>>> Proposed fibration plugin combines them all so that it combines files
>>>>> with same extensions for 1, 2. 3 and 4 character extension  in groups
>>>>> and sorts them in same fiber group.
>>>>>
>>>>> With this fibration plugin all eg. xvid files would be in same group
>>>>> in folder on disk sorted alphabetically
>>>>
>>>>
>>>> What application wants all xvid files to be in the same group?
>>>> Do you have any benchmark numbers which show advantages
>>>> of the new plugin?
>>>>
>>> Xvid files are just an example.
>>> ext_1234 fibration would be equal to sum of ext_1, ext_2, ext_3, ext_4
>>> and dot_o in one.
>>>
>>> In currently default plugin (dot_o) we sort all files by name from the
>>> start except .o files which we put at the end.
>>> So if we had a source directory with .c .h and .o files in it files by
>>> extension would be sorted like: chchchchchchchchoooooooooooooo
>>> I presumed that in some use cases it is better to have files be sorted
>>> ccccccccccchhhhhhhhhhhhhhoooooooooooo
>>>
>>> Hypothesis is to use the premise that files of same extension are in
>>> same order of size to reduce fragmentation.
>>
>>
>> What kind of fragmentation you are talking about?
>> Internal (which results in "dead" disk space), or
>> external (which results in a lot of "extents")?
>>
> External
>
>> Edward.
>>
>>
>>> If we group files of same extension in groups in one directory, when
>>> we write files of same extension after deletion of some files of one
>>> extension  their group would be in same order as the deleted file so
>>> they would be written in similar place and occupy the 'hole' of
>>> similar size.


So "similar" means the same order, that is file sizes can differ in 2 times?
TBH, I don't see what can be deduced from this assumption ;)
It can happen that new file either doesn't fit to that hole, or occupies too
small place, so that next file won't fit to the rest of the hole..

Edward.


>>> Ofc I am not talking about files of few kB size where Reiser4 is great
>>> at packing but about files from few MB to few GB.
>>>
>>> Eg. directory with mp3 and xvid files. mp3s are on the order of MB and
>>> xvid on the order of GB. If we sort them just by name order of xvid
>>> and mp3 files in one directory would be random so when deleting the
>>> smaller ones we would make random holes (like from
>>> mxmxmxxmmmxxxxmxxmmmx to mx xmxx  mx  xmx mmmx).
>>> With grouping of writing where all mp3s would be written first and all
>>> xvid after them after some deletions we would have smaller holes
>>> grouped first  and larger last (like from mmmmmmmmmmmmxxxxxxxxxx to mm
>>> m   mmm mmxx xxx xxx) but the main thing that after writing we would
>>> write mp3s in mp3 holes and xvid in xvid holes ergo. reduce
>>> fragmentation  (like from mm m   mmm mmxx xxx xxx to
>>> mmMmMMMmmmXmmxxXxxx xxx) that we would create if we would try to write
>>> xvid over mp3 holes.
>>>
>>> One obvious use case where I hypothesize that this type of fibration
>>> is better long term would be directories with content similar to usual
>>> Downloads directory, a lot of different types (and siyes) of files
>>> that get written and deleted a lot.
>>>
>>> ext_1234 fibration is the same as dot_o for directories with only one
>>> or one and .o file extension.
>>>
>>> Ofc this is just a hypothesis that I would like to prove with some
>>> fragmentation benchmarks but I wanted to hear your thoughts.
>>>
>>> And while I was looking through the code I found a part that I
>>> comprehended, elegant and easy to understand so I wanted to make
>>> something so I could learn more.
>>>
>>>
>>>> Thanks,
>>>> Edward.
>>>>
>>> Thank you for your time and effort
>>>
>>> Dushan
>>>
>>>
>>>>>     so that we will avoid putting
>>>>> small files between them and in that way reduce fragmentation. That
>>>>> group (xvid 4 character extensions) would be among last groups under
>>>>> one directory so that all small files would be written before it.
>>>>>
>>>>> Problem with the attached patch is that currently every fibre value is
>>>>> defined as u64  (eg. static __u64 fibre_ext_3) but if I understood
>>>>> correctly comments in kassign.c and fibration.c fibration part of the
>>>>> key is only 7 bits long.
>>>>> If that is true how did fibre_ext_3 worked?
>>>>>
>>>>> Thanks
>>>>>
>>>>> Dushan
>>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe reiserfs-devel"
>>> in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>


  reply	other threads:[~2017-01-06 19:58 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-25  0:59 [RFC] Smart fibration plugin ext_4321 Dušan Čolić
2016-12-25  8:51 ` Dušan Čolić
2016-12-26 18:47 ` Edward Shishkin
2016-12-26 21:13   ` Dušan Čolić
2017-01-06 13:44     ` Edward Shishkin
2017-01-06 15:34       ` Dušan Čolić
2017-01-06 19:58         ` Edward Shishkin [this message]
2017-01-06 23:09           ` Dušan Čolić
2017-01-06 23:05             ` Edward Shishkin
2017-01-07  8:15               ` Dušan Čolić
2017-01-07  7:58                 ` Edward Shishkin
2017-01-07 17:10                   ` Dušan Čolić
2017-01-07 17:56                     ` Edward Shishkin
2017-01-07 19:26                       ` Dušan Čolić
2017-01-07 19:06                         ` Edward Shishkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=586FF6CA.7090703@gmail.com \
    --to=edward.shishkin@gmail.com \
    --cc=dusanc@gmail.com \
    --cc=reiserfs-devel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.