All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andreas Rohner <e0502196-oe7qfRrRQffzPE21tAIdciO7C/xPubJB@public.gmane.org>
To: Vyacheslav Dubeyko <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: Contributing to NILFS
Date: Sun, 16 Dec 2012 18:45:32 +0100	[thread overview]
Message-ID: <1355679932.761.57.camel@terok> (raw)
In-Reply-To: <1355296112.2042.35.camel@slavad-ubuntu>

Hi Vyacheslav,

> I think that this task hides many difficult questions. How does it
> define what files fragmented or not? How does it measure the
> fragmentation degree? What fragmentation degree should be a basis for
> defragmentation activity? When does it need to detect fragmentation and
> how to keep this knowledge? How does it make defragmentation without
> performance degradation?
>
> As I understand, when we are talking about defragmentation then we
> expect a performance enhancement as a result. But defragmenter activity
> can be a background reason of performance degradation. Not every
> workload or I/O pattern can be a reason of significant fragmentation.
> 
> Also, it is a very important to choose a point of defragmentation. I
> mean that it is possible to try to prevent fragmentation or to correct
> fragmentation after flushing on the volume. It is possible to have a
> some hybrid technique, I think. An I/O pattern or file type can be a
> basis for such decision, I think.

Yes I agree. It is of course a good idea to reorder the data before
flushing and probably also to reorder it with the cleaner, but I
thought, that was already implemented and optimized. Is it?

Instead I imagined a tool like xfs_fsr for XFS. So the user can decide
when to defragment the file system, by running it manually or with a
cron job. Maybe this is a bit naive, since I probably don't know enough
about NILFS. Couldn't we just calculate the number of segments a file
uses if it is stored optimally and compare that to the actual number of
segments the file is spread out. For example, file A has 16MB. Lets
assume segments are of size 8MB. So (ignoring the metadata) file A
should use 2 segments. Now we count the different segments where the
blocks of file A really are, lets say 10, and calculate 1-(2/10)=0.8 So
it is 80% fragmented.

I wouldn't do that in the cleaner or in the background. Just a tool like
xfs_fsr, that the user can run once a month in the middle of the night
with a cron job. The tool would go through every file, calculate the
fragmentation and collect other statistics and decide if it is worth
defragmenting it or not.

If the user has a SSD he/she can decide not to defragment at all.

> As I understand, F2FS [1] has some defragmenting approaches. I think
> that it needs to discuss more deeply about technique of detecting
> fragmented files and fragmentation degree. But maybe hot data tracking
> patch [2,3] will be a basis for such discussion.

I did a quick search for F2FS defragmentation, but I couldn't find
anything. Did you mean this section of the article? "...it provides
large-scale write gathering so that when lots of blocks need to be
written at the same time they are collected into large sequential
writes..." Maybe I missed something, but isn't this just the inherent
property of a log-structured file system and not defragmentation?

Hot data tracking could be extremely useful for the cleaner. This paper
[1] suggests, that the best cleaner performance can be achieved by
distinguishing between hot and cold data. Is something like that already
implemented? Maybe I could do that for my masters thesis instead of the
defragmentation task... ;)

Thanks for the links. 

best regards,
Andreas Rohner

[1] http://www.cs.berkeley.edu/~brewer/cs262/LFS.pdf

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2012-12-16 17:45 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-10 20:05 Contributing to NILFS Andreas Rohner
2012-12-11  6:46 ` Vyacheslav Dubeyko
2012-12-11 13:54   ` Andreas Rohner
2012-12-12  7:08     ` Vyacheslav Dubeyko
2012-12-12 15:30       ` Sven-Göran Bergh
     [not found]         ` <1355326242.67765.YahooMailNeo-mKBY30tKGRG2Y7dhQGSVAJOW+3bF1jUfVpNB7YpNyf8@public.gmane.org>
2012-12-12 19:57           ` Vyacheslav Dubeyko
     [not found]             ` <706EE260-E8A2-410A-9211-FB4859516478-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
2012-12-13 10:59               ` Sven-Göran Bergh
2012-12-16 17:45       ` Andreas Rohner [this message]
2012-12-17  6:30         ` Vyacheslav Dubeyko
2012-12-17 10:23           ` Andreas Rohner
2012-12-19  7:13             ` Vyacheslav Dubeyko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1355679932.761.57.camel@terok \
    --to=e0502196-oe7qfrrrqffzpe21taidcio7c/xpubjb@public.gmane.org \
    --cc=linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.