linux-nilfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andreas Rohner <e0502196-oe7qfRrRQffzPE21tAIdciO7C/xPubJB@public.gmane.org>
To: Vyacheslav Dubeyko <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: Contributing to NILFS
Date: Sun, 16 Dec 2012 18:45:32 +0100	[thread overview]
Message-ID: <1355679932.761.57.camel@terok> (raw)
In-Reply-To: <1355296112.2042.35.camel@slavad-ubuntu>

Hi Vyacheslav,

> I think that this task hides many difficult questions. How does it
> define what files fragmented or not? How does it measure the
> fragmentation degree? What fragmentation degree should be a basis for
> defragmentation activity? When does it need to detect fragmentation and
> how to keep this knowledge? How does it make defragmentation without
> performance degradation?
>
> As I understand, when we are talking about defragmentation then we
> expect a performance enhancement as a result. But defragmenter activity
> can be a background reason of performance degradation. Not every
> workload or I/O pattern can be a reason of significant fragmentation.
> 
> Also, it is a very important to choose a point of defragmentation. I
> mean that it is possible to try to prevent fragmentation or to correct
> fragmentation after flushing on the volume. It is possible to have a
> some hybrid technique, I think. An I/O pattern or file type can be a
> basis for such decision, I think.

Yes I agree. It is of course a good idea to reorder the data before
flushing and probably also to reorder it with the cleaner, but I
thought, that was already implemented and optimized. Is it?

Instead I imagined a tool like xfs_fsr for XFS. So the user can decide
when to defragment the file system, by running it manually or with a
cron job. Maybe this is a bit naive, since I probably don't know enough
about NILFS. Couldn't we just calculate the number of segments a file
uses if it is stored optimally and compare that to the actual number of
segments the file is spread out. For example, file A has 16MB. Lets
assume segments are of size 8MB. So (ignoring the metadata) file A
should use 2 segments. Now we count the different segments where the
blocks of file A really are, lets say 10, and calculate 1-(2/10)=0.8 So
it is 80% fragmented.

I wouldn't do that in the cleaner or in the background. Just a tool like
xfs_fsr, that the user can run once a month in the middle of the night
with a cron job. The tool would go through every file, calculate the
fragmentation and collect other statistics and decide if it is worth
defragmenting it or not.

If the user has a SSD he/she can decide not to defragment at all.

> As I understand, F2FS [1] has some defragmenting approaches. I think
> that it needs to discuss more deeply about technique of detecting
> fragmented files and fragmentation degree. But maybe hot data tracking
> patch [2,3] will be a basis for such discussion.

I did a quick search for F2FS defragmentation, but I couldn't find
anything. Did you mean this section of the article? "...it provides
large-scale write gathering so that when lots of blocks need to be
written at the same time they are collected into large sequential
writes..." Maybe I missed something, but isn't this just the inherent
property of a log-structured file system and not defragmentation?

Hot data tracking could be extremely useful for the cleaner. This paper
[1] suggests, that the best cleaner performance can be achieved by
distinguishing between hot and cold data. Is something like that already
implemented? Maybe I could do that for my masters thesis instead of the
defragmentation task... ;)

Thanks for the links. 

best regards,
Andreas Rohner

[1] http://www.cs.berkeley.edu/~brewer/cs262/LFS.pdf

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2012-12-16 17:45 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-10 20:05 Contributing to NILFS Andreas Rohner
2012-12-11  6:46 ` Vyacheslav Dubeyko
2012-12-11 13:54   ` Andreas Rohner
2012-12-12  7:08     ` Vyacheslav Dubeyko
2012-12-12 15:30       ` Sven-Göran Bergh
     [not found]         ` <1355326242.67765.YahooMailNeo-mKBY30tKGRG2Y7dhQGSVAJOW+3bF1jUfVpNB7YpNyf8@public.gmane.org>
2012-12-12 19:57           ` Vyacheslav Dubeyko
     [not found]             ` <706EE260-E8A2-410A-9211-FB4859516478-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
2012-12-13 10:59               ` Sven-Göran Bergh
2012-12-16 17:45       ` Andreas Rohner [this message]
2012-12-17  6:30         ` Vyacheslav Dubeyko
2012-12-17 10:23           ` Andreas Rohner
2012-12-19  7:13             ` Vyacheslav Dubeyko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1355679932.761.57.camel@terok \
    --to=e0502196-oe7qfrrrqffzpe21taidcio7c/xpubjb@public.gmane.org \
    --cc=linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).