From: Quinn Harris <lists@qutek.net>
To: reiserfs-list@namesys.com
Subject: Re: Relocating files for faster boot/start-up on reiser(fs/4)
Date: Wed, 13 Sep 2006 21:10:43 -0600 [thread overview]
Message-ID: <200609132110.45703.lists@qutek.net> (raw)
In-Reply-To: <ee9s4k$f58$1@sea.gmane.org>
Peter,
I think you misunderstood what and why I was doing this. Let me try to
clarify.
My test is far from perfect. Its mearly an exercise to verify the basic idea.
> Just by copying you are allowing reiser to optimize the dir.
Exactly, but I am copying in a way that implicitly suggests what order those
files will be accessed in.
I was attempting to reorder the data on disk to minimize disk
seeks with knowledge of the order that data will be accessed. This was done
by taking advantage of the way reiser assigns keys to files based on their
name and its affinity to match key order with block order.
> You're trying to duplicate what a tree-based design does automatically.
This works because of the tree-based design of reiser.
The reiser must assign each file (item actually) some key, why not take
advantage of knowledge of the order those items will be accessed in? The
current key assignment algorithm is a best guess at that given the limited
information it has (file/directory name). Remember key assignment roughly
translates to on disk position.
The relocate script can leave the file system in the exact same state from a
semantic standpoint (what files and directories are there) but relocate the
data on disk. Copying those files to single directory with numeric names was
a kludge to implicitly tell the file system to place those files in a
specific order and near each other on disk. The rename step is to switch the
old unoptimized file position with the new more optimized position.
> Moreover, remember that reiser packs
> files into clusters so that you may read more than just your one file from
> time to time which could end up adding time to your test.
The boot optimization was over 3885 files. Ideally those files would be
ordered head to tail in a sequence that perfectly matches the order they will
be read. As a result multiple items in a node will all need to be read at
nearly the same time. That didn't happen in my test, but it was much closer
to that after I ran the relocate script than before. Hence the performance
improvement. With this script, reiser4 and a repacker I have reason to
believe the ordering will be nearly perfect. Of course, that is excluding
random access patterns inside the same file and the directory data needed to
get at the files.
This basic technique can be made into a boot script much like the readahead
script already in Ubuntu, just improved. Boot once with a profile option, it
measures read patterns (already does this), then reorders data on disk with
this trick, or maybe something better. Then the next time you boot its
1.5-2x faster. Better yet, including this profile information in the distro
packages. When a package is installed this info is used to help assign item
keys resulting in a better disk layout and faster boot times and no weird
file copy rename mumbo jumbo.
I bring this up here because I expect with reiser4, a repacker, and this
trick, reiser4 could deliver at least 50% better reproducible real world boot
and app load performance than any other file system. At least until other
file system implement something similar, like what MS did with XP. Can
something similar be done (or has been) on ext(2/3/4), XFS, JFS or other
linux file systems?
Windows XP boots much faster than Windows 2000 in part because it does what I
am talking about. File access is recorded at boot, then the disk is defraged
with this knowledge. Check out
http://msdn.microsoft.com/msdnmag/issues/01/12/xpkernel/default.aspx
under "Prefetch".
Also look at http://kerneltrap.org/node/2157
MS's implementation required implementing a defrag utility with a specific
feature that could position disk data based on access logs. Reiser4 can do
the same thing as part of its basic functionality with the addition of a much
much simpler tool to help assign keys based on that access log. Then a
repacker (when it devaporizes) can further optimize for that access pattern
without any code specific to that purpose. Seems like good orthogonal design
to me.
Hope that clarifies. Like my previous post, whatever it did, it did it in way
to many words.
On Wednesday 13 September 2006 15:10, Peter wrote:
> On Wed, 13 Sep 2006 14:51:39 -0600, Quinn Harris wrote:
> > Thoughts?
>
> Yes. Why on earth would you do this? By copying the files and renaming and
> hardlinking them is nothing a sysadmin would ever do. Just by copying you
> are allowing reiser to optimize the dir. You're trying to duplicate what a
> tree-based design does automatically. Moreover, remember that reiser packs
> files into clusters so that you may read more than just your one file from
> time to time which could end up adding time to your test.
>
> If reiser needs speedup it certainly won't be done by renaming files!
>
> JM$0.02
--
Quinn Harris
next prev parent reply other threads:[~2006-09-14 3:10 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-09-13 20:51 Relocating files for faster boot/start-up on reiser(fs/4) Quinn Harris
2006-09-13 21:10 ` Peter
2006-09-14 3:10 ` Quinn Harris [this message]
2006-09-14 19:55 ` David Masover
2006-09-14 22:09 ` Quinn Harris
2006-09-14 22:23 ` David Masover
2006-09-15 5:15 ` Toby Thain
2006-09-15 21:20 ` Quinn Harris
2006-09-15 22:27 ` David Masover
2006-09-16 0:01 ` Quinn Harris
2006-09-16 8:59 ` David Masover
2006-09-18 9:36 ` PFC
2006-09-18 22:32 ` Quinn Harris
2006-09-14 14:01 ` cmaurand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200609132110.45703.lists@qutek.net \
--to=lists@qutek.net \
--cc=reiserfs-list@namesys.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.