* Relocating files for faster boot/start-up on reiser(fs/4)
@ 2006-09-13 20:51 Quinn Harris
2006-09-13 21:10 ` Peter
0 siblings, 1 reply; 14+ messages in thread
From: Quinn Harris @ 2006-09-13 20:51 UTC (permalink / raw)
To: reiserfs-list
I have been playing around with relocating file data to improve boot time and
app start-up time (like OpenOffice) on reiser(fs/4). This is done by
monitoring the files accessed during boot/start-up then copying these files
into a single directory with sequential names 0001 0002 ... matching the
access order. Finally the new files are hard linked (rename should work too)
to the same location as the original files.
As I understand it both reiserfs and reiser4 assign keys to items based on the
file name and the parent directory. The file system then attempts to match
block order with key order . This allows the above trick to work for placing
files in a specific order next to each other on disk.
I am using readahead-watch on Ubuntu. This little tool uses inotify to
monitor all file accesses while it runs. The accessed files are written to a
text file by disk order. I have modified this tool to also write them by
access time. I then use a script (ruby) to do the above copy and link using
the output from readahead-watch.
I have done some tests on my Athlon 2200 laptop running reiserfs. Hard drive
is a 40GB Hitachi Travelstar 80GB has a max real Tx of 25MB/s and access time
of 12ms.
The reiserfs partition size is 36G with 8.9G used.
I used readahead-watch to create a readahead log during boot on Ubuntu Edgy
much like the default configuration with the "profile" boot option except set
to record by access time and I manually killed it after the system fully
booted. The with this log used for readahead the system booted in 2:15 from
grub load to usable desktop (auto login) as measured manually by a stop
watch. After running the relocate script the boot time with the same
readahead log was 1:38. I then reran the readahead-watch during boot set to
sort by disk order, resulting in a boot time of 1:15. I booted twice for
each test to make sure the results were within a few seconds.
I also used bootchart, but this didn't measure Gnome start-up and requires a
bit of ambition to analyze thoroughly. But it was evident that running the
relocate script did increase peek disk throughput from 6MB/s to 13MB/s and
increased the averate throughput rate. But most of boot time is still spent
waiting on the disk. My relocate script relocated 310Mb of files. If those
where perfectly contiguous on disk, this drive should be able to load that in
under 20s. Thought I expect only a fraction of that is actually accessed
during boot.
Using 'filefrag' it is evident that the relocate scripts attempt to relocate
the file continuously was a bit half assed, but from the boot times it was
clearly an improvement.
I also used readahead-watch to monitor the accessed files of openoffice writer
on startup. The initial cold start time was 17s (about 0.5s variation from
load to load). A warm start (start right after its closed) was 3.6s. The
results from readahead-watch where filtered through a script to remove all
files that where open when openoffice wasn't running (using fuser). Running
the relocate script on some of the X and gnome libraries broke my system
nicely until a reboot. After running the relocate script the cold start time
became 14s. When readahead-list is run on the same files relocated before
starting openoffice the load time was 6.5s. sudo sh -c "echo 1
> /proc/sys/vm/drop_caches" was used to ensure the disk was read between
runs.
Of course, these results are highly dependent on how fragmented the files
where before and how effectively the relocation worked. I expect others
could reproduce speedups but how much will vary. I did these tests on my
laptop with a slow hard drive so the results would be more evident.
I also did some test with fresh reiserfs, reiser4, and ext3 on a 100MB
loopback to see how well the file system would take the hint to order data
sequentially. Creating 10 5MB files with sequential names on reiser4
resulted in one fragment (measured by 'filefrag') for the whole bunch
probably a disk allocation bitmap, nearly perfect. reiserfs generally would
end up with 3-4 fragments for the same test. And ext3 didn't appear to make
any real attempt to order the files sequentially on disk.
I have a 29GB reiser4 partition with 21GB used I have been running for a few
years now (sometime before release). When I ran the same 10 5MB file test on
it, the total resulted in 1000+ fragments (didn't bother to count, but it was
a lot). But the files where allocated head to tail. Its a bit scary to
think the file system can't find a few MB unallocated region on disk.
Clearly a repacker would be really nice.
Relocating file data to match pre-measured access patterns can clearly make a
big performance difference. Reiser(fs/4) provides an easy mechanism to hint
at disk order which can be used to measurably improve boot/startup times.
But, I expect more can be done to achieve better results. This includes
better measurements of read patterns and better allocation of the data.
I hope to rerun these tests with Reiser4 (maybe 4.1) on the same hardware. I
expect with a fresh (not fragmented) Reiser4 partition, the improvements will
be more pronounced. But a repacker should allow more reproducible results
and nearly perfect data placement for boot and app start-up.
I hope with reiser4, relocation, and the new upstart (Ubuntu's sysvinit
replacement) with good scripts, I will get this system to boot to usable in
30 seconds. And slowoffice (aka openoffice) to load in 6s cold. Am I
overoptimistic?
What about a mechanism to explicitly set or hint at item keys? Maybe someday,
linux packages could include preferred file order information that a file
system like reiser4 could use to order the files on disk resulting in fast
load times without the need for the user to profile the app.
I think there is a lot to be said for measuring access patterns and using that
to set keys in addition to deducing it from semantics using a fibration
plug-in.
Thoughts?
--
Quinn Harris
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Relocating files for faster boot/start-up on reiser(fs/4)
2006-09-13 20:51 Relocating files for faster boot/start-up on reiser(fs/4) Quinn Harris
@ 2006-09-13 21:10 ` Peter
2006-09-14 3:10 ` Quinn Harris
2006-09-14 14:01 ` cmaurand
0 siblings, 2 replies; 14+ messages in thread
From: Peter @ 2006-09-13 21:10 UTC (permalink / raw)
To: reiserfs-list-nJ1KrdHEGnBBDgjK7y7TUQ
On Wed, 13 Sep 2006 14:51:39 -0600, Quinn Harris wrote:
>
> Thoughts?
>
Yes. Why on earth would you do this? By copying the files and renaming and
hardlinking them is nothing a sysadmin would ever do. Just by copying you
are allowing reiser to optimize the dir. You're trying to duplicate what a
tree-based design does automatically. Moreover, remember that reiser packs
files into clusters so that you may read more than just your one file from
time to time which could end up adding time to your test.
If reiser needs speedup it certainly won't be done by renaming files!
JM$0.02
--
Peter
+++++
Do not reply to this email, it is a spam trap and not monitored.
I can be reached via this list, or via
jabber: pete4abw at jabber.org
ICQ: 73676357
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Relocating files for faster boot/start-up on reiser(fs/4)
2006-09-13 21:10 ` Peter
@ 2006-09-14 3:10 ` Quinn Harris
2006-09-14 19:55 ` David Masover
2006-09-14 14:01 ` cmaurand
1 sibling, 1 reply; 14+ messages in thread
From: Quinn Harris @ 2006-09-14 3:10 UTC (permalink / raw)
To: reiserfs-list
Peter,
I think you misunderstood what and why I was doing this. Let me try to
clarify.
My test is far from perfect. Its mearly an exercise to verify the basic idea.
> Just by copying you are allowing reiser to optimize the dir.
Exactly, but I am copying in a way that implicitly suggests what order those
files will be accessed in.
I was attempting to reorder the data on disk to minimize disk
seeks with knowledge of the order that data will be accessed. This was done
by taking advantage of the way reiser assigns keys to files based on their
name and its affinity to match key order with block order.
> You're trying to duplicate what a tree-based design does automatically.
This works because of the tree-based design of reiser.
The reiser must assign each file (item actually) some key, why not take
advantage of knowledge of the order those items will be accessed in? The
current key assignment algorithm is a best guess at that given the limited
information it has (file/directory name). Remember key assignment roughly
translates to on disk position.
The relocate script can leave the file system in the exact same state from a
semantic standpoint (what files and directories are there) but relocate the
data on disk. Copying those files to single directory with numeric names was
a kludge to implicitly tell the file system to place those files in a
specific order and near each other on disk. The rename step is to switch the
old unoptimized file position with the new more optimized position.
> Moreover, remember that reiser packs
> files into clusters so that you may read more than just your one file from
> time to time which could end up adding time to your test.
The boot optimization was over 3885 files. Ideally those files would be
ordered head to tail in a sequence that perfectly matches the order they will
be read. As a result multiple items in a node will all need to be read at
nearly the same time. That didn't happen in my test, but it was much closer
to that after I ran the relocate script than before. Hence the performance
improvement. With this script, reiser4 and a repacker I have reason to
believe the ordering will be nearly perfect. Of course, that is excluding
random access patterns inside the same file and the directory data needed to
get at the files.
This basic technique can be made into a boot script much like the readahead
script already in Ubuntu, just improved. Boot once with a profile option, it
measures read patterns (already does this), then reorders data on disk with
this trick, or maybe something better. Then the next time you boot its
1.5-2x faster. Better yet, including this profile information in the distro
packages. When a package is installed this info is used to help assign item
keys resulting in a better disk layout and faster boot times and no weird
file copy rename mumbo jumbo.
I bring this up here because I expect with reiser4, a repacker, and this
trick, reiser4 could deliver at least 50% better reproducible real world boot
and app load performance than any other file system. At least until other
file system implement something similar, like what MS did with XP. Can
something similar be done (or has been) on ext(2/3/4), XFS, JFS or other
linux file systems?
Windows XP boots much faster than Windows 2000 in part because it does what I
am talking about. File access is recorded at boot, then the disk is defraged
with this knowledge. Check out
http://msdn.microsoft.com/msdnmag/issues/01/12/xpkernel/default.aspx
under "Prefetch".
Also look at http://kerneltrap.org/node/2157
MS's implementation required implementing a defrag utility with a specific
feature that could position disk data based on access logs. Reiser4 can do
the same thing as part of its basic functionality with the addition of a much
much simpler tool to help assign keys based on that access log. Then a
repacker (when it devaporizes) can further optimize for that access pattern
without any code specific to that purpose. Seems like good orthogonal design
to me.
Hope that clarifies. Like my previous post, whatever it did, it did it in way
to many words.
On Wednesday 13 September 2006 15:10, Peter wrote:
> On Wed, 13 Sep 2006 14:51:39 -0600, Quinn Harris wrote:
> > Thoughts?
>
> Yes. Why on earth would you do this? By copying the files and renaming and
> hardlinking them is nothing a sysadmin would ever do. Just by copying you
> are allowing reiser to optimize the dir. You're trying to duplicate what a
> tree-based design does automatically. Moreover, remember that reiser packs
> files into clusters so that you may read more than just your one file from
> time to time which could end up adding time to your test.
>
> If reiser needs speedup it certainly won't be done by renaming files!
>
> JM$0.02
--
Quinn Harris
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Relocating files for faster boot/start-up on reiser(fs/4)
2006-09-13 21:10 ` Peter
2006-09-14 3:10 ` Quinn Harris
@ 2006-09-14 14:01 ` cmaurand
1 sibling, 0 replies; 14+ messages in thread
From: cmaurand @ 2006-09-14 14:01 UTC (permalink / raw)
To: Peter; +Cc: reiserfs-list
SCO has done this solution, thats why its such a dog.
Peter wrote:
> On Wed, 13 Sep 2006 14:51:39 -0600, Quinn Harris wrote:
>> Thoughts?
>>
>
> Yes. Why on earth would you do this? By copying the files and renaming and
> hardlinking them is nothing a sysadmin would ever do. Just by copying you
> are allowing reiser to optimize the dir. You're trying to duplicate what a
> tree-based design does automatically. Moreover, remember that reiser packs
> files into clusters so that you may read more than just your one file from
> time to time which could end up adding time to your test.
>
> If reiser needs speedup it certainly won't be done by renaming files!
>
> JM$0.02
>
--
Curtis Maurand
Senior Network & Systems Engineer
BlueTarp Financial, Inc.
443 Congress St.
6th Floor
Portland, ME 04101
207.797.5900 x233 (office)
207.797.3833 (fax)
mailto:cmaurand@bluetarp.com
http://www.bluetarp.com
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Relocating files for faster boot/start-up on reiser(fs/4)
2006-09-14 3:10 ` Quinn Harris
@ 2006-09-14 19:55 ` David Masover
2006-09-14 22:09 ` Quinn Harris
0 siblings, 1 reply; 14+ messages in thread
From: David Masover @ 2006-09-14 19:55 UTC (permalink / raw)
To: Quinn Harris; +Cc: reiserfs-list
Quinn Harris wrote:
> The boot optimization was over 3885 files. Ideally those files would be
> ordered head to tail in a sequence that perfectly matches the order they will
> be read.
> I bring this up here because I expect with reiser4, a repacker, and this
Now that you mention it, do you have a control of some sort to prove
this isn't just fragmentation? That is, copy the files you're messing
with in some random order (that should make booting slower), and
benchmark that?
I'm guessing a repacker would speed things up much more than this,
although this is interesting and helpful for systems which need to be
rebooted often. (I'd prefer a working suspend2...)
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Relocating files for faster boot/start-up on reiser(fs/4)
2006-09-14 19:55 ` David Masover
@ 2006-09-14 22:09 ` Quinn Harris
2006-09-14 22:23 ` David Masover
0 siblings, 1 reply; 14+ messages in thread
From: Quinn Harris @ 2006-09-14 22:09 UTC (permalink / raw)
To: reiserfs-list
On Thursday 14 September 2006 13:55, David Masover wrote:
> Quinn Harris wrote:
> > The boot optimization was over 3885 files. Ideally those files would be
> > ordered head to tail in a sequence that perfectly matches the order they
> > will be read.
> >
> > I bring this up here because I expect with reiser4, a repacker, and this
>
> Now that you mention it, do you have a control of some sort to prove
> this isn't just fragmentation? That is, copy the files you're messing
> with in some random order (that should make booting slower), and
> benchmark that?
That is a good point. Recording the disk layout before and after to compare
relative fragmentation would be a good idea. As well as randomizing the
sequence as a sanity check.
Also note that during boot I was using readahead on all 3885 files. So the
kernel has a good opportunity to rearrange the reads. And the read sequence
doesn't necessary match the order its needed (though I tried to get that).
From what I have done, it would be unreasonable to say anything other than,
this has a good change of working. I really think this should work in
theory. That's why I tried it in the first place and required relatively
little data to convince myself I am not smoking crack.
A simple test with reiser4 suggests that it will pack the files right up next
to each other in the specified sequence. Take a look at the section "If It
Is In RAM, Dirty, and Contiguous, Then Squeeze It ALL Together Just Before
Writing" on www.namesys.com.
I will redo this on a fresh reiser4 filesystem to reduce initial fragmentation
and run some test to eliminate reasonable confounds.
>
> I'm guessing a repacker would speed things up much more than this,
> although this is interesting and helpful for systems which need to be
> rebooted often. (I'd prefer a working suspend2...)
This idea complements a repacker. The core idea is to cause the fs to order
files based on recorded access patterns that will likely happen again
(instead of just the location in the directory tree). A repacker would
ensure files are placed in the exact order back to back as we expect them to
be accessed. File system fragmentation will substantially inhibit this from
happening. That said, this is causing a basic form of repacking.
Yeah a good suspend will probably result in faster boot no matter how many
tricks you use to improve normal boot, but this could also improve cold load
times for apps like Open Office and Firefox. In addition suspend drops the
disk cache. Profiling access patterns on resuming from suspend then
relocating based on that might improve restore times.
Any time file access is very predictable, and doesn't closely follow the
default key assignment order, this could be improve performance.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Relocating files for faster boot/start-up on reiser(fs/4)
2006-09-14 22:09 ` Quinn Harris
@ 2006-09-14 22:23 ` David Masover
2006-09-15 5:15 ` Toby Thain
0 siblings, 1 reply; 14+ messages in thread
From: David Masover @ 2006-09-14 22:23 UTC (permalink / raw)
To: Quinn Harris; +Cc: reiserfs-list
Quinn Harris wrote:
> On Thursday 14 September 2006 13:55, David Masover wrote:
>> Quinn Harris wrote:
>>> The boot optimization was over 3885 files. Ideally those files would be
>>> ordered head to tail in a sequence that perfectly matches the order they
>>> will be read.
>>>
>>> I bring this up here because I expect with reiser4, a repacker, and this
>> Now that you mention it, do you have a control of some sort to prove
>> this isn't just fragmentation? That is, copy the files you're messing
>> with in some random order (that should make booting slower), and
>> benchmark that?
>
> That is a good point. Recording the disk layout before and after to compare
> relative fragmentation would be a good idea. As well as randomizing the
> sequence as a sanity check.
>
> Also note that during boot I was using readahead on all 3885 files. So the
> kernel has a good opportunity to rearrange the reads. And the read sequence
> doesn't necessary match the order its needed (though I tried to get that).
Speaking of which, did you parallize the boot process at all? I'd
estimate my system easily spent more than 50% of its boot time not
touching the disk at all before I did that. Gentoo can do this, I'm not
sure what else, as it kind of needs your init system to understand
dependencies.
As far as faster load times for Firefox and OpenOffice, you may be on to
something here, but then, these apps probably match up pretty well with
the on-disk format, too.
This would probably be most useful for a case where you have to read a
lot of small files, with relatively low CPU usage, in a fairly
unpredictable order. I suspect it would be nice for Gentoo's Portage
tree, though I can't think what else.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Relocating files for faster boot/start-up on reiser(fs/4)
2006-09-14 22:23 ` David Masover
@ 2006-09-15 5:15 ` Toby Thain
2006-09-15 21:20 ` Quinn Harris
0 siblings, 1 reply; 14+ messages in thread
From: Toby Thain @ 2006-09-15 5:15 UTC (permalink / raw)
To: David Masover; +Cc: ReiserFS List
On 14-Sep-06, at 6:23 PM, David Masover wrote:
> Quinn Harris wrote:
>> On Thursday 14 September 2006 13:55, David Masover wrote:
>>> ...
>> That is a good point. Recording the disk layout before and after
>> to compare relative fragmentation would be a good idea. As well
>> as randomizing the sequence as a sanity check.
>> Also note that during boot I was using readahead on all 3885
>> files. So the kernel has a good opportunity to rearrange the
>> reads. And the read sequence doesn't necessary match the order
>> its needed (though I tried to get that).
>
> Speaking of which, did you parallize the boot process at all?
Just off the top of my head, wouldn't that make the access sequence
asynchronous & thereby less predictable? (Although I'm sure it's a
net win.)
> I'd estimate my system easily spent more than 50% of its boot time
> not touching the disk at all before I did that. Gentoo can do
> this, I'm not sure what else, as it kind of needs your init system
> to understand dependencies.
...
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Relocating files for faster boot/start-up on reiser(fs/4)
2006-09-15 5:15 ` Toby Thain
@ 2006-09-15 21:20 ` Quinn Harris
2006-09-15 22:27 ` David Masover
2006-09-18 9:36 ` PFC
0 siblings, 2 replies; 14+ messages in thread
From: Quinn Harris @ 2006-09-15 21:20 UTC (permalink / raw)
To: reiserfs-list
On Thursday 14 September 2006 23:15, Toby Thain wrote:
> On 14-Sep-06, at 6:23 PM, David Masover wrote:
> > Quinn Harris wrote:
> >> On Thursday 14 September 2006 13:55, David Masover wrote:
> >>> ...
> >>
> >> That is a good point. Recording the disk layout before and after
> >> to compare relative fragmentation would be a good idea. As well
> >> as randomizing the sequence as a sanity check.
> >> Also note that during boot I was using readahead on all 3885
> >> files. So the kernel has a good opportunity to rearrange the
> >> reads. And the read sequence doesn't necessary match the order
> >> its needed (though I tried to get that).
> >
> > Speaking of which, did you parallize the boot process at all?
>
> Just off the top of my head, wouldn't that make the access sequence
> asynchronous & thereby less predictable? (Although I'm sure it's a
> net win.)
It could, but the kernel will try to reorder the outstanding block requests to
reduce seek. If that is an overall win I don't know. In addition early in
the boot, readahead-list or similar will tell the kernel to start reading
most of the files need for the complete boot so they are already in memory
when needed. Ubuntu does the readahead now and all my tests where with
readahead.
>
> > I'd estimate my system easily spent more than 50% of its boot time
> > not touching the disk at all before I did that. Gentoo can do
> > this, I'm not sure what else, as it kind of needs your init system
> > to understand dependencies.
>
> ...
My first test turned out to be on a heavily fragmented file system. I
reinstalled Ubuntu Dapper with a fresh reiserfs file system and it booted in
1:07 (grub to desktop background appearing). After extending the time
readahead-watch monitors files and running the reallocate script it now boots
in 0:50.
I wrote a little python script that uses the FIBMAP ioctl to check the blocks
the files are using. From this I know the relocate script on this fresh file
system is doing exactly what it was intended to do. I am also able to
estimate how much it will improve performance by comparing the fragmentation
before and after its run. I have learned that the delays on disk io for
Ubuntu boot are dominated by rotational latency and not head seeks. The
current readahead implementation orders the files by on disk location,
substantially mitigating head seek time. But the latency is can easily
double the time needed to load the same data.
Subjectively (and objectively by about 6s) relocation and extending
readahead-watch substantially improved Gnome boot and initial responsiveness.
But, I need to measure how much of this was caused by just extending how much
is read ahead vs. the reallocation.
The current Ubuntu boot waits for hardware probing, DHCP and other things
giving the disk readahead a chance to work. I think this reallocation might
help a parallel boot more as the data will be needed sooner. So I changed my
mind, I think parallel boot will highlight the reallocate advantage. Now I
just need to test the hypothesis.
Not sure if I would be better of trying initng or waiting for upstart (Ubuntus
new init) to get scripts that actually parallel boot. The code for upstart
is very clean and it has the backing of a major distro, so I have high hopes.
Much like before, I was able to improve a 16.5s oowriter cold start to 14s
with this reallocate script, with a cold start of 4.8s (OO 2.0.2, was using
2.0.3 before). It is evident to me that the readahead-watch is missing
something on Open Office startup. It seems very possible to get OO to cold
start in under 8s with the uses of reallocation and readahead right when it
starts.
My current scripts are at
http://www.quinnh.org/reallocate.py (27 line reallocate script, expects
dir /tmp/refrag to exist and takes the readahead-watch log as a paramater)
http://www.quinnh.org/measure.py (uses FIBMAP to estimate the time needed to
load the files in the passed readahead-watch log, uses average seek and and
latency for estimate)
http://www.quinnh.org/readahead-watch-time-order.patch (Patch against Ubuntu
readahead-watch to add an order by access time option.)
I will try to write a nice unified script that will profile, reallocate and do
readahead for an application to speed it up. e.g. "# reallocate.py
oowriter". Run it once to profile and reallocate. drop_caches, Run it again
and oowriter loads faster.
I think Python will be the best language for this because its become
relatively universal and its easy to understand for the uninitiated. This
really isn't black magic so transparency is good. I personally prefer Ruby
though.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Relocating files for faster boot/start-up on reiser(fs/4)
2006-09-15 21:20 ` Quinn Harris
@ 2006-09-15 22:27 ` David Masover
2006-09-16 0:01 ` Quinn Harris
2006-09-18 9:36 ` PFC
1 sibling, 1 reply; 14+ messages in thread
From: David Masover @ 2006-09-15 22:27 UTC (permalink / raw)
To: Quinn Harris; +Cc: reiserfs-list
Quinn Harris wrote:
> On Thursday 14 September 2006 23:15, Toby Thain wrote:
>> On 14-Sep-06, at 6:23 PM, David Masover wrote:
>>> Quinn Harris wrote:
>>>> On Thursday 14 September 2006 13:55, David Masover wrote:
>>>>> ...
>>>> That is a good point. Recording the disk layout before and after
>>>> to compare relative fragmentation would be a good idea. As well
>>>> as randomizing the sequence as a sanity check.
>>>> Also note that during boot I was using readahead on all 3885
>>>> files. So the kernel has a good opportunity to rearrange the
>>>> reads. And the read sequence doesn't necessary match the order
>>>> its needed (though I tried to get that).
>>> Speaking of which, did you parallize the boot process at all?
>> Just off the top of my head, wouldn't that make the access sequence
>> asynchronous & thereby less predictable? (Although I'm sure it's a
>> net win.)
> It could, but the kernel will try to reorder the outstanding block requests to
> reduce seek. If that is an overall win I don't know. In addition early in
> the boot, readahead-list or similar will tell the kernel to start reading
> most of the files need for the complete boot so they are already in memory
> when needed. Ubuntu does the readahead now and all my tests where with
> readahead.
That's interesting. I think either parallizing or a very aggressive
readahead will perform similarly, except in cases where you have a
script blocking on something other than disk or CPU, like, say, network.
>>> I'd estimate my system easily spent more than 50% of its boot time
>>> not touching the disk at all before I did that. Gentoo can do
>>> this, I'm not sure what else, as it kind of needs your init system
>>> to understand dependencies.
>> ...
>
> The current Ubuntu boot waits for hardware probing, DHCP and other things
> giving the disk readahead a chance to work. I think this reallocation might
> help a parallel boot more as the data will be needed sooner. So I changed my
> mind, I think parallel boot will highlight the reallocate advantage. Now I
> just need to test the hypothesis.
Hmm. That's possible. But again, even with the parallel boot, there
was still a bit of time spent not touching the disk, so I wouldn't
expect much more of a speedup than what you already have. Which also
means, by the way, that I wouldn't use it much -- my system takes more
like 20 seconds from Grub to a login prompt, and from then on, the only
things that take more than 5 seconds to load are games. Since I know
Quake 4 uses zipfiles (probably compressed) for its storage, and I
watched the HD LED while it loads, I don't think I can speed that up at
all short of buying a faster CPU.
Well, that and the Portage tree, but you say I shouldn't expect much
from that. Maybe the portage cache?
> Not sure if I would be better of trying initng or waiting for upstart (Ubuntus
> new init) to get scripts that actually parallel boot. The code for upstart
> is very clean and it has the backing of a major distro, so I have high hopes.
Hmm. That sounds kind of cool, but I wonder how it compares to Gentoo's
init scripts? I guess I'll have to wait till it hits the one Ubuntu box
I have...
> Much like before, I was able to improve a 16.5s oowriter cold start to 14s
> with this reallocate script, with a cold start of 4.8s (OO 2.0.2, was using
> 2.0.3 before).
Wait -- cold start is 14s, but it's also 4.8s? Did you mean warm/hot
start for that last number?
> I think Python will be the best language for this because its become
> relatively universal and its easy to understand for the uninitiated. This
> really isn't black magic so transparency is good. I personally prefer Ruby
> though.
Wait... Python is more universal than Ruby of Ruby on Rails?
Python is faster, anyway... I'm waiting for someone to do a decent
implementation of Ruby on something like .NET before I start using it
for anything I want to perform well.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Relocating files for faster boot/start-up on reiser(fs/4)
2006-09-15 22:27 ` David Masover
@ 2006-09-16 0:01 ` Quinn Harris
2006-09-16 8:59 ` David Masover
0 siblings, 1 reply; 14+ messages in thread
From: Quinn Harris @ 2006-09-16 0:01 UTC (permalink / raw)
To: reiserfs-list
On Friday 15 September 2006 16:27, David Masover wrote:
> > Not sure if I would be better of trying initng or waiting for upstart
> > (Ubuntus new init) to get scripts that actually parallel boot. The code
> > for upstart is very clean and it has the backing of a major distro, so I
> > have high hopes.
>
> Hmm. That sounds kind of cool, but I wonder how it compares to Gentoo's
> init scripts? I guess I'll have to wait till it hits the one Ubuntu box
> I have...
Gentoo default init doesn't paralize well. Not when compared to initng which
is realitivly easy to get to work on Gentoo. The Ubuntu people decided
initng wansn't powerfull enough (let alone the existing sysvinit). They
thought it needed a better way to define the bootup sequence during boot. In
addition to integrate running any task like ACPI events, hotplut, CRON into
one consistent tool.
http://www.netsplit.com/blog/work/canonical/upstart.html
>
> > Much like before, I was able to improve a 16.5s oowriter cold start to
> > 14s with this reallocate script, with a cold start of 4.8s (OO 2.0.2, was
> > using 2.0.3 before).
>
> Wait -- cold start is 14s, but it's also 4.8s? Did you mean warm/hot
> start for that last number?
OOPS its 4.8s warm and was initially 16.5s cold then 14s cold after
reallocationg.
>
> > I think Python will be the best language for this because its become
> > relatively universal and its easy to understand for the uninitiated.
> > This really isn't black magic so transparency is good. I personally
> > prefer Ruby though.
>
> Wait... Python is more universal than Ruby of Ruby on Rails?
Both Gentoo and Ubuntu install Python by default but not Ruby. And more
people at least in the US are familiar with Python. Finally I might use
Python inotify code (to replace readahead-watch) and the Ruby version is a
bit alpha and I don't think availible in Gentoo or Ubuntu packages.
I came to Ruby through RoR. I think the language has an unmatched pragmatic
eligance. This isn't appreciated until one addresses a few problem domains
with it. I don't know of anything Python does reasonably well that Ruby
can't do reasonably well (- the performance problem). On the other hand, I
doubt Python could make for something as slick as Rake
http://www.martinfowler.com/articles/rake.html. And Ruby provides a wealth
of conviences and shortcuts without being the lexical mess that is Perl. I
could be missing something though. But for this particular problem Python
isn't bad.
>
> Python is faster, anyway... I'm waiting for someone to do a decent
> implementation of Ruby on something like .NET before I start using it
> for anything I want to perform well.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Relocating files for faster boot/start-up on reiser(fs/4)
2006-09-16 0:01 ` Quinn Harris
@ 2006-09-16 8:59 ` David Masover
0 siblings, 0 replies; 14+ messages in thread
From: David Masover @ 2006-09-16 8:59 UTC (permalink / raw)
To: Quinn Harris; +Cc: reiserfs-list
Quinn Harris wrote:
> On Friday 15 September 2006 16:27, David Masover wrote:
>
>>> Not sure if I would be better of trying initng or waiting for upstart
>>> (Ubuntus new init) to get scripts that actually parallel boot. The code
>>> for upstart is very clean and it has the backing of a major distro, so I
>>> have high hopes.
>> Hmm. That sounds kind of cool, but I wonder how it compares to Gentoo's
>> init scripts? I guess I'll have to wait till it hits the one Ubuntu box
>> I have...
> Gentoo default init doesn't paralize well. Not when compared to initng which
> is realitivly easy to get to work on Gentoo.
I'm not sure what initng is, but the way I paralize Gentoo is by setting
a flag in /etc/conf.d/rc:
RC_PARALLEL_STARTUP="yes"
I still don't see a difference between initng and Gentoo's init. I
guess I'd have to install them both. One thing I like about Gentoo's
init is that they are still just shell scripts, and it would take a
minimal amount of code to convert them to/from the old init style.
> The Ubuntu people decided
> initng wansn't powerfull enough (let alone the existing sysvinit). They
> thought it needed a better way to define the bootup sequence during boot. In
> addition to integrate running any task like ACPI events, hotplut, CRON into
> one consistent tool.
> http://www.netsplit.com/blog/work/canonical/upstart.html
Aha, thanks.
Getting offtopic here, but I don't see the comparison I'm looking for.
I see why it's different than launchd -- mostly, that launchd provides
no way of knowing whether we want to wait for a script to run or an app
to start. But I don't know of a way to know when an app has finished
starting, unless it daemonizes itself -- which makes it easy to write a
script that ends when the app has started.
I really don't see the usefulness of making that distinction as far as
dependencies go.
Finally read up on the "event-based system", and I suspect this kind of
thing could probably be an extension to a dependency-based system. I
guess we'll see if initng pulls that off.
>> Wait... Python is more universal than Ruby of Ruby on Rails?
> Both Gentoo and Ubuntu install Python by default but not Ruby. And more
> people at least in the US are familiar with Python. Finally I might use
> Python inotify code (to replace readahead-watch) and the Ruby version is a
> bit alpha and I don't think availible in Gentoo or Ubuntu packages.
Speaking of which, the Perl inotify is broken for me a bit lately, I
need to figure out what's going on. Unfortunately, I don't know if it's
perl or the kernel that's broken...
> I don't know of anything Python does reasonably well that Ruby
> can't do reasonably well (- the performance problem).
This might solve the performance problem:
>> I'm waiting for someone to do a decent
>> implementation of Ruby on something like .NET before I start using it
>> for anything I want to perform well.
If they can do it right, well, it seems like MS wants to replace C++
with C#, thus .NET should perform decently. Mono means it's
cross-platform, or at least, it runs JIT'ed on the platforms I care about.
And hey, once you're on Gentoo or Ubuntu, it doesn't matter much,
really. Install a Ruby app and Ruby becomes a dependency.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Relocating files for faster boot/start-up on reiser(fs/4)
2006-09-15 21:20 ` Quinn Harris
2006-09-15 22:27 ` David Masover
@ 2006-09-18 9:36 ` PFC
2006-09-18 22:32 ` Quinn Harris
1 sibling, 1 reply; 14+ messages in thread
From: PFC @ 2006-09-18 9:36 UTC (permalink / raw)
To: reiserfs-list
Windows already does this.
It has a service which monitors filesystem usage, and writes data to disk
; the defragmenter uses this data to lay the files on disk so that boot is
very fast.
However think it optimizes only the time to the login screen ; so windows
boot is extremely fast ; of course, once you have logged in, you have to
wait forever until all the crap system tray apps launch themselves and eat
all your RAM...
My own repacker is very simple, and it handles any filesystem !
- boot from Kanotix CD
- tar cv /mnt/my_disk | lzop -c | ssh -c blowfish other_machine "cat
>backup.tar.lzo"
- umount, mkfs, mount
- ssh -c blowfish other_machine "cat backup.tar.lzo" | lzop -cd | tar xv
You can also use an USB disk, or other disks in the machine. The effects
are pretty visible.
I do the first part often to make a full disk backup to a USB harddrive.
Anyway, IMHO the best way to have a super responsive system would be :
Have a daemon which monitors which files, or parts of files, are read, and
in what context :
- boot to fully loaded KDE/Gnome
- launching an application
You already have this apparently...
The repacker would then use this information to lay files on disk (just
like windows does).
Then, the daemon would trigger readahead, when booting or detecting the
launch of an application, and read everything in (which would be a nice
sequential read)...
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Relocating files for faster boot/start-up on reiser(fs/4)
2006-09-18 9:36 ` PFC
@ 2006-09-18 22:32 ` Quinn Harris
0 siblings, 0 replies; 14+ messages in thread
From: Quinn Harris @ 2006-09-18 22:32 UTC (permalink / raw)
To: reiserfs-list
On Monday 18 September 2006 03:36, PFC wrote:
> Windows already does this.
I am familiar with this, it was in part the inspiration for this idea.
>
> It has a service which monitors filesystem usage, and writes data to disk
> ; the defragmenter uses this data to lay the files on disk so that boot is
> very fast.
Yep, my tests suggest for Ubuntu boot that typical improvements will be on the
order of 5-15s (substantially dependent on how fragmented the file system is
and complexity of boot). The system needs about 70MB of data to boot.
Typical fragmentation on a fresh filesystem will cause 2-3x slowdown over
perfect packing. 70MB at 20MB/s is 2.5s so our gain would be 5s. The
improvement wasn't as good as I was originally expecting but still very
measurable.
I am not sure this is enough of a win to justify integrating this idea in a
major distro. But without a
>
> However think it optimizes only the time to the login screen ; so windows
> boot is extremely fast ; of course, once you have logged in, you have to
> wait forever until all the crap system tray apps launch themselves and eat
> all your RAM...
In practice what I am doing improves system responsiveness. This is because I
am measuring all file access to an idle state then reallocating and
preloading all that data. Gnome still has a bit of data to page in after the
menus are available (like menu icons)
>
> My own repacker is very simple, and it handles any filesystem !
>
> - boot from Kanotix CD
> - tar cv /mnt/my_disk | lzop -c | ssh -c blowfish other_machine "cat
>
> >backup.tar.lzo"
>
> - umount, mkfs, mount
> - ssh -c blowfish other_machine "cat backup.tar.lzo" | lzop -cd | tar xv
>
> You can also use an USB disk, or other disks in the machine. The effects
> are pretty visible.
> I do the first part often to make a full disk backup to a USB harddrive.
>
> Anyway, IMHO the best way to have a super responsive system would be :
>
Are you essentially copying all the files off a file system, recreating a
fresh fs and writing them back? This allows the file system to reconstruct
the files in a good defraged order. Problem is it takes along time and takes
the system down (or at least the fs). The Reiser4 repacker would have the
same effect but without any downtime. Windows defrag does the same.
This doesn't place files used at boot right next to each other on disk like
what I am doing, but both our approaches reduces fragmentation.
My script can do its thing in about 10s (for just one thing like better
bootup) and without disrupting the running system. (Actually I have seen it
break some things, but I expect this can be mitigated by avoiding repacking
open files)
> Have a daemon which monitors which files, or parts of files, are read, and
> in what context :
>
> - boot to fully loaded KDE/Gnome
> - launching an application
>
> You already have this apparently...
>
> The repacker would then use this information to lay files on disk (just
> like windows does).
The python script I am using to measure fragmentation shows that what I am
doing results in packing that is within 1% of idea in terms of disk read
time. This only works well on a fresh filesystem with large contiguous
unallocated space but not so well on a heavily fragmented fs.
This suggests for reiser (maybe other fs), a special kernel or defrag tool for
this type of reallocation is unnecessary.
>
> Then, the daemon would trigger readahead, when booting or detecting the
> launch of an application, and read everything in (which would be a nice
> sequential read)...
I hope in a good time to write a python script that allows the user to do
something like "reallocate.py oowriter". It will profile Open Office writer
on startup using inotify to generating a readahead log. Then reallocate with
the copy trick. If the command is run again, it will readahead the files for
oowriter and launch the app. I think this could double cold start times.
Part from just the readahead and part from the reallocation.
It is evident that the inotify file monitoring (using Ubuntu readahead-watch)
is not catching everything. If I readahead for oowriter then lanuch it, it
takes 6-7s. But it takes 3.6 to start after immediately closing it. In
principal, I would think the readahead should be able to load all files
oowriter needs into cache getting the 3.6s time after readahead.
Also this doesn't monitor access patterns within a file. How much of the data
in a file is really needed? I am not sure I would be able to pack only parts
of files, but I might with sparse files.
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2006-09-18 22:32 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-09-13 20:51 Relocating files for faster boot/start-up on reiser(fs/4) Quinn Harris
2006-09-13 21:10 ` Peter
2006-09-14 3:10 ` Quinn Harris
2006-09-14 19:55 ` David Masover
2006-09-14 22:09 ` Quinn Harris
2006-09-14 22:23 ` David Masover
2006-09-15 5:15 ` Toby Thain
2006-09-15 21:20 ` Quinn Harris
2006-09-15 22:27 ` David Masover
2006-09-16 0:01 ` Quinn Harris
2006-09-16 8:59 ` David Masover
2006-09-18 9:36 ` PFC
2006-09-18 22:32 ` Quinn Harris
2006-09-14 14:01 ` cmaurand
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.