* Relocating files for faster boot/start-up on reiser(fs/4)
@ 2006-09-13 20:51 Quinn Harris
2006-09-13 21:10 ` Peter
0 siblings, 1 reply; 14+ messages in thread
From: Quinn Harris @ 2006-09-13 20:51 UTC (permalink / raw)
To: reiserfs-list
I have been playing around with relocating file data to improve boot time and
app start-up time (like OpenOffice) on reiser(fs/4). This is done by
monitoring the files accessed during boot/start-up then copying these files
into a single directory with sequential names 0001 0002 ... matching the
access order. Finally the new files are hard linked (rename should work too)
to the same location as the original files.
As I understand it both reiserfs and reiser4 assign keys to items based on the
file name and the parent directory. The file system then attempts to match
block order with key order . This allows the above trick to work for placing
files in a specific order next to each other on disk.
I am using readahead-watch on Ubuntu. This little tool uses inotify to
monitor all file accesses while it runs. The accessed files are written to a
text file by disk order. I have modified this tool to also write them by
access time. I then use a script (ruby) to do the above copy and link using
the output from readahead-watch.
I have done some tests on my Athlon 2200 laptop running reiserfs. Hard drive
is a 40GB Hitachi Travelstar 80GB has a max real Tx of 25MB/s and access time
of 12ms.
The reiserfs partition size is 36G with 8.9G used.
I used readahead-watch to create a readahead log during boot on Ubuntu Edgy
much like the default configuration with the "profile" boot option except set
to record by access time and I manually killed it after the system fully
booted. The with this log used for readahead the system booted in 2:15 from
grub load to usable desktop (auto login) as measured manually by a stop
watch. After running the relocate script the boot time with the same
readahead log was 1:38. I then reran the readahead-watch during boot set to
sort by disk order, resulting in a boot time of 1:15. I booted twice for
each test to make sure the results were within a few seconds.
I also used bootchart, but this didn't measure Gnome start-up and requires a
bit of ambition to analyze thoroughly. But it was evident that running the
relocate script did increase peek disk throughput from 6MB/s to 13MB/s and
increased the averate throughput rate. But most of boot time is still spent
waiting on the disk. My relocate script relocated 310Mb of files. If those
where perfectly contiguous on disk, this drive should be able to load that in
under 20s. Thought I expect only a fraction of that is actually accessed
during boot.
Using 'filefrag' it is evident that the relocate scripts attempt to relocate
the file continuously was a bit half assed, but from the boot times it was
clearly an improvement.
I also used readahead-watch to monitor the accessed files of openoffice writer
on startup. The initial cold start time was 17s (about 0.5s variation from
load to load). A warm start (start right after its closed) was 3.6s. The
results from readahead-watch where filtered through a script to remove all
files that where open when openoffice wasn't running (using fuser). Running
the relocate script on some of the X and gnome libraries broke my system
nicely until a reboot. After running the relocate script the cold start time
became 14s. When readahead-list is run on the same files relocated before
starting openoffice the load time was 6.5s. sudo sh -c "echo 1
> /proc/sys/vm/drop_caches" was used to ensure the disk was read between
runs.
Of course, these results are highly dependent on how fragmented the files
where before and how effectively the relocation worked. I expect others
could reproduce speedups but how much will vary. I did these tests on my
laptop with a slow hard drive so the results would be more evident.
I also did some test with fresh reiserfs, reiser4, and ext3 on a 100MB
loopback to see how well the file system would take the hint to order data
sequentially. Creating 10 5MB files with sequential names on reiser4
resulted in one fragment (measured by 'filefrag') for the whole bunch
probably a disk allocation bitmap, nearly perfect. reiserfs generally would
end up with 3-4 fragments for the same test. And ext3 didn't appear to make
any real attempt to order the files sequentially on disk.
I have a 29GB reiser4 partition with 21GB used I have been running for a few
years now (sometime before release). When I ran the same 10 5MB file test on
it, the total resulted in 1000+ fragments (didn't bother to count, but it was
a lot). But the files where allocated head to tail. Its a bit scary to
think the file system can't find a few MB unallocated region on disk.
Clearly a repacker would be really nice.
Relocating file data to match pre-measured access patterns can clearly make a
big performance difference. Reiser(fs/4) provides an easy mechanism to hint
at disk order which can be used to measurably improve boot/startup times.
But, I expect more can be done to achieve better results. This includes
better measurements of read patterns and better allocation of the data.
I hope to rerun these tests with Reiser4 (maybe 4.1) on the same hardware. I
expect with a fresh (not fragmented) Reiser4 partition, the improvements will
be more pronounced. But a repacker should allow more reproducible results
and nearly perfect data placement for boot and app start-up.
I hope with reiser4, relocation, and the new upstart (Ubuntu's sysvinit
replacement) with good scripts, I will get this system to boot to usable in
30 seconds. And slowoffice (aka openoffice) to load in 6s cold. Am I
overoptimistic?
What about a mechanism to explicitly set or hint at item keys? Maybe someday,
linux packages could include preferred file order information that a file
system like reiser4 could use to order the files on disk resulting in fast
load times without the need for the user to profile the app.
I think there is a lot to be said for measuring access patterns and using that
to set keys in addition to deducing it from semantics using a fibration
plug-in.
Thoughts?
--
Quinn Harris
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: Relocating files for faster boot/start-up on reiser(fs/4) 2006-09-13 20:51 Relocating files for faster boot/start-up on reiser(fs/4) Quinn Harris @ 2006-09-13 21:10 ` Peter 2006-09-14 3:10 ` Quinn Harris 2006-09-14 14:01 ` cmaurand 0 siblings, 2 replies; 14+ messages in thread From: Peter @ 2006-09-13 21:10 UTC (permalink / raw) To: reiserfs-list-nJ1KrdHEGnBBDgjK7y7TUQ On Wed, 13 Sep 2006 14:51:39 -0600, Quinn Harris wrote: > > Thoughts? > Yes. Why on earth would you do this? By copying the files and renaming and hardlinking them is nothing a sysadmin would ever do. Just by copying you are allowing reiser to optimize the dir. You're trying to duplicate what a tree-based design does automatically. Moreover, remember that reiser packs files into clusters so that you may read more than just your one file from time to time which could end up adding time to your test. If reiser needs speedup it certainly won't be done by renaming files! JM$0.02 -- Peter +++++ Do not reply to this email, it is a spam trap and not monitored. I can be reached via this list, or via jabber: pete4abw at jabber.org ICQ: 73676357 ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Relocating files for faster boot/start-up on reiser(fs/4) 2006-09-13 21:10 ` Peter @ 2006-09-14 3:10 ` Quinn Harris 2006-09-14 19:55 ` David Masover 2006-09-14 14:01 ` cmaurand 1 sibling, 1 reply; 14+ messages in thread From: Quinn Harris @ 2006-09-14 3:10 UTC (permalink / raw) To: reiserfs-list Peter, I think you misunderstood what and why I was doing this. Let me try to clarify. My test is far from perfect. Its mearly an exercise to verify the basic idea. > Just by copying you are allowing reiser to optimize the dir. Exactly, but I am copying in a way that implicitly suggests what order those files will be accessed in. I was attempting to reorder the data on disk to minimize disk seeks with knowledge of the order that data will be accessed. This was done by taking advantage of the way reiser assigns keys to files based on their name and its affinity to match key order with block order. > You're trying to duplicate what a tree-based design does automatically. This works because of the tree-based design of reiser. The reiser must assign each file (item actually) some key, why not take advantage of knowledge of the order those items will be accessed in? The current key assignment algorithm is a best guess at that given the limited information it has (file/directory name). Remember key assignment roughly translates to on disk position. The relocate script can leave the file system in the exact same state from a semantic standpoint (what files and directories are there) but relocate the data on disk. Copying those files to single directory with numeric names was a kludge to implicitly tell the file system to place those files in a specific order and near each other on disk. The rename step is to switch the old unoptimized file position with the new more optimized position. > Moreover, remember that reiser packs > files into clusters so that you may read more than just your one file from > time to time which could end up adding time to your test. The boot optimization was over 3885 files. Ideally those files would be ordered head to tail in a sequence that perfectly matches the order they will be read. As a result multiple items in a node will all need to be read at nearly the same time. That didn't happen in my test, but it was much closer to that after I ran the relocate script than before. Hence the performance improvement. With this script, reiser4 and a repacker I have reason to believe the ordering will be nearly perfect. Of course, that is excluding random access patterns inside the same file and the directory data needed to get at the files. This basic technique can be made into a boot script much like the readahead script already in Ubuntu, just improved. Boot once with a profile option, it measures read patterns (already does this), then reorders data on disk with this trick, or maybe something better. Then the next time you boot its 1.5-2x faster. Better yet, including this profile information in the distro packages. When a package is installed this info is used to help assign item keys resulting in a better disk layout and faster boot times and no weird file copy rename mumbo jumbo. I bring this up here because I expect with reiser4, a repacker, and this trick, reiser4 could deliver at least 50% better reproducible real world boot and app load performance than any other file system. At least until other file system implement something similar, like what MS did with XP. Can something similar be done (or has been) on ext(2/3/4), XFS, JFS or other linux file systems? Windows XP boots much faster than Windows 2000 in part because it does what I am talking about. File access is recorded at boot, then the disk is defraged with this knowledge. Check out http://msdn.microsoft.com/msdnmag/issues/01/12/xpkernel/default.aspx under "Prefetch". Also look at http://kerneltrap.org/node/2157 MS's implementation required implementing a defrag utility with a specific feature that could position disk data based on access logs. Reiser4 can do the same thing as part of its basic functionality with the addition of a much much simpler tool to help assign keys based on that access log. Then a repacker (when it devaporizes) can further optimize for that access pattern without any code specific to that purpose. Seems like good orthogonal design to me. Hope that clarifies. Like my previous post, whatever it did, it did it in way to many words. On Wednesday 13 September 2006 15:10, Peter wrote: > On Wed, 13 Sep 2006 14:51:39 -0600, Quinn Harris wrote: > > Thoughts? > > Yes. Why on earth would you do this? By copying the files and renaming and > hardlinking them is nothing a sysadmin would ever do. Just by copying you > are allowing reiser to optimize the dir. You're trying to duplicate what a > tree-based design does automatically. Moreover, remember that reiser packs > files into clusters so that you may read more than just your one file from > time to time which could end up adding time to your test. > > If reiser needs speedup it certainly won't be done by renaming files! > > JM$0.02 -- Quinn Harris ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Relocating files for faster boot/start-up on reiser(fs/4) 2006-09-14 3:10 ` Quinn Harris @ 2006-09-14 19:55 ` David Masover 2006-09-14 22:09 ` Quinn Harris 0 siblings, 1 reply; 14+ messages in thread From: David Masover @ 2006-09-14 19:55 UTC (permalink / raw) To: Quinn Harris; +Cc: reiserfs-list Quinn Harris wrote: > The boot optimization was over 3885 files. Ideally those files would be > ordered head to tail in a sequence that perfectly matches the order they will > be read. > I bring this up here because I expect with reiser4, a repacker, and this Now that you mention it, do you have a control of some sort to prove this isn't just fragmentation? That is, copy the files you're messing with in some random order (that should make booting slower), and benchmark that? I'm guessing a repacker would speed things up much more than this, although this is interesting and helpful for systems which need to be rebooted often. (I'd prefer a working suspend2...) ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Relocating files for faster boot/start-up on reiser(fs/4) 2006-09-14 19:55 ` David Masover @ 2006-09-14 22:09 ` Quinn Harris 2006-09-14 22:23 ` David Masover 0 siblings, 1 reply; 14+ messages in thread From: Quinn Harris @ 2006-09-14 22:09 UTC (permalink / raw) To: reiserfs-list On Thursday 14 September 2006 13:55, David Masover wrote: > Quinn Harris wrote: > > The boot optimization was over 3885 files. Ideally those files would be > > ordered head to tail in a sequence that perfectly matches the order they > > will be read. > > > > I bring this up here because I expect with reiser4, a repacker, and this > > Now that you mention it, do you have a control of some sort to prove > this isn't just fragmentation? That is, copy the files you're messing > with in some random order (that should make booting slower), and > benchmark that? That is a good point. Recording the disk layout before and after to compare relative fragmentation would be a good idea. As well as randomizing the sequence as a sanity check. Also note that during boot I was using readahead on all 3885 files. So the kernel has a good opportunity to rearrange the reads. And the read sequence doesn't necessary match the order its needed (though I tried to get that). From what I have done, it would be unreasonable to say anything other than, this has a good change of working. I really think this should work in theory. That's why I tried it in the first place and required relatively little data to convince myself I am not smoking crack. A simple test with reiser4 suggests that it will pack the files right up next to each other in the specified sequence. Take a look at the section "If It Is In RAM, Dirty, and Contiguous, Then Squeeze It ALL Together Just Before Writing" on www.namesys.com. I will redo this on a fresh reiser4 filesystem to reduce initial fragmentation and run some test to eliminate reasonable confounds. > > I'm guessing a repacker would speed things up much more than this, > although this is interesting and helpful for systems which need to be > rebooted often. (I'd prefer a working suspend2...) This idea complements a repacker. The core idea is to cause the fs to order files based on recorded access patterns that will likely happen again (instead of just the location in the directory tree). A repacker would ensure files are placed in the exact order back to back as we expect them to be accessed. File system fragmentation will substantially inhibit this from happening. That said, this is causing a basic form of repacking. Yeah a good suspend will probably result in faster boot no matter how many tricks you use to improve normal boot, but this could also improve cold load times for apps like Open Office and Firefox. In addition suspend drops the disk cache. Profiling access patterns on resuming from suspend then relocating based on that might improve restore times. Any time file access is very predictable, and doesn't closely follow the default key assignment order, this could be improve performance. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Relocating files for faster boot/start-up on reiser(fs/4) 2006-09-14 22:09 ` Quinn Harris @ 2006-09-14 22:23 ` David Masover 2006-09-15 5:15 ` Toby Thain 0 siblings, 1 reply; 14+ messages in thread From: David Masover @ 2006-09-14 22:23 UTC (permalink / raw) To: Quinn Harris; +Cc: reiserfs-list Quinn Harris wrote: > On Thursday 14 September 2006 13:55, David Masover wrote: >> Quinn Harris wrote: >>> The boot optimization was over 3885 files. Ideally those files would be >>> ordered head to tail in a sequence that perfectly matches the order they >>> will be read. >>> >>> I bring this up here because I expect with reiser4, a repacker, and this >> Now that you mention it, do you have a control of some sort to prove >> this isn't just fragmentation? That is, copy the files you're messing >> with in some random order (that should make booting slower), and >> benchmark that? > > That is a good point. Recording the disk layout before and after to compare > relative fragmentation would be a good idea. As well as randomizing the > sequence as a sanity check. > > Also note that during boot I was using readahead on all 3885 files. So the > kernel has a good opportunity to rearrange the reads. And the read sequence > doesn't necessary match the order its needed (though I tried to get that). Speaking of which, did you parallize the boot process at all? I'd estimate my system easily spent more than 50% of its boot time not touching the disk at all before I did that. Gentoo can do this, I'm not sure what else, as it kind of needs your init system to understand dependencies. As far as faster load times for Firefox and OpenOffice, you may be on to something here, but then, these apps probably match up pretty well with the on-disk format, too. This would probably be most useful for a case where you have to read a lot of small files, with relatively low CPU usage, in a fairly unpredictable order. I suspect it would be nice for Gentoo's Portage tree, though I can't think what else. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Relocating files for faster boot/start-up on reiser(fs/4) 2006-09-14 22:23 ` David Masover @ 2006-09-15 5:15 ` Toby Thain 2006-09-15 21:20 ` Quinn Harris 0 siblings, 1 reply; 14+ messages in thread From: Toby Thain @ 2006-09-15 5:15 UTC (permalink / raw) To: David Masover; +Cc: ReiserFS List On 14-Sep-06, at 6:23 PM, David Masover wrote: > Quinn Harris wrote: >> On Thursday 14 September 2006 13:55, David Masover wrote: >>> ... >> That is a good point. Recording the disk layout before and after >> to compare relative fragmentation would be a good idea. As well >> as randomizing the sequence as a sanity check. >> Also note that during boot I was using readahead on all 3885 >> files. So the kernel has a good opportunity to rearrange the >> reads. And the read sequence doesn't necessary match the order >> its needed (though I tried to get that). > > Speaking of which, did you parallize the boot process at all? Just off the top of my head, wouldn't that make the access sequence asynchronous & thereby less predictable? (Although I'm sure it's a net win.) > I'd estimate my system easily spent more than 50% of its boot time > not touching the disk at all before I did that. Gentoo can do > this, I'm not sure what else, as it kind of needs your init system > to understand dependencies. ... ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Relocating files for faster boot/start-up on reiser(fs/4) 2006-09-15 5:15 ` Toby Thain @ 2006-09-15 21:20 ` Quinn Harris 2006-09-15 22:27 ` David Masover 2006-09-18 9:36 ` PFC 0 siblings, 2 replies; 14+ messages in thread From: Quinn Harris @ 2006-09-15 21:20 UTC (permalink / raw) To: reiserfs-list On Thursday 14 September 2006 23:15, Toby Thain wrote: > On 14-Sep-06, at 6:23 PM, David Masover wrote: > > Quinn Harris wrote: > >> On Thursday 14 September 2006 13:55, David Masover wrote: > >>> ... > >> > >> That is a good point. Recording the disk layout before and after > >> to compare relative fragmentation would be a good idea. As well > >> as randomizing the sequence as a sanity check. > >> Also note that during boot I was using readahead on all 3885 > >> files. So the kernel has a good opportunity to rearrange the > >> reads. And the read sequence doesn't necessary match the order > >> its needed (though I tried to get that). > > > > Speaking of which, did you parallize the boot process at all? > > Just off the top of my head, wouldn't that make the access sequence > asynchronous & thereby less predictable? (Although I'm sure it's a > net win.) It could, but the kernel will try to reorder the outstanding block requests to reduce seek. If that is an overall win I don't know. In addition early in the boot, readahead-list or similar will tell the kernel to start reading most of the files need for the complete boot so they are already in memory when needed. Ubuntu does the readahead now and all my tests where with readahead. > > > I'd estimate my system easily spent more than 50% of its boot time > > not touching the disk at all before I did that. Gentoo can do > > this, I'm not sure what else, as it kind of needs your init system > > to understand dependencies. > > ... My first test turned out to be on a heavily fragmented file system. I reinstalled Ubuntu Dapper with a fresh reiserfs file system and it booted in 1:07 (grub to desktop background appearing). After extending the time readahead-watch monitors files and running the reallocate script it now boots in 0:50. I wrote a little python script that uses the FIBMAP ioctl to check the blocks the files are using. From this I know the relocate script on this fresh file system is doing exactly what it was intended to do. I am also able to estimate how much it will improve performance by comparing the fragmentation before and after its run. I have learned that the delays on disk io for Ubuntu boot are dominated by rotational latency and not head seeks. The current readahead implementation orders the files by on disk location, substantially mitigating head seek time. But the latency is can easily double the time needed to load the same data. Subjectively (and objectively by about 6s) relocation and extending readahead-watch substantially improved Gnome boot and initial responsiveness. But, I need to measure how much of this was caused by just extending how much is read ahead vs. the reallocation. The current Ubuntu boot waits for hardware probing, DHCP and other things giving the disk readahead a chance to work. I think this reallocation might help a parallel boot more as the data will be needed sooner. So I changed my mind, I think parallel boot will highlight the reallocate advantage. Now I just need to test the hypothesis. Not sure if I would be better of trying initng or waiting for upstart (Ubuntus new init) to get scripts that actually parallel boot. The code for upstart is very clean and it has the backing of a major distro, so I have high hopes. Much like before, I was able to improve a 16.5s oowriter cold start to 14s with this reallocate script, with a cold start of 4.8s (OO 2.0.2, was using 2.0.3 before). It is evident to me that the readahead-watch is missing something on Open Office startup. It seems very possible to get OO to cold start in under 8s with the uses of reallocation and readahead right when it starts. My current scripts are at http://www.quinnh.org/reallocate.py (27 line reallocate script, expects dir /tmp/refrag to exist and takes the readahead-watch log as a paramater) http://www.quinnh.org/measure.py (uses FIBMAP to estimate the time needed to load the files in the passed readahead-watch log, uses average seek and and latency for estimate) http://www.quinnh.org/readahead-watch-time-order.patch (Patch against Ubuntu readahead-watch to add an order by access time option.) I will try to write a nice unified script that will profile, reallocate and do readahead for an application to speed it up. e.g. "# reallocate.py oowriter". Run it once to profile and reallocate. drop_caches, Run it again and oowriter loads faster. I think Python will be the best language for this because its become relatively universal and its easy to understand for the uninitiated. This really isn't black magic so transparency is good. I personally prefer Ruby though. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Relocating files for faster boot/start-up on reiser(fs/4) 2006-09-15 21:20 ` Quinn Harris @ 2006-09-15 22:27 ` David Masover 2006-09-16 0:01 ` Quinn Harris 2006-09-18 9:36 ` PFC 1 sibling, 1 reply; 14+ messages in thread From: David Masover @ 2006-09-15 22:27 UTC (permalink / raw) To: Quinn Harris; +Cc: reiserfs-list Quinn Harris wrote: > On Thursday 14 September 2006 23:15, Toby Thain wrote: >> On 14-Sep-06, at 6:23 PM, David Masover wrote: >>> Quinn Harris wrote: >>>> On Thursday 14 September 2006 13:55, David Masover wrote: >>>>> ... >>>> That is a good point. Recording the disk layout before and after >>>> to compare relative fragmentation would be a good idea. As well >>>> as randomizing the sequence as a sanity check. >>>> Also note that during boot I was using readahead on all 3885 >>>> files. So the kernel has a good opportunity to rearrange the >>>> reads. And the read sequence doesn't necessary match the order >>>> its needed (though I tried to get that). >>> Speaking of which, did you parallize the boot process at all? >> Just off the top of my head, wouldn't that make the access sequence >> asynchronous & thereby less predictable? (Although I'm sure it's a >> net win.) > It could, but the kernel will try to reorder the outstanding block requests to > reduce seek. If that is an overall win I don't know. In addition early in > the boot, readahead-list or similar will tell the kernel to start reading > most of the files need for the complete boot so they are already in memory > when needed. Ubuntu does the readahead now and all my tests where with > readahead. That's interesting. I think either parallizing or a very aggressive readahead will perform similarly, except in cases where you have a script blocking on something other than disk or CPU, like, say, network. >>> I'd estimate my system easily spent more than 50% of its boot time >>> not touching the disk at all before I did that. Gentoo can do >>> this, I'm not sure what else, as it kind of needs your init system >>> to understand dependencies. >> ... > > The current Ubuntu boot waits for hardware probing, DHCP and other things > giving the disk readahead a chance to work. I think this reallocation might > help a parallel boot more as the data will be needed sooner. So I changed my > mind, I think parallel boot will highlight the reallocate advantage. Now I > just need to test the hypothesis. Hmm. That's possible. But again, even with the parallel boot, there was still a bit of time spent not touching the disk, so I wouldn't expect much more of a speedup than what you already have. Which also means, by the way, that I wouldn't use it much -- my system takes more like 20 seconds from Grub to a login prompt, and from then on, the only things that take more than 5 seconds to load are games. Since I know Quake 4 uses zipfiles (probably compressed) for its storage, and I watched the HD LED while it loads, I don't think I can speed that up at all short of buying a faster CPU. Well, that and the Portage tree, but you say I shouldn't expect much from that. Maybe the portage cache? > Not sure if I would be better of trying initng or waiting for upstart (Ubuntus > new init) to get scripts that actually parallel boot. The code for upstart > is very clean and it has the backing of a major distro, so I have high hopes. Hmm. That sounds kind of cool, but I wonder how it compares to Gentoo's init scripts? I guess I'll have to wait till it hits the one Ubuntu box I have... > Much like before, I was able to improve a 16.5s oowriter cold start to 14s > with this reallocate script, with a cold start of 4.8s (OO 2.0.2, was using > 2.0.3 before). Wait -- cold start is 14s, but it's also 4.8s? Did you mean warm/hot start for that last number? > I think Python will be the best language for this because its become > relatively universal and its easy to understand for the uninitiated. This > really isn't black magic so transparency is good. I personally prefer Ruby > though. Wait... Python is more universal than Ruby of Ruby on Rails? Python is faster, anyway... I'm waiting for someone to do a decent implementation of Ruby on something like .NET before I start using it for anything I want to perform well. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Relocating files for faster boot/start-up on reiser(fs/4) 2006-09-15 22:27 ` David Masover @ 2006-09-16 0:01 ` Quinn Harris 2006-09-16 8:59 ` David Masover 0 siblings, 1 reply; 14+ messages in thread From: Quinn Harris @ 2006-09-16 0:01 UTC (permalink / raw) To: reiserfs-list On Friday 15 September 2006 16:27, David Masover wrote: > > Not sure if I would be better of trying initng or waiting for upstart > > (Ubuntus new init) to get scripts that actually parallel boot. The code > > for upstart is very clean and it has the backing of a major distro, so I > > have high hopes. > > Hmm. That sounds kind of cool, but I wonder how it compares to Gentoo's > init scripts? I guess I'll have to wait till it hits the one Ubuntu box > I have... Gentoo default init doesn't paralize well. Not when compared to initng which is realitivly easy to get to work on Gentoo. The Ubuntu people decided initng wansn't powerfull enough (let alone the existing sysvinit). They thought it needed a better way to define the bootup sequence during boot. In addition to integrate running any task like ACPI events, hotplut, CRON into one consistent tool. http://www.netsplit.com/blog/work/canonical/upstart.html > > > Much like before, I was able to improve a 16.5s oowriter cold start to > > 14s with this reallocate script, with a cold start of 4.8s (OO 2.0.2, was > > using 2.0.3 before). > > Wait -- cold start is 14s, but it's also 4.8s? Did you mean warm/hot > start for that last number? OOPS its 4.8s warm and was initially 16.5s cold then 14s cold after reallocationg. > > > I think Python will be the best language for this because its become > > relatively universal and its easy to understand for the uninitiated. > > This really isn't black magic so transparency is good. I personally > > prefer Ruby though. > > Wait... Python is more universal than Ruby of Ruby on Rails? Both Gentoo and Ubuntu install Python by default but not Ruby. And more people at least in the US are familiar with Python. Finally I might use Python inotify code (to replace readahead-watch) and the Ruby version is a bit alpha and I don't think availible in Gentoo or Ubuntu packages. I came to Ruby through RoR. I think the language has an unmatched pragmatic eligance. This isn't appreciated until one addresses a few problem domains with it. I don't know of anything Python does reasonably well that Ruby can't do reasonably well (- the performance problem). On the other hand, I doubt Python could make for something as slick as Rake http://www.martinfowler.com/articles/rake.html. And Ruby provides a wealth of conviences and shortcuts without being the lexical mess that is Perl. I could be missing something though. But for this particular problem Python isn't bad. > > Python is faster, anyway... I'm waiting for someone to do a decent > implementation of Ruby on something like .NET before I start using it > for anything I want to perform well. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Relocating files for faster boot/start-up on reiser(fs/4) 2006-09-16 0:01 ` Quinn Harris @ 2006-09-16 8:59 ` David Masover 0 siblings, 0 replies; 14+ messages in thread From: David Masover @ 2006-09-16 8:59 UTC (permalink / raw) To: Quinn Harris; +Cc: reiserfs-list Quinn Harris wrote: > On Friday 15 September 2006 16:27, David Masover wrote: > >>> Not sure if I would be better of trying initng or waiting for upstart >>> (Ubuntus new init) to get scripts that actually parallel boot. The code >>> for upstart is very clean and it has the backing of a major distro, so I >>> have high hopes. >> Hmm. That sounds kind of cool, but I wonder how it compares to Gentoo's >> init scripts? I guess I'll have to wait till it hits the one Ubuntu box >> I have... > Gentoo default init doesn't paralize well. Not when compared to initng which > is realitivly easy to get to work on Gentoo. I'm not sure what initng is, but the way I paralize Gentoo is by setting a flag in /etc/conf.d/rc: RC_PARALLEL_STARTUP="yes" I still don't see a difference between initng and Gentoo's init. I guess I'd have to install them both. One thing I like about Gentoo's init is that they are still just shell scripts, and it would take a minimal amount of code to convert them to/from the old init style. > The Ubuntu people decided > initng wansn't powerfull enough (let alone the existing sysvinit). They > thought it needed a better way to define the bootup sequence during boot. In > addition to integrate running any task like ACPI events, hotplut, CRON into > one consistent tool. > http://www.netsplit.com/blog/work/canonical/upstart.html Aha, thanks. Getting offtopic here, but I don't see the comparison I'm looking for. I see why it's different than launchd -- mostly, that launchd provides no way of knowing whether we want to wait for a script to run or an app to start. But I don't know of a way to know when an app has finished starting, unless it daemonizes itself -- which makes it easy to write a script that ends when the app has started. I really don't see the usefulness of making that distinction as far as dependencies go. Finally read up on the "event-based system", and I suspect this kind of thing could probably be an extension to a dependency-based system. I guess we'll see if initng pulls that off. >> Wait... Python is more universal than Ruby of Ruby on Rails? > Both Gentoo and Ubuntu install Python by default but not Ruby. And more > people at least in the US are familiar with Python. Finally I might use > Python inotify code (to replace readahead-watch) and the Ruby version is a > bit alpha and I don't think availible in Gentoo or Ubuntu packages. Speaking of which, the Perl inotify is broken for me a bit lately, I need to figure out what's going on. Unfortunately, I don't know if it's perl or the kernel that's broken... > I don't know of anything Python does reasonably well that Ruby > can't do reasonably well (- the performance problem). This might solve the performance problem: >> I'm waiting for someone to do a decent >> implementation of Ruby on something like .NET before I start using it >> for anything I want to perform well. If they can do it right, well, it seems like MS wants to replace C++ with C#, thus .NET should perform decently. Mono means it's cross-platform, or at least, it runs JIT'ed on the platforms I care about. And hey, once you're on Gentoo or Ubuntu, it doesn't matter much, really. Install a Ruby app and Ruby becomes a dependency. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Relocating files for faster boot/start-up on reiser(fs/4) 2006-09-15 21:20 ` Quinn Harris 2006-09-15 22:27 ` David Masover @ 2006-09-18 9:36 ` PFC 2006-09-18 22:32 ` Quinn Harris 1 sibling, 1 reply; 14+ messages in thread From: PFC @ 2006-09-18 9:36 UTC (permalink / raw) To: reiserfs-list Windows already does this. It has a service which monitors filesystem usage, and writes data to disk ; the defragmenter uses this data to lay the files on disk so that boot is very fast. However think it optimizes only the time to the login screen ; so windows boot is extremely fast ; of course, once you have logged in, you have to wait forever until all the crap system tray apps launch themselves and eat all your RAM... My own repacker is very simple, and it handles any filesystem ! - boot from Kanotix CD - tar cv /mnt/my_disk | lzop -c | ssh -c blowfish other_machine "cat >backup.tar.lzo" - umount, mkfs, mount - ssh -c blowfish other_machine "cat backup.tar.lzo" | lzop -cd | tar xv You can also use an USB disk, or other disks in the machine. The effects are pretty visible. I do the first part often to make a full disk backup to a USB harddrive. Anyway, IMHO the best way to have a super responsive system would be : Have a daemon which monitors which files, or parts of files, are read, and in what context : - boot to fully loaded KDE/Gnome - launching an application You already have this apparently... The repacker would then use this information to lay files on disk (just like windows does). Then, the daemon would trigger readahead, when booting or detecting the launch of an application, and read everything in (which would be a nice sequential read)... ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Relocating files for faster boot/start-up on reiser(fs/4) 2006-09-18 9:36 ` PFC @ 2006-09-18 22:32 ` Quinn Harris 0 siblings, 0 replies; 14+ messages in thread From: Quinn Harris @ 2006-09-18 22:32 UTC (permalink / raw) To: reiserfs-list On Monday 18 September 2006 03:36, PFC wrote: > Windows already does this. I am familiar with this, it was in part the inspiration for this idea. > > It has a service which monitors filesystem usage, and writes data to disk > ; the defragmenter uses this data to lay the files on disk so that boot is > very fast. Yep, my tests suggest for Ubuntu boot that typical improvements will be on the order of 5-15s (substantially dependent on how fragmented the file system is and complexity of boot). The system needs about 70MB of data to boot. Typical fragmentation on a fresh filesystem will cause 2-3x slowdown over perfect packing. 70MB at 20MB/s is 2.5s so our gain would be 5s. The improvement wasn't as good as I was originally expecting but still very measurable. I am not sure this is enough of a win to justify integrating this idea in a major distro. But without a > > However think it optimizes only the time to the login screen ; so windows > boot is extremely fast ; of course, once you have logged in, you have to > wait forever until all the crap system tray apps launch themselves and eat > all your RAM... In practice what I am doing improves system responsiveness. This is because I am measuring all file access to an idle state then reallocating and preloading all that data. Gnome still has a bit of data to page in after the menus are available (like menu icons) > > My own repacker is very simple, and it handles any filesystem ! > > - boot from Kanotix CD > - tar cv /mnt/my_disk | lzop -c | ssh -c blowfish other_machine "cat > > >backup.tar.lzo" > > - umount, mkfs, mount > - ssh -c blowfish other_machine "cat backup.tar.lzo" | lzop -cd | tar xv > > You can also use an USB disk, or other disks in the machine. The effects > are pretty visible. > I do the first part often to make a full disk backup to a USB harddrive. > > Anyway, IMHO the best way to have a super responsive system would be : > Are you essentially copying all the files off a file system, recreating a fresh fs and writing them back? This allows the file system to reconstruct the files in a good defraged order. Problem is it takes along time and takes the system down (or at least the fs). The Reiser4 repacker would have the same effect but without any downtime. Windows defrag does the same. This doesn't place files used at boot right next to each other on disk like what I am doing, but both our approaches reduces fragmentation. My script can do its thing in about 10s (for just one thing like better bootup) and without disrupting the running system. (Actually I have seen it break some things, but I expect this can be mitigated by avoiding repacking open files) > Have a daemon which monitors which files, or parts of files, are read, and > in what context : > > - boot to fully loaded KDE/Gnome > - launching an application > > You already have this apparently... > > The repacker would then use this information to lay files on disk (just > like windows does). The python script I am using to measure fragmentation shows that what I am doing results in packing that is within 1% of idea in terms of disk read time. This only works well on a fresh filesystem with large contiguous unallocated space but not so well on a heavily fragmented fs. This suggests for reiser (maybe other fs), a special kernel or defrag tool for this type of reallocation is unnecessary. > > Then, the daemon would trigger readahead, when booting or detecting the > launch of an application, and read everything in (which would be a nice > sequential read)... I hope in a good time to write a python script that allows the user to do something like "reallocate.py oowriter". It will profile Open Office writer on startup using inotify to generating a readahead log. Then reallocate with the copy trick. If the command is run again, it will readahead the files for oowriter and launch the app. I think this could double cold start times. Part from just the readahead and part from the reallocation. It is evident that the inotify file monitoring (using Ubuntu readahead-watch) is not catching everything. If I readahead for oowriter then lanuch it, it takes 6-7s. But it takes 3.6 to start after immediately closing it. In principal, I would think the readahead should be able to load all files oowriter needs into cache getting the 3.6s time after readahead. Also this doesn't monitor access patterns within a file. How much of the data in a file is really needed? I am not sure I would be able to pack only parts of files, but I might with sparse files. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Relocating files for faster boot/start-up on reiser(fs/4) 2006-09-13 21:10 ` Peter 2006-09-14 3:10 ` Quinn Harris @ 2006-09-14 14:01 ` cmaurand 1 sibling, 0 replies; 14+ messages in thread From: cmaurand @ 2006-09-14 14:01 UTC (permalink / raw) To: Peter; +Cc: reiserfs-list SCO has done this solution, thats why its such a dog. Peter wrote: > On Wed, 13 Sep 2006 14:51:39 -0600, Quinn Harris wrote: >> Thoughts? >> > > Yes. Why on earth would you do this? By copying the files and renaming and > hardlinking them is nothing a sysadmin would ever do. Just by copying you > are allowing reiser to optimize the dir. You're trying to duplicate what a > tree-based design does automatically. Moreover, remember that reiser packs > files into clusters so that you may read more than just your one file from > time to time which could end up adding time to your test. > > If reiser needs speedup it certainly won't be done by renaming files! > > JM$0.02 > -- Curtis Maurand Senior Network & Systems Engineer BlueTarp Financial, Inc. 443 Congress St. 6th Floor Portland, ME 04101 207.797.5900 x233 (office) 207.797.3833 (fax) mailto:cmaurand@bluetarp.com http://www.bluetarp.com ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2006-09-18 22:32 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-09-13 20:51 Relocating files for faster boot/start-up on reiser(fs/4) Quinn Harris 2006-09-13 21:10 ` Peter 2006-09-14 3:10 ` Quinn Harris 2006-09-14 19:55 ` David Masover 2006-09-14 22:09 ` Quinn Harris 2006-09-14 22:23 ` David Masover 2006-09-15 5:15 ` Toby Thain 2006-09-15 21:20 ` Quinn Harris 2006-09-15 22:27 ` David Masover 2006-09-16 0:01 ` Quinn Harris 2006-09-16 8:59 ` David Masover 2006-09-18 9:36 ` PFC 2006-09-18 22:32 ` Quinn Harris 2006-09-14 14:01 ` cmaurand
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.