All of lore.kernel.org
 help / color / mirror / Atom feed
* Relocating files for faster boot/start-up on reiser(fs/4)
@ 2006-09-13 20:51 Quinn Harris
  2006-09-13 21:10 ` Peter
  0 siblings, 1 reply; 14+ messages in thread
From: Quinn Harris @ 2006-09-13 20:51 UTC (permalink / raw)
  To: reiserfs-list

I have been playing around with relocating file data to improve boot time and 
app start-up time (like OpenOffice) on reiser(fs/4).  This is done by 
monitoring the files accessed during boot/start-up then copying these files 
into a single directory with sequential names 0001 0002 ... matching the 
access order.  Finally the new files are hard linked (rename should work too) 
to the same location as the original files.

As I understand it both reiserfs and reiser4 assign keys to items based on the 
file name and the parent directory.  The file system then attempts to match 
block order with key order .  This allows the above trick to work for placing 
files in a specific order next to each other on disk.

I am using readahead-watch on Ubuntu.  This little tool uses inotify to 
monitor all file accesses while it runs.  The accessed files are written to a 
text file by disk order.  I have modified this tool to also write them by 
access time.  I then use a script (ruby) to do the above copy and link using 
the output from readahead-watch.

I have done some tests on my Athlon 2200 laptop running reiserfs.  Hard drive 
is a 40GB Hitachi Travelstar 80GB has a max real Tx of 25MB/s and access time 
of 12ms.

The reiserfs partition size is 36G with 8.9G used.

I used readahead-watch to create a readahead log during boot on Ubuntu Edgy 
much like the default configuration with the "profile" boot option except set 
to record by access time and I manually killed it after the system fully 
booted.  The with this log used for readahead the system booted in 2:15 from 
grub load to usable desktop (auto login) as measured manually by a stop 
watch.  After running the relocate script the boot time with the same 
readahead log was 1:38.  I then reran the readahead-watch during boot set to 
sort by disk order, resulting in a boot time of 1:15.  I booted twice for 
each test to make sure the results were within a few seconds.

I also used bootchart, but this didn't measure Gnome start-up and requires a 
bit of ambition to analyze thoroughly.  But it was evident that running the 
relocate script did increase peek disk throughput from 6MB/s to 13MB/s and 
increased the averate throughput rate.  But most of boot time is still spent 
waiting on the disk.  My relocate script relocated 310Mb of files.  If those 
where perfectly contiguous on disk, this drive should be able to load that in 
under 20s.  Thought I expect only a fraction of that is actually accessed 
during boot.

Using 'filefrag' it is evident that the relocate scripts attempt to relocate 
the file continuously was a bit half assed, but from the boot times it was 
clearly an improvement.

I also used readahead-watch to monitor the accessed files of openoffice writer 
on startup.  The initial cold start time was 17s (about 0.5s variation from 
load to load).  A warm start (start right after its closed) was 3.6s.  The 
results from readahead-watch where filtered through a script to remove all 
files that where open when openoffice wasn't running (using fuser).  Running 
the relocate script on some of the X and gnome libraries broke my system 
nicely until a reboot.  After running the relocate script the cold start time 
became 14s.  When readahead-list is run on the same files relocated before 
starting openoffice the load time was 6.5s.  sudo sh -c "echo 1 
> /proc/sys/vm/drop_caches" was used to ensure the disk was read between 
runs.

Of course, these results are highly dependent on how fragmented the files 
where before and how effectively the relocation worked.  I expect others 
could reproduce speedups but how much will vary.  I did these tests on my 
laptop with a slow hard drive so the results would be more evident.

I also did some test with fresh reiserfs, reiser4, and ext3 on a 100MB 
loopback to see how well the file system would take the hint to order data 
sequentially.  Creating 10 5MB files with sequential names on reiser4 
resulted in one fragment (measured by 'filefrag') for the whole bunch 
probably a disk allocation bitmap, nearly perfect.  reiserfs generally would 
end up with 3-4 fragments for the same test.  And ext3 didn't appear to make 
any real attempt to order the files sequentially on disk.

I have a 29GB reiser4 partition with 21GB used I have been running for a few 
years now (sometime before release).  When I ran the same 10 5MB file test on 
it, the total resulted in 1000+ fragments (didn't bother to count, but it was 
a lot).  But the files where allocated head to tail.  Its a bit scary to 
think the file system can't find a few MB unallocated region on disk.  
Clearly a repacker would be really nice.

Relocating file data to match pre-measured access patterns can clearly make a 
big performance difference.  Reiser(fs/4) provides an easy mechanism to hint 
at disk order which can be used to measurably improve boot/startup times.  
But, I expect more can be done to achieve better results.  This includes 
better measurements of read patterns and better allocation of the data.

I hope to rerun these tests with Reiser4 (maybe 4.1) on the same hardware.  I 
expect with a fresh (not fragmented) Reiser4 partition, the improvements will 
be more pronounced.  But a repacker should allow more reproducible results 
and nearly perfect data placement for boot and app start-up.

I hope with reiser4, relocation, and the new upstart (Ubuntu's sysvinit 
replacement) with good scripts, I will get this system to boot to usable in 
30 seconds.  And slowoffice (aka openoffice)  to load in 6s cold.  Am I 
overoptimistic?

What about a mechanism to explicitly set or hint at item keys?  Maybe someday, 
linux packages could include preferred file order information that a file 
system like reiser4 could use to order the files on disk resulting in fast 
load times without the need for the user to profile the app.

I think there is a lot to be said for measuring access patterns and using that 
to set keys in addition to deducing it from semantics using a fibration 
plug-in.

Thoughts?

--
Quinn Harris

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Relocating files for faster boot/start-up on reiser(fs/4)
  2006-09-13 20:51 Relocating files for faster boot/start-up on reiser(fs/4) Quinn Harris
@ 2006-09-13 21:10 ` Peter
  2006-09-14  3:10   ` Quinn Harris
  2006-09-14 14:01   ` cmaurand
  0 siblings, 2 replies; 14+ messages in thread
From: Peter @ 2006-09-13 21:10 UTC (permalink / raw)
  To: reiserfs-list-nJ1KrdHEGnBBDgjK7y7TUQ

On Wed, 13 Sep 2006 14:51:39 -0600, Quinn Harris wrote:
> 
> Thoughts?
>

Yes. Why on earth would you do this? By copying the files and renaming and
hardlinking them is nothing a sysadmin would ever do. Just by copying you
are allowing reiser to optimize the dir. You're trying to duplicate what a
tree-based design does automatically. Moreover, remember that reiser packs
files into clusters so that you may read more than just your one file from
time to time which could end up adding time to your test.

If reiser needs speedup it certainly won't be done by renaming files!

JM$0.02

-- 
Peter
+++++
Do not reply to this email, it is a spam trap and not monitored.
I can be reached via this list, or via 
jabber: pete4abw at jabber.org
ICQ: 73676357


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Relocating files for faster boot/start-up on reiser(fs/4)
  2006-09-13 21:10 ` Peter
@ 2006-09-14  3:10   ` Quinn Harris
  2006-09-14 19:55     ` David Masover
  2006-09-14 14:01   ` cmaurand
  1 sibling, 1 reply; 14+ messages in thread
From: Quinn Harris @ 2006-09-14  3:10 UTC (permalink / raw)
  To: reiserfs-list

Peter,

I think you misunderstood what and why I was doing this.  Let me try to 
clarify.

My test is far from perfect.  Its mearly an exercise to verify the basic idea.

> Just by copying you are allowing reiser to optimize the dir.
Exactly, but I am copying in a way that implicitly suggests what order those 
files will be accessed in.

I was attempting to reorder the data on disk to minimize disk 
seeks with knowledge of the order that data will be accessed.  This was done 
by taking advantage of the way reiser assigns keys to files based on their 
name and its affinity to match key order with block order.  

> You're trying to duplicate what a tree-based design does automatically.
This works because of the tree-based design of reiser.

The reiser must assign each file (item actually) some key, why not take 
advantage of knowledge of the order those items will be accessed in?  The 
current key assignment algorithm is a best guess at that given the limited 
information it has (file/directory name).  Remember key assignment roughly 
translates to on disk position.

The relocate script can leave the file system in the exact same state from a 
semantic standpoint (what files and directories are there) but relocate the 
data on disk.  Copying those files to single directory with numeric names was 
a kludge to implicitly tell the file system to place those files in a 
specific order and near each other on disk.  The rename step is to switch the 
old unoptimized file position with the new more optimized position.

> Moreover, remember that reiser packs
> files into clusters so that you may read more than just your one file from
> time to time which could end up adding time to your test.
The boot optimization was over 3885 files.  Ideally those files would be 
ordered head to tail in a sequence that perfectly matches the order they will 
be read.  As a result multiple items in a node will all need to be read at 
nearly the same time.  That didn't happen in my test, but it was much closer 
to that after I ran the relocate script than before.  Hence the performance 
improvement.  With this script, reiser4 and a repacker I have reason to 
believe the ordering will be nearly perfect.  Of course, that is excluding 
random access patterns inside the same file and the directory data needed to 
get at the files.

This basic technique can be made into a boot script much like the readahead 
script already in Ubuntu, just improved.  Boot once with a profile option, it 
measures read patterns (already does this), then reorders data on disk with 
this trick, or maybe something better.  Then the next time you boot its 
1.5-2x faster.  Better yet, including this profile information in the distro 
packages.  When a package is installed this info is used to help assign item 
keys resulting in a better disk layout and faster boot times and no weird 
file copy rename mumbo jumbo.

I bring this up here because I expect with reiser4, a repacker, and this 
trick, reiser4 could deliver at least 50% better reproducible real world boot 
and app load performance than any other file system.  At least until other 
file system implement something similar, like what MS did with XP.  Can 
something similar be done (or has been) on ext(2/3/4), XFS, JFS or other 
linux file systems?

Windows XP boots much faster than Windows 2000 in part because it does what I 
am talking about.  File access is recorded at boot, then the disk is defraged 
with this knowledge.  Check out
http://msdn.microsoft.com/msdnmag/issues/01/12/xpkernel/default.aspx
under "Prefetch".

Also look at http://kerneltrap.org/node/2157

MS's implementation required implementing a defrag utility with a specific 
feature that could position disk data based on access logs.  Reiser4 can do 
the same thing as part of its basic functionality with the addition of a much 
much simpler tool to help assign keys based on that access log.  Then a 
repacker (when it devaporizes) can further optimize for that access pattern 
without any code specific to that purpose.  Seems like good orthogonal design 
to me.

Hope that clarifies.  Like my previous post, whatever it did, it did it in way 
to many words.



On Wednesday 13 September 2006 15:10, Peter wrote:
> On Wed, 13 Sep 2006 14:51:39 -0600, Quinn Harris wrote:
> > Thoughts?
>
> Yes. Why on earth would you do this? By copying the files and renaming and
> hardlinking them is nothing a sysadmin would ever do. Just by copying you
> are allowing reiser to optimize the dir. You're trying to duplicate what a
> tree-based design does automatically. Moreover, remember that reiser packs
> files into clusters so that you may read more than just your one file from
> time to time which could end up adding time to your test.
>
> If reiser needs speedup it certainly won't be done by renaming files!
>
> JM$0.02

-- 
Quinn Harris


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Relocating files for faster boot/start-up on reiser(fs/4)
  2006-09-13 21:10 ` Peter
  2006-09-14  3:10   ` Quinn Harris
@ 2006-09-14 14:01   ` cmaurand
  1 sibling, 0 replies; 14+ messages in thread
From: cmaurand @ 2006-09-14 14:01 UTC (permalink / raw)
  To: Peter; +Cc: reiserfs-list


SCO has done this solution, thats why its such a dog.

Peter wrote:
> On Wed, 13 Sep 2006 14:51:39 -0600, Quinn Harris wrote:
>> Thoughts?
>>
> 
> Yes. Why on earth would you do this? By copying the files and renaming and
> hardlinking them is nothing a sysadmin would ever do. Just by copying you
> are allowing reiser to optimize the dir. You're trying to duplicate what a
> tree-based design does automatically. Moreover, remember that reiser packs
> files into clusters so that you may read more than just your one file from
> time to time which could end up adding time to your test.
> 
> If reiser needs speedup it certainly won't be done by renaming files!
> 
> JM$0.02
> 


-- 
Curtis Maurand
Senior Network & Systems Engineer
BlueTarp Financial, Inc.
443 Congress St.
6th Floor
Portland, ME 04101
207.797.5900 x233 (office)
207.797.3833	  (fax)
mailto:cmaurand@bluetarp.com
http://www.bluetarp.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Relocating files for faster boot/start-up on reiser(fs/4)
  2006-09-14  3:10   ` Quinn Harris
@ 2006-09-14 19:55     ` David Masover
  2006-09-14 22:09       ` Quinn Harris
  0 siblings, 1 reply; 14+ messages in thread
From: David Masover @ 2006-09-14 19:55 UTC (permalink / raw)
  To: Quinn Harris; +Cc: reiserfs-list

Quinn Harris wrote:

> The boot optimization was over 3885 files.  Ideally those files would be 
> ordered head to tail in a sequence that perfectly matches the order they will 
> be read.

> I bring this up here because I expect with reiser4, a repacker, and this 

Now that you mention it, do you have a control of some sort to prove 
this isn't just fragmentation?  That is, copy the files you're messing 
with in some random order (that should make booting slower), and 
benchmark that?

I'm guessing a repacker would speed things up much more than this, 
although this is interesting and helpful for systems which need to be 
rebooted often.  (I'd prefer a working suspend2...)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Relocating files for faster boot/start-up on reiser(fs/4)
  2006-09-14 19:55     ` David Masover
@ 2006-09-14 22:09       ` Quinn Harris
  2006-09-14 22:23         ` David Masover
  0 siblings, 1 reply; 14+ messages in thread
From: Quinn Harris @ 2006-09-14 22:09 UTC (permalink / raw)
  To: reiserfs-list

On Thursday 14 September 2006 13:55, David Masover wrote:
> Quinn Harris wrote:
> > The boot optimization was over 3885 files.  Ideally those files would be
> > ordered head to tail in a sequence that perfectly matches the order they
> > will be read.
> >
> > I bring this up here because I expect with reiser4, a repacker, and this
>
> Now that you mention it, do you have a control of some sort to prove
> this isn't just fragmentation?  That is, copy the files you're messing
> with in some random order (that should make booting slower), and
> benchmark that?

That is a good point.  Recording the disk layout before and after to compare 
relative fragmentation would be a good idea.  As well as randomizing the 
sequence as a sanity check.

Also note that during boot I was using readahead on all 3885 files.  So the 
kernel has a good opportunity to rearrange the reads.  And the read sequence 
doesn't necessary match the order its needed (though I tried to get that).

From what I have done, it would be unreasonable to say anything other than, 
this has a good change of working.  I really think this should work in 
theory.  That's why I tried it in the first place and required relatively 
little data to convince myself I am not smoking crack.

A simple test with reiser4 suggests that it will pack the files right up next 
to each other in the specified sequence.  Take a look at the section "If It 
Is In RAM, Dirty, and Contiguous, Then Squeeze It ALL Together Just Before 
Writing" on www.namesys.com.

I will redo this on a fresh reiser4 filesystem to reduce initial fragmentation 
and run some test to eliminate reasonable confounds.
 

>
> I'm guessing a repacker would speed things up much more than this,
> although this is interesting and helpful for systems which need to be
> rebooted often.  (I'd prefer a working suspend2...)
This idea complements a repacker.  The core idea is to cause the fs to order 
files based on recorded access patterns that will likely happen again 
(instead of just the location in the directory tree).  A repacker would 
ensure files are placed in the exact order back to back as we expect them to 
be accessed.  File system fragmentation will substantially inhibit this from 
happening.  That said, this is causing a basic form of repacking. 

Yeah a good suspend will probably result in faster boot no matter how many 
tricks you use to improve normal boot, but this could also improve cold load 
times for apps like Open Office and Firefox.  In addition suspend drops the 
disk cache.  Profiling access patterns on resuming from suspend then 
relocating based on that might improve restore times.

Any time file access is very predictable, and doesn't closely follow the 
default key assignment order, this could be improve performance.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Relocating files for faster boot/start-up on reiser(fs/4)
  2006-09-14 22:09       ` Quinn Harris
@ 2006-09-14 22:23         ` David Masover
  2006-09-15  5:15           ` Toby Thain
  0 siblings, 1 reply; 14+ messages in thread
From: David Masover @ 2006-09-14 22:23 UTC (permalink / raw)
  To: Quinn Harris; +Cc: reiserfs-list

Quinn Harris wrote:
> On Thursday 14 September 2006 13:55, David Masover wrote:
>> Quinn Harris wrote:
>>> The boot optimization was over 3885 files.  Ideally those files would be
>>> ordered head to tail in a sequence that perfectly matches the order they
>>> will be read.
>>>
>>> I bring this up here because I expect with reiser4, a repacker, and this
>> Now that you mention it, do you have a control of some sort to prove
>> this isn't just fragmentation?  That is, copy the files you're messing
>> with in some random order (that should make booting slower), and
>> benchmark that?
> 
> That is a good point.  Recording the disk layout before and after to compare 
> relative fragmentation would be a good idea.  As well as randomizing the 
> sequence as a sanity check.
> 
> Also note that during boot I was using readahead on all 3885 files.  So the 
> kernel has a good opportunity to rearrange the reads.  And the read sequence 
> doesn't necessary match the order its needed (though I tried to get that).

Speaking of which, did you parallize the boot process at all?  I'd 
estimate my system easily spent more than 50% of its boot time not 
touching the disk at all before I did that.  Gentoo can do this, I'm not 
sure what else, as it kind of needs your init system to understand 
dependencies.

As far as faster load times for Firefox and OpenOffice, you may be on to 
something here, but then, these apps probably match up pretty well with 
the on-disk format, too.

This would probably be most useful for a case where you have to read a 
lot of small files, with relatively low CPU usage, in a fairly 
unpredictable order.  I suspect it would be nice for Gentoo's Portage 
tree, though I can't think what else.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Relocating files for faster boot/start-up on reiser(fs/4)
  2006-09-14 22:23         ` David Masover
@ 2006-09-15  5:15           ` Toby Thain
  2006-09-15 21:20             ` Quinn Harris
  0 siblings, 1 reply; 14+ messages in thread
From: Toby Thain @ 2006-09-15  5:15 UTC (permalink / raw)
  To: David Masover; +Cc: ReiserFS List


On 14-Sep-06, at 6:23 PM, David Masover wrote:

> Quinn Harris wrote:
>> On Thursday 14 September 2006 13:55, David Masover wrote:
>>> ...
>> That is a good point.  Recording the disk layout before and after  
>> to compare relative fragmentation would be a good idea.  As well  
>> as randomizing the sequence as a sanity check.
>> Also note that during boot I was using readahead on all 3885  
>> files.  So the kernel has a good opportunity to rearrange the  
>> reads.  And the read sequence doesn't necessary match the order  
>> its needed (though I tried to get that).
>
> Speaking of which, did you parallize the boot process at all?

Just off the top of my head, wouldn't that make the access sequence  
asynchronous & thereby less predictable? (Although I'm sure it's a  
net win.)

> I'd estimate my system easily spent more than 50% of its boot time  
> not touching the disk at all before I did that.  Gentoo can do  
> this, I'm not sure what else, as it kind of needs your init system  
> to understand dependencies.
...

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Relocating files for faster boot/start-up on reiser(fs/4)
  2006-09-15  5:15           ` Toby Thain
@ 2006-09-15 21:20             ` Quinn Harris
  2006-09-15 22:27               ` David Masover
  2006-09-18  9:36               ` PFC
  0 siblings, 2 replies; 14+ messages in thread
From: Quinn Harris @ 2006-09-15 21:20 UTC (permalink / raw)
  To: reiserfs-list

On Thursday 14 September 2006 23:15, Toby Thain wrote:
> On 14-Sep-06, at 6:23 PM, David Masover wrote:
> > Quinn Harris wrote:
> >> On Thursday 14 September 2006 13:55, David Masover wrote:
> >>> ...
> >>
> >> That is a good point.  Recording the disk layout before and after
> >> to compare relative fragmentation would be a good idea.  As well
> >> as randomizing the sequence as a sanity check.
> >> Also note that during boot I was using readahead on all 3885
> >> files.  So the kernel has a good opportunity to rearrange the
> >> reads.  And the read sequence doesn't necessary match the order
> >> its needed (though I tried to get that).
> >
> > Speaking of which, did you parallize the boot process at all?
>
> Just off the top of my head, wouldn't that make the access sequence
> asynchronous & thereby less predictable? (Although I'm sure it's a
> net win.)
It could, but the kernel will try to reorder the outstanding block requests to 
reduce seek.  If that is an overall win I don't know.  In addition early in 
the boot, readahead-list or similar will tell the kernel to start reading 
most of the files need for the complete boot so they are already in memory 
when needed.  Ubuntu does the readahead now and all my tests where with 
readahead.

>
> > I'd estimate my system easily spent more than 50% of its boot time
> > not touching the disk at all before I did that.  Gentoo can do
> > this, I'm not sure what else, as it kind of needs your init system
> > to understand dependencies.
>
> ...

My first test turned out to be on a heavily fragmented file system.  I 
reinstalled Ubuntu Dapper with a fresh reiserfs file system and it booted in 
1:07 (grub to desktop background appearing).  After extending the time 
readahead-watch monitors files and running the reallocate script it now boots 
in 0:50.

I wrote a little python script that uses the FIBMAP ioctl to check the blocks 
the files are using.  From this I know the relocate script on this fresh file 
system is doing exactly what it was intended to do.  I am also able to 
estimate how much it will improve performance by comparing the fragmentation 
before and after its run.  I have learned that the delays on disk io for 
Ubuntu boot are dominated by rotational latency and not head seeks.  The 
current readahead implementation orders the files by on disk location, 
substantially mitigating head seek time.  But the latency is can easily 
double the time needed to load the same data.

Subjectively (and objectively by about 6s) relocation and extending 
readahead-watch substantially improved Gnome boot and initial responsiveness.  
But, I need to measure how much of this was caused by just extending how much 
is read ahead vs. the reallocation.

The current Ubuntu boot waits for hardware probing, DHCP and other things 
giving the disk readahead a chance to work.  I think this reallocation might 
help a parallel boot more as the data will be needed sooner.  So I changed my 
mind, I think parallel boot will highlight the reallocate advantage.  Now I 
just need to test the hypothesis.

Not sure if I would be better of trying initng or waiting for upstart (Ubuntus 
new init) to get scripts that actually parallel boot.  The code for upstart 
is very clean and it has the backing of a major distro, so I have high hopes.

Much like before, I was able to improve a 16.5s oowriter cold start to 14s 
with this reallocate script, with a cold start of 4.8s (OO 2.0.2, was using 
2.0.3 before).  It is evident to me that the readahead-watch is missing 
something on Open Office startup.  It seems very possible to get OO to cold 
start in under 8s with the uses of reallocation and readahead right when it 
starts.


My current scripts are at
http://www.quinnh.org/reallocate.py  (27 line reallocate script, expects 
dir /tmp/refrag to exist and takes the readahead-watch log as a paramater)

http://www.quinnh.org/measure.py (uses FIBMAP to estimate the time needed to 
load the files in the passed readahead-watch log, uses average seek and and 
latency for estimate)

http://www.quinnh.org/readahead-watch-time-order.patch (Patch against Ubuntu 
readahead-watch to add an order by access time option.)


I will try to write a nice unified script that will profile, reallocate and do 
readahead for an application to speed it up.  e.g. "# reallocate.py 
oowriter".  Run it once to profile and reallocate.  drop_caches, Run it again 
and oowriter loads faster.

I think Python will be the best language for this because its become 
relatively universal and its easy to understand for the uninitiated.  This 
really isn't black magic so transparency is good.  I personally prefer Ruby 
though.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Relocating files for faster boot/start-up on reiser(fs/4)
  2006-09-15 21:20             ` Quinn Harris
@ 2006-09-15 22:27               ` David Masover
  2006-09-16  0:01                 ` Quinn Harris
  2006-09-18  9:36               ` PFC
  1 sibling, 1 reply; 14+ messages in thread
From: David Masover @ 2006-09-15 22:27 UTC (permalink / raw)
  To: Quinn Harris; +Cc: reiserfs-list

Quinn Harris wrote:
> On Thursday 14 September 2006 23:15, Toby Thain wrote:
>> On 14-Sep-06, at 6:23 PM, David Masover wrote:
>>> Quinn Harris wrote:
>>>> On Thursday 14 September 2006 13:55, David Masover wrote:
>>>>> ...
>>>> That is a good point.  Recording the disk layout before and after
>>>> to compare relative fragmentation would be a good idea.  As well
>>>> as randomizing the sequence as a sanity check.
>>>> Also note that during boot I was using readahead on all 3885
>>>> files.  So the kernel has a good opportunity to rearrange the
>>>> reads.  And the read sequence doesn't necessary match the order
>>>> its needed (though I tried to get that).
>>> Speaking of which, did you parallize the boot process at all?
>> Just off the top of my head, wouldn't that make the access sequence
>> asynchronous & thereby less predictable? (Although I'm sure it's a
>> net win.)
> It could, but the kernel will try to reorder the outstanding block requests to 
> reduce seek.  If that is an overall win I don't know.  In addition early in 
> the boot, readahead-list or similar will tell the kernel to start reading 
> most of the files need for the complete boot so they are already in memory 
> when needed.  Ubuntu does the readahead now and all my tests where with 
> readahead.

That's interesting.  I think either parallizing or a very aggressive 
readahead will perform similarly, except in cases where you have a 
script blocking on something other than disk or CPU, like, say, network.

>>> I'd estimate my system easily spent more than 50% of its boot time
>>> not touching the disk at all before I did that.  Gentoo can do
>>> this, I'm not sure what else, as it kind of needs your init system
>>> to understand dependencies.
>> ...
> 

> The current Ubuntu boot waits for hardware probing, DHCP and other things 
> giving the disk readahead a chance to work.  I think this reallocation might 
> help a parallel boot more as the data will be needed sooner.  So I changed my 
> mind, I think parallel boot will highlight the reallocate advantage.  Now I 
> just need to test the hypothesis.

Hmm.  That's possible.  But again, even with the parallel boot, there 
was still a bit of time spent not touching the disk, so I wouldn't 
expect much more of a speedup than what you already have.  Which also 
means, by the way, that I wouldn't use it much -- my system takes more 
like 20 seconds from Grub to a login prompt, and from then on, the only 
things that take more than 5 seconds to load are games.  Since I know 
Quake 4 uses zipfiles (probably compressed) for its storage, and I 
watched the HD LED while it loads, I don't think I can speed that up at 
all short of buying a faster CPU.

Well, that and the Portage tree, but you say I shouldn't expect much 
from that.  Maybe the portage cache?

> Not sure if I would be better of trying initng or waiting for upstart (Ubuntus 
> new init) to get scripts that actually parallel boot.  The code for upstart 
> is very clean and it has the backing of a major distro, so I have high hopes.

Hmm.  That sounds kind of cool, but I wonder how it compares to Gentoo's 
init scripts?  I guess I'll have to wait till it hits the one Ubuntu box 
I have...

> Much like before, I was able to improve a 16.5s oowriter cold start to 14s 
> with this reallocate script, with a cold start of 4.8s (OO 2.0.2, was using 
> 2.0.3 before).

Wait -- cold start is 14s, but it's also 4.8s?  Did you mean warm/hot 
start for that last number?

> I think Python will be the best language for this because its become 
> relatively universal and its easy to understand for the uninitiated.  This 
> really isn't black magic so transparency is good.  I personally prefer Ruby 
> though.

Wait...  Python is more universal than Ruby of Ruby on Rails?

Python is faster, anyway...  I'm waiting for someone to do a decent 
implementation of Ruby on something like .NET before I start using it 
for anything I want to perform well.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Relocating files for faster boot/start-up on reiser(fs/4)
  2006-09-15 22:27               ` David Masover
@ 2006-09-16  0:01                 ` Quinn Harris
  2006-09-16  8:59                   ` David Masover
  0 siblings, 1 reply; 14+ messages in thread
From: Quinn Harris @ 2006-09-16  0:01 UTC (permalink / raw)
  To: reiserfs-list

On Friday 15 September 2006 16:27, David Masover wrote:

> > Not sure if I would be better of trying initng or waiting for upstart
> > (Ubuntus new init) to get scripts that actually parallel boot.  The code
> > for upstart is very clean and it has the backing of a major distro, so I
> > have high hopes.
>
> Hmm.  That sounds kind of cool, but I wonder how it compares to Gentoo's
> init scripts?  I guess I'll have to wait till it hits the one Ubuntu box
> I have...
Gentoo default init doesn't paralize well.  Not when compared to initng which 
is realitivly easy to get to work on Gentoo.  The Ubuntu people decided 
initng wansn't powerfull enough (let alone the existing sysvinit).  They 
thought it needed a better way to define the bootup sequence during boot.  In 
addition to integrate running any task like ACPI events, hotplut, CRON into 
one consistent tool.
http://www.netsplit.com/blog/work/canonical/upstart.html


>
> > Much like before, I was able to improve a 16.5s oowriter cold start to
> > 14s with this reallocate script, with a cold start of 4.8s (OO 2.0.2, was
> > using 2.0.3 before).
>
> Wait -- cold start is 14s, but it's also 4.8s?  Did you mean warm/hot
> start for that last number?
OOPS its 4.8s warm and was initially 16.5s cold then 14s cold after 
reallocationg.
>
> > I think Python will be the best language for this because its become
> > relatively universal and its easy to understand for the uninitiated. 
> > This really isn't black magic so transparency is good.  I personally
> > prefer Ruby though.
>
> Wait...  Python is more universal than Ruby of Ruby on Rails?
Both Gentoo and Ubuntu install Python by default but not Ruby.  And more 
people at least in the US are familiar with Python.  Finally I might use 
Python inotify code (to replace readahead-watch) and the Ruby version is a 
bit alpha and I don't think availible in Gentoo or Ubuntu packages.

I came to Ruby through RoR.  I think the language has an unmatched pragmatic 
eligance.  This isn't appreciated until one addresses a few problem domains 
with it.  I don't know of anything Python does reasonably well that Ruby 
can't do reasonably well (- the performance problem).  On the other hand, I 
doubt Python could make for something as slick as Rake 
http://www.martinfowler.com/articles/rake.html.  And Ruby provides a wealth 
of conviences and shortcuts without being the lexical mess that is Perl.  I 
could be missing something though.  But for this particular problem Python 
isn't bad.

>
> Python is faster, anyway...  I'm waiting for someone to do a decent
> implementation of Ruby on something like .NET before I start using it
> for anything I want to perform well.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Relocating files for faster boot/start-up on reiser(fs/4)
  2006-09-16  0:01                 ` Quinn Harris
@ 2006-09-16  8:59                   ` David Masover
  0 siblings, 0 replies; 14+ messages in thread
From: David Masover @ 2006-09-16  8:59 UTC (permalink / raw)
  To: Quinn Harris; +Cc: reiserfs-list

Quinn Harris wrote:
> On Friday 15 September 2006 16:27, David Masover wrote:
> 
>>> Not sure if I would be better of trying initng or waiting for upstart
>>> (Ubuntus new init) to get scripts that actually parallel boot.  The code
>>> for upstart is very clean and it has the backing of a major distro, so I
>>> have high hopes.
>> Hmm.  That sounds kind of cool, but I wonder how it compares to Gentoo's
>> init scripts?  I guess I'll have to wait till it hits the one Ubuntu box
>> I have...
> Gentoo default init doesn't paralize well.  Not when compared to initng which 
> is realitivly easy to get to work on Gentoo.

I'm not sure what initng is, but the way I paralize Gentoo is by setting 
a flag in /etc/conf.d/rc:
RC_PARALLEL_STARTUP="yes"

I still don't see a difference between initng and Gentoo's init.  I 
guess I'd have to install them both.  One thing I like about Gentoo's 
init is that they are still just shell scripts, and it would take a 
minimal amount of code to convert them to/from the old init style.

> The Ubuntu people decided 
> initng wansn't powerfull enough (let alone the existing sysvinit).  They 
> thought it needed a better way to define the bootup sequence during boot.  In 
> addition to integrate running any task like ACPI events, hotplut, CRON into 
> one consistent tool.
> http://www.netsplit.com/blog/work/canonical/upstart.html

Aha, thanks.

Getting offtopic here, but I don't see the comparison I'm looking for. 
I see why it's different than launchd -- mostly, that launchd provides 
no way of knowing whether we want to wait for a script to run or an app 
to start.  But I don't know of a way to know when an app has finished 
starting, unless it daemonizes itself -- which makes it easy to write a 
script that ends when the app has started.

I really don't see the usefulness of making that distinction as far as 
dependencies go.

Finally read up on the "event-based system", and I suspect this kind of 
thing could probably be an extension to a dependency-based system.  I 
guess we'll see if initng pulls that off.

>> Wait...  Python is more universal than Ruby of Ruby on Rails?
> Both Gentoo and Ubuntu install Python by default but not Ruby.  And more 
> people at least in the US are familiar with Python.  Finally I might use 
> Python inotify code (to replace readahead-watch) and the Ruby version is a 
> bit alpha and I don't think availible in Gentoo or Ubuntu packages.

Speaking of which, the Perl inotify is broken for me a bit lately, I 
need to figure out what's going on.  Unfortunately, I don't know if it's 
perl or the kernel that's broken...

> I don't know of anything Python does reasonably well that Ruby 
> can't do reasonably well (- the performance problem).

This might solve the performance problem:

>> I'm waiting for someone to do a decent
>> implementation of Ruby on something like .NET before I start using it
>> for anything I want to perform well.

If they can do it right, well, it seems like MS wants to replace C++ 
with C#, thus .NET should perform decently.  Mono means it's 
cross-platform, or at least, it runs JIT'ed on the platforms I care about.

And hey, once you're on Gentoo or Ubuntu, it doesn't matter much, 
really.  Install a Ruby app and Ruby becomes a dependency.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Relocating files for faster boot/start-up on reiser(fs/4)
  2006-09-15 21:20             ` Quinn Harris
  2006-09-15 22:27               ` David Masover
@ 2006-09-18  9:36               ` PFC
  2006-09-18 22:32                 ` Quinn Harris
  1 sibling, 1 reply; 14+ messages in thread
From: PFC @ 2006-09-18  9:36 UTC (permalink / raw)
  To: reiserfs-list


	Windows already does this.

	It has a service which monitors filesystem usage, and writes data to disk  
; the defragmenter uses this data to lay the files on disk so that boot is  
very fast.

	However think it optimizes only the time to the login screen ; so windows  
boot is extremely fast ; of course, once you have logged in, you have to  
wait forever until all the crap system tray apps launch themselves and eat  
all your RAM...

	My own repacker is very simple, and it handles any filesystem !

- boot from Kanotix CD
- tar cv /mnt/my_disk | lzop -c | ssh -c blowfish other_machine "cat  
>backup.tar.lzo"
- umount, mkfs, mount
- ssh -c blowfish other_machine "cat backup.tar.lzo" | lzop -cd | tar xv

	You can also use an USB disk, or other disks in the machine. The effects  
are pretty visible.
	I do the first part often to make a full disk backup to a USB harddrive.

	Anyway, IMHO the best way to have a super responsive system would be :

Have a daemon which monitors which files, or parts of files, are read, and  
in what context :

- boot to fully loaded KDE/Gnome
- launching an application

You already have this apparently...

The repacker would then use this information to lay files on disk (just  
like windows does).

Then, the daemon would trigger readahead, when booting or detecting the  
launch of an application, and read everything in (which would be a nice  
sequential read)...


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Relocating files for faster boot/start-up on reiser(fs/4)
  2006-09-18  9:36               ` PFC
@ 2006-09-18 22:32                 ` Quinn Harris
  0 siblings, 0 replies; 14+ messages in thread
From: Quinn Harris @ 2006-09-18 22:32 UTC (permalink / raw)
  To: reiserfs-list

On Monday 18 September 2006 03:36, PFC wrote:
> 	Windows already does this.
I am familiar with this, it was in part the inspiration for this idea.

>
> 	It has a service which monitors filesystem usage, and writes data to disk
> ; the defragmenter uses this data to lay the files on disk so that boot is
> very fast.
Yep, my tests suggest for Ubuntu boot that typical improvements will be on the 
order of 5-15s (substantially dependent on how fragmented the file system is 
and complexity of boot).  The system needs about 70MB of data to boot.  
Typical fragmentation on a fresh filesystem will cause 2-3x slowdown over 
perfect packing.  70MB at 20MB/s is 2.5s so our gain would be 5s.  The 
improvement wasn't as good as I was originally expecting but still very 
measurable.

I am not sure this is enough of a win to justify integrating this idea in a 
major distro.  But without a 

>
> 	However think it optimizes only the time to the login screen ; so windows
> boot is extremely fast ; of course, once you have logged in, you have to
> wait forever until all the crap system tray apps launch themselves and eat
> all your RAM...
In practice what I am doing improves system responsiveness.  This is because I 
am measuring all file access to an idle state then reallocating and 
preloading all that data.  Gnome still has a bit of data to page in after the 
menus are available (like menu icons)

>
> 	My own repacker is very simple, and it handles any filesystem !
>
> - boot from Kanotix CD
> - tar cv /mnt/my_disk | lzop -c | ssh -c blowfish other_machine "cat
>
> >backup.tar.lzo"
>
> - umount, mkfs, mount
> - ssh -c blowfish other_machine "cat backup.tar.lzo" | lzop -cd | tar xv
>
> 	You can also use an USB disk, or other disks in the machine. The effects
> are pretty visible.
> 	I do the first part often to make a full disk backup to a USB harddrive.
>
> 	Anyway, IMHO the best way to have a super responsive system would be :
>
Are you essentially copying all the files off a file system, recreating a 
fresh fs and writing them back?  This allows the file system to reconstruct 
the files in a good defraged order.  Problem is it takes along time and takes 
the system down (or at least the fs).  The Reiser4 repacker would have the 
same effect but without any downtime.  Windows defrag does the same.

This doesn't place files used at boot right next to each other on disk like 
what I am doing, but both our approaches reduces fragmentation.

My script can do its thing in about 10s (for just one thing like better 
bootup) and without disrupting the running system.  (Actually I have seen it 
break some things, but I expect this can be mitigated by avoiding repacking 
open files)

> Have a daemon which monitors which files, or parts of files, are read, and
> in what context :
>
> - boot to fully loaded KDE/Gnome
> - launching an application
>
> You already have this apparently...
>
> The repacker would then use this information to lay files on disk (just
> like windows does).
The python script I am using to measure fragmentation shows that what I am 
doing results in packing that is within 1% of idea in terms of disk read 
time.  This only works well on a fresh filesystem with large contiguous 
unallocated space but not so well on a heavily fragmented fs.

This suggests for reiser (maybe other fs), a special kernel or defrag tool for 
this type of reallocation is unnecessary.


>
> Then, the daemon would trigger readahead, when booting or detecting the
> launch of an application, and read everything in (which would be a nice
> sequential read)...

I hope in a good time to write a python script that allows the user to do 
something like "reallocate.py oowriter".  It will profile Open Office writer 
on startup using inotify to generating a readahead log.  Then reallocate with 
the copy trick.  If the command is run again, it will readahead the files for 
oowriter and launch the app.  I think this could double cold start times.  
Part from just the readahead and part from the reallocation.

It is evident that the inotify file monitoring (using Ubuntu readahead-watch) 
is not catching everything.  If I readahead for oowriter then lanuch it, it 
takes 6-7s.  But it takes 3.6 to start after immediately closing it.  In 
principal, I would think the readahead should be able to load all files 
oowriter needs into cache getting the 3.6s time after readahead.

Also this doesn't monitor access patterns within a file.  How much of the data 
in a file is really needed?  I am not sure I would be able to pack only parts 
of files, but I might with sparse files.



^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2006-09-18 22:32 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-09-13 20:51 Relocating files for faster boot/start-up on reiser(fs/4) Quinn Harris
2006-09-13 21:10 ` Peter
2006-09-14  3:10   ` Quinn Harris
2006-09-14 19:55     ` David Masover
2006-09-14 22:09       ` Quinn Harris
2006-09-14 22:23         ` David Masover
2006-09-15  5:15           ` Toby Thain
2006-09-15 21:20             ` Quinn Harris
2006-09-15 22:27               ` David Masover
2006-09-16  0:01                 ` Quinn Harris
2006-09-16  8:59                   ` David Masover
2006-09-18  9:36               ` PFC
2006-09-18 22:32                 ` Quinn Harris
2006-09-14 14:01   ` cmaurand

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.