Request: I/O request recording

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Request: I/O request recording
@ 2004-01-24 18:10 Felix von Leitner
  2004-01-24 18:23 ` Valdis.Kletnieks
                   ` (3 more replies)
  0 siblings, 4 replies; 23+ messages in thread
From: Felix von Leitner @ 2004-01-24 18:10 UTC (permalink / raw)
  To: linux-kernel

I would like to have a user space program that I could run while I cold
start KDE.  The program would then record which I/O pages were read in
which order.  The output of that program could then be used to pre-cache
all those pages, but in an order that reduces disk head movement.
Demand Loading unfortunately produces lots of random page I/O scattered
all over the disk.

Having a way to know which pages are accessed in which order at a
typical cold start would be very benefitial, not only for the purpose
described above but it could also be used as input for a linker code
reordering optimization.

What do you think?

Felix

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Request: I/O request recording
  2004-01-24 18:10 Request: I/O request recording Felix von Leitner
@ 2004-01-24 18:23 ` Valdis.Kletnieks
  2004-01-24 18:26 ` Arjan van de Ven
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 23+ messages in thread
From: Valdis.Kletnieks @ 2004-01-24 18:23 UTC (permalink / raw)
  To: Felix von Leitner; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1036 bytes --]

On Sat, 24 Jan 2004 19:10:27 +0100, Felix von Leitner <felix-kernel@fefe.de>  said:
> I would like to have a user space program that I could run while I cold
> start KDE.  The program would then record which I/O pages were read in
> which order.  The output of that program could then be used to pre-cache
> all those pages, but in an order that reduces disk head movement.
> Demand Loading unfortunately produces lots of random page I/O scattered
> all over the disk.

The Fedora version of the kernel-utils RPM includes /usr/sbin/readahead, which
gets launched like this:

start() {
    echo -n $"Starting background readahead: "
    /usr/sbin/readahead /usr/share/icons/Bluecurve/48x48/mimetypes/* &
    /usr/sbin/readahead /usr/share/icons/Bluecurve/24x24/stock/* &
    /usr/sbin/readahead /usr/share/applications/* &
    /usr/sbin/readahead `cat /etc/readahead.files` &
}

So given that program, you could simpy strace your KDE stuff, grep out all the
open calls and the filenames, stick them in /etc/readahead.files, and be done.

[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Request: I/O request recording
  2004-01-24 18:10 Request: I/O request recording Felix von Leitner
  2004-01-24 18:23 ` Valdis.Kletnieks
@ 2004-01-24 18:26 ` Arjan van de Ven
  2004-01-24 19:25   ` Ville Herva
  2004-01-24 20:11 ` Diego Calleja
  2004-01-24 23:35 ` Andrew Morton
  3 siblings, 1 reply; 23+ messages in thread
From: Arjan van de Ven @ 2004-01-24 18:26 UTC (permalink / raw)
  To: Felix von Leitner; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1038 bytes --]

On Sat, 2004-01-24 at 19:10, Felix von Leitner wrote:
> I would like to have a user space program that I could run while I cold
> start KDE.  The program would then record which I/O pages were read in
> which order.  The output of that program could then be used to pre-cache
> all those pages, but in an order that reduces disk head movement.
> Demand Loading unfortunately produces lots of random page I/O scattered
> all over the disk.

I recently did something like this (and it scared me, it seems a typical
Fedora boot into gnome opens like 11.000 files ;) but via a printk in
the kernel....

I experimented with readahead'ing all that stuff while the initscripts
ran in the hope it would save time... but it doesn't somehow.

Some other things kinda help; if you feel adventurous you could play
with the kernel-utils RPM in rawhide which does a readahead of the files
the desktop opens while GDM login window is displayed; if the user isn't
typing his name really fast that decreases the desktop startup time...

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Request: I/O request recording
  2004-01-24 18:26 ` Arjan van de Ven
@ 2004-01-24 19:25   ` Ville Herva
  2004-01-24 22:43     ` Arjan van de Ven
  0 siblings, 1 reply; 23+ messages in thread
From: Ville Herva @ 2004-01-24 19:25 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Felix von Leitner, linux-kernel

On Sat, Jan 24, 2004 at 07:26:17PM +0100, you [Arjan van de Ven] wrote:
> 
> I recently did something like this (and it scared me, it seems a typical
> Fedora boot into gnome opens like 11.000 files ;) but via a printk in
> the kernel....
> 
> I experimented with readahead'ing all that stuff while the initscripts
> ran in the hope it would save time... but it doesn't somehow.

Did you sort the sectors to be read, or just read the files into page cache
in randomish order ?

Or do you mean that even after all the files were read into cache, the X
startup time didn't get any better (not counting the cache priming)?

 
-- v --

v@iki.fi

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Request: I/O request recording
  2004-01-24 19:25   ` Ville Herva
@ 2004-01-24 22:43     ` Arjan van de Ven
  0 siblings, 0 replies; 23+ messages in thread
From: Arjan van de Ven @ 2004-01-24 22:43 UTC (permalink / raw)
  To: Ville Herva, Felix von Leitner, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 938 bytes --]

On Sat, Jan 24, 2004 at 09:25:45PM +0200, Ville Herva wrote:
> On Sat, Jan 24, 2004 at 07:26:17PM +0100, you [Arjan van de Ven] wrote:
> > 
> > I recently did something like this (and it scared me, it seems a typical
> > Fedora boot into gnome opens like 11.000 files ;) but via a printk in
> > the kernel....
> > 
> > I experimented with readahead'ing all that stuff while the initscripts
> > ran in the hope it would save time... but it doesn't somehow.
> 
> Did you sort the sectors to be read, or just read the files into page cache
> in randomish order ?

semi random order but mostly submitted in parallel so the kernel has lots of
freedom to reorder

> Or do you mean that even after all the files were read into cache, the X
> startup time didn't get any better (not counting the cache priming)?

I mean that the time it takes to prime is just about exactly the time you
then win... eg net gain of about zero

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Request: I/O request recording
  2004-01-24 18:10 Request: I/O request recording Felix von Leitner
  2004-01-24 18:23 ` Valdis.Kletnieks
  2004-01-24 18:26 ` Arjan van de Ven
@ 2004-01-24 20:11 ` Diego Calleja
  2004-01-24 21:09   ` Ville Herva
  2004-01-24 23:35 ` Andrew Morton
  3 siblings, 1 reply; 23+ messages in thread
From: Diego Calleja @ 2004-01-24 20:11 UTC (permalink / raw)
  To: Felix von Leitner; +Cc: linux-kernel

El Sat, 24 Jan 2004 19:10:27 +0100 Felix von Leitner <felix-kernel@fefe.de> escribió:

> I would like to have a user space program that I could run while I cold
> start KDE.  The program would then record which I/O pages were read in
> which order.  The output of that program could then be used to pre-cache
> all those pages, but in an order that reduces disk head movement.
> Demand Loading unfortunately produces lots of random page I/O scattered
> all over the disk.
> 
> Having a way to know which pages are accessed in which order at a
> typical cold start would be very benefitial, not only for the purpose
> described above but it could also be used as input for a linker code
> reordering optimization.
> 
> What do you think?

That's exactly what XP does (and Mac OS X, for that matter).
And it really works (ie: you can notice it)

XP records what the OS does in the first 2 minutes (or so). The next
time it boots, it tries to load the files that he knows that are going
to be used. The same for an app that is frecuently used: it records
what the app does, and it optimizes the startup of that app. 
Take a look at: (search prefetch)
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/appendix/hh/appendix/enhancements5_0qhx.asp
http://msdn.microsoft.com/msdnmag/issues/01/12/xpkernel/default.aspx

Andrew Morton wrote a patch some time ago for 2.5.64-mm6 which achieves a
similar effect, I think:

" To test the nonlinear mapping code more thoroughly I have arranged for all
  executable file-backed mmaps to be treated as nonlinear.

  This means that when an executable is first mapped in, the kernel will
  slurp the whole thing off disk in one hit. Some IO changes were made to
  speed this up.

  This means that large cache-cold executables start significantly faster.
  Launching X11+KDE+mozilla goes from 23 seconds to 16. Starting OpenOffice
  seems to be 2x to 3x faster, and starting Konqueror maybe 3x faster too.
  Interesting."
(see: http://www.ussg.iu.edu/hypermail/linux/kernel/0303.1/1296.html)

The patches are still available. IIRC, they were dropped "because it should be
done in userspace". It'd very interesting to write userspace program
that does what XP does (it looks like a good idea for desktops)

http://www.ussg.iu.edu/hypermail/linux/kernel/0303.1/1296.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Request: I/O request recording
  2004-01-24 20:11 ` Diego Calleja
@ 2004-01-24 21:09   ` Ville Herva
  0 siblings, 0 replies; 23+ messages in thread
From: Ville Herva @ 2004-01-24 21:09 UTC (permalink / raw)
  To: Diego Calleja; +Cc: Felix von Leitner, linux-kernel

On Sat, Jan 24, 2004 at 09:11:56PM +0100, you [Diego Calleja] wrote:
> 
> That's exactly what XP does (and Mac OS X, for that matter).
> And it really works (ie: you can notice it)
> 
> XP records what the OS does in the first 2 minutes (or so). The next
> time it boots, it tries to load the files that he knows that are going
> to be used. The same for an app that is frecuently used: it records
> what the app does, and it optimizes the startup of that app. 
> Take a look at: (search prefetch)
> http://msdn.microsoft.com/library/default.asp?url=/library/en-us/appendix/hh/appendix/enhancements5_0qhx.asp
> http://msdn.microsoft.com/msdnmag/issues/01/12/xpkernel/default.aspx

It's perhaps worth pointing out that XP not only uses the boot (or
application launch) traces to prefetch the data on next boot (application
launch) but also to reorder the data on disk optimally via XP's
defragmenter. 

And XP is noticeable faster to boot than (say) W2000.


-- v --

v@iki.fi

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Request: I/O request recording
  2004-01-24 18:10 Request: I/O request recording Felix von Leitner
                   ` (2 preceding siblings ...)
  2004-01-24 20:11 ` Diego Calleja
@ 2004-01-24 23:35 ` Andrew Morton
  2004-01-24 23:53   ` Davide Libenzi
  2004-01-25 22:59   ` Bart Samwel
  3 siblings, 2 replies; 23+ messages in thread
From: Andrew Morton @ 2004-01-24 23:35 UTC (permalink / raw)
  To: Felix von Leitner; +Cc: linux-kernel

Felix von Leitner <felix-kernel@fefe.de> wrote:
>
> I would like to have a user space program that I could run while I cold
> start KDE.  The program would then record which I/O pages were read in
> which order.  The output of that program could then be used to pre-cache
> all those pages, but in an order that reduces disk head movement.
> Demand Loading unfortunately produces lots of random page I/O scattered
> all over the disk.

I wrote a similar thing in September of 2001.  What you do is:

- Reboot the system, wait until everything is steady-state (eg: X has
  started, applications are loaded).

- Load a kernel module which dumps the current contents of the pagecache
  (filename/offset-into-file) into a file.

  (The kernel module writes to modprobe's stdout, so you just do

	modprobe fboot-dump > /tmp/fboot-dump.out

   I'm very proud of this.)

- Post-process the resulting output into a database which is used on the
  next reboot.

- reboot

- This time a userspace application cuts in real early and reads the
  database and preloads all the pagecache using "optimal" I/O patterns so
  that everything which you will need in the subsequent boot is already in
  memory.


So it's all an attempt to optimise the boot-time I/O patterns.  It was
pretty much a waste of time, gaining only 10% or so, from memory.  You
could get just as much or more speedup from simply launching all the
initscripts in parallel, although this did tend to break stuff.

Anyway, the code's ancient but might provide some ideas:

	http://www.zip.com.au/~akpm/linux/fboot.tar.gz



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Request: I/O request recording
  2004-01-24 23:35 ` Andrew Morton
@ 2004-01-24 23:53   ` Davide Libenzi
  2004-01-25  0:03     ` Andrew Morton
                       ` (2 more replies)
  2004-01-25 22:59   ` Bart Samwel
  1 sibling, 3 replies; 23+ messages in thread
From: Davide Libenzi @ 2004-01-24 23:53 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Felix von Leitner, linux-kernel

On Sat, 24 Jan 2004, Andrew Morton wrote:

> Felix von Leitner <felix-kernel@fefe.de> wrote:
> >
> > I would like to have a user space program that I could run while I cold
> > start KDE.  The program would then record which I/O pages were read in
> > which order.  The output of that program could then be used to pre-cache
> > all those pages, but in an order that reduces disk head movement.
> > Demand Loading unfortunately produces lots of random page I/O scattered
> > all over the disk.
> 
> I wrote a similar thing in September of 2001.  What you do is:
> 
> - Reboot the system, wait until everything is steady-state (eg: X has
>   started, applications are loaded).
> 
> - Load a kernel module which dumps the current contents of the pagecache
>   (filename/offset-into-file) into a file.
> 
>   (The kernel module writes to modprobe's stdout, so you just do
> 
> 	modprobe fboot-dump > /tmp/fboot-dump.out
> 
>    I'm very proud of this.)
> 
> - Post-process the resulting output into a database which is used on the
>   next reboot.
> 
> - reboot
> 
> - This time a userspace application cuts in real early and reads the
>   database and preloads all the pagecache using "optimal" I/O patterns so
>   that everything which you will need in the subsequent boot is already in
>   memory.
> 
> 
> So it's all an attempt to optimise the boot-time I/O patterns.  It was
> pretty much a waste of time, gaining only 10% or so, from memory.  You
> could get just as much or more speedup from simply launching all the
> initscripts in parallel, although this did tend to break stuff.
> 
> Anyway, the code's ancient but might provide some ideas:
> 
> 	http://www.zip.com.au/~akpm/linux/fboot.tar.gz

Warning. I don't know if they do have a patent for this, but MS does this 
starting from XP (look inside %WINDIR%\PreFetch). It is both boot and app 
based.




- Davide



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Request: I/O request recording
  2004-01-24 23:53   ` Davide Libenzi
@ 2004-01-25  0:03     ` Andrew Morton
  2004-01-25  0:09       ` Davide Libenzi
  2004-01-25  0:04     ` Valdis.Kletnieks
  2004-01-25 12:26     ` Felipe Alfaro Solana
  2 siblings, 1 reply; 23+ messages in thread
From: Andrew Morton @ 2004-01-25  0:03 UTC (permalink / raw)
  To: Davide Libenzi; +Cc: felix-kernel, linux-kernel

Davide Libenzi <davidel@xmailserver.org> wrote:
>
> > 	http://www.zip.com.au/~akpm/linux/fboot.tar.gz
> 
>  Warning. I don't know if they do have a patent for this, but MS does this 
>  starting from XP (look inside %WINDIR%\PreFetch). It is both boot and app 
>  based.

Did they do it in August 2001?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Request: I/O request recording
  2004-01-25  0:03     ` Andrew Morton
@ 2004-01-25  0:09       ` Davide Libenzi
  0 siblings, 0 replies; 23+ messages in thread
From: Davide Libenzi @ 2004-01-25  0:09 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Davide Libenzi, felix-kernel, linux-kernel

On Sat, 24 Jan 2004, Andrew Morton wrote:

> Davide Libenzi <davidel@xmailserver.org> wrote:
> >
> > > 	http://www.zip.com.au/~akpm/linux/fboot.tar.gz
> > 
> >  Warning. I don't know if they do have a patent for this, but MS does this 
> >  starting from XP (look inside %WINDIR%\PreFetch). It is both boot and app 
> >  based.
> 
> Did they do it in August 2001?

Ouch, I don't know. I know for sure that it came with XP, but I'm not 
really into MS things ;) This is one of the links that talk about that:

http://msdn.microsoft.com/msdnmag/issues/01/12/XPKernel/default.aspx



- Davide



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Request: I/O request recording
  2004-01-24 23:53   ` Davide Libenzi
  2004-01-25  0:03     ` Andrew Morton
@ 2004-01-25  0:04     ` Valdis.Kletnieks
  2004-01-25  0:10       ` Davide Libenzi
  2004-01-25 12:26     ` Felipe Alfaro Solana
  2 siblings, 1 reply; 23+ messages in thread
From: Valdis.Kletnieks @ 2004-01-25  0:04 UTC (permalink / raw)
  To: Davide Libenzi; +Cc: Andrew Morton, Felix von Leitner, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 976 bytes --]

On Sat, 24 Jan 2004 15:53:44 PST, Davide Libenzi said:

> > Anyway, the code's ancient but might provide some ideas:
> > 
> > 	http://www.zip.com.au/~akpm/linux/fboot.tar.gz
> 
> Warning. I don't know if they do have a patent for this, but MS does this 
> starting from XP (look inside %WINDIR%\PreFetch). It is both boot and app 
> based.

Hmm.. prior art time. ;)

IBM's OS/VS1 and MVS operating systems had the 'link pack area', where
frequently loaded modules were loaded at system startup.  And there were
numerous 3rd party optimizers that would analyze the LOAD SVC patterns on your
system and produce a list of which modules should be pre-loaded in order to get
the most bang for the buck (even a *large* 370/168 or 303x processor might be
able to spare a megabyte tops, so optimizing it was important, and sites would
spend $5K on software that would optimize the memory usage and save them a
memory upgrade at $40K a meg...

This was mid-70s, so definitely pre-XP.

[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Request: I/O request recording
  2004-01-25  0:04     ` Valdis.Kletnieks
@ 2004-01-25  0:10       ` Davide Libenzi
  0 siblings, 0 replies; 23+ messages in thread
From: Davide Libenzi @ 2004-01-25  0:10 UTC (permalink / raw)
  To: Valdis.Kletnieks
  Cc: Andrew Morton, Felix von Leitner, Linux Kernel Mailing List

On Sat, 24 Jan 2004 Valdis.Kletnieks@vt.edu wrote:

> On Sat, 24 Jan 2004 15:53:44 PST, Davide Libenzi said:
> 
> > > Anyway, the code's ancient but might provide some ideas:
> > > 
> > > 	http://www.zip.com.au/~akpm/linux/fboot.tar.gz
> > 
> > Warning. I don't know if they do have a patent for this, but MS does this 
> > starting from XP (look inside %WINDIR%\PreFetch). It is both boot and app 
> > based.
> 
> Hmm.. prior art time. ;)
> 
> IBM's OS/VS1 and MVS operating systems had the 'link pack area', where
> frequently loaded modules were loaded at system startup.  And there were
> numerous 3rd party optimizers that would analyze the LOAD SVC patterns on your
> system and produce a list of which modules should be pre-loaded in order to get
> the most bang for the buck (even a *large* 370/168 or 303x processor might be
> able to spare a megabyte tops, so optimizing it was important, and sites would
> spend $5K on software that would optimize the memory usage and save them a
> memory upgrade at $40K a meg...
> 
> This was mid-70s, so definitely pre-XP.

They (MS) do work of a page fault basis though. It is quite different.



- Davide



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Request: I/O request recording
  2004-01-24 23:53   ` Davide Libenzi
  2004-01-25  0:03     ` Andrew Morton
  2004-01-25  0:04     ` Valdis.Kletnieks
@ 2004-01-25 12:26     ` Felipe Alfaro Solana
  2 siblings, 0 replies; 23+ messages in thread
From: Felipe Alfaro Solana @ 2004-01-25 12:26 UTC (permalink / raw)
  To: Linux Kernel Mailinglist

On Sun, 2004-01-25 at 00:53, Davide Libenzi wrote:
> > So it's all an attempt to optimise the boot-time I/O patterns.  It was
> > pretty much a waste of time, gaining only 10% or so, from memory.  You
> > could get just as much or more speedup from simply launching all the
> > initscripts in parallel, although this did tend to break stuff.
> > 
> > Anyway, the code's ancient but might provide some ideas:
> > 
> > 	http://www.zip.com.au/~akpm/linux/fboot.tar.gz
> 
> Warning. I don't know if they do have a patent for this, but MS does this 
> starting from XP (look inside %WINDIR%\PreFetch). It is both boot and app 
> based.

And tomorrow, they'll say the have patented the hamburger recipe, or the
euclidean triangle, or... who knows. C'mon... The world is going crazy!


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Request: I/O request recording
  2004-01-24 23:35 ` Andrew Morton
  2004-01-24 23:53   ` Davide Libenzi
@ 2004-01-25 22:59   ` Bart Samwel
  2004-01-25 23:09     ` Andrew Morton
  1 sibling, 1 reply; 23+ messages in thread
From: Bart Samwel @ 2004-01-25 22:59 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Felix von Leitner, linux-kernel

Andrew Morton wrote:
> Felix von Leitner <felix-kernel@fefe.de> wrote:
> 
>>I would like to have a user space program that I could run while I cold
>>start KDE.  The program would then record which I/O pages were read in
>>which order.  The output of that program could then be used to pre-cache
>>all those pages, but in an order that reduces disk head movement.
>>Demand Loading unfortunately produces lots of random page I/O scattered
>>all over the disk.
> 
> I wrote a similar thing in September of 2001.  What you do is:
[...]

When I saw this thread I've fiddled for a bit with the block_dump
functionality that's in the laptop_mode patch. I wanted to see if it
could support a similar thing completely from user space (except for the
block_dump code, of course). I've written a small tool to generate a
complete file that lists tuples (sector, size, device) from the kernel
output in syslog; it parses all "READ block xxx" messages since the
last reboot. Putting this through sort -n -u delivers a nicely sorted
file, ready for optimized reading.

Unfortunately I'm now stuck within the other part, which is reading the
pages back in memory at the next boot. It's not working, and I was 
hoping someone here could take a look and tell me what I'm doing wrong.

Here's what I've tried so far. I've written a program that simply reads 
the ranges by opening the device and reading from sector*512 to 
sector*512+size. It uses async io for efficiency, and to allow the 
kernel to merge read requests. It seems to read all the data, but after 
that the other programs seem to read most of it *again*! I only go from 
8500 down to 7000 reads or so, while most of the 7000 reads that remain 
are also in the range that is being prefetched. :(

I was wondering if the pages could have been removed so soon, so, to 
make sure, I mmaped the whole shebang with MAP_LOCKED and PROT_READ, and 
kept the mapping process in memory during the whole boot process. This 
had exactly the same effect. So, I thought that I might be reading the 
wrong blocks. However, when I feed it something like (160000, 4096, 
hdb1) I get a block_dump log that says exactly that (plus some extra, 
because mmap seems to read in a bit more than needed). So, that's not it.

I'm out of clues. If someone would be so kind to take a look at what I'm 
doing wrong, I'd very much appreciate it. I've put the code up at 
http://www.xs4all.nl/~bsamwel/block_read_replay.tar.gz. How to use it:

1. Patch your kernel with the patch that's included in the tarball. This 
patch modifies the block_dump output slightly, and enables a block_dump 
value of 2 which only reports READ actions. It's against 2.6.1-mm2, but 
it should apply fine to any kernel that has laptop_mode in it.

2. Record the bootup info. Somewhere at the very beginning, include 
"echo 2 > /proc/sys/vm/block_dump" in an init script. Reboot, and after 
the bootup sequence is complete, do echo 0 > /proc/sys/vm/block_dump.

3. "make" and put brexec (one of the two versions) somewhere your init 
scripts can access it.

4. Run slbrp (SysLog Block Read Parser) to generate a block list file: 
slbrp /var/log/syslog | sort -n -u > /etc/bootup_blocks.

5. Precede the echo 2 > /proc/sys/vm/block_dump at startup with a brexec 
("block read executor") call, e.g. "brexec /etc/bootup_blocks". The mmap 
version takes an extra parameter <N> = the number of seconds to keep the 
pages mapped and must be put in the background because it will simply 
wait for N seconds before exiting. So, it should be something like 
"brexec /etc/bootup_blocks 60" and then "sleep 30" to give it time to 
read everything before bootup continues. Yes, it's not pretty. It's just 
used for experimenting, so it doesn't matter.

6. Reboot, and disable block_dump after booting, like in step (2). Now 
the logging of reads only starts _after_ brexec has attempted to load 
all pages, and this gives info on what is still loaded. You'll probably 
see that it loads many things that are also listed in the bootup_blocks 
file. Now my question is: what am I doing wrong that it needs to read 
those again?

-- Bart

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Request: I/O request recording
  2004-01-25 22:59   ` Bart Samwel
@ 2004-01-25 23:09     ` Andrew Morton
  2004-01-25 23:29       ` Bart Samwel
  0 siblings, 1 reply; 23+ messages in thread
From: Andrew Morton @ 2004-01-25 23:09 UTC (permalink / raw)
  To: Bart Samwel; +Cc: felix-kernel, linux-kernel

Bart Samwel <bart@samwel.tk> wrote:
>
> When I saw this thread I've fiddled for a bit with the block_dump
>  functionality that's in the laptop_mode patch. I wanted to see if it
>  could support a similar thing completely from user space (except for the
>  block_dump code, of course). I've written a small tool to generate a
>  complete file that lists tuples (sector, size, device) from the kernel
>  output in syslog; it parses all "READ block xxx" messages since the
>  last reboot. Putting this through sort -n -u delivers a nicely sorted
>  file, ready for optimized reading.
> 
>  Unfortunately I'm now stuck within the other part, which is reading the
>  pages back in memory at the next boot. It's not working, and I was 
>  hoping someone here could take a look and tell me what I'm doing wrong.

Linux caches disk data on a per-file basis.  So if you preload pagecache
via the /dev/hda1 "file", that is of no benefit to the /etc/passwd file. 
Each one has its own unique pagecache.  When reading pages for /etc/passwd
we don't go looking for the same disk blocks in the cache of /dev/hda1.

Which is why the userspace cache preloading needs to know the pathnames of
all the relevant files - it needs to open and read each one, applying
knowledge of disk layout while doing it.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Request: I/O request recording
  2004-01-25 23:09     ` Andrew Morton
@ 2004-01-25 23:29       ` Bart Samwel
  2004-01-25 23:38         ` Andrew Morton
  0 siblings, 1 reply; 23+ messages in thread
From: Bart Samwel @ 2004-01-25 23:29 UTC (permalink / raw)
  To: Andrew Morton; +Cc: felix-kernel, linux-kernel

Andrew Morton wrote:
> Bart Samwel <bart@samwel.tk> wrote:
> 
>>When I saw this thread I've fiddled for a bit with the block_dump
>> functionality that's in the laptop_mode patch. I wanted to see if it
>> could support a similar thing completely from user space (except for the
>> block_dump code, of course). I've written a small tool to generate a
>> complete file that lists tuples (sector, size, device) from the kernel
>> output in syslog; it parses all "READ block xxx" messages since the
>> last reboot. Putting this through sort -n -u delivers a nicely sorted
>> file, ready for optimized reading.
>>
>> Unfortunately I'm now stuck within the other part, which is reading the
>> pages back in memory at the next boot. It's not working, and I was 
>> hoping someone here could take a look and tell me what I'm doing wrong.
> 
> Linux caches disk data on a per-file basis.  So if you preload pagecache
> via the /dev/hda1 "file", that is of no benefit to the /etc/passwd file. 
> Each one has its own unique pagecache.  When reading pages for /etc/passwd
> we don't go looking for the same disk blocks in the cache of /dev/hda1.
> 
> Which is why the userspace cache preloading needs to know the pathnames of
> all the relevant files - it needs to open and read each one, applying
> knowledge of disk layout while doing it.

Hmmm, that explains why this didn't work. :( So if I wanted to do this 
completely from user space using only block_dump data I'd probably have 
to go through all files and find out if they had any blocks in common 
with my preload set -- presuming there is a way to find that out, which 
there probably isn't. That  makes this idea pretty much useless, I'm 
sorry to have bothered you with it.

-- Bart

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Request: I/O request recording
  2004-01-25 23:29       ` Bart Samwel
@ 2004-01-25 23:38         ` Andrew Morton
  2004-01-26  0:23           ` Diego Calleja García
                             ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Andrew Morton @ 2004-01-25 23:38 UTC (permalink / raw)
  To: Bart Samwel; +Cc: felix-kernel, linux-kernel

Bart Samwel <bart@samwel.tk> wrote:
>
> > 
> > Linux caches disk data on a per-file basis.  So if you preload pagecache
> > via the /dev/hda1 "file", that is of no benefit to the /etc/passwd file. 
> > Each one has its own unique pagecache.  When reading pages for /etc/passwd
> > we don't go looking for the same disk blocks in the cache of /dev/hda1.
> > 
> > Which is why the userspace cache preloading needs to know the pathnames of
> > all the relevant files - it needs to open and read each one, applying
> > knowledge of disk layout while doing it.
> 
> Hmmm, that explains why this didn't work. :( So if I wanted to do this 
> completely from user space using only block_dump data I'd probably have 
> to go through all files and find out if they had any blocks in common 
> with my preload set -- presuming there is a way to find that out, which 
> there probably isn't. That  makes this idea pretty much useless, I'm 
> sorry to have bothered you with it.
> 

You could certainly do that.  Given disk block #N you need to search all
files on the disk asking "who owns this block".  The FIBMAP ioctl can be
used on most filesystems (ext2, ext3, others..) to find out which blocks a
file is using.   See bmap.c in

http://www.zip.com.au/~akpm/linux/patches/stuff/ext3-tools.tar.gz

Unfortunately you cannot determine a directory's blocks in this way. 
Ext3's directories live in the /dev/hda1 pagecache anyway.  ext2's
directories each have their own pagecache.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Request: I/O request recording
  2004-01-25 23:38         ` Andrew Morton
@ 2004-01-26  0:23           ` Diego Calleja García
  2004-01-26  0:32             ` Andrew Morton
  2004-01-26 11:50           ` Bart Samwel
  2004-01-27 19:13           ` Bart Samwel
  2 siblings, 1 reply; 23+ messages in thread
From: Diego Calleja García @ 2004-01-26  0:23 UTC (permalink / raw)
  To: Andrew Morton; +Cc: bart, felix-kernel, linux-kernel

El Sun, 25 Jan 2004 15:38:03 -0800 Andrew Morton <akpm@osdl.org> escribió:

> Unfortunately you cannot determine a directory's blocks in this way. 
> Ext3's directories live in the /dev/hda1 pagecache anyway.  ext2's
> directories each have their own pagecache.

It would be possible to "hijack" the syscalls at libc level and look at what
the program is doing?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Request: I/O request recording
  2004-01-26  0:23           ` Diego Calleja García
@ 2004-01-26  0:32             ` Andrew Morton
  0 siblings, 0 replies; 23+ messages in thread
From: Andrew Morton @ 2004-01-26  0:32 UTC (permalink / raw)
  To: Diego Calleja García; +Cc: bart, felix-kernel, linux-kernel

Diego Calleja García <aradorlinux@yahoo.es> wrote:
>
> El Sun, 25 Jan 2004 15:38:03 -0800 Andrew Morton <akpm@osdl.org> escribió:
> 
> > Unfortunately you cannot determine a directory's blocks in this way. 
> > Ext3's directories live in the /dev/hda1 pagecache anyway.  ext2's
> > directories each have their own pagecache.
> 
> It would be possible to "hijack" the syscalls at libc level and look at what
> the program is doing?

That would work.  It misses out on pagefaults, which are kind of syscalls
in disguise.  So for any files which were mmapped you'd have to either
assume that all of the file's pages are required, or use mincore() to poke
around and find out which pages were really faulted in.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Request: I/O request recording
  2004-01-25 23:38         ` Andrew Morton
  2004-01-26  0:23           ` Diego Calleja García
@ 2004-01-26 11:50           ` Bart Samwel
  2004-01-26 11:57             ` Andrew Morton
  2004-01-27 19:13           ` Bart Samwel
  2 siblings, 1 reply; 23+ messages in thread
From: Bart Samwel @ 2004-01-26 11:50 UTC (permalink / raw)
  To: Andrew Morton; +Cc: felix-kernel, linux-kernel

Andrew Morton wrote:
> Bart Samwel <bart@samwel.tk> wrote:
> 
>>>Linux caches disk data on a per-file basis.  So if you preload pagecache
>>>via the /dev/hda1 "file", that is of no benefit to the /etc/passwd file. 
>>>Each one has its own unique pagecache.  When reading pages for /etc/passwd
>>>we don't go looking for the same disk blocks in the cache of /dev/hda1.
>>>
>>>Which is why the userspace cache preloading needs to know the pathnames of
>>>all the relevant files - it needs to open and read each one, applying
>>>knowledge of disk layout while doing it.
>>
>>Hmmm, that explains why this didn't work. :( So if I wanted to do this 
>>completely from user space using only block_dump data I'd probably have 
>>to go through all files and find out if they had any blocks in common 
>>with my preload set -- presuming there is a way to find that out, which 
>>there probably isn't. That  makes this idea pretty much useless, I'm 
>>sorry to have bothered you with it.
> 
> You could certainly do that.  Given disk block #N you need to search all
> files on the disk asking "who owns this block".  The FIBMAP ioctl can be
> used on most filesystems (ext2, ext3, others..) to find out which blocks a
> file is using.   See bmap.c in
> 
> http://www.zip.com.au/~akpm/linux/patches/stuff/ext3-tools.tar.gz
> 
> Unfortunately you cannot determine a directory's blocks in this way. 
> Ext3's directories live in the /dev/hda1 pagecache anyway.  ext2's
> directories each have their own pagecache.

I found out two things while trying to do this:

1. Many filesystems in linux set f_fsid to zero for statfs. I was trying 
to use this to skip over mount points, but that doesn't work. Had to use 
the st_dev field from stat instead. :(

2. Swapfiles apparently don't like to be touched. I did an 
ioctl(FIGETBSZ) on a swapfile, and it would simply block until I did a 
swapoff on the file. I didn't even get to the FIBMAP part. :( Is this 
correct behaviour? And is there any way to detect this so that I can 
work around it?

-- Bart

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Request: I/O request recording
  2004-01-26 11:50           ` Bart Samwel
@ 2004-01-26 11:57             ` Andrew Morton
  0 siblings, 0 replies; 23+ messages in thread
From: Andrew Morton @ 2004-01-26 11:57 UTC (permalink / raw)
  To: Bart Samwel; +Cc: felix-kernel, linux-kernel

Bart Samwel <bart@samwel.tk> wrote:
>
> 2. Swapfiles apparently don't like to be touched. I did an 
>  ioctl(FIGETBSZ) on a swapfile, and it would simply block until I did a 
>  swapoff on the file. I didn't even get to the FIBMAP part. :( Is this 
>  correct behaviour?

yup.

> And is there any way to detect this so that I can work around it?

swapoff -a beforehand, I guess.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Request: I/O request recording
  2004-01-25 23:38         ` Andrew Morton
  2004-01-26  0:23           ` Diego Calleja García
  2004-01-26 11:50           ` Bart Samwel
@ 2004-01-27 19:13           ` Bart Samwel
  2 siblings, 0 replies; 23+ messages in thread
From: Bart Samwel @ 2004-01-27 19:13 UTC (permalink / raw)
  To: Andrew Morton; +Cc: felix-kernel, linux-kernel

Andrew Morton wrote:
> You could certainly do that.  Given disk block #N you need to search all
> files on the disk asking "who owns this block".  The FIBMAP ioctl can be
> used on most filesystems (ext2, ext3, others..) to find out which blocks a
> file is using.   See bmap.c in
> 
> http://www.zip.com.au/~akpm/linux/patches/stuff/ext3-tools.tar.gz
> 
> Unfortunately you cannot determine a directory's blocks in this way. 
> Ext3's directories live in the /dev/hda1 pagecache anyway.  ext2's
> directories each have their own pagecache.

OK, I've written something that does this (but only correctly for ext3). 
I've put it here:

http://www.xs4all.nl/~bsamwel/bootup_prefetch.tar.gz

I haven't had the opportunity to do good measurements, so I don't really 
know if it even increases performance. If anyone feels like benchmarking 
this, I'd be very happy to hear from you. I don't really expect 
performance increases, as the bootup scripts seem to have enough 
processing to do to keep the system busy even without disk I/O. I wonder 
if it might make a difference on a faster processor though, my system's 
kind of sluggish by today's standards.

-- Bart

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2004-01-27 19:16 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-01-24 18:10 Request: I/O request recording Felix von Leitner
2004-01-24 18:23 ` Valdis.Kletnieks
2004-01-24 18:26 ` Arjan van de Ven
2004-01-24 19:25   ` Ville Herva
2004-01-24 22:43     ` Arjan van de Ven
2004-01-24 20:11 ` Diego Calleja
2004-01-24 21:09   ` Ville Herva
2004-01-24 23:35 ` Andrew Morton
2004-01-24 23:53   ` Davide Libenzi
2004-01-25  0:03     ` Andrew Morton
2004-01-25  0:09       ` Davide Libenzi
2004-01-25  0:04     ` Valdis.Kletnieks
2004-01-25  0:10       ` Davide Libenzi
2004-01-25 12:26     ` Felipe Alfaro Solana
2004-01-25 22:59   ` Bart Samwel
2004-01-25 23:09     ` Andrew Morton
2004-01-25 23:29       ` Bart Samwel
2004-01-25 23:38         ` Andrew Morton
2004-01-26  0:23           ` Diego Calleja García
2004-01-26  0:32             ` Andrew Morton
2004-01-26 11:50           ` Bart Samwel
2004-01-26 11:57             ` Andrew Morton
2004-01-27 19:13           ` Bart Samwel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox