* Request: I/O request recording
@ 2004-01-24 18:10 Felix von Leitner
2004-01-24 18:23 ` Valdis.Kletnieks
` (3 more replies)
0 siblings, 4 replies; 23+ messages in thread
From: Felix von Leitner @ 2004-01-24 18:10 UTC (permalink / raw)
To: linux-kernel
I would like to have a user space program that I could run while I cold
start KDE. The program would then record which I/O pages were read in
which order. The output of that program could then be used to pre-cache
all those pages, but in an order that reduces disk head movement.
Demand Loading unfortunately produces lots of random page I/O scattered
all over the disk.
Having a way to know which pages are accessed in which order at a
typical cold start would be very benefitial, not only for the purpose
described above but it could also be used as input for a linker code
reordering optimization.
What do you think?
Felix
^ permalink raw reply [flat|nested] 23+ messages in thread* Re: Request: I/O request recording 2004-01-24 18:10 Request: I/O request recording Felix von Leitner @ 2004-01-24 18:23 ` Valdis.Kletnieks 2004-01-24 18:26 ` Arjan van de Ven ` (2 subsequent siblings) 3 siblings, 0 replies; 23+ messages in thread From: Valdis.Kletnieks @ 2004-01-24 18:23 UTC (permalink / raw) To: Felix von Leitner; +Cc: linux-kernel [-- Attachment #1: Type: text/plain, Size: 1036 bytes --] On Sat, 24 Jan 2004 19:10:27 +0100, Felix von Leitner <felix-kernel@fefe.de> said: > I would like to have a user space program that I could run while I cold > start KDE. The program would then record which I/O pages were read in > which order. The output of that program could then be used to pre-cache > all those pages, but in an order that reduces disk head movement. > Demand Loading unfortunately produces lots of random page I/O scattered > all over the disk. The Fedora version of the kernel-utils RPM includes /usr/sbin/readahead, which gets launched like this: start() { echo -n $"Starting background readahead: " /usr/sbin/readahead /usr/share/icons/Bluecurve/48x48/mimetypes/* & /usr/sbin/readahead /usr/share/icons/Bluecurve/24x24/stock/* & /usr/sbin/readahead /usr/share/applications/* & /usr/sbin/readahead `cat /etc/readahead.files` & } So given that program, you could simpy strace your KDE stuff, grep out all the open calls and the filenames, stick them in /etc/readahead.files, and be done. [-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --] ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Request: I/O request recording 2004-01-24 18:10 Request: I/O request recording Felix von Leitner 2004-01-24 18:23 ` Valdis.Kletnieks @ 2004-01-24 18:26 ` Arjan van de Ven 2004-01-24 19:25 ` Ville Herva 2004-01-24 20:11 ` Diego Calleja 2004-01-24 23:35 ` Andrew Morton 3 siblings, 1 reply; 23+ messages in thread From: Arjan van de Ven @ 2004-01-24 18:26 UTC (permalink / raw) To: Felix von Leitner; +Cc: linux-kernel [-- Attachment #1: Type: text/plain, Size: 1038 bytes --] On Sat, 2004-01-24 at 19:10, Felix von Leitner wrote: > I would like to have a user space program that I could run while I cold > start KDE. The program would then record which I/O pages were read in > which order. The output of that program could then be used to pre-cache > all those pages, but in an order that reduces disk head movement. > Demand Loading unfortunately produces lots of random page I/O scattered > all over the disk. I recently did something like this (and it scared me, it seems a typical Fedora boot into gnome opens like 11.000 files ;) but via a printk in the kernel.... I experimented with readahead'ing all that stuff while the initscripts ran in the hope it would save time... but it doesn't somehow. Some other things kinda help; if you feel adventurous you could play with the kernel-utils RPM in rawhide which does a readahead of the files the desktop opens while GDM login window is displayed; if the user isn't typing his name really fast that decreases the desktop startup time... [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Request: I/O request recording 2004-01-24 18:26 ` Arjan van de Ven @ 2004-01-24 19:25 ` Ville Herva 2004-01-24 22:43 ` Arjan van de Ven 0 siblings, 1 reply; 23+ messages in thread From: Ville Herva @ 2004-01-24 19:25 UTC (permalink / raw) To: Arjan van de Ven; +Cc: Felix von Leitner, linux-kernel On Sat, Jan 24, 2004 at 07:26:17PM +0100, you [Arjan van de Ven] wrote: > > I recently did something like this (and it scared me, it seems a typical > Fedora boot into gnome opens like 11.000 files ;) but via a printk in > the kernel.... > > I experimented with readahead'ing all that stuff while the initscripts > ran in the hope it would save time... but it doesn't somehow. Did you sort the sectors to be read, or just read the files into page cache in randomish order ? Or do you mean that even after all the files were read into cache, the X startup time didn't get any better (not counting the cache priming)? -- v -- v@iki.fi ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Request: I/O request recording 2004-01-24 19:25 ` Ville Herva @ 2004-01-24 22:43 ` Arjan van de Ven 0 siblings, 0 replies; 23+ messages in thread From: Arjan van de Ven @ 2004-01-24 22:43 UTC (permalink / raw) To: Ville Herva, Felix von Leitner, linux-kernel [-- Attachment #1: Type: text/plain, Size: 938 bytes --] On Sat, Jan 24, 2004 at 09:25:45PM +0200, Ville Herva wrote: > On Sat, Jan 24, 2004 at 07:26:17PM +0100, you [Arjan van de Ven] wrote: > > > > I recently did something like this (and it scared me, it seems a typical > > Fedora boot into gnome opens like 11.000 files ;) but via a printk in > > the kernel.... > > > > I experimented with readahead'ing all that stuff while the initscripts > > ran in the hope it would save time... but it doesn't somehow. > > Did you sort the sectors to be read, or just read the files into page cache > in randomish order ? semi random order but mostly submitted in parallel so the kernel has lots of freedom to reorder > Or do you mean that even after all the files were read into cache, the X > startup time didn't get any better (not counting the cache priming)? I mean that the time it takes to prime is just about exactly the time you then win... eg net gain of about zero [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Request: I/O request recording 2004-01-24 18:10 Request: I/O request recording Felix von Leitner 2004-01-24 18:23 ` Valdis.Kletnieks 2004-01-24 18:26 ` Arjan van de Ven @ 2004-01-24 20:11 ` Diego Calleja 2004-01-24 21:09 ` Ville Herva 2004-01-24 23:35 ` Andrew Morton 3 siblings, 1 reply; 23+ messages in thread From: Diego Calleja @ 2004-01-24 20:11 UTC (permalink / raw) To: Felix von Leitner; +Cc: linux-kernel El Sat, 24 Jan 2004 19:10:27 +0100 Felix von Leitner <felix-kernel@fefe.de> escribió: > I would like to have a user space program that I could run while I cold > start KDE. The program would then record which I/O pages were read in > which order. The output of that program could then be used to pre-cache > all those pages, but in an order that reduces disk head movement. > Demand Loading unfortunately produces lots of random page I/O scattered > all over the disk. > > Having a way to know which pages are accessed in which order at a > typical cold start would be very benefitial, not only for the purpose > described above but it could also be used as input for a linker code > reordering optimization. > > What do you think? That's exactly what XP does (and Mac OS X, for that matter). And it really works (ie: you can notice it) XP records what the OS does in the first 2 minutes (or so). The next time it boots, it tries to load the files that he knows that are going to be used. The same for an app that is frecuently used: it records what the app does, and it optimizes the startup of that app. Take a look at: (search prefetch) http://msdn.microsoft.com/library/default.asp?url=/library/en-us/appendix/hh/appendix/enhancements5_0qhx.asp http://msdn.microsoft.com/msdnmag/issues/01/12/xpkernel/default.aspx Andrew Morton wrote a patch some time ago for 2.5.64-mm6 which achieves a similar effect, I think: " To test the nonlinear mapping code more thoroughly I have arranged for all executable file-backed mmaps to be treated as nonlinear. This means that when an executable is first mapped in, the kernel will slurp the whole thing off disk in one hit. Some IO changes were made to speed this up. This means that large cache-cold executables start significantly faster. Launching X11+KDE+mozilla goes from 23 seconds to 16. Starting OpenOffice seems to be 2x to 3x faster, and starting Konqueror maybe 3x faster too. Interesting." (see: http://www.ussg.iu.edu/hypermail/linux/kernel/0303.1/1296.html) The patches are still available. IIRC, they were dropped "because it should be done in userspace". It'd very interesting to write userspace program that does what XP does (it looks like a good idea for desktops) http://www.ussg.iu.edu/hypermail/linux/kernel/0303.1/1296.html ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Request: I/O request recording 2004-01-24 20:11 ` Diego Calleja @ 2004-01-24 21:09 ` Ville Herva 0 siblings, 0 replies; 23+ messages in thread From: Ville Herva @ 2004-01-24 21:09 UTC (permalink / raw) To: Diego Calleja; +Cc: Felix von Leitner, linux-kernel On Sat, Jan 24, 2004 at 09:11:56PM +0100, you [Diego Calleja] wrote: > > That's exactly what XP does (and Mac OS X, for that matter). > And it really works (ie: you can notice it) > > XP records what the OS does in the first 2 minutes (or so). The next > time it boots, it tries to load the files that he knows that are going > to be used. The same for an app that is frecuently used: it records > what the app does, and it optimizes the startup of that app. > Take a look at: (search prefetch) > http://msdn.microsoft.com/library/default.asp?url=/library/en-us/appendix/hh/appendix/enhancements5_0qhx.asp > http://msdn.microsoft.com/msdnmag/issues/01/12/xpkernel/default.aspx It's perhaps worth pointing out that XP not only uses the boot (or application launch) traces to prefetch the data on next boot (application launch) but also to reorder the data on disk optimally via XP's defragmenter. And XP is noticeable faster to boot than (say) W2000. -- v -- v@iki.fi ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Request: I/O request recording 2004-01-24 18:10 Request: I/O request recording Felix von Leitner ` (2 preceding siblings ...) 2004-01-24 20:11 ` Diego Calleja @ 2004-01-24 23:35 ` Andrew Morton 2004-01-24 23:53 ` Davide Libenzi 2004-01-25 22:59 ` Bart Samwel 3 siblings, 2 replies; 23+ messages in thread From: Andrew Morton @ 2004-01-24 23:35 UTC (permalink / raw) To: Felix von Leitner; +Cc: linux-kernel Felix von Leitner <felix-kernel@fefe.de> wrote: > > I would like to have a user space program that I could run while I cold > start KDE. The program would then record which I/O pages were read in > which order. The output of that program could then be used to pre-cache > all those pages, but in an order that reduces disk head movement. > Demand Loading unfortunately produces lots of random page I/O scattered > all over the disk. I wrote a similar thing in September of 2001. What you do is: - Reboot the system, wait until everything is steady-state (eg: X has started, applications are loaded). - Load a kernel module which dumps the current contents of the pagecache (filename/offset-into-file) into a file. (The kernel module writes to modprobe's stdout, so you just do modprobe fboot-dump > /tmp/fboot-dump.out I'm very proud of this.) - Post-process the resulting output into a database which is used on the next reboot. - reboot - This time a userspace application cuts in real early and reads the database and preloads all the pagecache using "optimal" I/O patterns so that everything which you will need in the subsequent boot is already in memory. So it's all an attempt to optimise the boot-time I/O patterns. It was pretty much a waste of time, gaining only 10% or so, from memory. You could get just as much or more speedup from simply launching all the initscripts in parallel, although this did tend to break stuff. Anyway, the code's ancient but might provide some ideas: http://www.zip.com.au/~akpm/linux/fboot.tar.gz ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Request: I/O request recording 2004-01-24 23:35 ` Andrew Morton @ 2004-01-24 23:53 ` Davide Libenzi 2004-01-25 0:03 ` Andrew Morton ` (2 more replies) 2004-01-25 22:59 ` Bart Samwel 1 sibling, 3 replies; 23+ messages in thread From: Davide Libenzi @ 2004-01-24 23:53 UTC (permalink / raw) To: Andrew Morton; +Cc: Felix von Leitner, linux-kernel On Sat, 24 Jan 2004, Andrew Morton wrote: > Felix von Leitner <felix-kernel@fefe.de> wrote: > > > > I would like to have a user space program that I could run while I cold > > start KDE. The program would then record which I/O pages were read in > > which order. The output of that program could then be used to pre-cache > > all those pages, but in an order that reduces disk head movement. > > Demand Loading unfortunately produces lots of random page I/O scattered > > all over the disk. > > I wrote a similar thing in September of 2001. What you do is: > > - Reboot the system, wait until everything is steady-state (eg: X has > started, applications are loaded). > > - Load a kernel module which dumps the current contents of the pagecache > (filename/offset-into-file) into a file. > > (The kernel module writes to modprobe's stdout, so you just do > > modprobe fboot-dump > /tmp/fboot-dump.out > > I'm very proud of this.) > > - Post-process the resulting output into a database which is used on the > next reboot. > > - reboot > > - This time a userspace application cuts in real early and reads the > database and preloads all the pagecache using "optimal" I/O patterns so > that everything which you will need in the subsequent boot is already in > memory. > > > So it's all an attempt to optimise the boot-time I/O patterns. It was > pretty much a waste of time, gaining only 10% or so, from memory. You > could get just as much or more speedup from simply launching all the > initscripts in parallel, although this did tend to break stuff. > > Anyway, the code's ancient but might provide some ideas: > > http://www.zip.com.au/~akpm/linux/fboot.tar.gz Warning. I don't know if they do have a patent for this, but MS does this starting from XP (look inside %WINDIR%\PreFetch). It is both boot and app based. - Davide ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Request: I/O request recording 2004-01-24 23:53 ` Davide Libenzi @ 2004-01-25 0:03 ` Andrew Morton 2004-01-25 0:09 ` Davide Libenzi 2004-01-25 0:04 ` Valdis.Kletnieks 2004-01-25 12:26 ` Felipe Alfaro Solana 2 siblings, 1 reply; 23+ messages in thread From: Andrew Morton @ 2004-01-25 0:03 UTC (permalink / raw) To: Davide Libenzi; +Cc: felix-kernel, linux-kernel Davide Libenzi <davidel@xmailserver.org> wrote: > > > http://www.zip.com.au/~akpm/linux/fboot.tar.gz > > Warning. I don't know if they do have a patent for this, but MS does this > starting from XP (look inside %WINDIR%\PreFetch). It is both boot and app > based. Did they do it in August 2001? ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Request: I/O request recording 2004-01-25 0:03 ` Andrew Morton @ 2004-01-25 0:09 ` Davide Libenzi 0 siblings, 0 replies; 23+ messages in thread From: Davide Libenzi @ 2004-01-25 0:09 UTC (permalink / raw) To: Andrew Morton; +Cc: Davide Libenzi, felix-kernel, linux-kernel On Sat, 24 Jan 2004, Andrew Morton wrote: > Davide Libenzi <davidel@xmailserver.org> wrote: > > > > > http://www.zip.com.au/~akpm/linux/fboot.tar.gz > > > > Warning. I don't know if they do have a patent for this, but MS does this > > starting from XP (look inside %WINDIR%\PreFetch). It is both boot and app > > based. > > Did they do it in August 2001? Ouch, I don't know. I know for sure that it came with XP, but I'm not really into MS things ;) This is one of the links that talk about that: http://msdn.microsoft.com/msdnmag/issues/01/12/XPKernel/default.aspx - Davide ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Request: I/O request recording 2004-01-24 23:53 ` Davide Libenzi 2004-01-25 0:03 ` Andrew Morton @ 2004-01-25 0:04 ` Valdis.Kletnieks 2004-01-25 0:10 ` Davide Libenzi 2004-01-25 12:26 ` Felipe Alfaro Solana 2 siblings, 1 reply; 23+ messages in thread From: Valdis.Kletnieks @ 2004-01-25 0:04 UTC (permalink / raw) To: Davide Libenzi; +Cc: Andrew Morton, Felix von Leitner, linux-kernel [-- Attachment #1: Type: text/plain, Size: 976 bytes --] On Sat, 24 Jan 2004 15:53:44 PST, Davide Libenzi said: > > Anyway, the code's ancient but might provide some ideas: > > > > http://www.zip.com.au/~akpm/linux/fboot.tar.gz > > Warning. I don't know if they do have a patent for this, but MS does this > starting from XP (look inside %WINDIR%\PreFetch). It is both boot and app > based. Hmm.. prior art time. ;) IBM's OS/VS1 and MVS operating systems had the 'link pack area', where frequently loaded modules were loaded at system startup. And there were numerous 3rd party optimizers that would analyze the LOAD SVC patterns on your system and produce a list of which modules should be pre-loaded in order to get the most bang for the buck (even a *large* 370/168 or 303x processor might be able to spare a megabyte tops, so optimizing it was important, and sites would spend $5K on software that would optimize the memory usage and save them a memory upgrade at $40K a meg... This was mid-70s, so definitely pre-XP. [-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --] ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Request: I/O request recording 2004-01-25 0:04 ` Valdis.Kletnieks @ 2004-01-25 0:10 ` Davide Libenzi 0 siblings, 0 replies; 23+ messages in thread From: Davide Libenzi @ 2004-01-25 0:10 UTC (permalink / raw) To: Valdis.Kletnieks Cc: Andrew Morton, Felix von Leitner, Linux Kernel Mailing List On Sat, 24 Jan 2004 Valdis.Kletnieks@vt.edu wrote: > On Sat, 24 Jan 2004 15:53:44 PST, Davide Libenzi said: > > > > Anyway, the code's ancient but might provide some ideas: > > > > > > http://www.zip.com.au/~akpm/linux/fboot.tar.gz > > > > Warning. I don't know if they do have a patent for this, but MS does this > > starting from XP (look inside %WINDIR%\PreFetch). It is both boot and app > > based. > > Hmm.. prior art time. ;) > > IBM's OS/VS1 and MVS operating systems had the 'link pack area', where > frequently loaded modules were loaded at system startup. And there were > numerous 3rd party optimizers that would analyze the LOAD SVC patterns on your > system and produce a list of which modules should be pre-loaded in order to get > the most bang for the buck (even a *large* 370/168 or 303x processor might be > able to spare a megabyte tops, so optimizing it was important, and sites would > spend $5K on software that would optimize the memory usage and save them a > memory upgrade at $40K a meg... > > This was mid-70s, so definitely pre-XP. They (MS) do work of a page fault basis though. It is quite different. - Davide ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Request: I/O request recording 2004-01-24 23:53 ` Davide Libenzi 2004-01-25 0:03 ` Andrew Morton 2004-01-25 0:04 ` Valdis.Kletnieks @ 2004-01-25 12:26 ` Felipe Alfaro Solana 2 siblings, 0 replies; 23+ messages in thread From: Felipe Alfaro Solana @ 2004-01-25 12:26 UTC (permalink / raw) To: Linux Kernel Mailinglist On Sun, 2004-01-25 at 00:53, Davide Libenzi wrote: > > So it's all an attempt to optimise the boot-time I/O patterns. It was > > pretty much a waste of time, gaining only 10% or so, from memory. You > > could get just as much or more speedup from simply launching all the > > initscripts in parallel, although this did tend to break stuff. > > > > Anyway, the code's ancient but might provide some ideas: > > > > http://www.zip.com.au/~akpm/linux/fboot.tar.gz > > Warning. I don't know if they do have a patent for this, but MS does this > starting from XP (look inside %WINDIR%\PreFetch). It is both boot and app > based. And tomorrow, they'll say the have patented the hamburger recipe, or the euclidean triangle, or... who knows. C'mon... The world is going crazy! ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Request: I/O request recording 2004-01-24 23:35 ` Andrew Morton 2004-01-24 23:53 ` Davide Libenzi @ 2004-01-25 22:59 ` Bart Samwel 2004-01-25 23:09 ` Andrew Morton 1 sibling, 1 reply; 23+ messages in thread From: Bart Samwel @ 2004-01-25 22:59 UTC (permalink / raw) To: Andrew Morton; +Cc: Felix von Leitner, linux-kernel Andrew Morton wrote: > Felix von Leitner <felix-kernel@fefe.de> wrote: > >>I would like to have a user space program that I could run while I cold >>start KDE. The program would then record which I/O pages were read in >>which order. The output of that program could then be used to pre-cache >>all those pages, but in an order that reduces disk head movement. >>Demand Loading unfortunately produces lots of random page I/O scattered >>all over the disk. > > I wrote a similar thing in September of 2001. What you do is: [...] When I saw this thread I've fiddled for a bit with the block_dump functionality that's in the laptop_mode patch. I wanted to see if it could support a similar thing completely from user space (except for the block_dump code, of course). I've written a small tool to generate a complete file that lists tuples (sector, size, device) from the kernel output in syslog; it parses all "READ block xxx" messages since the last reboot. Putting this through sort -n -u delivers a nicely sorted file, ready for optimized reading. Unfortunately I'm now stuck within the other part, which is reading the pages back in memory at the next boot. It's not working, and I was hoping someone here could take a look and tell me what I'm doing wrong. Here's what I've tried so far. I've written a program that simply reads the ranges by opening the device and reading from sector*512 to sector*512+size. It uses async io for efficiency, and to allow the kernel to merge read requests. It seems to read all the data, but after that the other programs seem to read most of it *again*! I only go from 8500 down to 7000 reads or so, while most of the 7000 reads that remain are also in the range that is being prefetched. :( I was wondering if the pages could have been removed so soon, so, to make sure, I mmaped the whole shebang with MAP_LOCKED and PROT_READ, and kept the mapping process in memory during the whole boot process. This had exactly the same effect. So, I thought that I might be reading the wrong blocks. However, when I feed it something like (160000, 4096, hdb1) I get a block_dump log that says exactly that (plus some extra, because mmap seems to read in a bit more than needed). So, that's not it. I'm out of clues. If someone would be so kind to take a look at what I'm doing wrong, I'd very much appreciate it. I've put the code up at http://www.xs4all.nl/~bsamwel/block_read_replay.tar.gz. How to use it: 1. Patch your kernel with the patch that's included in the tarball. This patch modifies the block_dump output slightly, and enables a block_dump value of 2 which only reports READ actions. It's against 2.6.1-mm2, but it should apply fine to any kernel that has laptop_mode in it. 2. Record the bootup info. Somewhere at the very beginning, include "echo 2 > /proc/sys/vm/block_dump" in an init script. Reboot, and after the bootup sequence is complete, do echo 0 > /proc/sys/vm/block_dump. 3. "make" and put brexec (one of the two versions) somewhere your init scripts can access it. 4. Run slbrp (SysLog Block Read Parser) to generate a block list file: slbrp /var/log/syslog | sort -n -u > /etc/bootup_blocks. 5. Precede the echo 2 > /proc/sys/vm/block_dump at startup with a brexec ("block read executor") call, e.g. "brexec /etc/bootup_blocks". The mmap version takes an extra parameter <N> = the number of seconds to keep the pages mapped and must be put in the background because it will simply wait for N seconds before exiting. So, it should be something like "brexec /etc/bootup_blocks 60" and then "sleep 30" to give it time to read everything before bootup continues. Yes, it's not pretty. It's just used for experimenting, so it doesn't matter. 6. Reboot, and disable block_dump after booting, like in step (2). Now the logging of reads only starts _after_ brexec has attempted to load all pages, and this gives info on what is still loaded. You'll probably see that it loads many things that are also listed in the bootup_blocks file. Now my question is: what am I doing wrong that it needs to read those again? -- Bart ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Request: I/O request recording 2004-01-25 22:59 ` Bart Samwel @ 2004-01-25 23:09 ` Andrew Morton 2004-01-25 23:29 ` Bart Samwel 0 siblings, 1 reply; 23+ messages in thread From: Andrew Morton @ 2004-01-25 23:09 UTC (permalink / raw) To: Bart Samwel; +Cc: felix-kernel, linux-kernel Bart Samwel <bart@samwel.tk> wrote: > > When I saw this thread I've fiddled for a bit with the block_dump > functionality that's in the laptop_mode patch. I wanted to see if it > could support a similar thing completely from user space (except for the > block_dump code, of course). I've written a small tool to generate a > complete file that lists tuples (sector, size, device) from the kernel > output in syslog; it parses all "READ block xxx" messages since the > last reboot. Putting this through sort -n -u delivers a nicely sorted > file, ready for optimized reading. > > Unfortunately I'm now stuck within the other part, which is reading the > pages back in memory at the next boot. It's not working, and I was > hoping someone here could take a look and tell me what I'm doing wrong. Linux caches disk data on a per-file basis. So if you preload pagecache via the /dev/hda1 "file", that is of no benefit to the /etc/passwd file. Each one has its own unique pagecache. When reading pages for /etc/passwd we don't go looking for the same disk blocks in the cache of /dev/hda1. Which is why the userspace cache preloading needs to know the pathnames of all the relevant files - it needs to open and read each one, applying knowledge of disk layout while doing it. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Request: I/O request recording 2004-01-25 23:09 ` Andrew Morton @ 2004-01-25 23:29 ` Bart Samwel 2004-01-25 23:38 ` Andrew Morton 0 siblings, 1 reply; 23+ messages in thread From: Bart Samwel @ 2004-01-25 23:29 UTC (permalink / raw) To: Andrew Morton; +Cc: felix-kernel, linux-kernel Andrew Morton wrote: > Bart Samwel <bart@samwel.tk> wrote: > >>When I saw this thread I've fiddled for a bit with the block_dump >> functionality that's in the laptop_mode patch. I wanted to see if it >> could support a similar thing completely from user space (except for the >> block_dump code, of course). I've written a small tool to generate a >> complete file that lists tuples (sector, size, device) from the kernel >> output in syslog; it parses all "READ block xxx" messages since the >> last reboot. Putting this through sort -n -u delivers a nicely sorted >> file, ready for optimized reading. >> >> Unfortunately I'm now stuck within the other part, which is reading the >> pages back in memory at the next boot. It's not working, and I was >> hoping someone here could take a look and tell me what I'm doing wrong. > > Linux caches disk data on a per-file basis. So if you preload pagecache > via the /dev/hda1 "file", that is of no benefit to the /etc/passwd file. > Each one has its own unique pagecache. When reading pages for /etc/passwd > we don't go looking for the same disk blocks in the cache of /dev/hda1. > > Which is why the userspace cache preloading needs to know the pathnames of > all the relevant files - it needs to open and read each one, applying > knowledge of disk layout while doing it. Hmmm, that explains why this didn't work. :( So if I wanted to do this completely from user space using only block_dump data I'd probably have to go through all files and find out if they had any blocks in common with my preload set -- presuming there is a way to find that out, which there probably isn't. That makes this idea pretty much useless, I'm sorry to have bothered you with it. -- Bart ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Request: I/O request recording 2004-01-25 23:29 ` Bart Samwel @ 2004-01-25 23:38 ` Andrew Morton 2004-01-26 0:23 ` Diego Calleja García ` (2 more replies) 0 siblings, 3 replies; 23+ messages in thread From: Andrew Morton @ 2004-01-25 23:38 UTC (permalink / raw) To: Bart Samwel; +Cc: felix-kernel, linux-kernel Bart Samwel <bart@samwel.tk> wrote: > > > > > Linux caches disk data on a per-file basis. So if you preload pagecache > > via the /dev/hda1 "file", that is of no benefit to the /etc/passwd file. > > Each one has its own unique pagecache. When reading pages for /etc/passwd > > we don't go looking for the same disk blocks in the cache of /dev/hda1. > > > > Which is why the userspace cache preloading needs to know the pathnames of > > all the relevant files - it needs to open and read each one, applying > > knowledge of disk layout while doing it. > > Hmmm, that explains why this didn't work. :( So if I wanted to do this > completely from user space using only block_dump data I'd probably have > to go through all files and find out if they had any blocks in common > with my preload set -- presuming there is a way to find that out, which > there probably isn't. That makes this idea pretty much useless, I'm > sorry to have bothered you with it. > You could certainly do that. Given disk block #N you need to search all files on the disk asking "who owns this block". The FIBMAP ioctl can be used on most filesystems (ext2, ext3, others..) to find out which blocks a file is using. See bmap.c in http://www.zip.com.au/~akpm/linux/patches/stuff/ext3-tools.tar.gz Unfortunately you cannot determine a directory's blocks in this way. Ext3's directories live in the /dev/hda1 pagecache anyway. ext2's directories each have their own pagecache. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Request: I/O request recording 2004-01-25 23:38 ` Andrew Morton @ 2004-01-26 0:23 ` Diego Calleja García 2004-01-26 0:32 ` Andrew Morton 2004-01-26 11:50 ` Bart Samwel 2004-01-27 19:13 ` Bart Samwel 2 siblings, 1 reply; 23+ messages in thread From: Diego Calleja García @ 2004-01-26 0:23 UTC (permalink / raw) To: Andrew Morton; +Cc: bart, felix-kernel, linux-kernel El Sun, 25 Jan 2004 15:38:03 -0800 Andrew Morton <akpm@osdl.org> escribió: > Unfortunately you cannot determine a directory's blocks in this way. > Ext3's directories live in the /dev/hda1 pagecache anyway. ext2's > directories each have their own pagecache. It would be possible to "hijack" the syscalls at libc level and look at what the program is doing? ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Request: I/O request recording 2004-01-26 0:23 ` Diego Calleja García @ 2004-01-26 0:32 ` Andrew Morton 0 siblings, 0 replies; 23+ messages in thread From: Andrew Morton @ 2004-01-26 0:32 UTC (permalink / raw) To: Diego Calleja García; +Cc: bart, felix-kernel, linux-kernel Diego Calleja García <aradorlinux@yahoo.es> wrote: > > El Sun, 25 Jan 2004 15:38:03 -0800 Andrew Morton <akpm@osdl.org> escribió: > > > Unfortunately you cannot determine a directory's blocks in this way. > > Ext3's directories live in the /dev/hda1 pagecache anyway. ext2's > > directories each have their own pagecache. > > It would be possible to "hijack" the syscalls at libc level and look at what > the program is doing? That would work. It misses out on pagefaults, which are kind of syscalls in disguise. So for any files which were mmapped you'd have to either assume that all of the file's pages are required, or use mincore() to poke around and find out which pages were really faulted in. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Request: I/O request recording 2004-01-25 23:38 ` Andrew Morton 2004-01-26 0:23 ` Diego Calleja García @ 2004-01-26 11:50 ` Bart Samwel 2004-01-26 11:57 ` Andrew Morton 2004-01-27 19:13 ` Bart Samwel 2 siblings, 1 reply; 23+ messages in thread From: Bart Samwel @ 2004-01-26 11:50 UTC (permalink / raw) To: Andrew Morton; +Cc: felix-kernel, linux-kernel Andrew Morton wrote: > Bart Samwel <bart@samwel.tk> wrote: > >>>Linux caches disk data on a per-file basis. So if you preload pagecache >>>via the /dev/hda1 "file", that is of no benefit to the /etc/passwd file. >>>Each one has its own unique pagecache. When reading pages for /etc/passwd >>>we don't go looking for the same disk blocks in the cache of /dev/hda1. >>> >>>Which is why the userspace cache preloading needs to know the pathnames of >>>all the relevant files - it needs to open and read each one, applying >>>knowledge of disk layout while doing it. >> >>Hmmm, that explains why this didn't work. :( So if I wanted to do this >>completely from user space using only block_dump data I'd probably have >>to go through all files and find out if they had any blocks in common >>with my preload set -- presuming there is a way to find that out, which >>there probably isn't. That makes this idea pretty much useless, I'm >>sorry to have bothered you with it. > > You could certainly do that. Given disk block #N you need to search all > files on the disk asking "who owns this block". The FIBMAP ioctl can be > used on most filesystems (ext2, ext3, others..) to find out which blocks a > file is using. See bmap.c in > > http://www.zip.com.au/~akpm/linux/patches/stuff/ext3-tools.tar.gz > > Unfortunately you cannot determine a directory's blocks in this way. > Ext3's directories live in the /dev/hda1 pagecache anyway. ext2's > directories each have their own pagecache. I found out two things while trying to do this: 1. Many filesystems in linux set f_fsid to zero for statfs. I was trying to use this to skip over mount points, but that doesn't work. Had to use the st_dev field from stat instead. :( 2. Swapfiles apparently don't like to be touched. I did an ioctl(FIGETBSZ) on a swapfile, and it would simply block until I did a swapoff on the file. I didn't even get to the FIBMAP part. :( Is this correct behaviour? And is there any way to detect this so that I can work around it? -- Bart ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Request: I/O request recording 2004-01-26 11:50 ` Bart Samwel @ 2004-01-26 11:57 ` Andrew Morton 0 siblings, 0 replies; 23+ messages in thread From: Andrew Morton @ 2004-01-26 11:57 UTC (permalink / raw) To: Bart Samwel; +Cc: felix-kernel, linux-kernel Bart Samwel <bart@samwel.tk> wrote: > > 2. Swapfiles apparently don't like to be touched. I did an > ioctl(FIGETBSZ) on a swapfile, and it would simply block until I did a > swapoff on the file. I didn't even get to the FIBMAP part. :( Is this > correct behaviour? yup. > And is there any way to detect this so that I can work around it? swapoff -a beforehand, I guess. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Request: I/O request recording 2004-01-25 23:38 ` Andrew Morton 2004-01-26 0:23 ` Diego Calleja García 2004-01-26 11:50 ` Bart Samwel @ 2004-01-27 19:13 ` Bart Samwel 2 siblings, 0 replies; 23+ messages in thread From: Bart Samwel @ 2004-01-27 19:13 UTC (permalink / raw) To: Andrew Morton; +Cc: felix-kernel, linux-kernel Andrew Morton wrote: > You could certainly do that. Given disk block #N you need to search all > files on the disk asking "who owns this block". The FIBMAP ioctl can be > used on most filesystems (ext2, ext3, others..) to find out which blocks a > file is using. See bmap.c in > > http://www.zip.com.au/~akpm/linux/patches/stuff/ext3-tools.tar.gz > > Unfortunately you cannot determine a directory's blocks in this way. > Ext3's directories live in the /dev/hda1 pagecache anyway. ext2's > directories each have their own pagecache. OK, I've written something that does this (but only correctly for ext3). I've put it here: http://www.xs4all.nl/~bsamwel/bootup_prefetch.tar.gz I haven't had the opportunity to do good measurements, so I don't really know if it even increases performance. If anyone feels like benchmarking this, I'd be very happy to hear from you. I don't really expect performance increases, as the bootup scripts seem to have enough processing to do to keep the system busy even without disk I/O. I wonder if it might make a difference on a faster processor though, my system's kind of sluggish by today's standards. -- Bart ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2004-01-27 19:16 UTC | newest] Thread overview: 23+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2004-01-24 18:10 Request: I/O request recording Felix von Leitner 2004-01-24 18:23 ` Valdis.Kletnieks 2004-01-24 18:26 ` Arjan van de Ven 2004-01-24 19:25 ` Ville Herva 2004-01-24 22:43 ` Arjan van de Ven 2004-01-24 20:11 ` Diego Calleja 2004-01-24 21:09 ` Ville Herva 2004-01-24 23:35 ` Andrew Morton 2004-01-24 23:53 ` Davide Libenzi 2004-01-25 0:03 ` Andrew Morton 2004-01-25 0:09 ` Davide Libenzi 2004-01-25 0:04 ` Valdis.Kletnieks 2004-01-25 0:10 ` Davide Libenzi 2004-01-25 12:26 ` Felipe Alfaro Solana 2004-01-25 22:59 ` Bart Samwel 2004-01-25 23:09 ` Andrew Morton 2004-01-25 23:29 ` Bart Samwel 2004-01-25 23:38 ` Andrew Morton 2004-01-26 0:23 ` Diego Calleja García 2004-01-26 0:32 ` Andrew Morton 2004-01-26 11:50 ` Bart Samwel 2004-01-26 11:57 ` Andrew Morton 2004-01-27 19:13 ` Bart Samwel
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox