Re: [RFC] kernel facilities for cache prefetching

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Linda Walsh <lkml@tlinx.org>
To: Wu Fengguang <wfg@mail.ustc.edu.cn>, linux-kernel@vger.kernel.org
Subject: Re: [RFC] kernel facilities for cache prefetching
Date: Wed, 03 May 2006 14:45:53 -0700	[thread overview]
Message-ID: <44592491.4060503@tlinx.org> (raw)
In-Reply-To: <346556235.24875@ustc.edu.cn>

Wu Fengguang wrote:
> Pre-caching reloaded ;)
> I/O ANALYSE...
> SCHEME/GOAL(s)...
>   
    Some good analysis and ideas.  I don't know if it is wanted, but I'd
like to add a 'few cents' referring to the pre-fetch mechanism in
XP, which addresses both boot and application prefetch and has the
benefit of showing measurable performance improvements (compare the
boot time of an NT4 system to XP; maybe a 5-8x performance boost?).

    1. As you mention; reading files "sequentially" through the file
system is "bad" for several reasons.  Areas of interest:
    a) don't go through the file system.  Don't waste time doing
directory lookups and following file-allocation maps;  Instead,
use raw-disk i/o and read sectors in using device & block number.
    b) Be "dynamic"; "Trace" (record (dev&blockno/range) blocks
starting ASAP after system boot and continuing for some "configurable"
number of seconds past reaching the desired "run-level" (coinciding with
initial disk quiescence).  Save as "configurable" (~6-8?) number of
traces to allow finding the common initial subset of blocks needed.
    c) Allow specification of max# of blocks and max number of "sections"
(discontiguous areas on disk);
    d) "Ideally", would have a way to "defrag" the common set of blocks.
I.e. -- moving the needed blocks from potentially disparate areas of
files into 1 or 2 contiguous areas, hopefully near the beginning of
the disk (or partition(s)).

    That's the area of "boot" pre-caching.

Next is doing something similar for "application" starts.  Start tracing
when an application is loaded & observe what blocks are requested for
that app for the first 20 ("configurable") seconds of execution.  Store
traces on a per-application basis.  Again, it would be ideal if the
different files (blocks, really), needed by an application could be
grouped so that sequentially needed disk-blocks are stored sequentially
on disk (this _could_ imply the containing files are not contiguous).

Essentially, one wants to do for applications, the same thing one does
for booting.  On small applications, the benefit would likely be negligible,
but on loading a large app like a windowing system, IDE, or database app,
multiple configuration files could be read into the cache in one large
read.

    That's "application" pre-caching.

    A third area -- that can't be easily done in the kernel, but would
require a higher skill level on the part of application and library
developers, is to move towards using "delay-loaded" libraries.  In
Windows, it seems common among system libraries to use this feature. 
An obvious benefit -- if certain features of a program are not used,
the associated libraries are never loaded.  Not loading unneeded parts
of a program should speed up initial application load time, significantly.

    I don't know where the "cross-over" point is, but moving to demand
loaded "so's" can cause extreme benefits for interactive usage.  In
addition to load-time benefits, additional benefits are gained by not
wasting memory on unused libraries and program features.

    In looking at the distro I use, many unused libraries are linked in
with commonly used programs.  For a small office or home setup, I rarely
have a need for LDAP, internationalized libraries, & application support
for virtually any hardware & software configuration.

    This isn't a kernel problem -- the kernel has dynamically loadable
modules to allow loading only of hardware & software drivers that are
needed in a particular kernel.  User applications that I have been
exposed to in Linux usually don't have this adaptability -- virtually
everything is loaded at execution time -- not as needed during
program execution. 

    I've seen this feature used on Unix systems to dynamically present
feature interfaces depending on the user's software configuration.  On
linux, I more often see a "everything, including the kitchen sink"
approach, where every possible software configuration is supported
via "static", shared libraries that must be present and loaded into
memory before the program begins execution.

    This has the potential to have a greater benefit as the application
environment becomes more complex if you think about how the number
of statically loaded, sharable libraries have increased (have seen
addition of ldap, pam, and most recently, selinux libraries that are
required for loading before application execution).

    Good luck in speeding these things up.  It might require some
level of cooperation in different areas (kernel, fs utils,
distro-configuration, application design and build...etc).

-linda

next prev parent reply	other threads:[~2006-05-03 21:45 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20060502075049.GA5000@mail.ustc.edu.cn>
2006-05-02  7:50 ` [RFC] kernel facilities for cache prefetching Wu Fengguang
2006-05-02 12:46   ` Diego Calleja
     [not found]     ` <20060502144203.GA10594@mail.ustc.edu.cn>
2006-05-02 14:42       ` Wu Fengguang
2006-05-02 16:07         ` Diego Calleja
     [not found]           ` <20060503064503.GA4781@mail.ustc.edu.cn>
2006-05-03  6:45             ` Wu Fengguang
2006-05-03 18:14               ` Diego Calleja
2006-05-03 23:39                 ` Zan Lynx
2006-05-04  1:37                   ` Diego Calleja
2006-05-02 15:55   ` Linus Torvalds
2006-05-02 16:35     ` Andi Kleen
     [not found]     ` <20060503071325.GC4781@mail.ustc.edu.cn>
2006-05-03  7:13       ` Wu Fengguang
2006-05-03 12:59     ` Nikita Danilov
     [not found]     ` <20060503041106.GC5915@mail.ustc.edu.cn>
2006-05-03  4:11       ` Wu Fengguang
2006-05-03 17:28       ` Badari Pulavarty
     [not found]         ` <346733486.30800@ustc.edu.cn>
2006-05-04 15:03           ` Linus Torvalds
2006-05-04 16:57             ` Badari Pulavarty
     [not found]             ` <20060505144451.GA6134@mail.ustc.edu.cn>
2006-05-05 14:44               ` Wu Fengguang
2006-05-03 22:20     ` Rik van Riel
     [not found]       ` <20060506011125.GA9099@mail.ustc.edu.cn>
2006-05-06  1:11         ` Wu Fengguang
2006-05-04  0:28     ` Linda Walsh
2006-05-04  1:31       ` Linus Torvalds
2006-05-04  7:08         ` Ph. Marek
2006-05-04  7:33           ` Arjan van de Ven
     [not found]             ` <20060504121454.GB6008@mail.ustc.edu.cn>
2006-05-04 12:14               ` Wu Fengguang
2006-05-04 12:34               ` Arjan van de Ven
2006-05-03 21:45   ` Linda Walsh [this message]
     [not found]     ` <20060504121212.GA6008@mail.ustc.edu.cn>
2006-05-04 12:12       ` Wu Fengguang
2006-05-04 18:57         ` Linda Walsh
     [not found]           ` <20060505152007.GB6134@mail.ustc.edu.cn>
2006-05-05 15:20             ` Wu Fengguang
2006-05-04  9:02   ` Helge Hafting
2006-05-02  7:58 ` Arjan van de Ven
     [not found]   ` <20060502080619.GA5406@mail.ustc.edu.cn>
2006-05-02  8:06     ` Wu Fengguang
2006-05-02  8:30     ` Arjan van de Ven
     [not found]       ` <20060502085325.GA9190@mail.ustc.edu.cn>
2006-05-02  8:53         ` Wu Fengguang
2006-05-06  6:49           ` Denis Vlasenko
2006-05-02  8:55         ` Arjan van de Ven
2006-05-02 11:39           ` Jan Engelhardt
     [not found]           ` <20060502114853.GA9983@mail.ustc.edu.cn>
2006-05-02 11:48             ` Wu Fengguang
2006-05-02 22:03       ` Dave Jones
2006-05-02  8:09 ` Jens Axboe
     [not found]   ` <20060502082009.GA9038@mail.ustc.edu.cn>
2006-05-02  8:20     ` Wu Fengguang
2006-05-03 22:05   ` Benjamin LaHaise
2006-05-02 19:10 ` Pavel Machek
2006-05-02 23:36   ` Nigel Cunningham
     [not found]     ` <20060503023505.GB5915@mail.ustc.edu.cn>
2006-05-03  2:35       ` Wu Fengguang
     [not found]   ` <20060503023223.GA5915@mail.ustc.edu.cn>
2006-05-03  2:32     ` Wu Fengguang
     [not found]   ` <20060503071948.GD4781@mail.ustc.edu.cn>
2006-05-03  7:19     ` Wu Fengguang
     [not found]   ` <20060504122830.GA6205@mail.ustc.edu.cn>
2006-05-04 12:28     ` Wu Fengguang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=44592491.4060503@tlinx.org \
    --to=lkml@tlinx.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=wfg@mail.ustc.edu.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox