public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Wu Fengguang <wfg@mail.ustc.edu.cn>
To: Linus Torvalds <torvalds@osdl.org>
Cc: linux-kernel@vger.kernel.org, Andrew Morton <akpm@osdl.org>,
	Jens Axboe <axboe@suse.de>, Nick Piggin <nickpiggin@yahoo.com.au>,
	Badari Pulavarty <pbadari@us.ibm.com>
Subject: Re: [RFC] kernel facilities for cache prefetching
Date: Wed, 3 May 2006 12:11:06 +0800	[thread overview]
Message-ID: <346629445.24665@ustc.edu.cn> (raw)
Message-ID: <20060503041106.GC5915@mail.ustc.edu.cn> (raw)
In-Reply-To: <Pine.LNX.4.64.0605020832570.4086@g5.osdl.org>

On Tue, May 02, 2006 at 08:55:06AM -0700, Linus Torvalds wrote:
> Doing prefetching on a physical block basis is simply not a valid 
> approach, for several reasons:

Sorry!
I made a misleading introduction. I'll try to explain it in more detail.

DATA ACQUISITION

/proc/filecache provides an interface to query the cached pages of any
file. This information is expressed in tuples of <idx, len>, which
more specifically means <mapping-offset, pages>.

Normally one should use 'echo' to setup two parameters before doing
'cat':
        @file
                the filename;
                use 'ALL' to get a list all files cached
        @mask
                only show the pages with non-zero (page-flags & @mask);
                for simplicity, use '0' to show all present pages(take 0 as ~0)

Normally, one should first get the file list using param 'file ALL',
and then iterate through all the files and pages of interested with
params 'file filename' and 'mask pagemask'.

The param 'mask' acts as a filter for different users: it allows
sysadms to know where his memory goes, and the prefetcher to ignore
pages from false readahead.

One can use 'mask hex(PG_active|PG_referenced|PG_mapped)' in its hex form
to show only accessed pages(here PG_mapped is a faked flag), and use
'mask hex(PG_dirty)' to show only dirtied pages.

One can use 
        $ echo "file /sbin/init" > /proc/filecache
        $ echo "mask 0" > /proc/filecache
        $ cat /proc/filecache
to get an idea which pages of /sbin/init are currently cached.

In the proposal, I used the following example, which is proved to be
rather misleading:
        $ echo "file /dev/hda1" > /proc/filecache
        $ cat /proc/filecache
The intention of that example was to show that filesystem dir/inode
buffer status -- which is the key data for user-land pre-caching --
can also be retrieved through this interface.

So the proposed solution is to
        - prefetch normal files on the virtual mapping level
        - prefetch fs dir/inode buffers on a physical block basis

I/O SUBMISSION
How can we avoid unnecessary seeks when prefetching on virtual mapping
level?  The answer is to leave this job to i/o elevators. What we
should do is to present elevators with most readahead requests before
too many requests being submitted to disk drivers.
The proposed scheme is to:
        1) (first things first)
           issue all readahead requests for filesystem buffers
        2) (in background, often blocked)
           issue all readahead requests for normal files
        -) make sure the above requests are of really _low_ priority
        3) regular system boot continues
        4) promote the priority of any request that is now demanded by
           legacy programs

In the scheme, most work is done by user-land tools. The required
kernel support is minimal and general-purpose:
        - an /proc/filecache interface
        - the ability to promote I/O priority on demanded pages

By this approach, we avoided the complicated OSX bootcache solution,
which is a physical-blocks-based, special-handlings-in-kernel solution
that is exactly what Linus is against.

Thanks,
Wu

  parent reply	other threads:[~2006-05-03  4:10 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20060502075049.GA5000@mail.ustc.edu.cn>
2006-05-02  7:50 ` [RFC] kernel facilities for cache prefetching Wu Fengguang
2006-05-02 12:46   ` Diego Calleja
     [not found]     ` <20060502144203.GA10594@mail.ustc.edu.cn>
2006-05-02 14:42       ` Wu Fengguang
2006-05-02 16:07         ` Diego Calleja
     [not found]           ` <20060503064503.GA4781@mail.ustc.edu.cn>
2006-05-03  6:45             ` Wu Fengguang
2006-05-03 18:14               ` Diego Calleja
2006-05-03 23:39                 ` Zan Lynx
2006-05-04  1:37                   ` Diego Calleja
2006-05-02 15:55   ` Linus Torvalds
2006-05-02 16:35     ` Andi Kleen
     [not found]     ` <20060503041106.GC5915@mail.ustc.edu.cn>
2006-05-03  4:11       ` Wu Fengguang [this message]
2006-05-03 17:28       ` Badari Pulavarty
     [not found]         ` <346733486.30800@ustc.edu.cn>
2006-05-04 15:03           ` Linus Torvalds
2006-05-04 16:57             ` Badari Pulavarty
     [not found]             ` <20060505144451.GA6134@mail.ustc.edu.cn>
2006-05-05 14:44               ` Wu Fengguang
     [not found]     ` <20060503071325.GC4781@mail.ustc.edu.cn>
2006-05-03  7:13       ` Wu Fengguang
2006-05-03 12:59     ` Nikita Danilov
2006-05-03 22:20     ` Rik van Riel
     [not found]       ` <20060506011125.GA9099@mail.ustc.edu.cn>
2006-05-06  1:11         ` Wu Fengguang
2006-05-04  0:28     ` Linda Walsh
2006-05-04  1:31       ` Linus Torvalds
2006-05-04  7:08         ` Ph. Marek
2006-05-04  7:33           ` Arjan van de Ven
     [not found]             ` <20060504121454.GB6008@mail.ustc.edu.cn>
2006-05-04 12:14               ` Wu Fengguang
2006-05-04 12:34               ` Arjan van de Ven
2006-05-03 21:45   ` Linda Walsh
     [not found]     ` <20060504121212.GA6008@mail.ustc.edu.cn>
2006-05-04 12:12       ` Wu Fengguang
2006-05-04 18:57         ` Linda Walsh
     [not found]           ` <20060505152007.GB6134@mail.ustc.edu.cn>
2006-05-05 15:20             ` Wu Fengguang
2006-05-04  9:02   ` Helge Hafting
2006-05-02  7:58 ` Arjan van de Ven
     [not found]   ` <20060502080619.GA5406@mail.ustc.edu.cn>
2006-05-02  8:06     ` Wu Fengguang
2006-05-02  8:30     ` Arjan van de Ven
     [not found]       ` <20060502085325.GA9190@mail.ustc.edu.cn>
2006-05-02  8:53         ` Wu Fengguang
2006-05-06  6:49           ` Denis Vlasenko
2006-05-02  8:55         ` Arjan van de Ven
2006-05-02 11:39           ` Jan Engelhardt
     [not found]           ` <20060502114853.GA9983@mail.ustc.edu.cn>
2006-05-02 11:48             ` Wu Fengguang
2006-05-02 22:03       ` Dave Jones
2006-05-02  8:09 ` Jens Axboe
     [not found]   ` <20060502082009.GA9038@mail.ustc.edu.cn>
2006-05-02  8:20     ` Wu Fengguang
2006-05-03 22:05   ` Benjamin LaHaise
2006-05-02 19:10 ` Pavel Machek
2006-05-02 23:36   ` Nigel Cunningham
     [not found]     ` <20060503023505.GB5915@mail.ustc.edu.cn>
2006-05-03  2:35       ` Wu Fengguang
     [not found]   ` <20060503023223.GA5915@mail.ustc.edu.cn>
2006-05-03  2:32     ` Wu Fengguang
     [not found]   ` <20060503071948.GD4781@mail.ustc.edu.cn>
2006-05-03  7:19     ` Wu Fengguang
     [not found]   ` <20060504122830.GA6205@mail.ustc.edu.cn>
2006-05-04 12:28     ` Wu Fengguang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=346629445.24665@ustc.edu.cn \
    --to=wfg@mail.ustc.edu.cn \
    --cc=akpm@osdl.org \
    --cc=axboe@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nickpiggin@yahoo.com.au \
    --cc=pbadari@us.ibm.com \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox