Re: Heuristic readahead for filesystems

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: jw schultz <jw@pegasys.ws>
To: linux-kernel@vger.kernel.org
Subject: Re: Heuristic readahead for filesystems
Date: Thu, 12 Sep 2002 06:35:47 -0700	[thread overview]
Message-ID: <20020912063547.A5033@pegasys.ws> (raw)
In-Reply-To: <Pine.LNX.3.95.1020912072949.2700A-100000@chaos.analogic.com>; from root@chaos.analogic.com on Thu, Sep 12, 2002 at 07:41:09AM -0400

On Thu, Sep 12, 2002 at 07:41:09AM -0400, Richard B. Johnson wrote:
> On Wed, 11 Sep 2002, jw schultz wrote:
> 
> > On Wed, Sep 11, 2002 at 03:21:37PM -0400, Richard B. Johnson wrote:
> > > On Wed, 11 Sep 2002, Oliver Neukum wrote:
> > > > No, it won't. But it would solve the issue of reading ahead.
> > > > Stating needs a kernel implementation of 'stat ahead'
> > > > -
> > > 
> > > I think this is discussed in the future. Write-ahead is the
> > > next problem solved. ?;)
> > 
> > Gating back to the original issue which was "readahead" of
> > stat() info...
> > 
> > The userland open of a directory could trigger an advance
> > reading of the directory data and of the inode structs of
> > all it's immediate members.  Almost all instances of a
> > usermode open on a directory will be doing fstats.  Even a
> > command line ls often has options (colour, -F, etc) turned on
> > by default that require fstat on all the entries.
> > The question would be how far ahead of the user app would
> > the kernel be.
> > 
> > I could possibly see having a fcntl() for directories to
> > pre-read just the first block of each file to accelerate
> > file-managers that use magic and perhaps forestall readahead
> > pulling in more than magic will use.
> 
> Then you are tuning a file-system for a single program
> like `ls`. Most real-world I/O to file-systems are not done
> by `ls` or even `make`. The extra read-ahead overhead is

Most real-world filesystem I/O doesn't open(2) directories.
Most filesystem I/O is stat, open and unlink of files with
full paths, no open(2) of the directory.  Notice that i
refer to the system call not internal functions that path
lookup invoke.  The list of things that open(2) directories
is very short and almost all of them stat the the majority
of the directory's contents.

> just that, 'overhead'. Since the cost of disk I/O is expensive,
> you certainly do not want to read any more than is absolutely
> necessary. There had been a lot so studies about this in the
> 70's when disks were very, very, slow. The disk-to-RAM speed
> ratio hasn't changed much even though both are much faster.
> Therefore, the conclusions of these studies, made by persons
> from DEC and IBM, should not have changed. From what I recall,
> all studies showed that read-ahead always reduced performance,
> but keeping what was already read in RAM always increased
> performance.

I'm sure there will be others that can show you the numbers.
Things have changed since those studies.  Disks are still
slooooooooow.  However the OS doesn't have to nursemaid the
transfer.  In most cases we queue requests and receive an
interrupt when the data is in memory.  _IF_ the disk 
isn't otherwise kept busy readahead reduces latency.
Most of the associated blocks have near proximity so there
is a good chance to do the readahead in a minumum number of
requests if they are fed to the queues in a batch.

> > The question would be how far ahead of the user app would
> > the kernel be.
I repeat this because i think it is a central point.  Much
of the I/O associated with directory scanning is in tight
loops that would mirror the kernel's behavior.  I have
doubts that it would produce a performance boost because it
might just be a synchronized duplication of effort.

I am not advocating doing this.  I was just exploring the
idea and bringing the thread back to the original question.

-- 
________________________________________________________________
	J.W. Schultz            Pegasystems Technologies
	email address:		jw@pegasys.ws

		Remember Cernan and Schmitt

next prev parent reply	other threads:[~2002-09-12 13:30 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-09-11 15:42 Heuristic readahead for filesystems Xuan Baldauf
2002-09-11 16:33 ` Davide Libenzi
2002-09-11 17:03   ` jdow
2002-09-11 17:20     ` Davide Libenzi
2002-09-11 17:51       ` jdow
2002-09-11 17:16   ` Hans Reiser
2002-09-11 16:42 ` Rik van Riel
2002-09-11 17:20   ` Oliver Neukum
2002-09-11 17:56   ` Xuan Baldauf
2002-09-11 18:30     ` Oliver Neukum
2002-09-11 18:43       ` Xuan Baldauf
2002-09-11 19:04         ` Oliver Neukum
2002-09-11 19:21           ` Richard B. Johnson
2002-09-12  0:45             ` jw schultz
2002-09-12  1:58               ` Andrew Morton
2002-09-12 11:41               ` Richard B. Johnson
2002-09-12 13:35                 ` jw schultz [this message]
2002-09-12 21:33                 ` jdow
2002-09-16 12:52                   ` Richard B. Johnson
2002-09-17  1:45                     ` jdow
2002-09-17 10:37                       ` jbradford
2002-09-17  5:19                     ` jdow
2002-09-12 12:41               ` Jesse Pollard
2002-09-11 19:03   ` Tomas Szepe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20020912063547.A5033@pegasys.ws \
    --to=jw@pegasys.ws \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox