public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Daniel Phillips <phillips@bonn-fries.net>
To: Richard Gooch <rgooch@ras.ucalgary.ca>,
	Linus Torvalds <torvalds@transmeta.com>
Cc: Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: Getting FS access events
Date: Mon, 14 May 2001 15:04:31 +0200	[thread overview]
Message-ID: <01051415043103.02742@starship> (raw)
In-Reply-To: <200105140224.f4E2OiE08257@vindaloo.ras.ucalgary.ca> <Pine.LNX.4.21.0105132139590.21224-100000@penguin.transmeta.com> <200105140515.f4E5FwP10245@vindaloo.ras.ucalgary.ca>
In-Reply-To: <200105140515.f4E5FwP10245@vindaloo.ras.ucalgary.ca>

On Monday 14 May 2001 07:15, Richard Gooch wrote:
> Linus Torvalds writes:
> > But sure, you can use bmap if you want. It would be interesting to
> > hear whether it makes much of a difference..
>
> I doubt bmap() would make any difference if there is a way of
> controlling when the I/O starts.
>
> However, this still doesn't address the issue of indirect blocks. If
> the indirect block has a higher bnum than the data blocks it points
> to, you've got a costly seek. This is why I'm still attracted to the
> idea of doing this at the block device layer. It's easy to capture
> *all* accesses and then warm the buffer cache.
>
> So, why can't the page cache check if a block is in the buffer cache?

That's not quite what you want, if only because there won't be anything 
in the buffer cache pretty soon.  What we really want is a block cache, 
tightly integrated with the page cache.  Readahead with a block cache 
would be more effective than our current file-based readahead.  For 
example, it handles the case where blocks of two files are interleaved.

Since we know that the page cache maps each block at most once, the 
optimal thing to do would be to just move a pointer from the block 
cache to the page cache whenever we can.  Unfortunately the layering in 
the VFS as it stands isn't friendly to this: typically we allocate a 
page in generic_file_read long before we ask the filesystem to map it.  
To test this zero-copy idea we'd need to replace generic_file_read and 
for mmap, filemap_nopage.

But we don't need anything so fancy to try out your idea, we just need 
a lvm-like device that can:

  - Maintain a block cache
  - Remap logical to physical blocks
  - Record the block accesses
  - Physically reorder the blocks according to the recorded order
  - Load a given region of disk into the block cache on command

None of this has to be particularly general to get to the benchmarking 
stage.  E.g, the 'block cache' only needs to cache one physical region.

The central idea here is that you obviously can't do any better than to 
have all the blocks you want to read at boot physically together on 
disk.

The advantage of using this lvm-style remapping is, it will work for 
any filesystem.  The disadvantage is that the ordering is then cast in 
stone - after the system is up it might not like the ordering you chose 
for the boot, and the elevator will be completely confused ;-)  But the 
thing is, everything you need to measure the boot performance is 
together in one place, just one device driver to write.  Then once you 
know what the perfect result is you have a yardstick to measure the 
effectivenns of other, less intrusive approaches.

I took a look at the lvm and md code to see if there's a quick way to 
press them into service for this test, and there probably is, but the 
complexity there is daunting.  I think starting with a clean sheet and 
writing a new driver would be easier.

--
Daniel

  reply	other threads:[~2001-05-14 13:12 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <200105140117.f4E1HqN07362@vindaloo.ras.ucalgary.ca>
2001-05-14  1:32 ` Getting FS access events Linus Torvalds
2001-05-14  1:45   ` Larry McVoy
2001-05-14  2:39     ` Richard Gooch
2001-05-14  3:09       ` Rik van Riel
2001-05-14  4:27         ` Richard Gooch
2001-05-15  4:37       ` Chris Wedgwood
2001-05-23 11:37         ` Stephen C. Tweedie
2001-05-14  2:24   ` Richard Gooch
2001-05-14  4:46     ` Linus Torvalds
2001-05-14  5:15       ` Richard Gooch
2001-05-14 13:04         ` Daniel Phillips [this message]
2001-05-14 18:00           ` Andreas Dilger
2001-05-14 20:16         ` Linus Torvalds
2001-05-14 23:19           ` Richard Gooch
2001-05-15  0:42             ` Daniel Phillips
2001-05-15  4:00             ` Linus Torvalds
2001-05-15  4:35               ` Larry McVoy
2001-05-15  4:57                 ` David S. Miller
2001-05-15  5:12                   ` Alexander Viro
2001-05-15  9:10                   ` Alan Cox
2001-05-15  9:48                     ` Lars Brinkhoff
2001-05-15  9:54                       ` Alexander Viro
2001-05-15 20:17                         ` Kai Henningsen
2001-05-15 20:58                           ` Alexander Viro
2001-05-15 21:08                             ` Alexander Viro
2001-05-15  4:59                 ` Alexander Viro
2001-05-15 17:01                   ` Pavel Machek
2001-05-15  4:43               ` Linus Torvalds
2001-05-15  5:04                 ` Alexander Viro
2001-05-15  6:20                 ` Richard Gooch
2001-05-15  6:28                   ` Linus Torvalds
2001-05-15  6:49                     ` Richard Gooch
2001-05-15  6:57                       ` Alexander Viro
2001-05-15 10:33                         ` Daniel Phillips
2001-05-15 10:44                           ` Alexander Viro
2001-05-15 14:42                             ` Daniel Phillips
2001-05-15  7:13                       ` Linus Torvalds
2001-05-15  7:56                         ` Chris Wedgwood
2001-05-15  8:06                           ` Linus Torvalds
2001-05-15  8:33                             ` Alexander Viro
2001-05-15 10:27                               ` David Woodhouse
2001-05-15 16:00                               ` Chris Mason
2001-05-15 19:26                               ` H. Peter Anvin
2001-05-15 20:03                                 ` Alexander Viro
2001-05-15 20:07                                   ` H. Peter Anvin
2001-05-15 20:15                                     ` Alexander Viro
2001-05-15 20:17                                       ` H. Peter Anvin
2001-05-15 20:22                                         ` Alexander Viro
2001-05-15 20:26                                           ` H. Peter Anvin
2001-05-15 20:31                                             ` Alexander Viro
2001-05-15 21:12                                               ` Linus Torvalds
2001-05-15 21:22                                               ` H. Peter Anvin
2001-05-15 21:02                                           ` Linus Torvalds
2001-05-15 21:53                                             ` Jan Harkes
2001-05-19  5:26                             ` Chris Wedgwood
2001-05-15 10:04                       ` Anton Altaparmakov
2001-05-15 19:28                         ` H. Peter Anvin
2001-05-15 22:31                           ` Albert D. Cahalan
2001-05-15 22:35                             ` H. Peter Anvin
2001-05-16  1:17                             ` Anton Altaparmakov
2001-05-16  1:30                               ` H. Peter Anvin
2001-05-16  8:34                               ` Anton Altaparmakov
2001-05-16 16:27                                 ` H. Peter Anvin
2001-05-15 16:26                       ` Pavel Machek
2001-05-15 18:02                       ` Craig Milo Rogers
2001-05-15 16:17                 ` Pavel Machek
2001-05-19 19:39                   ` Linus Torvalds
2001-05-19 19:44                     ` Pavel Machek
2001-05-19 19:47                       ` Linus Torvalds
2001-05-23 11:29                         ` Stephen C. Tweedie
2001-05-20  4:30                     ` Chris Wedgwood
2001-05-20 19:47                       ` Alan Cox
2001-05-18  7:55                 ` Rogier Wolff
2001-05-23 11:36                   ` Stephen C. Tweedie
2001-05-15  6:13               ` Richard Gooch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=01051415043103.02742@starship \
    --to=phillips@bonn-fries.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rgooch@ras.ucalgary.ca \
    --cc=torvalds@transmeta.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox