public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: bert hubert <bert.hubert@netherlabs.nl>
Cc: Rik van Riel <riel@redhat.com>, linux-kernel@vger.kernel.org
Subject: Re: userspace pagecache management tool
Date: Sat, 3 Mar 2007 15:45:41 -0800	[thread overview]
Message-ID: <20070303154541.70aed9df.akpm@linux-foundation.org> (raw)
In-Reply-To: <20070303230155.GA475@outpost.ds9a.nl>

On Sun, 4 Mar 2007 00:01:55 +0100 bert hubert <bert.hubert@netherlabs.nl> wrote:

> On Sat, Mar 03, 2007 at 02:26:09PM -0800, Andrew Morton wrote:
> > > > It is *not* a global instruction.  It uses setenv, so the user's policy
> > > > affects only the target process and its forked children.
> > > 
> > > ... and all other processes accessing the same file(s)!
> > > 
> > > Your library and the system calls may be limited to one process,
> > > but the consequences are global.
> > 
> > Yes.  So what?  If the user wants to go and evict libc.so from pagecache
> > then he can do so - the kernel has provided syscalls with which this can be
> > done for at least seven years.  Bad user, shouldn't do that.
> 
> While I agree with your sentiments that userspace can have a good idea on
> how to deal with the page cache, your program does more than it claims to
> do - because of how linux implements posix_fadvise.
> 
> I don't think anybody expects or desires your program to actually *evict*
> the stuff from the cache you are trying access, which happens in case the
> data was in the cache prior to starting your program.
> 
> What people expect is that a solution such as you wrote it simply won't
> *add* anything to the cache. They don't expect it will actually globally
> *remove* stuff from the cache.
> 
> Making a backup this way would hurt even worse than usual with your
> pagecache management tool if the file being backupped was still being read.
> 
> This is not your fault, but in practice, it makes your program less useful
> than it could be.

yup.  As I said, it's a proof-of-concept.  It's a project.  And I have about one
free femtosecond per fortnight :(

> One could conceivably fix that up using mincore and simply not fadvise if a
> page was in core already.

Yes.  Let's flesh it out the backup program policy some more:

- Unconditionally invalidate output files

- on entry to read(), probe pagecache, record which pages in the range are present

- on entry to next read(), shoot down those pages from the previous read
  which weren't in pagecache.

- But we can do better!  LRU the page's files up to a certain number of pages.

- Once that point is exceeded, we need to reclaim some pages.  Which
  ones?  Well, we've been observing all reads, so we can record which pages
  were referenced once, and which ones were referenced multiple times so we
  can do arbitrarily complex page aging in there.

- On close(), nuke all pages which weren't in core during open(), even if
  this app referenced them multiple times.

- If the backup program decided to read its input files with mmap we're
  rather screwed.  We can't intercept pagefaults so the best we can do is
  to restore the file's pagecache to its previous state on close().

  Or if it's really a problem, get control in there somehow and
  periodically poll the pagecache occupancy via mincore(), use madvise()
  then fadvise() to trim it back.

That all sounds reasonably doable.  It'd be pretty complex to do it
in-kernel but we could do it there too.  Problem is if course that the
above strategy is explicitly optimised for the backup program and if it's
in-kernel it becomes applicable to all other workloads.



  reply	other threads:[~2007-03-03 23:45 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-03-03 20:29 userspace pagecache management tool Andrew Morton
2007-03-03 20:40 ` Rik van Riel
2007-03-03 21:12   ` Andrew Morton
2007-03-03 21:30     ` Rik van Riel
2007-03-03 21:41       ` bert hubert
2007-03-03 22:14         ` Andrew Morton
2007-03-03 22:19           ` Rik van Riel
2007-03-03 22:26             ` Andrew Morton
2007-03-03 22:28               ` Rik van Riel
2007-03-03 22:38                 ` Andrew Morton
2007-03-03 22:56               ` Erik Andersen
2007-03-03 23:01               ` bert hubert
2007-03-03 23:45                 ` Andrew Morton [this message]
2007-03-06 12:10                   ` Pádraig Brady
2007-03-06 21:40                     ` Andrew Morton
2007-03-06 21:44                       ` Rik van Riel
2007-03-07 11:39                       ` Pádraig Brady
2007-03-07 18:50                         ` Andrew Morton
2007-03-08  7:59                   ` Vaidyanathan Srinivasan
2007-03-08  8:12                     ` Andrew Morton
2007-03-03 22:07       ` Andrew Morton
2007-03-03 22:25         ` Rik van Riel
2007-03-03 22:37           ` Andrew Morton
2007-03-03 22:52           ` Andrew Morton
2007-03-04  0:01             ` Rik van Riel
2007-03-04  1:02               ` Andrew Morton
2007-03-04  1:23                 ` Rik van Riel
2007-03-04  1:49                   ` Andrew Morton
2007-03-04  1:56                     ` Rik van Riel
2007-03-04 12:07                       ` Andrew Morton
2007-03-04 14:35                         ` Peter Zijlstra
2007-03-04 16:01                         ` Rik van Riel
2007-03-03 22:58 ` Ray Lee
2007-03-03 23:34   ` Andrew Morton
2007-03-04  1:02     ` Ray Lee
2007-03-04  1:21       ` Andrew Morton
2007-03-04  0:14 ` Eric St-Laurent
2007-03-04  1:10   ` Andrew Morton
2007-03-04  1:39   ` Rik van Riel
2007-03-04  1:16 ` Lee Revell
2007-03-04  1:39   ` Andrew Morton
2007-03-04  2:35     ` Lee Revell
2007-03-04  4:35       ` Andrew Morton
2007-03-05 11:02 ` Pádraig Brady
2007-03-05 11:12   ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070303154541.70aed9df.akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=bert.hubert@netherlabs.nl \
    --cc=linux-kernel@vger.kernel.org \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox