From: Andrew Morton <akpm@linux-foundation.org>
To: bert hubert <bert.hubert@netherlabs.nl>
Cc: Rik van Riel <riel@redhat.com>, linux-kernel@vger.kernel.org
Subject: Re: userspace pagecache management tool
Date: Sat, 3 Mar 2007 15:45:41 -0800 [thread overview]
Message-ID: <20070303154541.70aed9df.akpm@linux-foundation.org> (raw)
In-Reply-To: <20070303230155.GA475@outpost.ds9a.nl>
On Sun, 4 Mar 2007 00:01:55 +0100 bert hubert <bert.hubert@netherlabs.nl> wrote:
> On Sat, Mar 03, 2007 at 02:26:09PM -0800, Andrew Morton wrote:
> > > > It is *not* a global instruction. It uses setenv, so the user's policy
> > > > affects only the target process and its forked children.
> > >
> > > ... and all other processes accessing the same file(s)!
> > >
> > > Your library and the system calls may be limited to one process,
> > > but the consequences are global.
> >
> > Yes. So what? If the user wants to go and evict libc.so from pagecache
> > then he can do so - the kernel has provided syscalls with which this can be
> > done for at least seven years. Bad user, shouldn't do that.
>
> While I agree with your sentiments that userspace can have a good idea on
> how to deal with the page cache, your program does more than it claims to
> do - because of how linux implements posix_fadvise.
>
> I don't think anybody expects or desires your program to actually *evict*
> the stuff from the cache you are trying access, which happens in case the
> data was in the cache prior to starting your program.
>
> What people expect is that a solution such as you wrote it simply won't
> *add* anything to the cache. They don't expect it will actually globally
> *remove* stuff from the cache.
>
> Making a backup this way would hurt even worse than usual with your
> pagecache management tool if the file being backupped was still being read.
>
> This is not your fault, but in practice, it makes your program less useful
> than it could be.
yup. As I said, it's a proof-of-concept. It's a project. And I have about one
free femtosecond per fortnight :(
> One could conceivably fix that up using mincore and simply not fadvise if a
> page was in core already.
Yes. Let's flesh it out the backup program policy some more:
- Unconditionally invalidate output files
- on entry to read(), probe pagecache, record which pages in the range are present
- on entry to next read(), shoot down those pages from the previous read
which weren't in pagecache.
- But we can do better! LRU the page's files up to a certain number of pages.
- Once that point is exceeded, we need to reclaim some pages. Which
ones? Well, we've been observing all reads, so we can record which pages
were referenced once, and which ones were referenced multiple times so we
can do arbitrarily complex page aging in there.
- On close(), nuke all pages which weren't in core during open(), even if
this app referenced them multiple times.
- If the backup program decided to read its input files with mmap we're
rather screwed. We can't intercept pagefaults so the best we can do is
to restore the file's pagecache to its previous state on close().
Or if it's really a problem, get control in there somehow and
periodically poll the pagecache occupancy via mincore(), use madvise()
then fadvise() to trim it back.
That all sounds reasonably doable. It'd be pretty complex to do it
in-kernel but we could do it there too. Problem is if course that the
above strategy is explicitly optimised for the backup program and if it's
in-kernel it becomes applicable to all other workloads.
next prev parent reply other threads:[~2007-03-03 23:45 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-03-03 20:29 userspace pagecache management tool Andrew Morton
2007-03-03 20:40 ` Rik van Riel
2007-03-03 21:12 ` Andrew Morton
2007-03-03 21:30 ` Rik van Riel
2007-03-03 21:41 ` bert hubert
2007-03-03 22:14 ` Andrew Morton
2007-03-03 22:19 ` Rik van Riel
2007-03-03 22:26 ` Andrew Morton
2007-03-03 22:28 ` Rik van Riel
2007-03-03 22:38 ` Andrew Morton
2007-03-03 22:56 ` Erik Andersen
2007-03-03 23:01 ` bert hubert
2007-03-03 23:45 ` Andrew Morton [this message]
2007-03-06 12:10 ` Pádraig Brady
2007-03-06 21:40 ` Andrew Morton
2007-03-06 21:44 ` Rik van Riel
2007-03-07 11:39 ` Pádraig Brady
2007-03-07 18:50 ` Andrew Morton
2007-03-08 7:59 ` Vaidyanathan Srinivasan
2007-03-08 8:12 ` Andrew Morton
2007-03-03 22:07 ` Andrew Morton
2007-03-03 22:25 ` Rik van Riel
2007-03-03 22:37 ` Andrew Morton
2007-03-03 22:52 ` Andrew Morton
2007-03-04 0:01 ` Rik van Riel
2007-03-04 1:02 ` Andrew Morton
2007-03-04 1:23 ` Rik van Riel
2007-03-04 1:49 ` Andrew Morton
2007-03-04 1:56 ` Rik van Riel
2007-03-04 12:07 ` Andrew Morton
2007-03-04 14:35 ` Peter Zijlstra
2007-03-04 16:01 ` Rik van Riel
2007-03-03 22:58 ` Ray Lee
2007-03-03 23:34 ` Andrew Morton
2007-03-04 1:02 ` Ray Lee
2007-03-04 1:21 ` Andrew Morton
2007-03-04 0:14 ` Eric St-Laurent
2007-03-04 1:10 ` Andrew Morton
2007-03-04 1:39 ` Rik van Riel
2007-03-04 1:16 ` Lee Revell
2007-03-04 1:39 ` Andrew Morton
2007-03-04 2:35 ` Lee Revell
2007-03-04 4:35 ` Andrew Morton
2007-03-05 11:02 ` Pádraig Brady
2007-03-05 11:12 ` Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070303154541.70aed9df.akpm@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=bert.hubert@netherlabs.nl \
--cc=linux-kernel@vger.kernel.org \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox