public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrea Arcangeli <andrea@suse.de>
To: William Lee Irwin III <wli@holomorphy.com>
Cc: Dave Hansen <haveblue@us.ibm.com>, linux-kernel@vger.kernel.org
Subject: Re: RFC: CONFIG_PAGE_SHIFT (aka software PAGE_SIZE)
Date: Tue, 24 Jul 2007 21:44:18 +0200	[thread overview]
Message-ID: <20070724194418.GH19559@v2.random> (raw)
In-Reply-To: <20070718133222.GM11781@holomorphy.com>

On Wed, Jul 18, 2007 at 06:32:22AM -0700, William Lee Irwin III wrote:
> Actually I'd worked on what was called MPSS (Multiple Page Size Support)
> before I ever started on pgcl. Some large portion of the pgcl proposal
> as I presented it internally was to reduce the order of large page
> allocations and provide a promotion and demotion mechanism enabling
> different processes to have different sized translations for the same
> large page, and hence no out-of-context pagetable/TLB updates during
> promotion and demotion, essentially by making the TLB translation to
> page relation M:N. ISTR describing this in a KS presentation for which
> IIRC you were present. But that's neither here nor there.

Well the whole difference between you back then and SGI now, is that
your stuff wasn't being pushed to be merged very hard (it was proposed
but IIRC more as research topic, like the large PAGE_SIZE also fallen
into that same research area). See now the emails from SGI fs folks
about variable order page size, they want it merged badly instead.

My whole point is that the single moment the variable order page size
isn't pure research anymore like MPSS, the CONFIG_PAGE_SHIFT isn't
research anymore either, like the tail packing in pagecache with
kmalloc also isn't research anymore.

About the fs deciding the size of the pagecache granularity I totally
dislike that design, there's no reason why the fs should control that,
whatever clever algorithm deciding which pagecache granularity to use
should be outside fs/xfs. I like the pagecache layer to be in charge
of everything. The fs should stay a simple remapper between logical
inode offset to physical disk offset. That can take into account raid,
or other stuff, that's still a logical->raid->physical translation,
but the highelevel "brainer" intellgigence of deciding which
granularity the pagecache should use, would better be in the
pagecache/vfs layer to benefit everyone. And anyway I prefer to keep
the PAGE_SIZE big, and allocate fragments for small files with kmalloc
down to 32 bytes granularity, and memcpy them away if you mmap the
file. After the first time we move from kmalloc fragment to real
PAGE_SIZE pagecache, we add a bitflag to the inode somewhere to be
sure we never use the kmalloc fragment anymore later even if the page
is evicted from pagecache (inodes may well live longer than pagecache
so a bitflag is going to be worth it).

  parent reply	other threads:[~2007-07-24 19:44 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-07-06 22:26 RFC: CONFIG_PAGE_SHIFT (aka software PAGE_SIZE) Andrea Arcangeli
2007-07-06 23:33 ` Dave Hansen
2007-07-06 23:52   ` Andrea Arcangeli
2007-07-17 17:47     ` William Lee Irwin III
2007-07-17 19:33       ` Andrea Arcangeli
2007-07-18 13:32         ` William Lee Irwin III
2007-07-18 16:34           ` Rene Herman
2007-07-18 23:50             ` Andrea Arcangeli
2007-07-19  0:53               ` Rene Herman
2007-07-24 19:44           ` Andrea Arcangeli [this message]
2007-07-25  3:20             ` William Lee Irwin III
2007-07-25 14:39               ` Andrea Arcangeli
2007-07-25 17:56                 ` William Lee Irwin III
2007-07-07  1:36 ` Badari Pulavarty
2007-07-07  1:47 ` Badari Pulavarty
2007-07-07 10:12   ` Andrea Arcangeli
2007-07-07  7:01 ` Paul Mackerras
2007-07-07 10:25   ` Andrea Arcangeli
2007-07-07 18:53 ` Jan Engelhardt
2007-07-07 20:34   ` Rik van Riel
2007-07-08  9:52   ` Andrea Arcangeli
2007-07-08 23:20 ` David Chinner
2007-07-10 10:11   ` Andrea Arcangeli
2007-07-12  0:12     ` David Chinner
2007-07-12 11:14       ` Andrea Arcangeli
2007-07-12 14:44         ` David Chinner
2007-07-12 16:31           ` Andrea Arcangeli
2007-07-12 16:34             ` Dave Hansen
2007-07-13  7:13               ` David Chinner
2007-07-13 14:08                 ` Dave Kleikamp
2007-07-13 14:31                 ` Andrea Arcangeli
2007-07-16  0:27                   ` David Chinner
2007-07-12 17:53 ` Matt Mackall
2007-07-13  1:06   ` Andrea Arcangeli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070724194418.GH19559@v2.random \
    --to=andrea@suse.de \
    --cc=haveblue@us.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=wli@holomorphy.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox