public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Daniel Phillips <phillips@arcor.de>
To: Linus Torvalds <torvalds@osdl.org>,
	William Lee Irwin III <wli@holomorphy.com>
Cc: mfedyk@matchmail.com, "Eric W. Biederman" <ebiederm@xmission.com>,
	Anton Ertl <anton@mips.complang.tuwien.ac.at>,
	Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Subpages (was: Page Colouring)
Date: Mon, 29 Dec 2003 15:02:20 -0500	[thread overview]
Message-ID: <200312291501.32541.phillips@arcor.de> (raw)
In-Reply-To: <Pine.LNX.4.58.0312282000390.11299@home.osdl.org>

On Sunday 28 December 2003 23:09, Linus Torvalds wrote:
> On Sun, 28 Dec 2003, William Lee Irwin III wrote:
> > Doubtful. I suspect he may be referring to pgcl (sometimes called
> > "subpages"), though Dan Phillips hasn't been involved in it. I guess
> > we'll have to wait for Linus to respond to know for sure.
>
> I didn't see the patch itself, but I spent some time talking to Daniel
> after your talk at the kernel summit. At least I _think_ it was him I was
> talking to - my memory for names and faces is basically zero.
>
> Daniel claimed to have it working back then, and that it actually shrank
> the kernel source code. The basic approach is to just make PAGE_SIZE
> larger, and handle temporary needs for smaller subpages by just
> dynamically allocating "struct page" entries for them. The size reduction
> came from getting rid of the "struct buffer_head", because it ends up
> being just another "small page".
>
> Daniel, details?

Hi Linus,

Your description is accurate.  Another reason for code size shrinkage is 
getting rid of the loops across buffers in the block IO library, e.g., 
block_read_full_page.

Subpages only make sense for file-backed memory, which conveniently lets the 
page cache keep track of subpages.  Each address_space has pages of all the 
same size, which may be smaller, larger or the same as PAGE_CACHE_SIZE.  The 
first case, "subpages", is the interesting one.

An address_space with subpages has base pages of PAGE_CACHE_SIZE for its 
"even" entries and up to N-1 dynamically allocated struct pages for the "odd"  
entries where N is PAGE_CACHE_SIZE divided by the subpage size.  Base pages 
are normal members of mem_map.  Subpages are not referenced by mem_map, but 
only by the page cache.  They are created by operations such as 
find_or_create_page, which first creates a base page if necessary.  A counter 
field in the page flags of the base page keeps track of how many subpages 
share a base page's physical memory; when this field goes to zero the base 
page may be removed from the page cache.

Subpages always have a ->virtual field regardless of whether mem_map pages do.  
This is used for virt_to_phys and to locate the base page when a subpage is 
freed.

Page fault handling doesn't change much if at all, since the faulting address 
is rounded down to a physical page, which will be a base page.

Most of the changes for subpages are in the buffer.c page cache operations and 
are largely transparent to the VMM, though PAGE_CACHE_SHIFT becomes 
mapping->page_shift, which touches a lot of files.  As you noted, buffer_head 
functionality can be taken over by struct page and buffers become expendible.  
However it is not necessary to cross that bridge immediately; page buffer 
lists continue to work though the buffer list is never longer than one.

With a little more work, subpages can be used to shrink mem_map: implement a 
larger PAGE_CACHE_SIZE then use subpages to handle ABI problems.  In this 
case faults on subpages are possible and the fault path probably needs to 
know something about it.  With a larger-than-physical PAGE_CACHE_SIZE we can 
finally have large buffers, though the kernel would have to be compiled for 
it.  Some more work to allowing mapping->page_shift to be larger than 
PAGE_CACHE_SIZE would complete the process of generalizing the page size.  My 
impression is, this isn't too messy, most of the impact is on faulting.  Bill 
and others are already familiar with this I think.  The work should dovetail.

I took a stab at implementing subpages some time ago in 2.4 and got it mostly 
working but not quite bootable.  I did find out roughly how invasive the 
patch is, which is: not very, unless I've overlooked something major.  I'll 
get busy on a 2.6 prototype, and of course I'll listen attentively for 
reasons why this plan won't work.

Regards,

Daniel


  parent reply	other threads:[~2003-12-29 20:04 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <179fV-1iK-23@gated-at.bofh.it>
     [not found] ` <179IS-1VD-13@gated-at.bofh.it>
2003-12-27 20:21   ` Page Colouring (was: 2.6.0 Huge pages not working as expected) Anton Ertl
2003-12-27 20:56     ` Linus Torvalds
2003-12-27 23:31       ` Eric W. Biederman
2003-12-27 23:50         ` William Lee Irwin III
2003-12-28  1:09         ` David S. Miller
2003-12-28  4:53         ` Linus Torvalds
2003-12-28 16:39           ` William Lee Irwin III
2003-12-29  0:36             ` Mike Fedyk
2003-12-29  2:55               ` William Lee Irwin III
2003-12-29  4:09                 ` Linus Torvalds
2003-12-29  6:52                   ` William Lee Irwin III
2003-12-29  9:14                     ` Linus Torvalds
2003-12-29  9:22                       ` William Lee Irwin III
2003-12-29  9:33                         ` Linus Torvalds
2003-12-29 10:23                           ` William Lee Irwin III
2003-12-29 10:59                             ` Mike Fedyk
2003-12-29 11:14                               ` William Lee Irwin III
2003-12-30  2:00                             ` Rusty Russell
2003-12-30  4:59                               ` William Lee Irwin III
     [not found]                     ` <20031229084304.GA31630@elte.hu>
2003-12-29 12:09                       ` Ingo Molnar
2003-12-29 12:49                         ` William Lee Irwin III
2003-12-29 20:02                   ` Daniel Phillips [this message]
2003-12-29 20:15                     ` Subpages (was: Page Colouring) Linus Torvalds
2003-12-29 21:11           ` Page Colouring (was: 2.6.0 Huge pages not working as expected) Eric W. Biederman
2003-12-29 21:35             ` Linus Torvalds
     [not found]       ` <17tHK-3K6-21@gated-at.bofh.it>
2003-12-28 17:17         ` Anton Ertl

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200312291501.32541.phillips@arcor.de \
    --to=phillips@arcor.de \
    --cc=anton@mips.complang.tuwien.ac.at \
    --cc=ebiederm@xmission.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mfedyk@matchmail.com \
    --cc=torvalds@osdl.org \
    --cc=wli@holomorphy.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox