public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: William Lee Irwin III <wli@holomorphy.com>
To: Zwane Mwaikambo <zwane@linuxpower.ca>
Cc: linux-kernel@vger.kernel.org
Subject: Re: 64GB NUMA-Q after pgcl
Date: Fri, 28 Mar 2003 02:14:33 -0800	[thread overview]
Message-ID: <20030328101433.GQ1350@holomorphy.com> (raw)
In-Reply-To: <Pine.LNX.4.50.0303280303190.2884-100000@montezuma.mastecende.com>

On Thu, 27 Mar 2003, William Lee Irwin III wrote:
>> Sure. On NUMA-Q mem_map[] is not allocated using bootmem except for
>> node 0. Various other bootmem allocations are also proportional to
>> memory as measured in units of PAGE_SIZE, but not all.
>> So all we're seeing here is node 0's mem_map[] with "miscellaneous"
>> bootmem allocations thrown in, whether reduced or increased.
>> This is not very reflective of what's going on as the majority of mem_map[]
>> is allocated through a custom reservation mechanism as opposed to bootmem.

On Fri, Mar 28, 2003 at 03:05:42AM -0500, Zwane Mwaikambo wrote:
> Thanks, nice work btw, although the core guts of this stuff is somewhat of 
> a mystery to some of us ;)

The code is still very much of prototype quality, so I'm actually being
somewhat deliberately obscure so those who aren't specifically
interested in hacking or very early testing don't accidentally burn
themselves or otherwise get the impression of a patchkit gone horribly
wrong. And even worse than that, so no one reviews the code before I've
cleaned it up.

The concept is really very simple, although the consequences are far
reaching. The kernel ties together its basic unit of allocation and
accounting, the PAGE_SIZE area and its associated struct page, together
with the notion of a pagetable entry and the size of the area mapped by
a pagetable entry (also called PAGE_SIZE in mainline, made into a
distinct notion of MMUPAGE_SIZE by the patch).

Page clustering is named for the view of the arrangement that a set of
hardware pages is a "cluster" represented by the software accounting
unit. In truth it's closer to symmetry apart from the constraint that
the software unit must be larger than the hardware unit. The net result
of it is that you go around figuring out which of the two units various
bits of code really meant, and for pagetable walks and so on the code
must be taught that it's referring to only a piece of a software page,
or to hand callers the piece they need when they need them.

The fact it resolves the horror of mem_map[] overrunning kernel
virtualspace on i386 PAE is really an obscure coincidence. AIUI Hugh's
2.4.x patch was actually intended to enable larger filesystem block
sizes, and the BSD implementation for the VAX was simply meant to deal
with the fact that even 16B for every 512B hardware page is too large a
fraction of physical memory (not virtual) for page-granularity
accounting to be memory-efficient. For BSD's purposes a relatively
small constant factor sufficed; for i386 a much larger one is required
for workload feasibility as virtualspace approaches the precise
fraction of physical memory that the coremap would otherwise consume.

Various other odd goodnesses are supposed to come of it, for instance,
prefaulting benefits as a side effect of trying to utilize the entire
software page in fault handlers, and io throughput benefits from
increased physical contiguity. My codebase is not prepared for
performance analysis yet, as the fragmentation issues are only
partially resolved. The real point of the posting is to show that this
thing actually makes 64GB work and, of course, to get first the post
on 64GB i386 PAE. =)

With this in hand, we can say "Yes, this solves the problem without
turning critical userspace apps into doorstops by stealing address
space from them" and I can resume coding up the final stretch of
functionality and move on to cleanups and maintenance of the patch
until the devel cycle comes to the point where it's ready for a merge.
I'd not be surprised if some vendor and/or distro interest is provoked,
and I'll do my best to help them along (if desired) once the patch is in
good enough shape wrt. functionality and clean enough to deliver to them.


-- wli

  reply	other threads:[~2003-03-28 10:03 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-03-28  4:00 64GB NUMA-Q after pgcl William Lee Irwin III
2003-03-28  7:45 ` Zwane Mwaikambo
2003-03-28  7:57   ` William Lee Irwin III
2003-03-28  8:05     ` Zwane Mwaikambo
2003-03-28 10:14       ` William Lee Irwin III [this message]
2003-03-28 17:38         ` John Levon
2003-03-30 23:19 ` Andrea Arcangeli
2003-03-31  4:27   ` William Lee Irwin III
2003-03-31  5:22     ` William Lee Irwin III
2003-03-31 21:02       ` Ingo Oeser
2003-03-31 22:27         ` William Lee Irwin III
2003-04-01  1:25           ` Andrea Arcangeli
2003-03-31 18:35     ` Andrea Arcangeli
2003-03-31 18:41       ` Christoph Hellwig
2003-03-31 19:08         ` William Lee Irwin III
2003-04-01  0:47           ` Andrea Arcangeli
2003-04-01  0:44         ` Andrea Arcangeli
2003-03-31 18:55       ` William Lee Irwin III

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20030328101433.GQ1350@holomorphy.com \
    --to=wli@holomorphy.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=zwane@linuxpower.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox