public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Martin J. Bligh" <Martin.Bligh@us.ibm.com>
To: Andrea Arcangeli <andrea@suse.de>
Cc: William Lee Irwin III <wli@holomorphy.com>,
	"M. Edward Borasky" <znmeb@aracnet.com>,
	linux-kernel@vger.kernel.org, riel@surriel.com,
	torvalds@transmeta.com, akpm@zip.com.au
Subject: Re: Have the 2.4 kernel memory management problems on large machines been fixed?
Date: Wed, 22 May 2002 11:18:58 -0700	[thread overview]
Message-ID: <366180000.1022091538@flay> (raw)
In-Reply-To: <20020522172157.GK21164@dualathlon.random>

>> >> 	Persistent kmap sucks, and the global systemwide TLB flushes
>> >> 	scale as O(1/N^2) with the number of CPUs. Enlarging the kmap 
>> >> 	area helps a little, but really we need to stop doing this to
>> >> 	ourselves. I will have a patch (hopefully within a week) to do 
>> >> 	per-task kmap, based on the	UKVA patch that Dave McCracken has
>> >> 	already implemented.
>> > 
>> > O(1/N^2)? wouldn't that get progressively better as the number of cpu's
> 
> 1/N^2 is less than O(1), no-way.

Sorry, typo - O(N^2). Cost of each systemwide flush is N times as much, and
we do them N times more often (fixed size kmap pool, due to fixed size KVA).
At a quick test, Keith found that increasing the size of the kmap pool from 1024
to 4096 (4Mb to 16Mb of KVA consumed) reduces the number of flushes by a
factor of 10 (due to the static overhead).

> Anyways this is only a matter of implementing the
> persistent-and-atomic-kmap, I'm pretty sure they're the right solution
> for this problem, then the whole pool in highmem.c will go away and even
> the pagecache will stop blocking on the kmaps.

Working on the first stage of it as we speak ...
 
> I look forward to see the patch (just the kmap-atomic-and-persistent,
> not the constnatly mapped pte that is more likely to be a regression
> than current linux way IMHO), so we can possibly cleanup and then
> integrate it in 2.5 :).

We have a breakoff of the UKVA infrastructure now (thanks to Dave McCracken),
and once we've kicked its tires a little, we'll pass it across for inspection.

> Other things like managing 63G of highmem with only 850M of direct
> mapping they're almost unsolvable in a generic manner, however
> configuration options and arch-ifdefs can be used here. If the
> computation always stays in kernel or always in usersapce then 4G KVA is
> a solution (as slow as 2.0, the first bigmem for 2.2 and PTX I guess).

I'm more worried about 32Gb than 64Gb for the moment, I don't know
of any machines anyone is actually selling that will take 64Gb - the
NUMA-Q will if we want to work on it, but 16Gb and 32Gb are the
real points right now.

> But even CONFG_2G may not be ok if you want 1.7G of
> shm constnatly mapped in all tasks. 

Exactly. Sometimes I hate databases ;-)

> So at the end the closest to a
> generic solution may be to rewrite the whole kernel MM API to use pfn
> instead of page structures and to kmap the mem_map to get the struct
> page, so you don't shrink the user address space, you move the huge
> mem_map to highmem and the slowdown ""should"" be minor than the 4G KVA
> probably (not obvious though), 

Page clustering may be an easier solution for now, and you're right this is only
a "bridge" to the new world ... that'd give us an effective 16Kb page size, with
probably much less pain than the kmap'ed mem_map, and might even *improve*
performance ;-) Taming the beast to something workable rather than killing it totally
is good enough ...

M.

  reply	other threads:[~2002-05-22 18:24 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-05-22  6:51 2.4.19pre*: IO statistics in /proc/partitions corrupt Jochen Suckfuell
2002-05-22 14:00 ` Have the 2.4 kernel memory management problems on large machines been fixed? M. Edward Borasky
2002-05-22 14:08   ` bert hubert
2002-05-22 14:55     ` Alan Cox
2002-05-22 15:56       ` Martin J. Bligh
2002-05-22 16:23         ` Alan Cox
2002-05-22 21:46         ` Doug Ledford
2002-05-22 14:36   ` William Lee Irwin III
2002-05-22 15:44     ` Martin J. Bligh
2002-05-22 15:53       ` Martin J. Bligh
2002-05-22 16:07       ` William Lee Irwin III
2002-05-22 16:36         ` Martin J. Bligh
2002-05-22 17:21           ` Andrea Arcangeli
2002-05-22 18:18             ` Martin J. Bligh [this message]
2002-05-22 18:02         ` Alan Cox
2002-05-22 18:08           ` Linus Torvalds
2002-05-22 18:30             ` Rik van Riel
2002-05-22 18:40               ` Linus Torvalds
2002-05-22 18:48               ` Martin J. Bligh
2002-05-22 18:34             ` Have the 2.4 kernel memory management problems on large machines Alan Cox
2002-05-22 18:27               ` Linus Torvalds
2002-05-22 20:30             ` Have the 2.4 kernel memory management problems on large machines been fixed? William Lee Irwin III
2002-05-22 21:18               ` Martin J. Bligh
2002-05-22 21:23                 ` Linus Torvalds
2002-05-22 22:35                   ` Andrea Arcangeli
2002-05-22 22:44                   ` Martin J. Bligh
2002-05-28  2:08                 ` Wim Coekaerts
2002-05-31 20:39                   ` Andrea Arcangeli
2002-05-23 14:16             ` Bill Davidsen
2002-05-23 17:18               ` Linus Torvalds
2002-05-23 19:34                 ` Bill Davidsen
2002-05-23 19:46                   ` Linus Torvalds
2002-05-22 18:38           ` Martin J. Bligh
2002-05-22 17:50       ` Alan Cox
2002-05-22 17:54         ` J Sloan
2002-05-22 18:22           ` Have the 2.4 kernel memory management problems on large machines Alan Cox
2002-05-22 22:14             ` J Sloan
2002-05-22 18:24         ` Have the 2.4 kernel memory management problems on large machines been fixed? Martin J. Bligh
2002-05-22 22:05           ` Alan Cox
  -- strict thread matches above, loose matches on Subject: below --
2002-05-22 14:29 Alastair Stevens
     [not found] <E17AaR0-0002QM-00@the-village.bc.nu.suse.lists.linux.kernel>
     [not found] ` <Pine.LNX.4.33.0205221048570.23621-100000@penguin.transmeta.com.suse.lists.linux.kernel>
2002-05-22 20:23   ` Andi Kleen
2002-05-22 20:58     ` Linus Torvalds
2002-05-23 12:40       ` Mike Jagdis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=366180000.1022091538@flay \
    --to=martin.bligh@us.ibm.com \
    --cc=akpm@zip.com.au \
    --cc=andrea@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=riel@surriel.com \
    --cc=torvalds@transmeta.com \
    --cc=wli@holomorphy.com \
    --cc=znmeb@aracnet.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox