From: "Martin J. Bligh" <Martin.Bligh@us.ibm.com>
To: Andrea Arcangeli <andrea@suse.de>
Cc: William Lee Irwin III <wli@holomorphy.com>,
"M. Edward Borasky" <znmeb@aracnet.com>,
linux-kernel@vger.kernel.org, riel@surriel.com,
torvalds@transmeta.com, akpm@zip.com.au
Subject: Re: Have the 2.4 kernel memory management problems on large machines been fixed?
Date: Wed, 22 May 2002 11:18:58 -0700 [thread overview]
Message-ID: <366180000.1022091538@flay> (raw)
In-Reply-To: <20020522172157.GK21164@dualathlon.random>
>> >> Persistent kmap sucks, and the global systemwide TLB flushes
>> >> scale as O(1/N^2) with the number of CPUs. Enlarging the kmap
>> >> area helps a little, but really we need to stop doing this to
>> >> ourselves. I will have a patch (hopefully within a week) to do
>> >> per-task kmap, based on the UKVA patch that Dave McCracken has
>> >> already implemented.
>> >
>> > O(1/N^2)? wouldn't that get progressively better as the number of cpu's
>
> 1/N^2 is less than O(1), no-way.
Sorry, typo - O(N^2). Cost of each systemwide flush is N times as much, and
we do them N times more often (fixed size kmap pool, due to fixed size KVA).
At a quick test, Keith found that increasing the size of the kmap pool from 1024
to 4096 (4Mb to 16Mb of KVA consumed) reduces the number of flushes by a
factor of 10 (due to the static overhead).
> Anyways this is only a matter of implementing the
> persistent-and-atomic-kmap, I'm pretty sure they're the right solution
> for this problem, then the whole pool in highmem.c will go away and even
> the pagecache will stop blocking on the kmaps.
Working on the first stage of it as we speak ...
> I look forward to see the patch (just the kmap-atomic-and-persistent,
> not the constnatly mapped pte that is more likely to be a regression
> than current linux way IMHO), so we can possibly cleanup and then
> integrate it in 2.5 :).
We have a breakoff of the UKVA infrastructure now (thanks to Dave McCracken),
and once we've kicked its tires a little, we'll pass it across for inspection.
> Other things like managing 63G of highmem with only 850M of direct
> mapping they're almost unsolvable in a generic manner, however
> configuration options and arch-ifdefs can be used here. If the
> computation always stays in kernel or always in usersapce then 4G KVA is
> a solution (as slow as 2.0, the first bigmem for 2.2 and PTX I guess).
I'm more worried about 32Gb than 64Gb for the moment, I don't know
of any machines anyone is actually selling that will take 64Gb - the
NUMA-Q will if we want to work on it, but 16Gb and 32Gb are the
real points right now.
> But even CONFG_2G may not be ok if you want 1.7G of
> shm constnatly mapped in all tasks.
Exactly. Sometimes I hate databases ;-)
> So at the end the closest to a
> generic solution may be to rewrite the whole kernel MM API to use pfn
> instead of page structures and to kmap the mem_map to get the struct
> page, so you don't shrink the user address space, you move the huge
> mem_map to highmem and the slowdown ""should"" be minor than the 4G KVA
> probably (not obvious though),
Page clustering may be an easier solution for now, and you're right this is only
a "bridge" to the new world ... that'd give us an effective 16Kb page size, with
probably much less pain than the kmap'ed mem_map, and might even *improve*
performance ;-) Taming the beast to something workable rather than killing it totally
is good enough ...
M.
next prev parent reply other threads:[~2002-05-22 18:24 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-05-22 6:51 2.4.19pre*: IO statistics in /proc/partitions corrupt Jochen Suckfuell
2002-05-22 14:00 ` Have the 2.4 kernel memory management problems on large machines been fixed? M. Edward Borasky
2002-05-22 14:08 ` bert hubert
2002-05-22 14:55 ` Alan Cox
2002-05-22 15:56 ` Martin J. Bligh
2002-05-22 16:23 ` Alan Cox
2002-05-22 21:46 ` Doug Ledford
2002-05-22 14:36 ` William Lee Irwin III
2002-05-22 15:44 ` Martin J. Bligh
2002-05-22 15:53 ` Martin J. Bligh
2002-05-22 16:07 ` William Lee Irwin III
2002-05-22 16:36 ` Martin J. Bligh
2002-05-22 17:21 ` Andrea Arcangeli
2002-05-22 18:18 ` Martin J. Bligh [this message]
2002-05-22 18:02 ` Alan Cox
2002-05-22 18:08 ` Linus Torvalds
2002-05-22 18:30 ` Rik van Riel
2002-05-22 18:40 ` Linus Torvalds
2002-05-22 18:48 ` Martin J. Bligh
2002-05-22 18:34 ` Have the 2.4 kernel memory management problems on large machines Alan Cox
2002-05-22 18:27 ` Linus Torvalds
2002-05-22 20:30 ` Have the 2.4 kernel memory management problems on large machines been fixed? William Lee Irwin III
2002-05-22 21:18 ` Martin J. Bligh
2002-05-22 21:23 ` Linus Torvalds
2002-05-22 22:35 ` Andrea Arcangeli
2002-05-22 22:44 ` Martin J. Bligh
2002-05-28 2:08 ` Wim Coekaerts
2002-05-31 20:39 ` Andrea Arcangeli
2002-05-23 14:16 ` Bill Davidsen
2002-05-23 17:18 ` Linus Torvalds
2002-05-23 19:34 ` Bill Davidsen
2002-05-23 19:46 ` Linus Torvalds
2002-05-22 18:38 ` Martin J. Bligh
2002-05-22 17:50 ` Alan Cox
2002-05-22 17:54 ` J Sloan
2002-05-22 18:22 ` Have the 2.4 kernel memory management problems on large machines Alan Cox
2002-05-22 22:14 ` J Sloan
2002-05-22 18:24 ` Have the 2.4 kernel memory management problems on large machines been fixed? Martin J. Bligh
2002-05-22 22:05 ` Alan Cox
-- strict thread matches above, loose matches on Subject: below --
2002-05-22 14:29 Alastair Stevens
[not found] <E17AaR0-0002QM-00@the-village.bc.nu.suse.lists.linux.kernel>
[not found] ` <Pine.LNX.4.33.0205221048570.23621-100000@penguin.transmeta.com.suse.lists.linux.kernel>
2002-05-22 20:23 ` Andi Kleen
2002-05-22 20:58 ` Linus Torvalds
2002-05-23 12:40 ` Mike Jagdis
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=366180000.1022091538@flay \
--to=martin.bligh@us.ibm.com \
--cc=akpm@zip.com.au \
--cc=andrea@suse.de \
--cc=linux-kernel@vger.kernel.org \
--cc=riel@surriel.com \
--cc=torvalds@transmeta.com \
--cc=wli@holomorphy.com \
--cc=znmeb@aracnet.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.