From: "Martin J. Bligh" <Martin.Bligh@us.ibm.com>
To: Andrea Arcangeli <andrea@suse.de>
Cc: William Lee Irwin III <wli@holomorphy.com>,
"M. Edward Borasky" <znmeb@aracnet.com>,
linux-kernel@vger.kernel.org, riel@surriel.com,
torvalds@transmeta.com, akpm@zip.com.au
Subject: Re: Have the 2.4 kernel memory management problems on large machines been fixed?
Date: Wed, 22 May 2002 11:18:58 -0700 [thread overview]
Message-ID: <366180000.1022091538@flay> (raw)
In-Reply-To: <20020522172157.GK21164@dualathlon.random>
>> >> Persistent kmap sucks, and the global systemwide TLB flushes
>> >> scale as O(1/N^2) with the number of CPUs. Enlarging the kmap
>> >> area helps a little, but really we need to stop doing this to
>> >> ourselves. I will have a patch (hopefully within a week) to do
>> >> per-task kmap, based on the UKVA patch that Dave McCracken has
>> >> already implemented.
>> >
>> > O(1/N^2)? wouldn't that get progressively better as the number of cpu's
>
> 1/N^2 is less than O(1), no-way.
Sorry, typo - O(N^2). Cost of each systemwide flush is N times as much, and
we do them N times more often (fixed size kmap pool, due to fixed size KVA).
At a quick test, Keith found that increasing the size of the kmap pool from 1024
to 4096 (4Mb to 16Mb of KVA consumed) reduces the number of flushes by a
factor of 10 (due to the static overhead).
> Anyways this is only a matter of implementing the
> persistent-and-atomic-kmap, I'm pretty sure they're the right solution
> for this problem, then the whole pool in highmem.c will go away and even
> the pagecache will stop blocking on the kmaps.
Working on the first stage of it as we speak ...
> I look forward to see the patch (just the kmap-atomic-and-persistent,
> not the constnatly mapped pte that is more likely to be a regression
> than current linux way IMHO), so we can possibly cleanup and then
> integrate it in 2.5 :).
We have a breakoff of the UKVA infrastructure now (thanks to Dave McCracken),
and once we've kicked its tires a little, we'll pass it across for inspection.
> Other things like managing 63G of highmem with only 850M of direct
> mapping they're almost unsolvable in a generic manner, however
> configuration options and arch-ifdefs can be used here. If the
> computation always stays in kernel or always in usersapce then 4G KVA is
> a solution (as slow as 2.0, the first bigmem for 2.2 and PTX I guess).
I'm more worried about 32Gb than 64Gb for the moment, I don't know
of any machines anyone is actually selling that will take 64Gb - the
NUMA-Q will if we want to work on it, but 16Gb and 32Gb are the
real points right now.
> But even CONFG_2G may not be ok if you want 1.7G of
> shm constnatly mapped in all tasks.
Exactly. Sometimes I hate databases ;-)
> So at the end the closest to a
> generic solution may be to rewrite the whole kernel MM API to use pfn
> instead of page structures and to kmap the mem_map to get the struct
> page, so you don't shrink the user address space, you move the huge
> mem_map to highmem and the slowdown ""should"" be minor than the 4G KVA
> probably (not obvious though),
Page clustering may be an easier solution for now, and you're right this is only
a "bridge" to the new world ... that'd give us an effective 16Kb page size, with
probably much less pain than the kmap'ed mem_map, and might even *improve*
performance ;-) Taming the beast to something workable rather than killing it totally
is good enough ...
M.
next prev parent reply other threads:[~2002-05-22 18:24 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-05-22 6:51 2.4.19pre*: IO statistics in /proc/partitions corrupt Jochen Suckfuell
2002-05-22 14:00 ` Have the 2.4 kernel memory management problems on large machines been fixed? M. Edward Borasky
2002-05-22 14:08 ` bert hubert
2002-05-22 14:55 ` Alan Cox
2002-05-22 15:56 ` Martin J. Bligh
2002-05-22 16:23 ` Alan Cox
2002-05-22 21:46 ` Doug Ledford
2002-05-22 14:36 ` William Lee Irwin III
2002-05-22 15:44 ` Martin J. Bligh
2002-05-22 15:53 ` Martin J. Bligh
2002-05-22 16:07 ` William Lee Irwin III
2002-05-22 16:36 ` Martin J. Bligh
2002-05-22 17:21 ` Andrea Arcangeli
2002-05-22 18:18 ` Martin J. Bligh [this message]
2002-05-22 18:02 ` Alan Cox
2002-05-22 18:08 ` Linus Torvalds
2002-05-22 18:30 ` Rik van Riel
2002-05-22 18:40 ` Linus Torvalds
2002-05-22 18:48 ` Martin J. Bligh
2002-05-22 18:34 ` Have the 2.4 kernel memory management problems on large machines Alan Cox
2002-05-22 18:27 ` Linus Torvalds
2002-05-22 20:30 ` Have the 2.4 kernel memory management problems on large machines been fixed? William Lee Irwin III
2002-05-22 21:18 ` Martin J. Bligh
2002-05-22 21:23 ` Linus Torvalds
2002-05-22 22:35 ` Andrea Arcangeli
2002-05-22 22:44 ` Martin J. Bligh
2002-05-28 2:08 ` Wim Coekaerts
2002-05-31 20:39 ` Andrea Arcangeli
2002-05-23 14:16 ` Bill Davidsen
2002-05-23 17:18 ` Linus Torvalds
2002-05-23 19:34 ` Bill Davidsen
2002-05-23 19:46 ` Linus Torvalds
2002-05-22 18:38 ` Martin J. Bligh
2002-05-22 17:50 ` Alan Cox
2002-05-22 17:54 ` J Sloan
2002-05-22 18:22 ` Have the 2.4 kernel memory management problems on large machines Alan Cox
2002-05-22 22:14 ` J Sloan
2002-05-22 18:24 ` Have the 2.4 kernel memory management problems on large machines been fixed? Martin J. Bligh
2002-05-22 22:05 ` Alan Cox
-- strict thread matches above, loose matches on Subject: below --
2002-05-22 14:29 Alastair Stevens
[not found] <E17AaR0-0002QM-00@the-village.bc.nu.suse.lists.linux.kernel>
[not found] ` <Pine.LNX.4.33.0205221048570.23621-100000@penguin.transmeta.com.suse.lists.linux.kernel>
2002-05-22 20:23 ` Andi Kleen
2002-05-22 20:58 ` Linus Torvalds
2002-05-23 12:40 ` Mike Jagdis
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=366180000.1022091538@flay \
--to=martin.bligh@us.ibm.com \
--cc=akpm@zip.com.au \
--cc=andrea@suse.de \
--cc=linux-kernel@vger.kernel.org \
--cc=riel@surriel.com \
--cc=torvalds@transmeta.com \
--cc=wli@holomorphy.com \
--cc=znmeb@aracnet.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox