From: Andrea Arcangeli <andrea@suse.de>
To: "Martin J. Bligh" <Martin.Bligh@us.ibm.com>
Cc: William Lee Irwin III <wli@holomorphy.com>,
Daniel Phillips <phillips@bonn-fries.net>,
Russell King <rmk@arm.linux.org.uk>,
linux-kernel@vger.kernel.org
Subject: Re: Bug: Discontigmem virt_to_page() [Alpha,ARM,Mips64?]
Date: Fri, 3 May 2002 10:38:13 +0200 [thread overview]
Message-ID: <20020503103813.K11414@dualathlon.random> (raw)
In-Reply-To: <20020503080433.R11414@dualathlon.random> <4023859403.1020382422@[10.10.2.3]>
On Thu, May 02, 2002 at 11:33:43PM -0700, Martin J. Bligh wrote:
> into kernel address space for ever. That's a fundamental scalability
> problem for a 32 bit machine, and I think we need to fix it. If we
> map only the pages the process is using into the user-kernel address
> space area, rather than the global KVA, we get rid of some of these
> problems. Not that that plan doesn't have its own problems, but ... ;-)
:) As said every workaround has a significant drawback at this point.
Starting flooding the tlb with invlpg and pagetable walking every time
we need to do a set_bit or clear_bit test_bit or an unlock_page is both
overkill at runtime and overcomplex on the software side too to manage
those kernel pools in user memory.
just assume we do that and that you're ok to pay for the hit in general
purpose usage, then the next year how will you plan to workaround the
limitation of 64G of physical ram, are you going to multiplex another
64G of ram via a pci register so you can handle 128G of ram on x86 just
not simultaneously? (but that's ok in theory, the cpu won't notice
you're swapping the ram under it, and you cannot keep mapped in virtual
mem more than 4G anyways simultaneously, so it doesn't matter if some
ram isn't visible on the phsical side either)
I mean, in theory there's no limit, but in practice there's a limit, 64G
is just over the limit for general purpose x86 IMHO, it's at a point
where every workaround for something has a significant performance (or
memory drawback), still very fine for custom apps that needs that much
ram but 32G is the pratical limit of general purpose x86 IMHO.
Ah, and of course you could also use 2M pagetables by default to make it
more usable but still you would run in some huge ram wastage in certain
usages with small files, huge pageins and reads swapout and swapins,
plus it wouldn't be guaranteed to be transparent to the userspace
binaries (for istance mmap offset fields would break backwards
compatibility on the required alignment, that's probably the last
problem though). Despite its also significant drawbacks and the
complexity of the change, probably the 4M pagetables would be the saner
approch to manage more efficiently 64G with only a 800M kernel window.
> Bear in mind that we've sucessfully used 64Gb of ram in a 32 bit
> virtual addr space a long time ago with Dynix/PTX.
You can use 64G "sucessfully" just now too with 2.4.19pre8 too, as said
in the earlier email there are many applications that doesn't care if
there's only a few meg of zone_normal and for them 2.4.19pre8 is just
fine (actually -aa is much better for the bounce buffers and other vm
fixes in that area). If all the load is in userspace current 2.4 is just
optimal and you'll take advantage of all the ram without problems (let's
assume it's not a numa machine, with numa you'd be better with the fixes
I included in my tree). But if you need the kernel to do some amount of
work, like vfs caching, blkdev cache, lots of bh on pagecache, lots of
vma, lots of kiobufs, skb etc.. then you'd probably be faster if you
boot with mem=32G or at least you should take actions like recompiling
the kernel as CONFIG_2G that would then break SGA large 1.7G etc...
> > So at the end you'll be left with
> > only say 5/10M per node of zone_normal that will be filled immediatly as
> > soon as you start reading some directory from disk. a few hundred mbyte
> > of vfs cache is the minimum for those machines, this doesn't even take
> > into account bh headers for the pagecache, physical address space
> > pagecache for the buffercache, kiobufs, vma, etc...
>
> Bufferheads are another huge problem right now. For a P4 machine, they
> round off to 128 bytes per data structure. I was just looking at a 16Gb
> machine that had completely wedged itself by filling ZONE_NORMAL with
Go ahead, use -aa or the vm-33 update, I fixed that problem a few days
after hearing about it the first time (with the due credit to Rik in a
comment for showing me such problem btw, I never noticed it before).
> unfreeable overhead - 440Mb of bufferheads alone. Globally mapping the
> bufferheads is probably another thing that'll have to go.
>
> > It's just that 1G of
> > virtual address space reserved for kernel is too low to handle
> > efficiently 64G of physical ram, this is a fact and you can't
> > workaround it.
>
> Death to global mappings! ;-)
>
> I'd agree that a 64 bit vaddr space makes much more sense, but we're
This is my whole point yes :)
> stuck with the chips we've got for a little while yet. AMD were a few
> years too late for the bleeding edge Intel arch people amongst us.
Andrea
next prev parent reply other threads:[~2002-05-03 8:37 UTC|newest]
Thread overview: 152+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-04-26 18:27 Bug: Discontigmem virt_to_page() [Alpha,ARM,Mips64?] Russell King
2002-04-26 22:46 ` Andrea Arcangeli
2002-04-29 17:50 ` Martin J. Bligh
2002-04-29 22:00 ` Roman Zippel
2002-04-30 0:43 ` Andrea Arcangeli
2002-04-27 22:10 ` Daniel Phillips
2002-04-29 13:35 ` Andrea Arcangeli
2002-04-29 23:02 ` Daniel Phillips
2002-05-01 2:23 ` Andrea Arcangeli
2002-04-30 23:12 ` Daniel Phillips
2002-05-01 1:05 ` Daniel Phillips
2002-05-02 0:47 ` Andrea Arcangeli
2002-05-01 1:26 ` Daniel Phillips
2002-05-02 1:43 ` Andrea Arcangeli
2002-05-01 2:41 ` Daniel Phillips
2002-05-02 13:34 ` Andrea Arcangeli
2002-05-02 15:18 ` Martin J. Bligh
2002-05-02 15:35 ` Andrea Arcangeli
2002-05-01 15:42 ` Daniel Phillips
2002-05-02 16:06 ` Andrea Arcangeli
2002-05-02 16:10 ` Martin J. Bligh
2002-05-02 16:40 ` Andrea Arcangeli
2002-05-02 17:16 ` William Lee Irwin III
2002-05-02 18:41 ` Andrea Arcangeli
2002-05-02 19:19 ` William Lee Irwin III
2002-05-02 19:27 ` Daniel Phillips
2002-05-02 19:38 ` William Lee Irwin III
2002-05-02 19:58 ` Daniel Phillips
2002-05-03 6:28 ` Andrea Arcangeli
2002-05-03 6:10 ` Andrea Arcangeli
2002-05-02 22:20 ` Martin J. Bligh
2002-05-02 21:28 ` William Lee Irwin III
2002-05-02 21:52 ` Kurt Ferreira
2002-05-02 21:55 ` William Lee Irwin III
2002-05-03 6:38 ` Andrea Arcangeli
2002-05-03 6:58 ` Martin J. Bligh
2002-05-03 6:04 ` Andrea Arcangeli
2002-05-03 6:33 ` Martin J. Bligh
2002-05-03 8:38 ` Andrea Arcangeli [this message]
2002-05-03 9:26 ` William Lee Irwin III
2002-05-03 15:38 ` Martin J. Bligh
2002-05-03 15:17 ` Virtual address space exhaustion (was Discontigmem virt_to_page() ) Martin J. Bligh
2002-05-03 15:58 ` Andrea Arcangeli
2002-05-03 16:10 ` Martin J. Bligh
2002-05-03 16:25 ` Andrea Arcangeli
2002-05-03 16:02 ` Daniel Phillips
2002-05-03 16:20 ` Andrea Arcangeli
2002-05-03 16:41 ` Daniel Phillips
2002-05-03 16:58 ` Andrea Arcangeli
2002-05-03 18:08 ` Daniel Phillips
2002-05-03 9:24 ` Bug: Discontigmem virt_to_page() [Alpha,ARM,Mips64?] William Lee Irwin III
2002-05-03 10:30 ` Andrea Arcangeli
2002-05-03 11:09 ` William Lee Irwin III
2002-05-03 11:27 ` Andrea Arcangeli
2002-05-03 15:42 ` Martin J. Bligh
2002-05-03 15:32 ` Martin J. Bligh
2002-05-02 19:22 ` Daniel Phillips
2002-05-03 6:06 ` Andrea Arcangeli
2002-05-02 18:25 ` Daniel Phillips
2002-05-02 18:44 ` Andrea Arcangeli
2002-05-02 19:31 ` Martin J. Bligh
2002-05-02 18:57 ` Andrea Arcangeli
2002-05-02 19:08 ` Daniel Phillips
2002-05-03 5:15 ` Andrea Arcangeli
2002-05-05 23:54 ` Daniel Phillips
2002-05-06 0:28 ` Andrea Arcangeli
2002-05-06 0:34 ` Daniel Phillips
2002-05-06 1:01 ` Andrea Arcangeli
2002-05-06 0:55 ` Russell King
2002-05-06 1:07 ` Daniel Phillips
2002-05-06 1:20 ` Andrea Arcangeli
2002-05-06 1:24 ` Daniel Phillips
2002-05-06 1:42 ` Andrea Arcangeli
2002-05-06 1:48 ` Daniel Phillips
2002-05-06 2:06 ` Andrea Arcangeli
2002-05-06 17:40 ` Daniel Phillips
2002-05-06 19:09 ` Martin J. Bligh
2002-05-06 1:09 ` Andrea Arcangeli
2002-05-06 1:13 ` Daniel Phillips
2002-05-06 2:03 ` Daniel Phillips
2002-05-06 2:31 ` Andrea Arcangeli
2002-05-06 8:57 ` Russell King
2002-05-06 8:54 ` Roman Zippel
2002-05-06 15:26 ` Daniel Phillips
2002-05-06 19:07 ` Roman Zippel
2002-05-08 15:57 ` Daniel Phillips
2002-05-08 23:11 ` Roman Zippel
2002-05-09 16:08 ` Daniel Phillips
2002-05-09 22:06 ` Roman Zippel
2002-05-09 22:22 ` Daniel Phillips
2002-05-09 23:00 ` Roman Zippel
2002-05-09 23:22 ` Daniel Phillips
2002-05-10 0:13 ` Roman Zippel
2002-05-02 22:39 ` Martin J. Bligh
2002-05-03 7:04 ` Andrea Arcangeli
2002-05-02 23:42 ` Daniel Phillips
2002-05-03 7:45 ` Andrea Arcangeli
2002-05-02 16:07 ` Martin J. Bligh
2002-05-02 16:58 ` Gerrit Huizenga
2002-05-02 18:10 ` Andrea Arcangeli
2002-05-02 19:28 ` Gerrit Huizenga
2002-05-02 22:23 ` Martin J. Bligh
2002-05-03 6:20 ` Andrea Arcangeli
2002-05-03 6:39 ` Martin J. Bligh
2002-05-02 16:00 ` William Lee Irwin III
2002-05-02 2:37 ` William Lee Irwin III
2002-05-02 15:59 ` Andrea Arcangeli
2002-05-02 16:06 ` William Lee Irwin III
2002-05-01 18:05 ` Jesse Barnes
2002-05-01 23:17 ` Andrea Arcangeli
2002-05-01 23:23 ` discontiguous memory platforms Jesse Barnes
2002-05-02 0:51 ` Ralf Baechle
2002-05-02 1:27 ` Andrea Arcangeli
2002-05-02 1:32 ` Ralf Baechle
2002-05-02 8:50 ` Roman Zippel
2002-05-01 13:21 ` Daniel Phillips
2002-05-02 14:00 ` Roman Zippel
2002-05-01 14:08 ` Daniel Phillips
2002-05-02 17:56 ` Roman Zippel
2002-05-01 17:59 ` Daniel Phillips
2002-05-02 18:26 ` Roman Zippel
2002-05-02 18:32 ` Daniel Phillips
2002-05-02 19:40 ` Roman Zippel
2002-05-02 20:14 ` Daniel Phillips
2002-05-03 6:34 ` Andrea Arcangeli
2002-05-03 9:33 ` Roman Zippel
2002-05-03 6:30 ` Andrea Arcangeli
2002-05-02 18:35 ` Geert Uytterhoeven
2002-05-02 18:39 ` Daniel Phillips
2002-05-02 0:20 ` Bug: Discontigmem virt_to_page() [Alpha,ARM,Mips64?] Anton Blanchard
2002-05-01 1:35 ` Daniel Phillips
2002-05-02 1:45 ` William Lee Irwin III
2002-05-01 2:02 ` Daniel Phillips
2002-05-02 2:33 ` William Lee Irwin III
2002-05-01 2:44 ` Daniel Phillips
2002-05-02 1:46 ` Andrea Arcangeli
2002-05-01 1:56 ` Daniel Phillips
2002-05-02 1:01 ` Andrea Arcangeli
2002-05-02 15:28 ` Anton Blanchard
2002-05-01 16:10 ` Daniel Phillips
2002-05-02 15:59 ` Dave Engebretsen
2002-05-01 17:24 ` Daniel Phillips
2002-05-02 16:44 ` Dave Engebretsen
2002-05-02 16:31 ` William Lee Irwin III
2002-05-02 16:21 ` Dave Engebretsen
2002-05-02 17:28 ` William Lee Irwin III
2002-05-02 23:05 ` Daniel Phillips
2002-05-03 0:05 ` William Lee Irwin III
2002-05-03 1:19 ` Daniel Phillips
2002-05-03 19:47 ` Dave Engebretsen
2002-05-03 22:06 ` Daniel Phillips
2002-05-03 23:52 ` David Mosberger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20020503103813.K11414@dualathlon.random \
--to=andrea@suse.de \
--cc=Martin.Bligh@us.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=phillips@bonn-fries.net \
--cc=rmk@arm.linux.org.uk \
--cc=wli@holomorphy.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox