public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: Virtual address space exhaustion (was  Discontigmem virt_to_page() )
@ 2002-05-03 18:37 Tony Luck
  2002-05-03 19:01 ` Richard B. Johnson
  0 siblings, 1 reply; 22+ messages in thread
From: Tony Luck @ 2002-05-03 18:37 UTC (permalink / raw)
  To: linux-kernel

Richard B. Johnson wrote:
> One of the Unix characteristics is that the kernel
> address space is shared with each of the process
> address space.

This hasn't been an absolute requirement. There have
been 32-bit Unix implementations that gave separate
4G address spaces to the kernel and to each user
process.  The only real downside to this is that
copyin()/copyout() are more complex. Some processors
provided special instructions to access user-mode
addresses from kernel to mitigate this complexity.

-Tony


__________________________________________________
Do You Yahoo!?
Yahoo! Health - your guide to health and wellness
http://health.yahoo.com

^ permalink raw reply	[flat|nested] 22+ messages in thread
* Re: Bug: Discontigmem virt_to_page() [Alpha,ARM,Mips64?]
@ 2002-05-03  8:38 Andrea Arcangeli
  2002-05-03 15:17 ` Virtual address space exhaustion (was Discontigmem virt_to_page() ) Martin J. Bligh
  0 siblings, 1 reply; 22+ messages in thread
From: Andrea Arcangeli @ 2002-05-03  8:38 UTC (permalink / raw)
  To: Martin J. Bligh
  Cc: William Lee Irwin III, Daniel Phillips, Russell King,
	linux-kernel

On Thu, May 02, 2002 at 11:33:43PM -0700, Martin J. Bligh wrote:
> into kernel address space for ever. That's a fundamental scalability
> problem for a 32 bit machine, and I think we need to fix it. If we
> map only the pages the process is using into the user-kernel address
> space area, rather than the global KVA, we get rid of some of these
> problems. Not that that plan doesn't have its own problems, but ... ;-)

:) As said every workaround has a significant drawback at this point.
Starting flooding the tlb with invlpg and pagetable walking every time
we need to do a set_bit or clear_bit test_bit or an unlock_page is both
overkill at runtime and overcomplex on the software side too to manage
those kernel pools in user memory.

just assume we do that and that you're ok to pay for the hit in general
purpose usage, then the next year how will you plan to workaround the
limitation of 64G of physical ram, are you going to multiplex another
64G of ram via a pci register so you can handle 128G of ram on x86 just
not simultaneously? (but that's ok in theory, the cpu won't notice
you're swapping the ram under it, and you cannot keep mapped in virtual
mem more than 4G anyways simultaneously, so it doesn't matter if some
ram isn't visible on the phsical side either)

I mean, in theory there's no limit, but in practice there's a limit, 64G
is just over the limit for general purpose x86 IMHO, it's at a point
where every workaround for something has a significant performance (or
memory drawback), still very fine for custom apps that needs that much
ram but 32G is the pratical limit of general purpose x86 IMHO.

Ah, and of course you could also use 2M pagetables by default to make it
more usable but still you would run in some huge ram wastage in certain
usages with small files, huge pageins and reads swapout and swapins,
plus it wouldn't be guaranteed to be transparent to the userspace
binaries (for istance mmap offset fields would break backwards
compatibility on the required alignment, that's probably the last
problem though). Despite its also significant drawbacks and the
complexity of the change, probably the 4M pagetables would be the saner
approch to manage more efficiently 64G with only a 800M kernel window.

> Bear in mind that we've sucessfully used 64Gb of ram in a 32 bit 
> virtual addr space a long time ago with Dynix/PTX.

You can use 64G "sucessfully" just now too with 2.4.19pre8 too, as said
in the earlier email there are many applications that doesn't care if
there's only a few meg of zone_normal and for them 2.4.19pre8 is just
fine (actually -aa is much better for the bounce buffers and other vm
fixes in that area). If all the load is in userspace current 2.4 is just
optimal and you'll take advantage of all the ram without problems (let's
assume it's not a numa machine, with numa you'd be better with the fixes
I included in my tree).  But if you need the kernel to do some amount of
work, like vfs caching, blkdev cache, lots of bh on pagecache, lots of
vma, lots of kiobufs, skb etc..  then you'd probably be faster if you
boot with mem=32G or at least you should take actions like recompiling
the kernel as CONFIG_2G that would then break SGA large 1.7G etc...

> > So at the end you'll be left with
> > only say 5/10M per node of zone_normal that will be filled immediatly as
> > soon as you start reading some directory from disk. a few hundred mbyte
> > of vfs cache is the minimum for those machines, this doesn't even take
> > into account bh headers for the pagecache, physical address space
> > pagecache for the buffercache, kiobufs, vma, etc... 
> 
> Bufferheads are another huge problem right now. For a P4 machine, they
> round off to 128 bytes per data structure. I was just looking at a 16Gb
> machine that had completely wedged itself by filling ZONE_NORMAL with 

Go ahead, use -aa or the vm-33 update, I fixed that problem a few days
after hearing about it the first time (with the due credit to Rik in a
comment for showing me such problem btw, I never noticed it before).

> unfreeable overhead - 440Mb of bufferheads alone. Globally mapping the
> bufferheads is probably another thing that'll have to go.
> 
> > It's just that 1G of
> > virtual address space reserved for kernel is too low to handle
> > efficiently 64G of physical ram, this is a fact and you can't 
> > workaround it. 
> 
> Death to global mappings! ;-)
> 
> I'd agree that a 64 bit vaddr space makes much more sense, but we're

This is my whole point yes :)

> stuck with the chips we've got for a little while yet. AMD were a few
> years too late for the bleeding edge Intel arch people amongst us.

Andrea

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2002-05-06  9:49 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-05-03 18:37 Virtual address space exhaustion (was Discontigmem virt_to_page() ) Tony Luck
2002-05-03 19:01 ` Richard B. Johnson
2002-04-27  1:15   ` Pavel Machek
2002-05-03 19:09   ` Christoph Hellwig
2002-05-03 19:17     ` Richard B. Johnson
2002-05-03 19:24       ` Christoph Hellwig
2002-05-03 19:38   ` Matti Aarnio
2002-05-03 19:50   ` Tony Luck
2002-05-03 20:22   ` Jeff Dike
2002-05-03 19:30     ` Richard B. Johnson
2002-05-03 22:35       ` Martin J. Bligh
2002-05-05  0:49         ` Denis Vlasenko
2002-05-05 17:59           ` Martin J. Bligh
  -- strict thread matches above, loose matches on Subject: below --
2002-05-03  8:38 Bug: Discontigmem virt_to_page() [Alpha,ARM,Mips64?] Andrea Arcangeli
2002-05-03 15:17 ` Virtual address space exhaustion (was Discontigmem virt_to_page() ) Martin J. Bligh
2002-05-03 15:58   ` Andrea Arcangeli
2002-05-03 16:10     ` Martin J. Bligh
2002-05-03 16:25       ` Andrea Arcangeli
2002-05-03 16:02   ` Daniel Phillips
2002-05-03 16:20     ` Andrea Arcangeli
2002-05-03 16:41       ` Daniel Phillips
2002-05-03 16:58         ` Andrea Arcangeli
2002-05-03 18:08           ` Daniel Phillips

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox