The 3G (or nG) Kernel Memory Space Offset

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* The 3G (or nG) Kernel Memory Space Offset
@ 2006-08-29 14:15 Dong Feng
  2006-08-29 14:32 ` Andi Kleen
  2006-08-29 14:36 ` Jan Engelhardt
  0 siblings, 2 replies; 14+ messages in thread
From: Dong Feng @ 2006-08-29 14:15 UTC (permalink / raw)
  To: Andi Kleen, Nick Piggin, Arjan van de Ven, Dong Feng,
	Paul Mackerras, Christoph Lameter, David Howells
  Cc: linux-kernel

The Linux kernel permenantly map 3-4G linear memory space to 0-4G
physical memory space. My question is that what is the rationality
behind this counterintuitive mapping. Is this just some personal
choice for the earlier kernel developers?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: The 3G (or nG) Kernel Memory Space Offset
  2006-08-29 14:15 The 3G (or nG) Kernel Memory Space Offset Dong Feng
@ 2006-08-29 14:32 ` Andi Kleen
  2006-08-29 14:36 ` Jan Engelhardt
  1 sibling, 0 replies; 14+ messages in thread
From: Andi Kleen @ 2006-08-29 14:32 UTC (permalink / raw)
  To: Dong Feng
  Cc: Nick Piggin, Arjan van de Ven, Paul Mackerras, Christoph Lameter,
	David Howells, linux-kernel

On Tuesday 29 August 2006 16:15, Dong Feng wrote:
> The Linux kernel permenantly map 3-4G linear memory space to 0-4G

i386 Mainline kernel doesn't, no.

-Andi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: The 3G (or nG) Kernel Memory Space Offset
  2006-08-29 14:15 The 3G (or nG) Kernel Memory Space Offset Dong Feng
  2006-08-29 14:32 ` Andi Kleen
@ 2006-08-29 14:36 ` Jan Engelhardt
  2006-08-29 16:01   ` Dong Feng
  1 sibling, 1 reply; 14+ messages in thread
From: Jan Engelhardt @ 2006-08-29 14:36 UTC (permalink / raw)
  To: Dong Feng
  Cc: Andi Kleen, Nick Piggin, Arjan van de Ven, Paul Mackerras,
	Christoph Lameter, David Howells, linux-kernel

>
> The Linux kernel permenantly map 3-4G linear memory space to 0-4G
> physical memory space.

"3-4G linear memory space" is usually the "kernel space", i.e. 0xc0000000 
upwards. mostly the kernel is loaded here (on x86).

"0-4G physical memory space" denotes RAM. Since kernelspace is resident, it 
only seems logical to map it to 0G (that is, the start of RAM), because the 
end of RAM can be flexible.

IOW, you cannot map kernelspace to the physical location 0xc0000000 because 
there might not be that much RAM.

(Also note the PCI memory hole which is near the end of the 4G range.)

> My question is that what is the rationality
> behind this counterintuitive mapping. Is this just some personal
> choice for the earlier kernel developers?

Jan Engelhardt
-- 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: The 3G (or nG) Kernel Memory Space Offset
  2006-08-29 14:36 ` Jan Engelhardt
@ 2006-08-29 16:01   ` Dong Feng
  2006-08-29 16:05     ` Arjan van de Ven
                       ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Dong Feng @ 2006-08-29 16:01 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: Andi Kleen, Nick Piggin, Arjan van de Ven, Paul Mackerras,
	Christoph Lameter, David Howells, linux-kernel

2006/8/29, Jan Engelhardt <jengelh@linux01.gwdg.de>:
>
> "0-4G physical memory space" denotes RAM. Since kernelspace is resident, it
> only seems logical to map it to 0G (that is, the start of RAM), because the
> end of RAM can be flexible.
>
> IOW, you cannot map kernelspace to the physical location 0xc0000000 because
> there might not be that much RAM.
>
> (Also note the PCI memory hole which is near the end of the 4G range.)
>
>
> Jan Engelhardt
> --
>

Sorry for my typo. I actually means "0-1G physical memory space." My
question is actually why there is a 3G offset from linear kernel to
physical kernel. Why not simply have kernel memory linear space
located on 0-1G linear address, and therefore the physical kernel and
linear kernel just coincide?

Or perhaps this offset is just some personal favor. Say if the first
kernel designer decided to locate kernel at 2-3G linear address, then
2G offset would have appeared in code. Is this the case?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: The 3G (or nG) Kernel Memory Space Offset
  2006-08-29 16:01   ` Dong Feng
@ 2006-08-29 16:05     ` Arjan van de Ven
  2006-08-29 16:30       ` Jan Engelhardt
  2006-08-29 16:12     ` Christoph Lameter
  2006-08-29 16:42     ` Jeremy Fitzhardinge
  2 siblings, 1 reply; 14+ messages in thread
From: Arjan van de Ven @ 2006-08-29 16:05 UTC (permalink / raw)
  To: Dong Feng
  Cc: Jan Engelhardt, Andi Kleen, Nick Piggin, Paul Mackerras,
	Christoph Lameter, David Howells, linux-kernel

On Wed, 2006-08-30 at 00:01 +0800, Dong Feng wrote:
> 2006/8/29, Jan Engelhardt <jengelh@linux01.gwdg.de>:
> >
> > "0-4G physical memory space" denotes RAM. Since kernelspace is resident, it
> > only seems logical to map it to 0G (that is, the start of RAM), because the
> > end of RAM can be flexible.
> >
> > IOW, you cannot map kernelspace to the physical location 0xc0000000 because
> > there might not be that much RAM.
> >
> > (Also note the PCI memory hole which is near the end of the 4G range.)
> >
> >
> > Jan Engelhardt
> > --
> >
> 
> 
> Sorry for my typo. I actually means "0-1G physical memory space." My
> question is actually why there is a 3G offset from linear kernel to
> physical kernel. Why not simply have kernel memory linear space
> located on 0-1G linear address, and therefore the physical kernel and
> linear kernel just coincide?


the price for that would be that you would have to flush all the tlb's
on each syscall. That's seen as a quite hefty price by many kernel
developers.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: The 3G (or nG) Kernel Memory Space Offset
  2006-08-29 16:05     ` Arjan van de Ven
@ 2006-08-29 16:30       ` Jan Engelhardt
  2006-08-29 16:44         ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 14+ messages in thread
From: Jan Engelhardt @ 2006-08-29 16:30 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Dong Feng, Andi Kleen, Nick Piggin, Paul Mackerras,
	Christoph Lameter, David Howells, linux-kernel

>> 
>> Sorry for my typo. I actually means "0-1G physical memory space." My
>> question is actually why there is a 3G offset from linear kernel to
>> physical kernel. Why not simply have kernel memory linear space
>> located on 0-1G linear address, and therefore the physical kernel and
>> linear kernel just coincide?
>
>the price for that would be that you would have to flush all the tlb's
>on each syscall. That's seen as a quite hefty price by many kernel
>developers.

Since it's all just virtual addresses, is the TLB flush really that much 
different when kernelspace runs from (virtual) 0x00000000-0x3FFFFFFF rather 
than (virtual)0xC000000-0xFFFFFFFF?


Jan Engelhardt
-- 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: The 3G (or nG) Kernel Memory Space Offset
  2006-08-29 16:30       ` Jan Engelhardt
@ 2006-08-29 16:44         ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 14+ messages in thread
From: Jeremy Fitzhardinge @ 2006-08-29 16:44 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: Arjan van de Ven, Dong Feng, Andi Kleen, Nick Piggin,
	Paul Mackerras, Christoph Lameter, David Howells, linux-kernel

Jan Engelhardt wrote:
> Since it's all just virtual addresses, is the TLB flush really that much 
> different when kernelspace runs from (virtual) 0x00000000-0x3FFFFFFF rather 
> than (virtual)0xC000000-0xFFFFFFFF?
>   

If kernel and userspace are disjoint, they can be in the same address 
space, so there's no need for a TLB flush at all.

    J

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: The 3G (or nG) Kernel Memory Space Offset
  2006-08-29 16:01   ` Dong Feng
  2006-08-29 16:05     ` Arjan van de Ven
@ 2006-08-29 16:12     ` Christoph Lameter
  2006-08-29 16:16       ` Dong Feng
  2006-08-29 16:42     ` Jeremy Fitzhardinge
  2 siblings, 1 reply; 14+ messages in thread
From: Christoph Lameter @ 2006-08-29 16:12 UTC (permalink / raw)
  To: Dong Feng
  Cc: Jan Engelhardt, Andi Kleen, Nick Piggin, Arjan van de Ven,
	Paul Mackerras, David Howells, linux-kernel

On Wed, 30 Aug 2006, Dong Feng wrote:

> Or perhaps this offset is just some personal favor. Say if the first
> kernel designer decided to locate kernel at 2-3G linear address, then
> 2G offset would have appeared in code. Is this the case?

Well this is the second time that you suggest that the reason for 
technical decisions have to do with personal favors. Are you trying to 
provoke us into answering your question?


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: The 3G (or nG) Kernel Memory Space Offset
  2006-08-29 16:12     ` Christoph Lameter
@ 2006-08-29 16:16       ` Dong Feng
  0 siblings, 0 replies; 14+ messages in thread
From: Dong Feng @ 2006-08-29 16:16 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Jan Engelhardt, Andi Kleen, Nick Piggin, Arjan van de Ven,
	Paul Mackerras, David Howells, linux-kernel

No, please do not get me wrong. Or perhaps please tolerate my poor English.

My intention is just trying to understand whether there is an absolute
rationality for a design choice or it just has to have a choice and
the choice can be made arbitrarily.

Sorry again if my English cause any misunderstanding.


2006/8/30, Christoph Lameter <clameter@sgi.com>:
> On Wed, 30 Aug 2006, Dong Feng wrote:
>
> > Or perhaps this offset is just some personal favor. Say if the first
> > kernel designer decided to locate kernel at 2-3G linear address, then
> > 2G offset would have appeared in code. Is this the case?
>
> Well this is the second time that you suggest that the reason for
> technical decisions have to do with personal favors. Are you trying to
> provoke us into answering your question?
>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: The 3G (or nG) Kernel Memory Space Offset
  2006-08-29 16:01   ` Dong Feng
  2006-08-29 16:05     ` Arjan van de Ven
  2006-08-29 16:12     ` Christoph Lameter
@ 2006-08-29 16:42     ` Jeremy Fitzhardinge
  2006-08-29 18:37       ` Peter Grandi
  2 siblings, 1 reply; 14+ messages in thread
From: Jeremy Fitzhardinge @ 2006-08-29 16:42 UTC (permalink / raw)
  To: Dong Feng
  Cc: Jan Engelhardt, Andi Kleen, Nick Piggin, Arjan van de Ven,
	Paul Mackerras, Christoph Lameter, David Howells, linux-kernel

Dong Feng wrote:
> Sorry for my typo. I actually means "0-1G physical memory space." My
> question is actually why there is a 3G offset from linear kernel to
> physical kernel. Why not simply have kernel memory linear space
> located on 0-1G linear address, and therefore the physical kernel and
> linear kernel just coincide?

If kernel virtual addresses were low, you would either need to do an 
address-space switch (=TLB flush) on every user-kernel switch, or 
require userspace to be at some high address.  The former would be very 
expensive, and the latter very strange (the standard x86 ABI requires 
low addresses).  The clean solution is to map the kernel to the high 
part of the address space, but it is easier to load the kernel into low 
physical memory at boot, thus leading to a physical-virtual offset.  The 
selection of 3G is a reasonable tradeoff of physical memory size vs user 
virtual address space size, but of course it can be adjusted, or you can 
use highmem.

    J

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: The 3G (or nG) Kernel Memory Space Offset
  2006-08-29 16:42     ` Jeremy Fitzhardinge
@ 2006-08-29 18:37       ` Peter Grandi
  2006-08-29 21:15         ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Grandi @ 2006-08-29 18:37 UTC (permalink / raw)
  To: Linux kernel

[ ... ]

df> My question is actually why there is a 3G offset from linear
df> kernel to physical kernel. Why not simply have kernel memory
df> linear space located on 0-1G linear address, and therefore the
df> physical kernel and linear kernel just coincide?

First of all there are _three_ mapping regions:

* for the per-process address space	(x86 default 3GiB at address 0);
* the kernel address space		(x86 default 128MiB at address 3Gib);
* the real memory address space		(x86 default the last 896MiB).

The kernel address space is small and does not matter much in this
discussionm except for stealing 128MiB. What really matter are the
other two. Note also that the memory resident pages of a process
are necessarily mapped twice, once in the per-process address
space and once in the real memory space.

There are actually three possible cases:

1) per-process mapped low, real memory mapped high (e.g. 3GiB+128MiB+896MiB).
2) real memory mapped low, per-process mapped high (e.g. 896MiB+128MiB+3GiB).
3) both per-process and real memory mapped low (e.g. 3.9GiB+128MiB), with 
   real memory/per process flipping or something else.

jeremy> If kernel virtual addresses were low, you would either
jeremy> need to do an address-space switch (=TLB flush)

This is case #3, which was the norm on many platforms, e.g. UNIX
PDP-11. To be practical it requires special instructions to
load/store from unmapped address spaces. Linus prefers to map
both kernel and physical memory in every address space.

jeremy> on every user-kernel switch,

Not on every user-kernel switch, because there are two (or
three) possibilities:

* Only the real-memory address space has the 128MiB kernel
  address space map, which seems what this phrase assumes.

* Each address space, including both per-process ones and the
  real memory one, have a 128MiB mapping for the kernel address
  space.

If the 128MiB kernel address space were still to be mapped in
every process address space *and* the real memory space, one would
need a switch only when the kernel wants to access per-process
space and real memory in short time. Unless there is some special
way that allows the kernel to address real memory directly, but
then that gets a bit cumbersome.

jeremy> or require userspace to be at some high address.

This is case #2.

jeremy> and the latter very strange (the standard x86 ABI
jeremy> requires low addresses).

Strange does not matter a lot; but it is somewhat surprising
that the x86 ABI, which includes shared libs all over the place,
does require low addresses. But if that is the case it must have
been an important point in the past, when layout compatibility
might have mattered for iBCS (anybody remembers that? :->).

What I suspect is more likely as a reason to avoid mapping
per-process address space at high addresses is that would have
broken many incorrect programs... :-)

jeremy> The clean solution is to map the kernel to the high part
jeremy> of the address space,

Not necessarily clean, but perhaps required by ABI compatibility.

jeremy> but it is easier to load the kernel into low physical
jeremy> memory at boot, thus leading to a physical-virtual
jeremy> offset.

Odd, because this is an argument to have case #2 or #3: because
then one loads the kernel code at low physical addresses, and
then maps them 1-1 onto virtual addresses.

jeremy> The selection of 3G is a reasonable tradeoff of physical
jeremy> memory size vs user virtual address space size, but of
jeremy> course it can be adjusted, or you can use highmem.

Probably this was a bit dumb, because of the missing 128MiB
syndrome. I would set the default for per-process space to
2GiB-128MiB, leaving only those users who want more per-process
address space and don't want to move to 64-bit to move the
boundary back. Some reflections on this in:

  http://WWW.sabi.co.UK/Notes/#060821c

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: The 3G (or nG) Kernel Memory Space Offset
  2006-08-29 18:37       ` Peter Grandi
@ 2006-08-29 21:15         ` Jeremy Fitzhardinge
  2006-08-30  6:11           ` Jan Engelhardt
  0 siblings, 1 reply; 14+ messages in thread
From: Jeremy Fitzhardinge @ 2006-08-29 21:15 UTC (permalink / raw)
  To: Peter Grandi; +Cc: Linux kernel

Peter Grandi wrote:
> [ ... ]
>
> df> My question is actually why there is a 3G offset from linear
> df> kernel to physical kernel. Why not simply have kernel memory
> df> linear space located on 0-1G linear address, and therefore the
> df> physical kernel and linear kernel just coincide?
>
> First of all there are _three_ mapping regions:
>
> * for the per-process address space	(x86 default 3GiB at address 0);
> * the kernel address space		(x86 default 128MiB at address 3Gib);
> * the real memory address space		(x86 default the last 896MiB).
>
> The kernel address space is small and does not matter much in this
> discussionm except for stealing 128MiB. What really matter are the
> other two. Note also that the memory resident pages of a process
> are necessarily mapped twice, once in the per-process address
> space and once in the real memory space.
>
> There are actually three possible cases:
>
> 1) per-process mapped low, real memory mapped high (e.g. 3GiB+128MiB+896MiB).
> 2) real memory mapped low, per-process mapped high (e.g. 896MiB+128MiB+3GiB).
> 3) both per-process and real memory mapped low (e.g. 3.9GiB+128MiB), with 
>    real memory/per process flipping or something else.
>
> jeremy> If kernel virtual addresses were low, you would either
> jeremy> need to do an address-space switch (=TLB flush)
>
> This is case #3, which was the norm on many platforms, e.g. UNIX
> PDP-11. To be practical it requires special instructions to
> load/store from unmapped address spaces. Linus prefers to map
> both kernel and physical memory in every address space.
>   

Linux used to use segmentation to do something like this; %fs was set up 
to point to the user address space in the kernel, and accesses to 
userspace used an %fs segment override.  This history is still visible 
in the naming of get/set_fs (which has nothing to do with filesystems).

> * Only the real-memory address space has the 128MiB kernel
>   address space map, which seems what this phrase assumes.
>
> * Each address space, including both per-process ones and the
>   real memory one, have a 128MiB mapping for the kernel address
>   space.
>   

By "real" I assume you mean "physical".  What you're suggesting is 
something akin to highmem, but applied to all memory.  With highmem, the 
kernel can't assume it has direct access to all physical memory, and you 
must explicitly map it in with kmap() to use it.  You could do this with 
all memory all the time, but with an obvious performance (and 
complexity) overhead.

> Strange does not matter a lot; but it is somewhat surprising
> that the x86 ABI, which includes shared libs all over the place,
> does require low addresses. But if that is the case it must have
> been an important point in the past, when layout compatibility
> might have mattered for iBCS (anybody remembers that? :->).
>   

Well, if you're mapping kernel+physical memory at low addresses, it 
means that userspace moves about depending on where you want to put the 
user/kernel split.  That's a lot harder to deal with than just moving 
around the limit.

The load address for ET_EXEC executables is defined as 0x08048000; you 
can use ET_DYN if you want to load them elsewhere.  Using lower 
addresses allows the use of instructions with smaller pointers and 
offsets (though this might be less important on x86).  x86-64's normal 
compilation model requires non-relocatable code to be in the lower 
2Gbytes, for example.

> Odd, because this is an argument to have case #2 or #3: because
> then one loads the kernel code at low physical addresses, and
> then maps them 1-1 onto virtual addresses.
>   

I'm pointing out that the existing design has a reasonable technical 
justification, and is not somebody's arbitrary personal choice.  There 
are certainly other possible designs, with their own pros and cons.

    J

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: The 3G (or nG) Kernel Memory Space Offset
  2006-08-29 21:15         ` Jeremy Fitzhardinge
@ 2006-08-30  6:11           ` Jan Engelhardt
  0 siblings, 0 replies; 14+ messages in thread
From: Jan Engelhardt @ 2006-08-30  6:11 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Peter Grandi, Linux kernel

>
> The load address for ET_EXEC executables is defined as 0x08048000;
> you can use ET_DYN if you want to load them elsewhere.  Using lower
> addresses allows the use of instructions with smaller pointers and
> offsets (though this might be less important on x86).

Less on x86. HTE tells me there are only two ways (EB and E9):

      EB ??                 jmp OFFSET8     for 16/32/64
      E9 ?? ??              jmp OFFSET16    for 16-bit mode
      E9 ?? ?? ?? ??        jmp OFFSET32    for 32/64-bit mode



Jan Engelhardt
-- 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: The 3G (or nG) Kernel Memory Space Offset
@ 2006-08-30  4:32 linux
  0 siblings, 0 replies; 14+ messages in thread
From: linux @ 2006-08-30  4:32 UTC (permalink / raw)
  To: middle.fengdong; +Cc: linux-kernel

Just to answer the question in elementary terms:

This is because:
- On x86, the user and kernel share the available 4G virtual address space,
- User space gets first choice, and so takes the low 3G.
- The kernel thus has to use the high 1G, and if it wants a copy
  of physical memory, that's the only place it can go.

In somewhat more detail:

1) In standard x86 Linux, the user and kernel address spaces share the 4
   GB virtual address space of the x86 processor.  There are other ways
   to do it (see the 4G+4G patch for an example), but they're slower.

   x86 processors only support one set of page tables at a time, and
   changing is a slow operation.  Other processors let you have separate
   user and kernel page tables active simultaneously, but x86 does not.

   So for speed, you don't want to change page tables to make a system
   call.  Also, many system calls are passed pointers to buffers in user
   memory, so need to access user memory.  It's fastest and easiest to do
   this if user memory is in the address space when executing kernel code.

   Fortunately, x86 page tables have a "user" bit in each page table
   entry, that can make pages only accessible from the kernel.  They are
   still in the user's virtual address space, but can't be accessed.
   Thus, it is possible for the user and kernel to share the address space.

   So, given all of this, Linux (as well as most other operating systems)
   on x86 has decided to divide the 4 GB virtual address space into "user"
   and "kernel" parts.  As far as the user is concerned, the kernel part
   is just "missing", so it's made as as small as reasonably possible.

2) The division chosen is that the user gets the low 3G of the address
   space, and the kernel gets the high 1G.  x86 ABI standards require
   that user space gets low addresses, and in any case, the kernel exists
   to make user-space programs happy.

3) The kernel finds it convenient to have a copy of physical memory in its
   address space, so it maps one.  If there's more RAM than will fit in the
   kernel address space, the HIGHMEM patches provide an alternative.
   Since this is an elementary explanation, I won't describe how that works.

Thus, the physical memory map used in the kernel ends up offset by 3G.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2006-08-30  6:12 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-08-29 14:15 The 3G (or nG) Kernel Memory Space Offset Dong Feng
2006-08-29 14:32 ` Andi Kleen
2006-08-29 14:36 ` Jan Engelhardt
2006-08-29 16:01   ` Dong Feng
2006-08-29 16:05     ` Arjan van de Ven
2006-08-29 16:30       ` Jan Engelhardt
2006-08-29 16:44         ` Jeremy Fitzhardinge
2006-08-29 16:12     ` Christoph Lameter
2006-08-29 16:16       ` Dong Feng
2006-08-29 16:42     ` Jeremy Fitzhardinge
2006-08-29 18:37       ` Peter Grandi
2006-08-29 21:15         ` Jeremy Fitzhardinge
2006-08-30  6:11           ` Jan Engelhardt
  -- strict thread matches above, loose matches on Subject: below --
2006-08-30  4:32 linux

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox