public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* i686 SMP systems with more then 12 GB ram with 2.4.x kernel ?
@ 2001-12-29 18:18 Harald Holzer
  2001-12-29 18:45 ` Alan Cox
  2002-01-02 17:30 ` Timothy D. Witham
  0 siblings, 2 replies; 37+ messages in thread
From: Harald Holzer @ 2001-12-29 18:18 UTC (permalink / raw)
  To: linux-kernel@vger.kernel.org

Are there some i686 SMP systems with more then 12 GB ram out there ?

Is there a known problem with 2.4.x kernel and such systems ?

I have the following problem:

A Intel SRPM8 Server with 32 GB ram and RH 7.2 and kernel 2.4.17 on it.
After doing a lot of disc access the system slows down and the oom
killer begins his work. (only after running some cp processes.)
Because the system is running out of low memory.

Disable the oom killer has no affect, the low memory is going to 0
and the system dies.

It looks like as the buffer_heads would fill the low memory up,
whether there is sufficient memory available or not, as long as
there is sufficient high memory for caching.

Any ideas ?

Here some information from /proc/meminfo and /proc/slabinfo some
seconds before dying:

        total:    used:    free:  shared: buffers:  cached:
Mem:  33781227520 10787348480 22993879040        0 21807104 10447962112
Swap:        0        0        0
MemTotal:     32989480 kB
MemFree:      22454960 kB
MemShared:           0 kB
Buffers:         21296 kB
Cached:       10203088 kB
SwapCached:          0 kB
Active:          27248 kB
Inact_dirty:  10206312 kB
Inact_clean:         0 kB
Inact_target:  2046712 kB
HighTotal:    32636928 kB
HighFree:     22424128 kB
LowTotal:       352552 kB
LowFree:         30832 kB
SwapTotal:           0 kB
SwapFree:            0 kB

slabinfo - version: 1.1 (statistics) (SMP)
kmem_cache           112    112    284    8    8    1 :    112     112     8    0    0 :  124   62 :     12      9      0      0
ip_fib_hash          145    145     24    1    1    1 :    145     145     1    0    0 :  252  126 :      8      3      0      0
urb_priv               0      0     56    0    0    1 :      0       0     0    0    0 :  252  126 :      0      0      0      0
journal_head         755  11591     56   49  173    1 :  11591 1255589   173    0    0 :  252  126 : 1271927  10219 1272001   9959
revoke_table         169    169     20    1    1    1 :    169     169     1    0    0 :  252  126 :      0      3      0      0
revoke_record        126    145     24    1    1    1 :    126     126     1    0    0 :  252  126 :      2      2      3      0
clip_arp_cache         0      0    124    0    0    1 :      0       0     0    0    0 :  252  126 :      0      0      0      0
ip_mrt_cache           0      0     84    0    0    1 :      0       0     0    0    0 :  252  126 :      0      0      0      0
tcp_tw_bucket          0      0    128    0    0    1 :      0       0     0    0    0 :  252  126 :      0      0      0      0
tcp_bind_bucket      561    580     24    4    4    1 :    561     561     4    0    0 :  252  126 :      4     11      0      0
tcp_open_request     252    252     92    6    6    1 :    252     252     6    0    0 :  252  126 :      0     12      6      0
inet_peer_cache        0      0     48    0    0    1 :      0       0     0    0    0 :  252  126 :      0      0      0      0
ip_dst_cache         176    176    176    8    8    1 :    176     176     8    0    0 :  252  126 :     40     16     22      0
arp_cache             64     64    120    2    2    1 :     64      64     2    0    0 :  252  126 :      0      4      1      0
blkdev_requests     1056   1056     88   24   24    1 :   1056    1056    24    0    0 :  252  126 :   1128     48    256      0
dnotify cache          0      0     28    0    0    1 :      0       0     0    0    0 :  252  126 :      0      0      0      0
file lock cache      312    312    100    8    8    1 :    312     312     8    0    0 :  252  126 :  19870     16  19876      0
fasync cache           0      0     24    0    0    1 :      0       0     0    0    0 :  252  126 :      0      0      0      0
uid_cache            339    339     32    3    3    1 :    339     339     3    0    0 :  252  126 :      1      6      1      0
skbuff_head_cache   2438   2438    168  106  106    1 :   2438    3950   106    0    0 :  252  126 :   3943    224   3035     12
sock                 129    129   1280   43   43    1 :    129     189    43    0    0 :   60   30 :    435     87    443      2
sigqueue             252    252    140    9    9    1 :    252     252     9    0    0 :  252  126 :   1111     18   1118      0
cdev_cache           702    702     48    9    9    1 :    702     702     9    0    0 :  252  126 :    145     18      5      0
bdev_cache           354    354     64    6    6    1 :    354     354     6    0    0 :  252  126 :     20     12     23      0
mnt_cache            224    224     68    4    4    1 :    224     224     4    0    0 :  252  126 :      7      7      2      0
inode_cache         2224   2224    488  278  278    1 :   3872    4638   486    0    0 :  124   62 :  73933    984  72421     23
dentry_cache        2732   3828    116  116  116    1 :   5181    7701   157    0    0 :  252  126 :  76421    333  74349     21
dquot                  0      0    112    0    0    1 :      0       0     0    0    0 :  252  126 :      0      0      0      0
filp                 578    578    112   17   17    1 :    578     578    17    0    0 :  252  126 :    397     34      0      0
names_cache           18     18   4096   18   18    1 :     18      18    18    0    0 :   60   30 :  66979     36  66997      0
buffer_head       2557740 2559882    104 69151 69186    1 : 2559882 3919926 69186    0    0 :  252  126 : 5029666 149166 2542612  10811
mm_struct            216    216    144    8    8    1 :    216     216     8    0    0 :  252  126 :   1345     16   1317      0
vm_area_struct      1724   1850     76   37   37    1 :   1850    7772    37    0    0 :  252  126 :  53909    121  53030     48
fs_cache             588    588     44    7    7    1 :    588     588     7    0    0 :  252  126 :   1347     13   1320      0
files_cache          171    171    424   19   19    1 :    171     171    19    0    0 :  124   62 :   1335     37   1320      0
signal_act           162    162   1312   54   54    1 :    162     162    54    0    0 :   60   30 :   1305    103   1317      0
pae_pgd              791    791     32    7    7    1 :    791     791     7    0    0 :  252  126 :   1346     14   1317      0
size-131072(DMA)       0      0 131072    0    0   32 :      0       0     0    0    0 :    0    0 :      0      0      0      0
size-131072            0      0 131072    0    0   32 :      0       0     0    0    0 :    0    0 :      0      0      0      0
size-65536(DMA)        0      0  65536    0    0   16 :      0       0     0    0    0 :    0    0 :      0      0      0      0
size-65536             0      0  65536    0    0   16 :      0       0     0    0    0 :    0    0 :      0      0      0      0
size-32768(DMA)        0      0  32768    0    0    8 :      0       0     0    0    0 :    0    0 :      0      0      0      0
size-32768             1      1  32768    1    1    8 :      1       2     1    0    0 :    0    0 :      0      0      0      0
size-16384(DMA)        1      1  16384    1    1    4 :      1       1     1    0    0 :    0    0 :      0      0      0      0
size-16384             1      1  16384    1    1    4 :      1       1     1    0    0 :    0    0 :      0      0      0      0
size-8192(DMA)         0      0   8192    0    0    2 :      0       0     0    0    0 :    0    0 :      0      0      0      0
size-8192              2      3   8192    2    3    2 :      3      41     3    0    0 :    0    0 :      0      0      0      0
size-4096(DMA)         0      0   4096    0    0    1 :      0       0     0    0    0 :   60   30 :      0      0      0      0
size-4096            207    237   4096  207  237    1 :    237     597   237    0    0 :   60   30 :    750    486    946     13
size-2048(DMA)         0      0   2048    0    0    1 :      0       0     0    0    0 :   60   30 :      0      0      0      0
size-2048            368    428   2048  194  214    1 :    428    2198   214    0    0 :   60   30 :   3169    487   3326     61
size-1024(DMA)         0      0   1024    0    0    1 :      0       0     0    0    0 :  124   62 :      0      0      0      0
size-1024            448    448   1024  112  112    1 :    448     448   112    0    0 :  124   62 :   1012    157    890      0
size-512(DMA)          0      0    512    0    0    1 :      0       0     0    0    0 :  124   62 :      0      0      0      0
size-512             520    520    512   65   65    1 :    520     520    65    0    0 :  124   62 :    460    115    116      0
size-256(DMA)          0      0    264    0    0    1 :      0       0     0    0    0 :  124   62 :      0      0      0      0
size-256             630    630    264   42   42    1 :    630     940    42    0    0 :  124   62 :   5914     82   5810      5
size-128(DMA)         28     28    136    1    1    1 :     28      28     1    0    0 :  252  126 :      0      2      0      0
size-128             868    868    136   31   31    1 :    868     869    31    0    0 :  252  126 :   2589     45   2374      0
size-64(DMA)           0      0     72    0    0    1 :      0       0     0    0    0 :  252  126 :      0      0      0      0
size-64              583    583     72   11   11    1 :    583     583    11    0    0 :  252  126 :   1200     19    975      0
size-32(DMA)          92     92     40    1    1    1 :     92      92     1    0    0 :  252  126 :     16      2      0      0
size-32             1384   2392     40   22   26    1 :   2392    2526    26    0    0 :  252  126 : 2569647     52 2568729      9

Regards,
Harald Holzer


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: i686 SMP systems with more then 12 GB ram with 2.4.x kernel ?
  2001-12-29 18:18 i686 SMP systems with more then 12 GB ram with 2.4.x kernel ? Harald Holzer
@ 2001-12-29 18:45 ` Alan Cox
  2001-12-29 21:24   ` M. Edward (Ed) Borasky
  2002-01-01 18:15   ` M. Edward Borasky
  2002-01-02 17:30 ` Timothy D. Witham
  1 sibling, 2 replies; 37+ messages in thread
From: Alan Cox @ 2001-12-29 18:45 UTC (permalink / raw)
  To: Harald Holzer; +Cc: linux-kernel@vger.kernel.org

> Are there some i686 SMP systems with more then 12 GB ram out there ?

Very very few.

> Is there a known problem with 2.4.x kernel and such systems ?

Several 8)

Hardware limits:
	-	36bit addressing mode on x86 processors is slower
	-	Many device drivers cant handle > 32bit DMA
	-	The CPU can't efficiently map all that memory at once

Software:
	-	The block I/O layer doesn't cleanly handle large systems
	-	The page struct is too big which puts undo loads on the
		memory that the CPU can map
	-	We don't discard page tables when we can and should
	-	We should probably switch to a larger virtual page size
		on big machines.

The ones that actually bite hard are the block I/O layer and the page
struct size. Making the block layer handle its part well is a 2.5 thing.

> It looks like as the buffer_heads would fill the low memory up,
> whether there is sufficient memory available or not, as long as
> there is sufficient high memory for caching.

That may well be happening. The Red Hat supplied 7.2 and 7.2 errata kernels
were tested on 8Gb, I don't know what else larger.

Because much of the memory cannot be used for kernel objects there is an
imbalance in available resources and its very hard to balance them sanely.
I'm not sure how many 8Gb+ machines Andrea has handy to tune the VM on
either.

Alan

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: i686 SMP systems with more then 12 GB ram with 2.4.x kernel ?
@ 2001-12-29 19:25 Dieter Nützel
  0 siblings, 0 replies; 37+ messages in thread
From: Dieter Nützel @ 2001-12-29 19:25 UTC (permalink / raw)
  To: Harald Holzer; +Cc: Alan Cox, Andrea Arcangeli, Linux Kernel List

On Saturday, 29. December 2001 18:45, Alan cox wrote:
> > Are there some i686 SMP systems with more then 12 GB ram out there ?
>
> Very very few.
>
> > Is there a known problem with 2.4.x kernel and such systems ?
>
> Several 8)
>
> Hardware limits:
>         -       36bit addressing mode on x86 processors is slower
>         -       Many device drivers cant handle > 32bit DMA
>         -       The CPU can't efficiently map all that memory at once
>
> Software:
>         -       The block I/O layer doesn't cleanly handle large systems
>         -       The page struct is too big which puts undo loads on the
>                 memory that the CPU can map
>         -       We don't discard page tables when we can and should
>         -       We should probably switch to a larger virtual page size
>                 on big machines.
>
> The ones that actually bite hard are the block I/O layer and the page
> struct size. Making the block layer handle its part well is a 2.5 thing.
>
> > It looks like as the buffer_heads would fill the low memory up,
> > whether there is sufficient memory available or not, as long as
> > there is sufficient high memory for caching.
>
> That may well be happening. The Red Hat supplied 7.2 and 7.2 errata kernels
> were tested on 8Gb, I don't know what else larger.
>
> Because much of the memory cannot be used for kernel objects there is an
> imbalance in available resources and its very hard to balance them sanely.
> I'm not sure how many 8Gb+ machines Andrea has handy to tune the VM on
> either.

I think Andrea has access to some. Maybe SAP?

Have you tried with 2.4.17rc2aa2?
ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.17rc2aa2.bz2

The 10_vm-21 part applies clean to 2.4.17 (final), too.
I have it running without a glitch. But sadly a way smaller (much smaller) 
system...;-)

Regards,
	Dieter

-- 
Dieter Nützel
Graduate Student, Computer Science

University of Hamburg
Department of Computer Science
@home: Dieter.Nuetzel@hamburg.de

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: i686 SMP systems with more then 12 GB ram with 2.4.x kernel ?
  2001-12-29 18:45 ` Alan Cox
@ 2001-12-29 21:24   ` M. Edward (Ed) Borasky
  2001-12-30  0:25     ` M. Edward (Ed) Borasky
  2002-01-01 18:15   ` M. Edward Borasky
  1 sibling, 1 reply; 37+ messages in thread
From: M. Edward (Ed) Borasky @ 2001-12-29 21:24 UTC (permalink / raw)
  To: Alan Cox; +Cc: Harald Holzer, linux-kernel@vger.kernel.org

On Sat, 29 Dec 2001, Alan Cox wrote:

> Because much of the memory cannot be used for kernel objects there is
> an imbalance in available resources and its very hard to balance them
> sanely.  I'm not sure how many 8Gb+ machines Andrea has handy to tune
> the VM on either.

Along those lines -- I have in front of me the source to
"/linux/mm/page_alloc.c" (2.4.17 kernel) which reads (partially)

lines 29-32:
static char *zone_names[MAX_NR_ZONES] = { "DMA", "Normal", "HighMem" };
static int zone_balance_ratio[MAX_NR_ZONES] __initdata = { 128, 128, 128, };
static int zone_balance_min[MAX_NR_ZONES] __initdata = { 20 , 20, 20, };
static int zone_balance_max[MAX_NR_ZONES] __initdata = { 255 , 255, 255, };

lines 718-725:
		mask = (realsize / zone_balance_ratio[j]);
		if (mask < zone_balance_min[j])
			mask = zone_balance_min[j];
		else if (mask > zone_balance_max[j])
			mask = zone_balance_max[j];
		zone->pages_min = mask;
		zone->pages_low = mask*2;
		zone->pages_high = mask*3;

What it *looks* like the programmer (Andrea??) intended was to make the
watermarks proportional to the amount of memory in each zone. So for the
dma, normal and highmem zones, one would have 1/128th of the amount of
memory as "min", 1/64th as "low" and 3/128th as "high". Leaving aside
any debate over whether these are appropriate values or not and whether
or not "free memory is wasted memory", what in fact appears to be
happening is that the "else if" clause is limiting "min" to 255 pages
(about a megabyte on i386), and "low" and "high" to 2 and 3 megabytes
respectively.

Could someone with a big box and a benchmark that drives it out of free
memory please try commenting out the "else if" clause and see if it
makes a difference? I tried this on my puny 512 MB Athlon and verified
that the right values were there with "sysrq", but I don't have anything
bigger to try it on and I don't have a benchmark to test it with either.

-- 
M. Edward Borasky

znmeb@borasky-research.net
http://www.borasky-research.net


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: i686 SMP systems with more then 12 GB ram with 2.4.x kernel ?
  2001-12-29 21:24   ` M. Edward (Ed) Borasky
@ 2001-12-30  0:25     ` M. Edward (Ed) Borasky
  2001-12-30  2:14       ` Harald Holzer
  0 siblings, 1 reply; 37+ messages in thread
From: M. Edward (Ed) Borasky @ 2001-12-30  0:25 UTC (permalink / raw)
  To: Alan Cox; +Cc: andrea, Harald Holzer, linux-kernel@vger.kernel.org

On Sat, 29 Dec 2001, M. Edward (Ed) Borasky wrote:

> On Sat, 29 Dec 2001, Alan Cox wrote:
>
> > Because much of the memory cannot be used for kernel objects there
> > is an imbalance in available resources and its very hard to balance
> > them sanely.  I'm not sure how many 8Gb+ machines Andrea has handy
> > to tune the VM on either.
>
> Along those lines -- I have in front of me the source to
> "/linux/mm/page_alloc.c" (2.4.17 kernel) which reads (partially)

[snip]


> Could someone with a big box and a benchmark that drives it out of
> free memory please try commenting out the "else if" clause and see if
> it makes a difference? I tried this on my puny 512 MB Athlon and
> verified that the right values were there with "sysrq", but I don't
> have anything bigger to try it on and I don't have a benchmark to test
> it with either.

And here it is as a patch against 2.4.17:

diff -ur linux/mm/page_alloc.c linux-2.4.17znmeb/mm/page_alloc.c
--- linux/mm/page_alloc.c	Mon Nov 19 16:35:40 2001
+++ linux-2.4.17znmeb/mm/page_alloc.c	Sat Dec 29 16:04:25 2001
@@ -718,8 +718,13 @@
 		mask = (realsize / zone_balance_ratio[j]);
 		if (mask < zone_balance_min[j])
 			mask = zone_balance_min[j];
+		/* else if clause commented out for testing
+		 * M. Edward Borasky, Borasky Research
+		 * 2001-12-29
+		 *
 		else if (mask > zone_balance_max[j])
 			mask = zone_balance_max[j];
+		 */
 		zone->pages_min = mask;
 		zone->pages_low = mask*2;
 		zone->pages_high = mask*3;


Apologies if pine with vim as the editor messes this puppy up :-).
-- 
M. Edward Borasky

znmeb@borasky-research.net
http://www.borasky-research.net

I brought my inner child to "Take Your Child To Work Day."


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: i686 SMP systems with more then 12 GB ram with 2.4.x kernel ?
  2001-12-30  0:25     ` M. Edward (Ed) Borasky
@ 2001-12-30  2:14       ` Harald Holzer
  2001-12-30  2:33         ` M. Edward Borasky
  0 siblings, 1 reply; 37+ messages in thread
From: Harald Holzer @ 2001-12-30  2:14 UTC (permalink / raw)
  To: M. Edward (Ed) Borasky, linux-kernel@vger.kernel.org

I tested your suggestion with an 2.4.17rc2aa2 kernel,
but it didnt help.
The system dies after copying more then 6-8GB on data.
Here are the last lines from /var/log/messages:

Dec 30 01:47:32 localhost kernel: __alloc_pages: 0-order allocation failed (gfp=0x70/0)
Dec 30 01:47:32 localhost kernel: __alloc_pages: 0-order allocation failed (gfp=0x70/0)
Dec 30 01:47:32 localhost kernel: __alloc_pages: 0-order allocation failed (gfp=0x1f0/0)
Dec 30 01:47:32 localhost kernel: __alloc_pages: 0-order allocation failed (gfp=0x70/0)
Dec 30 01:47:32 localhost kernel: __alloc_pages: 0-order allocation failed (gfp=0xf0/0)
Dec 30 01:47:32 localhost kernel: __alloc_pages: 0-order allocation failed (gfp=0x70/0)
Dec 30 01:47:32 localhost last message repeated 3 times
Dec 30 01:47:32 localhost kernel: __alloc_pages: 0-order allocation failed (gfp=0x1f0/0)

I started to begin searching in the linux/fs/buffer.c, and found the
following interesting lines (line: 1450+):

void create_empty_buffers(struct page *page, kdev_t dev, unsigned long blocksize)
{
	struct buffer_head *bh, *head, *tail;

	/* FIXME: create_buffers should fail if there's no enough memory */
	head = create_buffers(page, blocksize, 1);
	if (page->buffers)
		BUG();

	bh = head;

Could the create_buffer function cause this problem ?

Harald Holzer

On Sun, 2001-12-30 at 01:25, M. Edward (Ed) Borasky wrote:
> On Sat, 29 Dec 2001, M. Edward (Ed) Borasky wrote:
> 
> > On Sat, 29 Dec 2001, Alan Cox wrote:
> >
> > > Because much of the memory cannot be used for kernel objects there
> > > is an imbalance in available resources and its very hard to balance
> > > them sanely.  I'm not sure how many 8Gb+ machines Andrea has handy
> > > to tune the VM on either.
> >
> > Along those lines -- I have in front of me the source to
> > "/linux/mm/page_alloc.c" (2.4.17 kernel) which reads (partially)
> 
> [snip]
> 
> 
> > Could someone with a big box and a benchmark that drives it out of
> > free memory please try commenting out the "else if" clause and see if
> > it makes a difference? I tried this on my puny 512 MB Athlon and
> > verified that the right values were there with "sysrq", but I don't
> > have anything bigger to try it on and I don't have a benchmark to test
> > it with either.
> 
> And here it is as a patch against 2.4.17:
> 
> diff -ur linux/mm/page_alloc.c linux-2.4.17znmeb/mm/page_alloc.c
> --- linux/mm/page_alloc.c	Mon Nov 19 16:35:40 2001
> +++ linux-2.4.17znmeb/mm/page_alloc.c	Sat Dec 29 16:04:25 2001
> @@ -718,8 +718,13 @@
>  		mask = (realsize / zone_balance_ratio[j]);
>  		if (mask < zone_balance_min[j])
>  			mask = zone_balance_min[j];
> +		/* else if clause commented out for testing
> +		 * M. Edward Borasky, Borasky Research
> +		 * 2001-12-29
> +		 *
>  		else if (mask > zone_balance_max[j])
>  			mask = zone_balance_max[j];
> +		 */
>  		zone->pages_min = mask;
>  		zone->pages_low = mask*2;
>  		zone->pages_high = mask*3;
> 
> 
> Apologies if pine with vim as the editor messes this puppy up :-).
> -- 
> M. Edward Borasky
> 
> znmeb@borasky-research.net
> http://www.borasky-research.net
> 
> I brought my inner child to "Take Your Child To Work Day."
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: i686 SMP systems with more then 12 GB ram with 2.4.x kernel ?
  2001-12-30  2:14       ` Harald Holzer
@ 2001-12-30  2:33         ` M. Edward Borasky
  0 siblings, 0 replies; 37+ messages in thread
From: M. Edward Borasky @ 2001-12-30  2:33 UTC (permalink / raw)
  To: Harald Holzer, linux-kernel

Hmmm ... 0 order allocation failures ... is that new *after* my patch or
were you getting those before? Maybe we've *moved* the problem??

--
Take Your Trading to the Next Level!
M. Edward Borasky, Meta-Trading Coach

znmeb@borasky-research.net
http://www.meta-trading-coach.com
http://groups.yahoo.com/group/meta-trading-coach

> -----Original Message-----
> From: linux-kernel-owner@vger.kernel.org
> [mailto:linux-kernel-owner@vger.kernel.org]On Behalf Of Harald Holzer
> Sent: Saturday, December 29, 2001 6:15 PM
> To: M. Edward (Ed) Borasky; linux-kernel@vger.kernel.org
> Subject: Re: i686 SMP systems with more then 12 GB ram with 2.4.x kernel
> ?
>
>
> I tested your suggestion with an 2.4.17rc2aa2 kernel,
> but it didnt help.
> The system dies after copying more then 6-8GB on data.
> Here are the last lines from /var/log/messages:
>
> Dec 30 01:47:32 localhost kernel: __alloc_pages: 0-order
> allocation failed (gfp=0x70/0)
> Dec 30 01:47:32 localhost kernel: __alloc_pages: 0-order
> allocation failed (gfp=0x70/0)
> Dec 30 01:47:32 localhost kernel: __alloc_pages: 0-order
> allocation failed (gfp=0x1f0/0)
> Dec 30 01:47:32 localhost kernel: __alloc_pages: 0-order
> allocation failed (gfp=0x70/0)
> Dec 30 01:47:32 localhost kernel: __alloc_pages: 0-order
> allocation failed (gfp=0xf0/0)
> Dec 30 01:47:32 localhost kernel: __alloc_pages: 0-order
> allocation failed (gfp=0x70/0)
> Dec 30 01:47:32 localhost last message repeated 3 times
> Dec 30 01:47:32 localhost kernel: __alloc_pages: 0-order
> allocation failed (gfp=0x1f0/0)
>
> I started to begin searching in the linux/fs/buffer.c, and found the
> following interesting lines (line: 1450+):
>
> void create_empty_buffers(struct page *page, kdev_t dev, unsigned
> long blocksize)
> {
> 	struct buffer_head *bh, *head, *tail;
>
> 	/* FIXME: create_buffers should fail if there's no enough memory */
> 	head = create_buffers(page, blocksize, 1);
> 	if (page->buffers)
> 		BUG();
>
> 	bh = head;
>
> Could the create_buffer function cause this problem ?
>
> Harald Holzer
>
> On Sun, 2001-12-30 at 01:25, M. Edward (Ed) Borasky wrote:
> > On Sat, 29 Dec 2001, M. Edward (Ed) Borasky wrote:
> >
> > > On Sat, 29 Dec 2001, Alan Cox wrote:
> > >
> > > > Because much of the memory cannot be used for kernel objects there
> > > > is an imbalance in available resources and its very hard to balance
> > > > them sanely.  I'm not sure how many 8Gb+ machines Andrea has handy
> > > > to tune the VM on either.
> > >
> > > Along those lines -- I have in front of me the source to
> > > "/linux/mm/page_alloc.c" (2.4.17 kernel) which reads (partially)
> >
> > [snip]
> >
> >
> > > Could someone with a big box and a benchmark that drives it out of
> > > free memory please try commenting out the "else if" clause and see if
> > > it makes a difference? I tried this on my puny 512 MB Athlon and
> > > verified that the right values were there with "sysrq", but I don't
> > > have anything bigger to try it on and I don't have a benchmark to test
> > > it with either.
> >
> > And here it is as a patch against 2.4.17:
> >
> > diff -ur linux/mm/page_alloc.c linux-2.4.17znmeb/mm/page_alloc.c
> > --- linux/mm/page_alloc.c	Mon Nov 19 16:35:40 2001
> > +++ linux-2.4.17znmeb/mm/page_alloc.c	Sat Dec 29 16:04:25 2001
> > @@ -718,8 +718,13 @@
> >  		mask = (realsize / zone_balance_ratio[j]);
> >  		if (mask < zone_balance_min[j])
> >  			mask = zone_balance_min[j];
> > +		/* else if clause commented out for testing
> > +		 * M. Edward Borasky, Borasky Research
> > +		 * 2001-12-29
> > +		 *
> >  		else if (mask > zone_balance_max[j])
> >  			mask = zone_balance_max[j];
> > +		 */
> >  		zone->pages_min = mask;
> >  		zone->pages_low = mask*2;
> >  		zone->pages_high = mask*3;
> >
> >
> > Apologies if pine with vim as the editor messes this puppy up :-).
> > --
> > M. Edward Borasky
> >
> > znmeb@borasky-research.net
> > http://www.borasky-research.net
> >
> > I brought my inner child to "Take Your Child To Work Day."
> >
> > -
> > To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>
>


^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: i686 SMP systems with more then 12 GB ram with 2.4.x kernel ?
  2001-12-29 18:45 ` Alan Cox
  2001-12-29 21:24   ` M. Edward (Ed) Borasky
@ 2002-01-01 18:15   ` M. Edward Borasky
  2002-01-01 18:46     ` Alan Cox
  1 sibling, 1 reply; 37+ messages in thread
From: M. Edward Borasky @ 2002-01-01 18:15 UTC (permalink / raw)
  To: Alan Cox, Harald Holzer; +Cc: linux-kernel

> Because much of the memory cannot be used for kernel objects there is an
> imbalance in available resources and its very hard to balance them sanely.

As I understand it, in a Linux / i686 system, there are three zones: DMA
(0 - 2^24-1), low (2^24 - 2^30-1) and high (2^30 and up). And the hardware
(PAE) apparently distinguishes memory addresses above 2^32-1 as well.
Questions:

1. Shouldn't there be *four* zones: (DMA, low, high and PAE)?

2. Isn't the boundary at 2^30 really irrelevant and the three "correct"
zones are (0 - 2^24-1), (2^24 - 2^32-1) and (2^32 - 2^36-1)?

3. On a system without ISA DMA devices, can DMA and low be merged into a
single zone?

4. It's pretty obvious exactly which functions require memory under 2^24 --
ISA DMA. But exactly which functions require memory under 2^30 and which
functions require memory under 2^32? It seems relatively easy to write a
Perl script to truck through the kernel source and figure this out; has
anyone done it? It would seem to me a valuable piece of information -- what
the demands are for the relatively precious areas of memory under 1 GB and
under 4 GB.
--
M. Edward Borasky

znmeb@borasky-research.net
http://www.borasky-research.net


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: i686 SMP systems with more then 12 GB ram with 2.4.x kernel ?
  2002-01-01 18:15   ` M. Edward Borasky
@ 2002-01-01 18:46     ` Alan Cox
  2002-01-01 19:02       ` M. Edward Borasky
                         ` (2 more replies)
  0 siblings, 3 replies; 37+ messages in thread
From: Alan Cox @ 2002-01-01 18:46 UTC (permalink / raw)
  To: M. Edward Borasky; +Cc: Alan Cox, Harald Holzer, linux-kernel

> 1. Shouldn't there be *four* zones: (DMA, low, high and PAE)?

Probably not. PAE isnt special. With PAE you pay the page table penalties
for all RAM.

> 2. Isn't the boundary at 2^30 really irrelevant and the three "correct"
> zones are (0 - 2^24-1), (2^24 - 2^32-1) and (2^32 - 2^36-1)?

Nope. The limit for directly mapped memory is 2^30.

> 3. On a system without ISA DMA devices, can DMA and low be merged into a
> single zone?

Rarely. PCI vendors are not exactly angels when it comes to implementing all
32bits of a DMA transfer



^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: i686 SMP systems with more then 12 GB ram with 2.4.x kernel ?
  2002-01-01 18:46     ` Alan Cox
@ 2002-01-01 19:02       ` M. Edward Borasky
  2002-01-02  1:16       ` H. Peter Anvin
  2002-01-02 21:17       ` Gerrit Huizenga
  2 siblings, 0 replies; 37+ messages in thread
From: M. Edward Borasky @ 2002-01-01 19:02 UTC (permalink / raw)
  To: Alan Cox; +Cc: Harald Holzer, linux-kernel

> > 2. Isn't the boundary at 2^30 really irrelevant and the three "correct"
> > zones are (0 - 2^24-1), (2^24 - 2^32-1) and (2^32 - 2^36-1)?
>
> Nope. The limit for directly mapped memory is 2^30.

Ouch! That makes low memory *extremely* precious. Intuitively, the demand
for directly mapped memory will be proportional to the demand for all
memory, with a proportionality constant depending on the purpose for the
system and the efficiency of the application set. We've seen (apparently --
I haven't looked at any data, just messages on the list) cases where we can
force this to happen with benchmarks designed to embarrass the VM :)) but
have we seen it in real applications?

Thanks for taking the time to answer these questions. I'm struggling to
understand where the performance walls are in large i686 systems, in both
Linux and Windows. In the end, though, relentless application of Moore's Law
to the IA64 must be the correct answer :)).
--
M. Edward Borasky

znmeb@borasky-research.net
http://www.borasky-research.net


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: i686 SMP systems with more then 12 GB ram with 2.4.x kernel ?
  2002-01-01 18:46     ` Alan Cox
  2002-01-01 19:02       ` M. Edward Borasky
@ 2002-01-02  1:16       ` H. Peter Anvin
  2002-01-02 21:17       ` Gerrit Huizenga
  2 siblings, 0 replies; 37+ messages in thread
From: H. Peter Anvin @ 2002-01-02  1:16 UTC (permalink / raw)
  To: linux-kernel

Followup to:  <E16LTvs-00016I-00@the-village.bc.nu>
By author:    Alan Cox <alan@lxorguk.ukuu.org.uk>
In newsgroup: linux.dev.kernel
> 
> > 2. Isn't the boundary at 2^30 really irrelevant and the three "correct"
> > zones are (0 - 2^24-1), (2^24 - 2^32-1) and (2^32 - 2^36-1)?
> 
> Nope. The limit for directly mapped memory is 2^30.
> 

2^30-2^27 to be exact (assuming a 3:1 split and 128MB vmalloc zone.)

	-hpa
-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt	<amsp@zytor.com>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: i686 SMP systems with more then 12 GB ram with 2.4.x kernel ?
  2001-12-29 18:18 i686 SMP systems with more then 12 GB ram with 2.4.x kernel ? Harald Holzer
  2001-12-29 18:45 ` Alan Cox
@ 2002-01-02 17:30 ` Timothy D. Witham
       [not found]   ` <1009994687.12942.14.camel@hh2.hhhome.at>
  1 sibling, 1 reply; 37+ messages in thread
From: Timothy D. Witham @ 2002-01-02 17:30 UTC (permalink / raw)
  To: Harald Holzer; +Cc: linux-kernel@vger.kernel.org

  The OSDL can get you access to this size of machine.

Tim

On Sat, 2001-12-29 at 10:18, Harald Holzer wrote:
> Are there some i686 SMP systems with more then 12 GB ram out there ?
> 
> Is there a known problem with 2.4.x kernel and such systems ?
> 
> I have the following problem:
> 
> A Intel SRPM8 Server with 32 GB ram and RH 7.2 and kernel 2.4.17 on it.
> After doing a lot of disc access the system slows down and the oom
> killer begins his work. (only after running some cp processes.)
> Because the system is running out of low memory.
> 
> Disable the oom killer has no affect, the low memory is going to 0
> and the system dies.
> 
> It looks like as the buffer_heads would fill the low memory up,
> whether there is sufficient memory available or not, as long as
> there is sufficient high memory for caching.
> 
> Any ideas ?
> 
> Here some information from /proc/meminfo and /proc/slabinfo some
> seconds before dying:
> 
>         total:    used:    free:  shared: buffers:  cached:
> Mem:  33781227520 10787348480 22993879040        0 21807104 10447962112
> Swap:        0        0        0
> MemTotal:     32989480 kB
> MemFree:      22454960 kB
> MemShared:           0 kB
> Buffers:         21296 kB
> Cached:       10203088 kB
> SwapCached:          0 kB
> Active:          27248 kB
> Inact_dirty:  10206312 kB
> Inact_clean:         0 kB
> Inact_target:  2046712 kB
> HighTotal:    32636928 kB
> HighFree:     22424128 kB
> LowTotal:       352552 kB
> LowFree:         30832 kB
> SwapTotal:           0 kB
> SwapFree:            0 kB
> 
> slabinfo - version: 1.1 (statistics) (SMP)
> kmem_cache           112    112    284    8    8    1 :    112     112     8    0    0 :  124   62 :     12      9      0      0
> ip_fib_hash          145    145     24    1    1    1 :    145     145     1    0    0 :  252  126 :      8      3      0      0
> urb_priv               0      0     56    0    0    1 :      0       0     0    0    0 :  252  126 :      0      0      0      0
> journal_head         755  11591     56   49  173    1 :  11591 1255589   173    0    0 :  252  126 : 1271927  10219 1272001   9959
> revoke_table         169    169     20    1    1    1 :    169     169     1    0    0 :  252  126 :      0      3      0      0
> revoke_record        126    145     24    1    1    1 :    126     126     1    0    0 :  252  126 :      2      2      3      0
> clip_arp_cache         0      0    124    0    0    1 :      0       0     0    0    0 :  252  126 :      0      0      0      0
> ip_mrt_cache           0      0     84    0    0    1 :      0       0     0    0    0 :  252  126 :      0      0      0      0
> tcp_tw_bucket          0      0    128    0    0    1 :      0       0     0    0    0 :  252  126 :      0      0      0      0
> tcp_bind_bucket      561    580     24    4    4    1 :    561     561     4    0    0 :  252  126 :      4     11      0      0
> tcp_open_request     252    252     92    6    6    1 :    252     252     6    0    0 :  252  126 :      0     12      6      0
> inet_peer_cache        0      0     48    0    0    1 :      0       0     0    0    0 :  252  126 :      0      0      0      0
> ip_dst_cache         176    176    176    8    8    1 :    176     176     8    0    0 :  252  126 :     40     16     22      0
> arp_cache             64     64    120    2    2    1 :     64      64     2    0    0 :  252  126 :      0      4      1      0
> blkdev_requests     1056   1056     88   24   24    1 :   1056    1056    24    0    0 :  252  126 :   1128     48    256      0
> dnotify cache          0      0     28    0    0    1 :      0       0     0    0    0 :  252  126 :      0      0      0      0
> file lock cache      312    312    100    8    8    1 :    312     312     8    0    0 :  252  126 :  19870     16  19876      0
> fasync cache           0      0     24    0    0    1 :      0       0     0    0    0 :  252  126 :      0      0      0      0
> uid_cache            339    339     32    3    3    1 :    339     339     3    0    0 :  252  126 :      1      6      1      0
> skbuff_head_cache   2438   2438    168  106  106    1 :   2438    3950   106    0    0 :  252  126 :   3943    224   3035     12
> sock                 129    129   1280   43   43    1 :    129     189    43    0    0 :   60   30 :    435     87    443      2
> sigqueue             252    252    140    9    9    1 :    252     252     9    0    0 :  252  126 :   1111     18   1118      0
> cdev_cache           702    702     48    9    9    1 :    702     702     9    0    0 :  252  126 :    145     18      5      0
> bdev_cache           354    354     64    6    6    1 :    354     354     6    0    0 :  252  126 :     20     12     23      0
> mnt_cache            224    224     68    4    4    1 :    224     224     4    0    0 :  252  126 :      7      7      2      0
> inode_cache         2224   2224    488  278  278    1 :   3872    4638   486    0    0 :  124   62 :  73933    984  72421     23
> dentry_cache        2732   3828    116  116  116    1 :   5181    7701   157    0    0 :  252  126 :  76421    333  74349     21
> dquot                  0      0    112    0    0    1 :      0       0     0    0    0 :  252  126 :      0      0      0      0
> filp                 578    578    112   17   17    1 :    578     578    17    0    0 :  252  126 :    397     34      0      0
> names_cache           18     18   4096   18   18    1 :     18      18    18    0    0 :   60   30 :  66979     36  66997      0
> buffer_head       2557740 2559882    104 69151 69186    1 : 2559882 3919926 69186    0    0 :  252  126 : 5029666 149166 2542612  10811
> mm_struct            216    216    144    8    8    1 :    216     216     8    0    0 :  252  126 :   1345     16   1317      0
> vm_area_struct      1724   1850     76   37   37    1 :   1850    7772    37    0    0 :  252  126 :  53909    121  53030     48
> fs_cache             588    588     44    7    7    1 :    588     588     7    0    0 :  252  126 :   1347     13   1320      0
> files_cache          171    171    424   19   19    1 :    171     171    19    0    0 :  124   62 :   1335     37   1320      0
> signal_act           162    162   1312   54   54    1 :    162     162    54    0    0 :   60   30 :   1305    103   1317      0
> pae_pgd              791    791     32    7    7    1 :    791     791     7    0    0 :  252  126 :   1346     14   1317      0
> size-131072(DMA)       0      0 131072    0    0   32 :      0       0     0    0    0 :    0    0 :      0      0      0      0
> size-131072            0      0 131072    0    0   32 :      0       0     0    0    0 :    0    0 :      0      0      0      0
> size-65536(DMA)        0      0  65536    0    0   16 :      0       0     0    0    0 :    0    0 :      0      0      0      0
> size-65536             0      0  65536    0    0   16 :      0       0     0    0    0 :    0    0 :      0      0      0      0
> size-32768(DMA)        0      0  32768    0    0    8 :      0       0     0    0    0 :    0    0 :      0      0      0      0
> size-32768             1      1  32768    1    1    8 :      1       2     1    0    0 :    0    0 :      0      0      0      0
> size-16384(DMA)        1      1  16384    1    1    4 :      1       1     1    0    0 :    0    0 :      0      0      0      0
> size-16384             1      1  16384    1    1    4 :      1       1     1    0    0 :    0    0 :      0      0      0      0
> size-8192(DMA)         0      0   8192    0    0    2 :      0       0     0    0    0 :    0    0 :      0      0      0      0
> size-8192              2      3   8192    2    3    2 :      3      41     3    0    0 :    0    0 :      0      0      0      0
> size-4096(DMA)         0      0   4096    0    0    1 :      0       0     0    0    0 :   60   30 :      0      0      0      0
> size-4096            207    237   4096  207  237    1 :    237     597   237    0    0 :   60   30 :    750    486    946     13
> size-2048(DMA)         0      0   2048    0    0    1 :      0       0     0    0    0 :   60   30 :      0      0      0      0
> size-2048            368    428   2048  194  214    1 :    428    2198   214    0    0 :   60   30 :   3169    487   3326     61
> size-1024(DMA)         0      0   1024    0    0    1 :      0       0     0    0    0 :  124   62 :      0      0      0      0
> size-1024            448    448   1024  112  112    1 :    448     448   112    0    0 :  124   62 :   1012    157    890      0
> size-512(DMA)          0      0    512    0    0    1 :      0       0     0    0    0 :  124   62 :      0      0      0      0
> size-512             520    520    512   65   65    1 :    520     520    65    0    0 :  124   62 :    460    115    116      0
> size-256(DMA)          0      0    264    0    0    1 :      0       0     0    0    0 :  124   62 :      0      0      0      0
> size-256             630    630    264   42   42    1 :    630     940    42    0    0 :  124   62 :   5914     82   5810      5
> size-128(DMA)         28     28    136    1    1    1 :     28      28     1    0    0 :  252  126 :      0      2      0      0
> size-128             868    868    136   31   31    1 :    868     869    31    0    0 :  252  126 :   2589     45   2374      0
> size-64(DMA)           0      0     72    0    0    1 :      0       0     0    0    0 :  252  126 :      0      0      0      0
> size-64              583    583     72   11   11    1 :    583     583    11    0    0 :  252  126 :   1200     19    975      0
> size-32(DMA)          92     92     40    1    1    1 :     92      92     1    0    0 :  252  126 :     16      2      0      0
> size-32             1384   2392     40   22   26    1 :   2392    2526    26    0    0 :  252  126 : 2569647     52 2568729      9
> 
> Regards,
> Harald Holzer
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
-- 
Timothy D. Witham - Lab Director - wookie@osdlab.org
Open Source Development Lab Inc - A non-profit corporation
15275 SW Koll Parkway - Suite H - Beaverton OR, 97006
(503)-626-2455 x11 (office)    (503)-702-2871     (cell)
(503)-626-2436     (fax)


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: i686 SMP systems with more then 12 GB ram with 2.4.x kernel ?
  2002-01-01 18:46     ` Alan Cox
  2002-01-01 19:02       ` M. Edward Borasky
  2002-01-02  1:16       ` H. Peter Anvin
@ 2002-01-02 21:17       ` Gerrit Huizenga
  2002-01-06  8:20         ` Benjamin LaHaise
  2 siblings, 1 reply; 37+ messages in thread
From: Gerrit Huizenga @ 2002-01-02 21:17 UTC (permalink / raw)
  To: Alan Cox; +Cc: M. Edward Borasky, Harald Holzer, linux-kernel


In message <E16LTvs-00016I-00@the-village.bc.nu>, > : Alan Cox writes:

> > 2. Isn't the boundary at 2^30 really irrelevant and the three "correct"
> > zones are (0 - 2^24-1), (2^24 - 2^32-1) and (2^32 - 2^36-1)?
> 
> Nope. The limit for directly mapped memory is 2^30.
 
The limit *per L1 Page Table Base Pointer*, that is.  You could
in theory have a different L1 Page Table base pointer for each
task (including each proc 0 in linux).  You can also pull a few
tricks such as instantiating a 4 GB kernel virtual address space
while in kernel mode (using a virtual windowing mechanism as is used
for high mem today to map in user space for copying in data from
user space if/when needed).  The latter takes some tricky code to
get mapping correct but it wasn't a lot of code in PTX.  Just needed
a lot of careful thought, review, testing, etc.

I don't know if there are real examples of large memory systems
exhausting the ~1 GB of kernel virtual address space on machines
with > 12-32 GB of physical memory (we had this problem in PTX which
created the need for a larger kernel virtual address space in some
contexts).

> > 3. On a system without ISA DMA devices, can DMA and low be merged into a
> > single zone?
> 
> Rarely. PCI vendors are not exactly angels when it comes to implementing
> all 32bits of a DMA transfer

Would be nice to have a config option like "CONFIG_PCI_36" to imply
that all devices on a PAE system were able to access all of memory,
globally removing the need for bounce buffering and allowing a native
PCI setup for mapping memory addresses...

gerrit

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: i686 SMP systems with more then 12 GB ram with 2.4.x kernel ?
       [not found]     ` <1009995669.1253.17.camel@wookie-laptop.pdx.osdl.net>
@ 2002-01-02 23:50       ` Harald Holzer
  2002-01-03  0:16         ` Alan Cox
                           ` (3 more replies)
  0 siblings, 4 replies; 37+ messages in thread
From: Harald Holzer @ 2002-01-02 23:50 UTC (permalink / raw)
  To: linux-kernel@vger.kernel.org; +Cc: Timothy D. Witham

Today i checked some memory configurations and noticed that the low
memory decreases, when i add more memory to the system,
and the size of reserved memory increases:

at 1GB ram, are 16,936kB low mem reserved.
4GB ram, 72,824kB reserved
8GB ram, 142,332kB reserved
16GB ram, 269,424kB reserved
32GB ram, 532,080kB reserved, usable low mem: 352 MB
64GB ram ?? 

Which function does the reserved memory fulfill ?
Is it all for paging ?


Harald Holzer



Memory related startup messages at 32GB:

Jan  1 15:56:05 localhost kernel: Linux version 2.4.17-64g (root@bigbox) (gcc version 2.96 20000731 (Red Hat Linux 7.1 2.96-98)) #1 SMP Tue Jan 1 14:19:36 CET 2002
Jan  1 15:56:05 localhost kernel: BIOS-provided physical RAM map:
Jan  1 15:56:05 localhost kernel:  BIOS-e820: 0000000000000000 - 000000000009d800 (usable)
Jan  1 15:56:05 localhost kernel:  BIOS-e820: 000000000009d800 - 00000000000a0000 (reserved)
Jan  1 15:56:05 localhost kernel:  BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
Jan  1 15:56:05 localhost kernel:  BIOS-e820: 0000000000100000 - 0000000003ff8000 (usable)
Jan  1 15:56:05 localhost kernel:  BIOS-e820: 0000000003ff8000 - 0000000003fffc00 (ACPI data)
Jan  1 15:56:05 localhost kernel:  BIOS-e820: 0000000003fffc00 - 0000000004000000 (ACPI NVS)
Jan  1 15:56:05 localhost kernel:  BIOS-e820: 0000000004000000 - 00000000f0000000 (usable)
Jan  1 15:56:05 localhost kernel:  BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
Jan  1 15:56:05 localhost kernel:  BIOS-e820: 0000000100000000 - 0000000810000000 (usable)
Jan  1 15:56:05 localhost kernel: 32128MB HIGHMEM available.
Jan  1 15:56:05 localhost kernel: found SMP MP-table at 000f65d0
Jan  1 15:56:05 localhost kernel: hm, page 000f6000 reserved twice.
Jan  1 15:56:05 localhost kernel: hm, page 000f7000 reserved twice.
Jan  1 15:56:05 localhost kernel: hm, page 0009d000 reserved twice.
Jan  1 15:56:05 localhost kernel: hm, page 0009e000 reserved twice.
Jan  1 15:56:05 localhost kernel: On node 0 totalpages: 8454144
Jan  1 15:56:05 localhost kernel: zone(0): 4096 pages.
Jan  1 15:56:05 localhost kernel: zone(1): 225280 pages.
Jan  1 15:56:05 localhost kernel: zone(2): 8224768 pages.
Jan  1 15:56:05 localhost kernel: Intel MultiProcessor Specification v1.4
Jan  1 15:56:05 localhost kernel:     Virtual Wire compatibility mode.
Jan  1 15:56:05 localhost kernel: OEM ID: INTEL    Product ID: SPM8         APIC at: 0xFEE00000
Jan  1 15:56:05 localhost kernel: Processor #7 Pentium(tm) Pro APIC version 17
Jan  1 15:56:05 localhost kernel: Processor #0 Pentium(tm) Pro APIC version 17
Jan  1 15:56:05 localhost kernel: Processor #1 Pentium(tm) Pro APIC version 17
Jan  1 15:56:05 localhost kernel: Processor #2 Pentium(tm) Pro APIC version 17
Jan  1 15:56:05 localhost kernel: Processor #3 Pentium(tm) Pro APIC version 17
Jan  1 15:56:05 localhost kernel: Processor #4 Pentium(tm) Pro APIC version 17
Jan  1 15:56:05 localhost kernel: Processor #5 Pentium(tm) Pro APIC version 17
Jan  1 15:56:05 localhost kernel: Processor #6 Pentium(tm) Pro APIC version 17
Jan  1 15:56:05 localhost kernel: I/O APIC #8 Version 19 at 0xFEC00000.
Jan  1 15:56:05 localhost kernel: Processors: 8
Jan  1 15:56:05 localhost kernel: Kernel command line: BOOT_IMAGE=linux-17-64g ro root=802 BOOT_FILE=/boot/vmlinuz-2.4.17-64g console=ttyS0,38400
Jan  1 15:56:05 localhost kernel: Initializing CPU#0
Jan  1 15:56:05 localhost kernel: Detected 700.082 MHz processor.
Jan  1 15:56:05 localhost kernel: Console: colour VGA+ 80x25
Jan  1 15:56:05 localhost kernel: Calibrating delay loop... 1395.91 BogoMIPS

-->Jan  1 15:56:05 localhost kernel: Memory: 33021924k/33816576k available (1081k kernel code, 532080k reserved, 290k data, 248k init, 32636928k highmem)

Jan  1 15:56:05 localhost kernel: Dentry-cache hash table entries: 262144 (order: 9, 2097152 bytes)
Jan  1 15:56:05 localhost kernel: Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
Jan  1 15:56:05 localhost kernel: Mount-cache hash table entries: 262144 (order: 9, 2097152 bytes)
Jan  1 15:56:05 localhost kernel: Buffer-cache hash table entries: 524288 (order: 9, 2097152 bytes)
Jan  1 15:56:05 localhost kernel: Page-cache hash table entries: 524288 (order: 9, 2097152 bytes)



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: i686 SMP systems with more then 12 GB ram with 2.4.x kernel ?
  2002-01-02 23:50       ` Harald Holzer
@ 2002-01-03  0:16         ` Alan Cox
  2002-01-03 13:30           ` Rik van Riel
  2002-01-03  0:17         ` Mark Zealey
                           ` (2 subsequent siblings)
  3 siblings, 1 reply; 37+ messages in thread
From: Alan Cox @ 2002-01-03  0:16 UTC (permalink / raw)
  To: Harald Holzer; +Cc: linux-kernel@vger.kernel.org, Timothy D. Witham

> 16GB ram, 269,424kB reserved
> 32GB ram, 532,080kB reserved, usable low mem: 352 MB
> 64GB ram ?? 

64Gb basically you can forget

> Which function does the reserved memory fulfill ?
> Is it all for paging ?

A lot of it is the page structs (64bytes per page - which really should be
nearer the 32 some rival Unix OS's achieve on x86)

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: i686 SMP systems with more then 12 GB ram with 2.4.x kernel ?
  2002-01-02 23:50       ` Harald Holzer
  2002-01-03  0:16         ` Alan Cox
@ 2002-01-03  0:17         ` Mark Zealey
  2002-01-03 13:28         ` Rik van Riel
  2002-01-03 15:15         ` Anton Blanchard
  3 siblings, 0 replies; 37+ messages in thread
From: Mark Zealey @ 2002-01-03  0:17 UTC (permalink / raw)
  To: linux-kernel@vger.kernel.org

On Thu, Jan 03, 2002 at 12:50:50AM +0100, Harald Holzer wrote:

> Today i checked some memory configurations and noticed that the low
> memory decreases, when i add more memory to the system,
> and the size of reserved memory increases:
> 
> at 1GB ram, are 16,936kB low mem reserved.
> 4GB ram, 72,824kB reserved
> 8GB ram, 142,332kB reserved
> 16GB ram, 269,424kB reserved
> 32GB ram, 532,080kB reserved, usable low mem: 352 MB
> 64GB ram ?? 
> 
> Which function does the reserved memory fulfill ?
> Is it all for paging ?

Yeah, page tables mostly, they have to be kept in low-mem. It's making struct
page's for every single page in that system, and must be kept in the kernel
memory...

I'd doubt if you could get the system to boot at 64 gigs

-- 

Mark Zealey
mark@zealos.org
mark@itsolve.co.uk

UL++++>$ G!>(GCM/GCS/GS/GM) dpu? s:-@ a16! C++++>$ P++++>+++++$ L+++>+++++$
!E---? W+++>$ N- !o? !w--- O? !M? !V? !PS !PE--@ PGP+? r++ !t---?@ !X---?
!R- b+ !tv b+ DI+ D+? G+++ e>+++++ !h++* r!-- y--

(www.geekcode.com)

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: i686 SMP systems with more then 12 GB ram with 2.4.x kernel ?
  2002-01-02 23:50       ` Harald Holzer
  2002-01-03  0:16         ` Alan Cox
  2002-01-03  0:17         ` Mark Zealey
@ 2002-01-03 13:28         ` Rik van Riel
  2002-01-03 14:33           ` Stephan von Krawczynski
  2002-01-03 15:15         ` Anton Blanchard
  3 siblings, 1 reply; 37+ messages in thread
From: Rik van Riel @ 2002-01-03 13:28 UTC (permalink / raw)
  To: Harald Holzer; +Cc: linux-kernel@vger.kernel.org, Timothy D. Witham

On 3 Jan 2002, Harald Holzer wrote:

> at 1GB ram, are 16,936kB low mem reserved.
> 4GB ram, 72,824kB reserved
> 8GB ram, 142,332kB reserved
> 16GB ram, 269,424kB reserved
> 32GB ram, 532,080kB reserved, usable low mem: 352 MB
> 64GB ram ??
>
> Which function does the reserved memory fulfill ?
> Is it all for paging ?

The kernel stores various data structures there, in particular
the mem_map[] array, which has one data structure for each
page.

In the standard kernel, that is 52 bytes per page, giving you
a space usage of 416 MB for the mem_map[] array.

I'm currently integrating a patch into my VM tree which removes
the wait queue from the page struct, bringing the size down to
36 bytes per page, or 288 MB, giving a space saving of 128 MB.

Another item to look into is removing the page cache hash table
and replacing it by a radix tree or hash trie, in the hopes of
improving scalability while at the same time saving some space.

As for page table overhead, on machines like yours we really
should be using 4 MB pages for the larger data segments, which
will cut down the page table size by a factor of 512 ;)

regards,

Rik
-- 
Shortwave goes a long way:  irc.starchat.net  #swl

http://www.surriel.com/		http://distro.conectiva.com/


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: i686 SMP systems with more then 12 GB ram with 2.4.x kernel ?
  2002-01-03  0:16         ` Alan Cox
@ 2002-01-03 13:30           ` Rik van Riel
  2002-01-04 12:09             ` Hugh Dickins
  0 siblings, 1 reply; 37+ messages in thread
From: Rik van Riel @ 2002-01-03 13:30 UTC (permalink / raw)
  To: Alan Cox; +Cc: Harald Holzer, linux-kernel@vger.kernel.org, Timothy D. Witham

On Thu, 3 Jan 2002, Alan Cox wrote:

> > Which function does the reserved memory fulfill ?
> > Is it all for paging ?
>
> A lot of it is the page structs (64bytes per page - which really
> should be nearer the 32 some rival Unix OS's achieve on x86)

The 2.4 kernel has the page struct at 52 bytes in size,
William Lee Irwin and I have brought this down to 36.

Expect to see this integrated into the rmap VM soon ;)

regards,

Rik
-- 
Shortwave goes a long way:  irc.starchat.net  #swl

http://www.surriel.com/		http://distro.conectiva.com/


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: i686 SMP systems with more then 12 GB ram with 2.4.x kernel ?
  2002-01-03 13:28         ` Rik van Riel
@ 2002-01-03 14:33           ` Stephan von Krawczynski
  2002-01-03 16:38             ` Rik van Riel
  0 siblings, 1 reply; 37+ messages in thread
From: Stephan von Krawczynski @ 2002-01-03 14:33 UTC (permalink / raw)
  To: Rik van Riel; +Cc: harald.holzer, linux-kernel, wookie, velco

On Thu, 3 Jan 2002 11:28:45 -0200 (BRST)
Rik van Riel <riel@conectiva.com.br> wrote:

> Another item to look into is removing the page cache hash table
> and replacing it by a radix tree or hash trie, in the hopes of
> improving scalability while at the same time saving some space.

Ah, didn't we see such a patch lately in LKML? If I remember correct I saw some
comparison charts too and some people testing it were happy with it. Just
searched through the list: 24. dec :-) by Momchil Velikov Can someone with big
mem have a look at the saving? How about 18-pre?

Regards,
Stephan

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: i686 SMP systems with more then 12 GB ram with 2.4.x kernel ?
  2002-01-02 23:50       ` Harald Holzer
                           ` (2 preceding siblings ...)
  2002-01-03 13:28         ` Rik van Riel
@ 2002-01-03 15:15         ` Anton Blanchard
  3 siblings, 0 replies; 37+ messages in thread
From: Anton Blanchard @ 2002-01-03 15:15 UTC (permalink / raw)
  To: Harald Holzer; +Cc: linux-kernel@vger.kernel.org, Timothy D. Witham


> Today i checked some memory configurations and noticed that the low
> memory decreases, when i add more memory to the system,
> and the size of reserved memory increases:
> 
> at 1GB ram, are 16,936kB low mem reserved.
> 4GB ram, 72,824kB reserved
> 8GB ram, 142,332kB reserved
> 16GB ram, 269,424kB reserved
> 32GB ram, 532,080kB reserved, usable low mem: 352 MB

> 64GB ram ?? 

If you need 64G of RAM and decent performance you dont want an x86.
Use a sparc64, alpha or ppc64 linux machine.

Anton

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: i686 SMP systems with more then 12 GB ram with 2.4.x kernel ?
  2002-01-03 14:33           ` Stephan von Krawczynski
@ 2002-01-03 16:38             ` Rik van Riel
  0 siblings, 0 replies; 37+ messages in thread
From: Rik van Riel @ 2002-01-03 16:38 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: harald.holzer, linux-kernel, wookie, velco

On Thu, 3 Jan 2002, Stephan von Krawczynski wrote:
> On Thu, 3 Jan 2002 11:28:45 -0200 (BRST)
> Rik van Riel <riel@conectiva.com.br> wrote:
>
> > Another item to look into is removing the page cache hash table
> > and replacing it by a radix tree or hash trie, in the hopes of
> > improving scalability while at the same time saving some space.
>
> Ah, didn't we see such a patch lately in LKML? If I remember correct I
> saw some comparison charts too and some people testing it were happy
> with it. Just searched through the list: 24. dec :-) by Momchil
> Velikov Can someone with big mem have a look at the saving? How about
> 18-pre?

>From what velco told me on IRC, he is still tuning his work
and looking at further improvements.

One thing to keep in mind is that most pages are in the
page cache; we wouldn't want to reduce space in one data
structure just to use more space elsewhere, this is
something to look at very carefully...

regards,

Rik
-- 
Shortwave goes a long way:  irc.starchat.net  #swl

http://www.surriel.com/		http://distro.conectiva.com/


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: i686 SMP systems with more then 12 GB ram with 2.4.x kernel ?
  2002-01-03 13:30           ` Rik van Riel
@ 2002-01-04 12:09             ` Hugh Dickins
  2002-01-04 12:15               ` Rik van Riel
  0 siblings, 1 reply; 37+ messages in thread
From: Hugh Dickins @ 2002-01-04 12:09 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Alan Cox, Harald Holzer, linux-kernel@vger.kernel.org,
	Timothy D. Witham

On Thu, 3 Jan 2002, Rik van Riel wrote:
> On Thu, 3 Jan 2002, Alan Cox wrote:
> > A lot of it is the page structs (64bytes per page - which really
> > should be nearer the 32 some rival Unix OS's achieve on x86)
> 
> The 2.4 kernel has the page struct at 52 bytes in size,
> William Lee Irwin and I have brought this down to 36.

Please restate those numbers, Rik: I share Alan's belief that the
current standard 2.4 kernel has page struct at 64 bytes in size.

Hugh


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: i686 SMP systems with more then 12 GB ram with 2.4.x kernel ?
  2002-01-04 12:09             ` Hugh Dickins
@ 2002-01-04 12:15               ` Rik van Riel
  0 siblings, 0 replies; 37+ messages in thread
From: Rik van Riel @ 2002-01-04 12:15 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Alan Cox, Harald Holzer, linux-kernel@vger.kernel.org,
	Timothy D. Witham

On Fri, 4 Jan 2002, Hugh Dickins wrote:
> On Thu, 3 Jan 2002, Rik van Riel wrote:
> > On Thu, 3 Jan 2002, Alan Cox wrote:
> > > A lot of it is the page structs (64bytes per page - which really
> > > should be nearer the 32 some rival Unix OS's achieve on x86)
> >
> > The 2.4 kernel has the page struct at 52 bytes in size,
> > William Lee Irwin and I have brought this down to 36.
>
> Please restate those numbers, Rik: I share Alan's belief that the
> current standard 2.4 kernel has page struct at 64 bytes in size.

Indeed, I counted wrong ... substracted the waitqueue when counting
the first time, then substracted it again ;)

The struct page in the current kernel is indeed 64 bytes. In
the rmap VM it's also 64 bytes (60 bytes if highmem is disabled).

After removal of the waitqueue, that'll be 52 bytes, or 48 if
highmem is disabled.

kind regards,

Rik
-- 
DMCA, SSSCA, W3C?  Who cares?  http://thefreeworld.net/

http://www.surriel.com/		http://distro.conectiva.com/


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: i686 SMP systems with more then 12 GB ram with 2.4.x kernel ?
  2002-01-02 21:17       ` Gerrit Huizenga
@ 2002-01-06  8:20         ` Benjamin LaHaise
  2002-01-06 16:16           ` Alan Cox
  0 siblings, 1 reply; 37+ messages in thread
From: Benjamin LaHaise @ 2002-01-06  8:20 UTC (permalink / raw)
  To: Gerrit Huizenga; +Cc: Alan Cox, M. Edward Borasky, Harald Holzer, linux-kernel

On Wed, Jan 02, 2002 at 01:17:59PM -0800, Gerrit Huizenga wrote:
> I don't know if there are real examples of large memory systems
> exhausting the ~1 GB of kernel virtual address space on machines
> with > 12-32 GB of physical memory (we had this problem in PTX which
> created the need for a larger kernel virtual address space in some
> contexts).

The ~800MB or so of kernel address space is exhausted with struct page 
entries around 48GB of physical memory or so

SGI's original highmem patch switched page tables on entry to kernel 
space, so there is code already tested that we can borrow.  But I'm 
not sure if it's worth it as the overhead it adds makes life really 
suck: we would lose the ability to use global pages, as well as always 
encounter tlb misses on the kernel<->userspace transition.  PAE shows 
up as a 5% performance loss on normal loads, and this would make it 
worse.  We're probably better off implementing PSE.  Of course, making 
these kinds of choices is hard without actual statistics of the 
usage patterns we're targetting.

> Would be nice to have a config option like "CONFIG_PCI_36" to imply
> that all devices on a PAE system were able to access all of memory,
> globally removing the need for bounce buffering and allowing a native
> PCI setup for mapping memory addresses...

That would be neat.

		-ben
-- 
Fish.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: i686 SMP systems with more then 12 GB ram with 2.4.x kernel ?
  2002-01-06  8:20         ` Benjamin LaHaise
@ 2002-01-06 16:16           ` Alan Cox
  2002-01-06 20:23             ` Benjamin LaHaise
                               ` (2 more replies)
  0 siblings, 3 replies; 37+ messages in thread
From: Alan Cox @ 2002-01-06 16:16 UTC (permalink / raw)
  To: Benjamin LaHaise
  Cc: Gerrit Huizenga, Alan Cox, M. Edward Borasky, Harald Holzer,
	linux-kernel

> up as a 5% performance loss on normal loads, and this would make it 
> worse.  We're probably better off implementing PSE.  Of course, making 
> these kinds of choices is hard without actual statistics of the 
> usage patterns we're targetting.

You don't neccessarily need PSE. Migrating to an option to support > 4K
_virtual_ page size is more flexible for x86, although it would need 
glibc getpagesize() fixing I think, and might mean a few apps wouldnt
run in that configuration.

Alan

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: i686 SMP systems with more then 12 GB ram with 2.4.x kernel ?
@ 2002-01-06 18:39 Daniel Freedman
  2002-01-06 18:59 ` Marvin Justice
  0 siblings, 1 reply; 37+ messages in thread
From: Daniel Freedman @ 2002-01-06 18:39 UTC (permalink / raw)
  To: linux-kernel


On Jan 01 2002, H. Peter Anvin (hpa@zytor.com) wrote:
> By author: Alan Cox <alan@lxorguk.ukuu.org.uk>
> >
> > > 2. Isn't the boundary at 2^30 really irrelevant and the three "correct"
> > > zones are (0 - 2^24-1), (2^24 - 2^32-1) and (2^32 - 2^36-1)?
> >
> > Nope. The limit for directly mapped memory is 2^30.
> >
> 
> 2^30-2^27 to be exact (assuming a 3:1 split and 128MB vmalloc zone.)
> 
>         -hpa

For my better understanding, where's the 128MB vmalloc zone assumption
defined, please?

I'm pretty sure I understand that the 3:1 split you refer to is
defined by PAGE_OFFSET in asm-i386/page.h

But when I tried to find the answer in the source for the vmalloc
zone, I looked in linux/mm.h, linux/mmzone.h, linux/vmalloc.h, and
mm/vmalloc.c, but couldn't find anything there or in O'Reilly's kernel
book that I could follow/understand.

Thanks for any pointers.

Take care,

Daniel

-- 
Daniel A. Freedman
Laboratory for Atomic and Solid State Physics
Department of Physics
Cornell University

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: i686 SMP systems with more then 12 GB ram with 2.4.x kernel ?
  2002-01-06 18:39 Daniel Freedman
@ 2002-01-06 18:59 ` Marvin Justice
  2002-01-06 19:45   ` Daniel Freedman
  0 siblings, 1 reply; 37+ messages in thread
From: Marvin Justice @ 2002-01-06 18:59 UTC (permalink / raw)
  To: Daniel Freedman, linux-kernel

Is this what your looking for? Just below the definition of PAGE_OFFSET in 
page.h:

/*
 * This much address space is reserved for vmalloc() and iomap()
 * as well as fixmap mappings.
 */
#define __VMALLOC_RESERVE	(128 << 20)
 

On Sunday 06 January 2002 12:39 pm, Daniel Freedman wrote:
> On Jan 01 2002, H. Peter Anvin (hpa@zytor.com) wrote:
> > By author: Alan Cox <alan@lxorguk.ukuu.org.uk>
> >
> > > > 2. Isn't the boundary at 2^30 really irrelevant and the three
> > > > "correct" zones are (0 - 2^24-1), (2^24 - 2^32-1) and (2^32 -
> > > > 2^36-1)?
> > >
> > > Nope. The limit for directly mapped memory is 2^30.
> >
> > 2^30-2^27 to be exact (assuming a 3:1 split and 128MB vmalloc zone.)
> >
> >         -hpa
>
> For my better understanding, where's the 128MB vmalloc zone assumption
> defined, please?
>
> I'm pretty sure I understand that the 3:1 split you refer to is
> defined by PAGE_OFFSET in asm-i386/page.h
>
> But when I tried to find the answer in the source for the vmalloc
> zone, I looked in linux/mm.h, linux/mmzone.h, linux/vmalloc.h, and
> mm/vmalloc.c, but couldn't find anything there or in O'Reilly's kernel
> book that I could follow/understand.
>
> Thanks for any pointers.
>
> Take care,
>
> Daniel

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: i686 SMP systems with more then 12 GB ram with 2.4.x kernel ?
  2002-01-06 18:59 ` Marvin Justice
@ 2002-01-06 19:45   ` Daniel Freedman
  2002-01-06 20:15     ` Marvin Justice
  0 siblings, 1 reply; 37+ messages in thread
From: Daniel Freedman @ 2002-01-06 19:45 UTC (permalink / raw)
  To: Marvin Justice; +Cc: linux-kernel


Hi Marvin,

Thanks for the quick reply.


On Sun, Jan 06, 2002, Marvin Justice wrote:
> Is this what your looking for? Just below the definition of PAGE_OFFSET in 
> page.h:
> 
> /*
>  * This much address space is reserved for vmalloc() and iomap()
>  * as well as fixmap mappings.
>  */
> #define __VMALLOC_RESERVE	(128 << 20)

However, while it does seem to be exactly the definition for 128MB
vmalloc offset that I was looking for, I don't seem to have this
definition in my source tree (2.4.16):

  freedman@planck:/usr/src/linux$ grep -r __VMALLOC_RESERVE *
  freedman@planck:/usr/src/linux$ 

Any idea why this is so?

Thanks again,

Daniel

> On Sunday 06 January 2002 12:39 pm, Daniel Freedman wrote:
> > On Jan 01 2002, H. Peter Anvin (hpa@zytor.com) wrote:
> > > By author: Alan Cox <alan@lxorguk.ukuu.org.uk>
> > >
> > > > > 2. Isn't the boundary at 2^30 really irrelevant and the three
> > > > > "correct" zones are (0 - 2^24-1), (2^24 - 2^32-1) and (2^32 -
> > > > > 2^36-1)?
> > > >
> > > > Nope. The limit for directly mapped memory is 2^30.
> > >
> > > 2^30-2^27 to be exact (assuming a 3:1 split and 128MB vmalloc zone.)
> > >
> > >         -hpa
> >
> > For my better understanding, where's the 128MB vmalloc zone assumption
> > defined, please?
> >
> > I'm pretty sure I understand that the 3:1 split you refer to is
> > defined by PAGE_OFFSET in asm-i386/page.h
> >
> > But when I tried to find the answer in the source for the vmalloc
> > zone, I looked in linux/mm.h, linux/mmzone.h, linux/vmalloc.h, and
> > mm/vmalloc.c, but couldn't find anything there or in O'Reilly's kernel
> > book that I could follow/understand.
> >
> > Thanks for any pointers.
> >
> > Take care,
> >
> > Daniel

-- 
Daniel A. Freedman
Laboratory for Atomic and Solid State Physics
Department of Physics
Cornell University

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: i686 SMP systems with more then 12 GB ram with 2.4.x kernel ?
  2002-01-06 19:45   ` Daniel Freedman
@ 2002-01-06 20:15     ` Marvin Justice
  2002-01-07  5:17       ` Daniel Freedman
  0 siblings, 1 reply; 37+ messages in thread
From: Marvin Justice @ 2002-01-06 20:15 UTC (permalink / raw)
  To: Daniel Freedman; +Cc: linux-kernel

On Sunday 06 January 2002 01:45 pm, Daniel Freedman wrote:
> Hi Marvin,
>
> Thanks for the quick reply.
>
> On Sun, Jan 06, 2002, Marvin Justice wrote:
> > Is this what your looking for? Just below the definition of PAGE_OFFSET
> > in page.h:
> >
> > /*
> >  * This much address space is reserved for vmalloc() and iomap()
> >  * as well as fixmap mappings.
> >  */
> > #define __VMALLOC_RESERVE	(128 << 20)
>
> However, while it does seem to be exactly the definition for 128MB
> vmalloc offset that I was looking for, I don't seem to have this
> definition in my source tree (2.4.16):
>
>   freedman@planck:/usr/src/linux$ grep -r __VMALLOC_RESERVE *
>   freedman@planck:/usr/src/linux$
>
> Any idea why this is so?
>
> Thanks again,
>
> Daniel
>

Hmmm. Looks like it was moved sometime between 2.4.16 and 2.4.18pre1. In my 
2.4.16 tree it's located in arch/i386/kernel/setup.c and without the leading 
underscores.

-M

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: i686 SMP systems with more then 12 GB ram with 2.4.x kernel ?
  2002-01-06 16:16           ` Alan Cox
@ 2002-01-06 20:23             ` Benjamin LaHaise
  2002-01-06 23:37             ` Chris Wedgwood
  2002-01-07 16:20             ` Hugh Dickins
  2 siblings, 0 replies; 37+ messages in thread
From: Benjamin LaHaise @ 2002-01-06 20:23 UTC (permalink / raw)
  To: Alan Cox; +Cc: Gerrit Huizenga, M. Edward Borasky, Harald Holzer, linux-kernel

On Sun, Jan 06, 2002 at 04:16:07PM +0000, Alan Cox wrote:
> You don't neccessarily need PSE. Migrating to an option to support > 4K
> _virtual_ page size is more flexible for x86, although it would need 
> glibc getpagesize() fixing I think, and might mean a few apps wouldnt
> run in that configuration.

Perhaps, but if the majority of people using 64GB of ram are served well 
by PSE, then it's worth getting that 5% of performance back.

		-ben
-- 
Fish.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: i686 SMP systems with more then 12 GB ram with 2.4.x kernel ?
  2002-01-06 16:16           ` Alan Cox
  2002-01-06 20:23             ` Benjamin LaHaise
@ 2002-01-06 23:37             ` Chris Wedgwood
  2002-01-07  0:29               ` The COUGAR Project M. Edward Borasky
  2002-01-07  2:18               ` i686 SMP systems with more then 12 GB ram with 2.4.x kernel ? Marvin Justice
  2002-01-07 16:20             ` Hugh Dickins
  2 siblings, 2 replies; 37+ messages in thread
From: Chris Wedgwood @ 2002-01-06 23:37 UTC (permalink / raw)
  To: Alan Cox
  Cc: Benjamin LaHaise, Gerrit Huizenga, M. Edward Borasky,
	Harald Holzer, linux-kernel


On Sun, Jan 06, 2002 at 04:16:07PM +0000, Alan Cox wrote:

    You don't neccessarily need PSE. Migrating to an option to support
    > 4K _virtual_ page size is more flexible for x86, although it
    would need glibc getpagesize() fixing I think, and might mean a
    few apps wouldnt run in that configuration.

If someone has a minute or so, can someone briefly explain the
difference(s) between PSE and PAE?



  --cw

^ permalink raw reply	[flat|nested] 37+ messages in thread

* The COUGAR Project
  2002-01-06 23:37             ` Chris Wedgwood
@ 2002-01-07  0:29               ` M. Edward Borasky
  2002-01-07  2:18               ` i686 SMP systems with more then 12 GB ram with 2.4.x kernel ? Marvin Justice
  1 sibling, 0 replies; 37+ messages in thread
From: M. Edward Borasky @ 2002-01-07  0:29 UTC (permalink / raw)
  To: linux-kernel

The COUGAR project is something I've been thinking about the past few
months. Unfortunately I no longer have *any* free time to devote to it. So
I'm releasing the proposal on the web, in the hopes that someone in the
kernel community will pick it up and make a project out of it. See
http://www.borasky-research.net/Cougar.htm.

--
M. Edward Borasky

znmeb@borasky-research.net
http://www.borasky-research.net


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: i686 SMP systems with more then 12 GB ram with 2.4.x kernel ?
  2002-01-06 23:37             ` Chris Wedgwood
  2002-01-07  0:29               ` The COUGAR Project M. Edward Borasky
@ 2002-01-07  2:18               ` Marvin Justice
  2002-01-07  2:38                 ` Chris Wedgwood
  1 sibling, 1 reply; 37+ messages in thread
From: Marvin Justice @ 2002-01-07  2:18 UTC (permalink / raw)
  To: Chris Wedgwood, Alan Cox
  Cc: Benjamin LaHaise, Gerrit Huizenga, M. Edward Borasky,
	Harald Holzer, linux-kernel

> If someone has a minute or so, can someone briefly explain the
> difference(s) between PSE and PAE?
>

Here's my (probably simple minded) understanding. With the PSE bit turned on 
in one of the x86 control registers (cr3?), page sizes are 4MB instead of the 
usual 4KB. One advantage of large pages is that there are fewer page tables 
and struct page's to store.

PAE is turned on by setting a different bit. It allows for the possibility up 
to 64GB of physical ram on i686. Actual addresses are still just 32 bits, 
however,  so any given process is limited to 4GB (actually Linux puts a limit 
of 3GB). But by using a 3 level paging scheme it's possible to map process 
A's 32 bit address space to a different region of physical ram than process 
B's which, in turn, is mapped to a different physical region than process C, 
etc. 

As far as I know, it's possible to set both bits simultaneously.

-Marvin


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: i686 SMP systems with more then 12 GB ram with 2.4.x kernel ?
  2002-01-07  2:18               ` i686 SMP systems with more then 12 GB ram with 2.4.x kernel ? Marvin Justice
@ 2002-01-07  2:38                 ` Chris Wedgwood
  2002-01-07  4:40                   ` T. A.
  0 siblings, 1 reply; 37+ messages in thread
From: Chris Wedgwood @ 2002-01-07  2:38 UTC (permalink / raw)
  To: Marvin Justice
  Cc: Alan Cox, Benjamin LaHaise, Gerrit Huizenga, M. Edward Borasky,
	Harald Holzer, linux-kernel

On Sun, Jan 06, 2002 at 08:18:33PM -0600, Marvin Justice wrote:

    Here's my (probably simple minded) understanding. With the PSE bit
    turned on in one of the x86 control registers (cr3?), page sizes
    are 4MB instead of the usual 4KB. One advantage of large pages is
    that there are fewer page tables and struct page's to store.

Ah, I knew 4MB pages were possible... I was under the impression _all_
pages had to be 4MB which would seem to suck badly as they would be
too coarse for many applications (but for certain large sci. apps. I'm
sure this would be perfect, less TLB thrashing too with sparse
data-sets).

On the whole, I'm not sure I can see how 4MB pages _everywhere_ in
user-space would be a win for many people at all...


  --cw

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: i686 SMP systems with more then 12 GB ram with 2.4.x kernel ?
  2002-01-07  2:38                 ` Chris Wedgwood
@ 2002-01-07  4:40                   ` T. A.
  0 siblings, 0 replies; 37+ messages in thread
From: T. A. @ 2002-01-07  4:40 UTC (permalink / raw)
  To: Linux Kernel Mailing List, Chris Wedgwood

    There are 2MB pages as well.  Probably would be a better choice than
4MB.  Also only has a 2 tier paging mechanism instead of a 3 tier one when
paging with 4KB which should help take care of the current slowdown with
highmem.

----- Original Message -----
From: "Chris Wedgwood" <cw@f00f.org>
To: "Marvin Justice" <mjustice@austin.rr.com>
Cc: "Alan Cox" <alan@lxorguk.ukuu.org.uk>; "Benjamin LaHaise"
<bcrl@redhat.com>; "Gerrit Huizenga" <gerrit@us.ibm.com>; "M. Edward
Borasky" <znmeb@aracnet.com>; "Harald Holzer" <harald.holzer@eunet.at>;
<linux-kernel@vger.kernel.org>
Sent: Sunday, January 06, 2002 9:38 PM
Subject: Re: i686 SMP systems with more then 12 GB ram with 2.4.x kernel ?


> On Sun, Jan 06, 2002 at 08:18:33PM -0600, Marvin Justice wrote:
>
>     Here's my (probably simple minded) understanding. With the PSE bit
>     turned on in one of the x86 control registers (cr3?), page sizes
>     are 4MB instead of the usual 4KB. One advantage of large pages is
>     that there are fewer page tables and struct page's to store.
>
> Ah, I knew 4MB pages were possible... I was under the impression _all_
> pages had to be 4MB which would seem to suck badly as they would be
> too coarse for many applications (but for certain large sci. apps. I'm
> sure this would be perfect, less TLB thrashing too with sparse
> data-sets).
>
> On the whole, I'm not sure I can see how 4MB pages _everywhere_ in
> user-space would be a win for many people at all...
>
>
>   --cw
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: i686 SMP systems with more then 12 GB ram with 2.4.x kernel ?
  2002-01-06 20:15     ` Marvin Justice
@ 2002-01-07  5:17       ` Daniel Freedman
  0 siblings, 0 replies; 37+ messages in thread
From: Daniel Freedman @ 2002-01-07  5:17 UTC (permalink / raw)
  To: linux-kernel

On Sun, Jan 06, 2002, Marvin Justice wrote:
> On Sunday 06 January 2002 01:45 pm, Daniel Freedman wrote:
> > Hi Marvin,
> >
> > Thanks for the quick reply.
> >
> > On Sun, Jan 06, 2002, Marvin Justice wrote:
> > > Is this what your looking for? Just below the definition of PAGE_OFFSET
> > > in page.h:
> > >
> > > /*
> > >  * This much address space is reserved for vmalloc() and iomap()
> > >  * as well as fixmap mappings.
> > >  */
> > > #define __VMALLOC_RESERVE	(128 << 20)
> >
> > However, while it does seem to be exactly the definition for 128MB
> > vmalloc offset that I was looking for, I don't seem to have this
> > definition in my source tree (2.4.16):
> >
> >   freedman@planck:/usr/src/linux$ grep -r __VMALLOC_RESERVE *
> >   freedman@planck:/usr/src/linux$
> 
> Hmmm. Looks like it was moved sometime between 2.4.16 and 2.4.18pre1. In my 
> 2.4.16 tree it's located in arch/i386/kernel/setup.c and without the leading 
> underscores.
> 
> -M

<sheepishly buries head in sand>  Oops...  Sorry about missing that.

Thanks for the help and take care,

Daniel

-- 
Daniel A. Freedman
Laboratory for Atomic and Solid State Physics
Department of Physics
Cornell University

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: i686 SMP systems with more then 12 GB ram with 2.4.x kernel ?
  2002-01-06 16:16           ` Alan Cox
  2002-01-06 20:23             ` Benjamin LaHaise
  2002-01-06 23:37             ` Chris Wedgwood
@ 2002-01-07 16:20             ` Hugh Dickins
  2 siblings, 0 replies; 37+ messages in thread
From: Hugh Dickins @ 2002-01-07 16:20 UTC (permalink / raw)
  To: Alan Cox
  Cc: Benjamin LaHaise, Gerrit Huizenga, M. Edward Borasky,
	Harald Holzer, linux-kernel

On Sun, 6 Jan 2002, Alan Cox wrote:
> 
> You don't neccessarily need PSE. Migrating to an option to support > 4K
> _virtual_ page size is more flexible for x86, although it would need 
> glibc getpagesize() fixing I think, and might mean a few apps wouldnt
> run in that configuration.

Larger kernel PAGE_SIZE can work, still presenting 4KB page size to user
space for compat.  The interesting part is holding anon pages together,
not fragmenting to use PAGE_SIZE for each MMUPAGE_SIZE of user space.

I have patches against 2.4.6 and 2.4.7 which did that; but didn't keep
them up to date because there's a fair effort going through drivers
deciding which PAGE_s need to be MMUPAGE_s.  I intend to resurrect
that work against 2.5 later on (or sooner if there's interest).

Hugh


^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2002-01-07 16:18 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-12-29 18:18 i686 SMP systems with more then 12 GB ram with 2.4.x kernel ? Harald Holzer
2001-12-29 18:45 ` Alan Cox
2001-12-29 21:24   ` M. Edward (Ed) Borasky
2001-12-30  0:25     ` M. Edward (Ed) Borasky
2001-12-30  2:14       ` Harald Holzer
2001-12-30  2:33         ` M. Edward Borasky
2002-01-01 18:15   ` M. Edward Borasky
2002-01-01 18:46     ` Alan Cox
2002-01-01 19:02       ` M. Edward Borasky
2002-01-02  1:16       ` H. Peter Anvin
2002-01-02 21:17       ` Gerrit Huizenga
2002-01-06  8:20         ` Benjamin LaHaise
2002-01-06 16:16           ` Alan Cox
2002-01-06 20:23             ` Benjamin LaHaise
2002-01-06 23:37             ` Chris Wedgwood
2002-01-07  0:29               ` The COUGAR Project M. Edward Borasky
2002-01-07  2:18               ` i686 SMP systems with more then 12 GB ram with 2.4.x kernel ? Marvin Justice
2002-01-07  2:38                 ` Chris Wedgwood
2002-01-07  4:40                   ` T. A.
2002-01-07 16:20             ` Hugh Dickins
2002-01-02 17:30 ` Timothy D. Witham
     [not found]   ` <1009994687.12942.14.camel@hh2.hhhome.at>
     [not found]     ` <1009995669.1253.17.camel@wookie-laptop.pdx.osdl.net>
2002-01-02 23:50       ` Harald Holzer
2002-01-03  0:16         ` Alan Cox
2002-01-03 13:30           ` Rik van Riel
2002-01-04 12:09             ` Hugh Dickins
2002-01-04 12:15               ` Rik van Riel
2002-01-03  0:17         ` Mark Zealey
2002-01-03 13:28         ` Rik van Riel
2002-01-03 14:33           ` Stephan von Krawczynski
2002-01-03 16:38             ` Rik van Riel
2002-01-03 15:15         ` Anton Blanchard
  -- strict thread matches above, loose matches on Subject: below --
2001-12-29 19:25 Dieter Nützel
2002-01-06 18:39 Daniel Freedman
2002-01-06 18:59 ` Marvin Justice
2002-01-06 19:45   ` Daniel Freedman
2002-01-06 20:15     ` Marvin Justice
2002-01-07  5:17       ` Daniel Freedman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox