All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: Limit hash table size
@ 2004-01-12 16:50 Manfred Spraul
  0 siblings, 0 replies; 54+ messages in thread
From: Manfred Spraul @ 2004-01-12 16:50 UTC (permalink / raw)
  To: Anton Blanchard, Andrew Morton, Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 460 bytes --]

>
>
>Why cant we do something like Andrews recent min_free_kbytes patch and
>make the rate of change non linear. Just slow the increase down as we
>get bigger. I agree a 2GB hashtable is pretty ludicrous, but a 4MB one
>on a 512GB machine (which we sell at the moment) could be too :)
>  
>
What about making the limit configurable with a boot time parameter? If 
someone uses a 512 GB ppc64 as an nfs server, he might want a 2 GB inode 
hash.

--
    Manfred

[-- Attachment #2: patch-hash-alloc --]
[-- Type: text/plain, Size: 2266 bytes --]

// $Header$
// Kernel Version:
//  VERSION = 2
//  PATCHLEVEL = 6
//  SUBLEVEL = 0
//  EXTRAVERSION = -test11
--- 2.6/fs/inode.c	2003-11-29 09:46:34.000000000 +0100
+++ build-2.6/fs/inode.c	2003-11-29 10:19:21.000000000 +0100
@@ -1327,6 +1327,20 @@
 		wake_up_all(wq);
 }
 
+static __initdata int ihash_entries;
+
+static int __init set_ihash_entries(char *str)
+{
+	get_option(&str, &ihash_entries);
+	if (ihash_entries <= 0) {
+		ihash_entries = 0;
+		return 0;
+	}
+	return 1;
+}
+
+__setup("ihash_entries=", set_ihash_entries);
+
 /*
  * Initialize the waitqueues and inode hash table.
  */
@@ -1340,8 +1354,16 @@
 	for (i = 0; i < ARRAY_SIZE(i_wait_queue_heads); i++)
 		init_waitqueue_head(&i_wait_queue_heads[i].wqh);
 
-	mempages >>= (14 - PAGE_SHIFT);
-	mempages *= sizeof(struct hlist_head);
+	if (!ihash_entries) {
+		ihash_entries = mempages >> (14 - PAGE_SHIFT);
+		/* Limit inode hash size. Override for nfs servers
+		 * that handle lots of files.
+		 */
+		if (ihash_entries > 1024*1024)
+			ihash_entries = 1024*1024;
+	}
+
+	mempages = ihash_entries*sizeof(struct hlist_head);
 	for (order = 0; ((1UL << order) << PAGE_SHIFT) < mempages; order++)
 		;
 
--- 2.6/fs/dcache.c	2003-11-29 09:46:34.000000000 +0100
+++ build-2.6/fs/dcache.c	2003-11-29 10:53:15.000000000 +0100
@@ -1546,6 +1546,20 @@
 	return ino;
 }
 
+static __initdata int dhash_entries;
+
+static int __init set_dhash_entries(char *str)
+{
+	get_option(&str, &dhash_entries);
+	if (dhash_entries <= 0) {
+		dhash_entries = 0;
+		return 0;
+	}
+	return 1;
+}
+
+__setup("dhash_entries=", set_dhash_entries);
+
 static void __init dcache_init(unsigned long mempages)
 {
 	struct hlist_head *d;
@@ -1571,10 +1585,18 @@
 	
 	set_shrinker(DEFAULT_SEEKS, shrink_dcache_memory);
 
+	if (!dhash_entries) {
 #if PAGE_SHIFT < 13
-	mempages >>= (13 - PAGE_SHIFT);
+		mempages >>= (13 - PAGE_SHIFT);
 #endif
-	mempages *= sizeof(struct hlist_head);
+		dhash_entries = mempages;
+		/* 8 mio is enough for general purpose systems.
+		 * For file servers, override with "dhash_entries="
+		 */
+		if (dhash_entries > 8*1024*1024)
+			dhash_entries = 8*1024*1024;
+	}
+	mempages = dhash_entries*sizeof(struct hlist_head);
 	for (order = 0; ((1UL << order) << PAGE_SHIFT) < mempages; order++)
 		;
 

^ permalink raw reply	[flat|nested] 54+ messages in thread
* Re: Limit hash table size
@ 2004-02-06  6:32 Manfred Spraul
  0 siblings, 0 replies; 54+ messages in thread
From: Manfred Spraul @ 2004-02-06  6:32 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, Chen, Kenneth W

Andrew wrote:

>Maybe we should leave the sizing of these tables as-is, and add some hook
>which allows the architecture to scale them back.
>  
>
Architecture or administrator?
I think a boot parameter is the better solution: The admin knows if his 
system is a compute node or a file server.

--
    Manfred



^ permalink raw reply	[flat|nested] 54+ messages in thread
[parent not found: <B05667366EE6204181EABE9C1B1C0EB5802441@scsmsx401.sc.intel.com.suse.lists.linux.kernel>]
* Limit hash table size
@ 2004-01-08 23:12 ` Chen, Kenneth W
  0 siblings, 0 replies; 54+ messages in thread
From: Chen, Kenneth W @ 2004-01-08 23:12 UTC (permalink / raw)
  To: Linux Kernel Mailing List, linux-ia64; +Cc: Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 1134 bytes --]

The issue of exceedingly large hash tables has been discussed on the
mailing list a while back, but seems to slip through the cracks.

What we found is it's not a problem for x86 (and most other
architectures) because __get_free_pages won't be able to get anything
beyond order MAX_ORDER-1 (10) which means at most those hash tables are
4MB each (assume 4K page size).  However, on ia64, in order to support
larger hugeTLB page size, the MAX_ORDER is bumped up to 18, which now
means a 2GB upper limits enforced by the page allocator (assume 16K page
size).  PPC64 is another example that bumps up MAX_ORDER.

Last time I checked, the tcp ehash table is taking a whooping (insane!)
2GB on one of our large machine.  dentry and inode hash tables also take
considerable amount of memory.

This patch just enforces all the hash tables to have a max order of 10,
which limits them down to 16MB each on ia64.  People can clean up other
part of table size calculation.  But minimally, this patch doesn't
change any hash sizes already in use on x86.

Andrew, would you please apply?

- Ken Chen
 <<hashtable.patch>> 

[-- Attachment #2: hashtable.patch --]
[-- Type: application/octet-stream, Size: 2196 bytes --]

diff -Nurp linux-2.6.0.orig/fs/dcache.c linux-2.6.0/fs/dcache.c
--- linux-2.6.0.orig/fs/dcache.c	2003-12-17 18:58:15.000000000 -0800
+++ linux-2.6.0/fs/dcache.c	2004-01-08 14:59:58.000000000 -0800
@@ -1571,11 +1571,9 @@ static void __init dcache_init(unsigned 
 	
 	set_shrinker(DEFAULT_SEEKS, shrink_dcache_memory);
 
-#if PAGE_SHIFT < 13
-	mempages >>= (13 - PAGE_SHIFT);
-#endif
+	mempages >>= 1;
 	mempages *= sizeof(struct hlist_head);
-	for (order = 0; ((1UL << order) << PAGE_SHIFT) < mempages; order++)
+	for (order = 0; (order < 10) && (((1UL << order) << PAGE_SHIFT) < mempages); order++)
 		;
 
 	do {
diff -Nurp linux-2.6.0.orig/fs/inode.c linux-2.6.0/fs/inode.c
--- linux-2.6.0.orig/fs/inode.c	2003-12-17 18:59:55.000000000 -0800
+++ linux-2.6.0/fs/inode.c	2004-01-08 15:00:19.000000000 -0800
@@ -1340,9 +1340,9 @@ void __init inode_init(unsigned long mem
 	for (i = 0; i < ARRAY_SIZE(i_wait_queue_heads); i++)
 		init_waitqueue_head(&i_wait_queue_heads[i].wqh);
 
-	mempages >>= (14 - PAGE_SHIFT);
+	mempages >>= 2;
 	mempages *= sizeof(struct hlist_head);
-	for (order = 0; ((1UL << order) << PAGE_SHIFT) < mempages; order++)
+	for (order = 0; (order < 10) && (((1UL << order) << PAGE_SHIFT) < mempages); order++)
 		;
 
 	do {
diff -Nurp linux-2.6.0.orig/net/ipv4/route.c linux-2.6.0/net/ipv4/route.c
--- linux-2.6.0.orig/net/ipv4/route.c	2003-12-17 18:59:55.000000000 -0800
+++ linux-2.6.0/net/ipv4/route.c	2004-01-08 15:01:17.000000000 -0800
@@ -2747,7 +2747,7 @@ int __init ip_rt_init(void)
 
 	goal = num_physpages >> (26 - PAGE_SHIFT);
 
-	for (order = 0; (1UL << order) < goal; order++)
+	for (order = 0; (order < 10) && ((1UL << order) < goal); order++)
 		/* NOTHING */;
 
 	do {
diff -Nurp linux-2.6.0.orig/net/ipv4/tcp.c linux-2.6.0/net/ipv4/tcp.c
--- linux-2.6.0.orig/net/ipv4/tcp.c	2003-12-17 18:58:38.000000000 -0800
+++ linux-2.6.0/net/ipv4/tcp.c	2004-01-08 15:00:42.000000000 -0800
@@ -2610,7 +2610,7 @@ void __init tcp_init(void)
 	else
 		goal = num_physpages >> (23 - PAGE_SHIFT);
 
-	for (order = 0; (1UL << order) < goal; order++)
+	for (order = 0; (order < 10) && ((1UL << order) < goal); order++)
 		;
 	do {
 		tcp_ehash_size = (1UL << order) * PAGE_SIZE /

^ permalink raw reply	[flat|nested] 54+ messages in thread

end of thread, other threads:[~2004-02-19  7:45 UTC | newest]

Thread overview: 54+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-01-12 16:50 Limit hash table size Manfred Spraul
  -- strict thread matches above, loose matches on Subject: below --
2004-02-06  6:32 Manfred Spraul
     [not found] <B05667366EE6204181EABE9C1B1C0EB5802441@scsmsx401.sc.intel.com.suse.lists.linux.kernel>
     [not found] ` <20040205155813.726041bd.akpm@osdl.org.suse.lists.linux.kernel>
2004-02-06  1:54   ` Andi Kleen
2004-02-05  2:38     ` Steve Lord
2004-02-06  3:12       ` Andrew Morton
2004-02-05  4:06         ` Steve Lord
2004-02-06  4:39           ` Andi Kleen
2004-02-06  4:59             ` Andrew Morton
2004-02-06  5:34             ` Maneesh Soni
2004-02-06  3:19         ` Andi Kleen
2004-02-06  3:23         ` Nick Piggin
2004-02-06  3:34           ` Andrew Morton
2004-02-06  3:38             ` Nick Piggin
2004-02-18 12:41       ` Pavel Machek
2004-02-06  3:09     ` Andrew Morton
2004-02-06  3:18       ` Andi Kleen
2004-02-06  3:30         ` Andrew Morton
2004-02-06  4:45           ` Martin J. Bligh
2004-02-06  6:22       ` Matt Mackall
2004-02-06 20:20       ` Taneli Vähäkangas
2004-02-06 20:27         ` Andrew Morton
2004-02-06 21:46           ` Taneli Vähäkangas
2004-01-08 23:12 Chen, Kenneth W
2004-01-08 23:12 ` Chen, Kenneth W
2004-01-09  9:25 ` Andrew Morton
2004-01-09  9:25   ` Andrew Morton
2004-01-09 14:25 ` Anton Blanchard
2004-01-09 14:25   ` Anton Blanchard
2004-01-09 19:05 ` Chen, Kenneth W
2004-01-09 19:05   ` Chen, Kenneth W
2004-01-12 13:32   ` Anton Blanchard
2004-01-12 13:32     ` Anton Blanchard
2004-01-14 22:29 ` Chen, Kenneth W
2004-01-14 22:29   ` Chen, Kenneth W
2004-01-14 22:31 ` Chen, Kenneth W
2004-01-14 22:31   ` Chen, Kenneth W
2004-01-18 14:25   ` Anton Blanchard
2004-01-18 14:25     ` Anton Blanchard
2004-02-05 23:58 ` Andrew Morton
2004-02-05 23:58   ` Andrew Morton
2004-02-06  0:10 ` Chen, Kenneth W
2004-02-06  0:10   ` Chen, Kenneth W
2004-02-06  0:23   ` Andrew Morton
2004-02-06  0:23     ` Andrew Morton
2004-02-09 23:12     ` Jes Sorensen
2004-02-09 23:12       ` Jes Sorensen
2004-02-17 22:24 ` Chen, Kenneth W
2004-02-17 22:24   ` Chen, Kenneth W
2004-02-17 23:24   ` Andrew Morton
2004-02-17 23:24     ` Andrew Morton
2004-02-18  0:16 ` Chen, Kenneth W
2004-02-18  0:16   ` Chen, Kenneth W
2004-02-18  0:45 ` Chen, Kenneth W
2004-02-18  0:45   ` Chen, Kenneth W

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.