[PATCH] Sanely size hash tables when using large base pages.

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Paul Mundt <lethal@linux-sh.org>
To: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org
Subject: [PATCH] Sanely size hash tables when using large base pages.
Date: Tue, 26 Dec 2006 15:16:52 +0900	[thread overview]
Message-ID: <20061226061652.GA598@linux-sh.org> (raw)

At the moment both the pidhash and inode/dentry cache hash tables (common
by way of alloc_large_system_hash()) are incorrectly sized by their
respective detection logic when we attempt to use large base pages on
systems with little memory.

This results in odd behaviour when using a 64kB PAGE_SIZE, such as:

PID hash table entries: 512 (order: 9, 2048 bytes)
...
Dentry cache hash table entries: 8192 (order: -1, 32768 bytes)
Inode-cache hash table entries: 4096 (order: -2, 16384 bytes)

The mount cache hash table is seemingly the only one that gets this right
by directly taking PAGE_SIZE in to account.

The following patch attempts to catch the bogus values and round it up to
at least 0-order (or down, in the PID hash case).

Signed-off-by: Paul Mundt <lethal@linux-sh.org>

--

 kernel/pid.c    |   17 +++++++++++++----
 mm/page_alloc.c |    4 ++++

 2 files changed, 17 insertions(+), 4 deletions(-)
diff --git a/kernel/pid.c b/kernel/pid.c
index 2efe9d8..198c6a9 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -383,20 +383,29 @@ void free_pid_ns(struct kref *kref)
 /*
  * The pid hash table is scaled according to the amount of memory in the
  * machine.  From a minimum of 16 slots up to 4096 slots at one gigabyte or
- * more.
+ * more.  As a safety net for large base page on small memory systems
+ * the hash table is scaled to a 0-order allocation in the event that the
+ * initial detection logic sizes the table incorrectly (which can result
+ * in a very large number of slots).
  */
 void __init pidhash_init(void)
 {
-	int i, pidhash_size;
+	int i, pidhash_size, size;
 	unsigned long megabytes = nr_kernel_pages >> (20 - PAGE_SHIFT);
 
 	pidhash_shift = max(4, fls(megabytes * 4));
 	pidhash_shift = min(12, pidhash_shift);
 	pidhash_size = 1 << pidhash_shift;
 
+	size = pidhash_size * sizeof(struct hlist_head);
+	if (unlikely(size < PAGE_SIZE)) {
+		size = PAGE_SIZE;
+		pidhash_size = size / sizeof(struct hlist_head);
+		pidhash_shift = 0;
+	}
+
 	printk("PID hash table entries: %d (order: %d, %Zd bytes)\n",
-		pidhash_size, pidhash_shift,
-		pidhash_size * sizeof(struct hlist_head));
+		pidhash_size, pidhash_shift, size);
 
 	pid_hash = alloc_bootmem(pidhash_size *	sizeof(*(pid_hash)));
 	if (!pid_hash)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8c1a116..4a9a83f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3321,6 +3321,10 @@ void *__init alloc_large_system_hash(con
 			numentries >>= (scale - PAGE_SHIFT);
 		else
 			numentries <<= (PAGE_SHIFT - scale);
+
+		/* Make sure we've got at least a 0-order allocation.. */
+		if (unlikely((numentries * bucketsize) < PAGE_SIZE))
+			numentries = PAGE_SIZE / bucketsize;
 	}
 	numentries = roundup_pow_of_two(numentries);

WARNING: multiple messages have this Message-ID (diff)

From: Paul Mundt <lethal@linux-sh.org>
To: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org
Subject: [PATCH] Sanely size hash tables when using large base pages.
Date: Tue, 26 Dec 2006 15:16:52 +0900	[thread overview]
Message-ID: <20061226061652.GA598@linux-sh.org> (raw)

At the moment both the pidhash and inode/dentry cache hash tables (common
by way of alloc_large_system_hash()) are incorrectly sized by their
respective detection logic when we attempt to use large base pages on
systems with little memory.

This results in odd behaviour when using a 64kB PAGE_SIZE, such as:

PID hash table entries: 512 (order: 9, 2048 bytes)
...
Dentry cache hash table entries: 8192 (order: -1, 32768 bytes)
Inode-cache hash table entries: 4096 (order: -2, 16384 bytes)

The mount cache hash table is seemingly the only one that gets this right
by directly taking PAGE_SIZE in to account.

The following patch attempts to catch the bogus values and round it up to
at least 0-order (or down, in the PID hash case).

Signed-off-by: Paul Mundt <lethal@linux-sh.org>

--

 kernel/pid.c    |   17 +++++++++++++----
 mm/page_alloc.c |    4 ++++

 2 files changed, 17 insertions(+), 4 deletions(-)
diff --git a/kernel/pid.c b/kernel/pid.c
index 2efe9d8..198c6a9 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -383,20 +383,29 @@ void free_pid_ns(struct kref *kref)
 /*
  * The pid hash table is scaled according to the amount of memory in the
  * machine.  From a minimum of 16 slots up to 4096 slots at one gigabyte or
- * more.
+ * more.  As a safety net for large base page on small memory systems
+ * the hash table is scaled to a 0-order allocation in the event that the
+ * initial detection logic sizes the table incorrectly (which can result
+ * in a very large number of slots).
  */
 void __init pidhash_init(void)
 {
-	int i, pidhash_size;
+	int i, pidhash_size, size;
 	unsigned long megabytes = nr_kernel_pages >> (20 - PAGE_SHIFT);
 
 	pidhash_shift = max(4, fls(megabytes * 4));
 	pidhash_shift = min(12, pidhash_shift);
 	pidhash_size = 1 << pidhash_shift;
 
+	size = pidhash_size * sizeof(struct hlist_head);
+	if (unlikely(size < PAGE_SIZE)) {
+		size = PAGE_SIZE;
+		pidhash_size = size / sizeof(struct hlist_head);
+		pidhash_shift = 0;
+	}
+
 	printk("PID hash table entries: %d (order: %d, %Zd bytes)\n",
-		pidhash_size, pidhash_shift,
-		pidhash_size * sizeof(struct hlist_head));
+		pidhash_size, pidhash_shift, size);
 
 	pid_hash = alloc_bootmem(pidhash_size *	sizeof(*(pid_hash)));
 	if (!pid_hash)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8c1a116..4a9a83f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3321,6 +3321,10 @@ void *__init alloc_large_system_hash(con
 			numentries >>= (scale - PAGE_SHIFT);
 		else
 			numentries <<= (PAGE_SHIFT - scale);
+
+		/* Make sure we've got at least a 0-order allocation.. */
+		if (unlikely((numentries * bucketsize) < PAGE_SIZE))
+			numentries = PAGE_SIZE / bucketsize;
 	}
 	numentries = roundup_pow_of_two(numentries);
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next             reply	other threads:[~2006-12-26  6:40 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-12-26  6:16 Paul Mundt [this message]
2006-12-26  6:16 ` [PATCH] Sanely size hash tables when using large base pages Paul Mundt
2006-12-26  7:42 ` Fengguang Wu
2006-12-26  7:42   ` Fengguang Wu
2006-12-26  7:42     ` Fengguang Wu
2006-12-28  0:29   ` Paul Mundt
2006-12-28  0:29     ` Paul Mundt

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:2efe9d8 dfblob:198c6a9 dfblob:8c1a116 dfblob:4a9a83f
dfblob:2efe9d8 dfblob:198c6a9 dfblob:8c1a116 dfblob:4a9a83f )
 OR (
bs:"[PATCH] Sanely size hash tables when using large base pages." )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20061226061652.GA598@linux-sh.org \
    --to=lethal@linux-sh.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.