All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <dada1@cosmosbay.com>
To: Andi Kleen <ak@suse.de>
Cc: linux-kernel@vger.kernel.org
Subject: Re: [NUMA , x86_64] Why memnode_shift is chosen with the lowest possible value ?
Date: Fri, 30 Sep 2005 11:09:16 +0200	[thread overview]
Message-ID: <433D00BC.2070001@cosmosbay.com> (raw)
In-Reply-To: <433C1D6F.1030605@cosmosbay.com>

[-- Attachment #1: Type: text/plain, Size: 1081 bytes --]

Eric Dumazet a écrit :
> Andi Kleen a écrit :
> 
>>> Using memnode_shift=33 would access only 2 bytes from this 
>>> memnodemap[], touching fewer cache lines (well , one cache line). 
>>> kfree() and friends would be slightly faster, at least cache friendly.
>>
>>
>>
>> Agreed. Please send a patch.
> 

My previous patch was not correct. Couly you please review this second version.

[PATCH] NUMA,x86_64 :
     Compute the highest possible value for memnode_shift,
     in order to reduce footprint of memnodemap[] to the minimum, thus
     making all users (phys_to_nid(), kfree()), more cache friendly.

Before the patch :

Node 0 MemBase 0000000000000000 Limit 00000001ffffffff
Node 1 MemBase 0000000200000000 Limit 00000003ffffffff
Using 23 for the hash shift. Max adder is 3ffffffff

After the patch :

Node 0 MemBase 0000000000000000 Limit 00000001ffffffff
Node 1 MemBase 0000000200000000 Limit 00000003ffffffff
Using 33 for the hash shift.

In this case, only 2 bytes of memnodemap[] are used, instead of 2048

Thank you

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>


[-- Attachment #2: compute_hash_shift --]
[-- Type: text/plain, Size: 2241 bytes --]

--- linux-2.6.14-rc2/arch/x86_64/mm/numa.c	2005-09-20 05:00:41.000000000 +0200
+++ linux-2.6.14-rc2-ed/arch/x86_64/mm/numa.c	2005-09-30 11:04:18.000000000 +0200
@@ -38,38 +38,57 @@
 
 int numa_off __initdata;
 
-int __init compute_hash_shift(struct node *nodes, int numnodes)
+
+/*
+ * Given a shift value, try to populate memnodemap[]
+ * Returns :
+ * 1 if OK
+ * 0 if memnodmap[] too small (of shift too small)
+ * -1 if node overlap or lost ram (shift too big)
+ */
+static int __init populate_memnodemap(
+	const struct node *nodes, int numnodes, int shift)
 {
 	int i; 
-	int shift = 20;
-	unsigned long addr,maxend=0;
-	
-	for (i = 0; i < numnodes; i++)
-		if ((nodes[i].start != nodes[i].end) && (nodes[i].end > maxend))
-				maxend = nodes[i].end;
+	int res = -1;
+	unsigned long addr, end;
 
-	while ((1UL << shift) <  (maxend / NODEMAPSIZE))
-		shift++;
-
-	printk (KERN_DEBUG"Using %d for the hash shift. Max adder is %lx \n",
-			shift,maxend);
-	memset(memnodemap,0xff,sizeof(*memnodemap) * NODEMAPSIZE);
+	memset(memnodemap, 0xff, sizeof(memnodemap));
 	for (i = 0; i < numnodes; i++) {
-		if (nodes[i].start == nodes[i].end)
+		addr = nodes[i].start;
+		end = nodes[i].end;
+		if (addr >= end)
 			continue;
-		for (addr = nodes[i].start;
-		     addr < nodes[i].end;
-		     addr += (1UL << shift)) {
-			if (memnodemap[addr >> shift] != 0xff) {
-				printk(KERN_INFO
-	"Your memory is not aligned you need to rebuild your kernel "
-	"with a bigger NODEMAPSIZE shift=%d adder=%lu\n",
-					shift,addr);
+		if ((end >> shift) >= NODEMAPSIZE)
+			return 0;
+		do {
+			if (memnodemap[addr >> shift] != 0xff)
 				return -1;
-			} 
 			memnodemap[addr >> shift] = i;
-		} 
+			addr += (1 << shift);
+		} while (addr < end);
+		res = 1;
 	} 
+	return res;
+}
+
+int __init compute_hash_shift(struct node *nodes, int numnodes)
+{
+	int shift = 20;
+
+	while (populate_memnodemap(nodes, numnodes, shift + 1) >= 0)
+		shift++;
+
+	printk(KERN_DEBUG "Using %d for the hash shift.\n",
+		shift);
+
+	if (populate_memnodemap(nodes, numnodes, shift) != 1) {
+		printk(KERN_INFO
+	"Your memory is not aligned you need to rebuild your kernel "
+	"with a bigger NODEMAPSIZE shift=%d\n",
+			shift);
+		return -1;
+	}
 	return shift;
 }
 

  reply	other threads:[~2005-09-30  9:09 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-09-28 20:25 [PATCH 0/3] Demand faulting for huge pages Adam Litke
2005-09-28 20:25 ` Adam Litke
2005-09-28 20:31 ` [PATCH 1/3 htlb-get_user_pages] " Adam Litke
2005-09-28 20:31   ` Adam Litke
2005-09-28 20:32 ` [PATCH 2/3 htlb-fault] " Adam Litke
2005-09-28 20:32   ` Adam Litke
2005-09-29  6:09   ` Andrew Morton
2005-09-29  6:09     ` Andrew Morton
2005-09-29  6:10   ` Andrew Morton
2005-09-29  6:10     ` Andrew Morton
2005-09-28 20:33 ` [PATCH 3/3 htlb-acct] " Adam Litke
2005-09-28 20:33   ` Adam Litke
2005-09-29  6:20   ` Andrew Morton
2005-09-29  6:20     ` Andrew Morton
2005-09-29  9:45     ` Andi Kleen
2005-09-29  9:45       ` Andi Kleen
2005-09-29 13:40       ` [NUMA , x86_64] Why memnode_shift is chosen with the lowest possible value ? Eric Dumazet
2005-09-29 13:43         ` Andi Kleen
2005-09-29 16:59           ` Eric Dumazet
2005-09-30  9:09             ` Eric Dumazet [this message]
2005-10-04 17:13               ` Andi Kleen
2005-10-04 21:12                 ` Eric Dumazet
2005-09-29 13:32 ` [PATCH 0/3] Demand faulting for huge pages Hugh Dickins
2005-09-29 13:32   ` Hugh Dickins
2005-10-06 15:22   ` Adam Litke
2005-10-06 15:22     ` Adam Litke

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=433D00BC.2070001@cosmosbay.com \
    --to=dada1@cosmosbay.com \
    --cc=ak@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.