linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Mel Gorman <mel@csn.ul.ie>
To: Olaf Hering <olaf@aepfle.de>
Cc: lee.schermerhorn@hp.com, Linux MM <linux-mm@kvack.org>,
	linux-kernel@vger.kernel.org, linuxppc-dev@ozlabs.org,
	Pekka Enberg <penberg@cs.helsinki.fi>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
	hanth Aravamudan <nacc@us.ibm.com>,
	akpm@linux-foundation.org,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Christoph Lameter <clameter@sgi.com>
Subject: Re: crash in kmem_cache_init
Date: Tue, 22 Jan 2008 19:54:49 +0000	[thread overview]
Message-ID: <20080122195448.GA15567@csn.ul.ie> (raw)
In-Reply-To: <20080118225713.GA31128@aepfle.de>

On (18/01/08 23:57), Olaf Hering didst pronounce:
> On Fri, Jan 18, Christoph Lameter wrote:
> 
> > Could you try this patch?
> 
> Does not help, same crash.
> 

Hi Olaf,

It was suggested this problem was the same as another slab-related boot problem
that was fixed for 2.6.24 by reverting a change. This fix can be found at
http://www.csn.ul.ie/~mel/postings/slab-20080122/partial-revert-slab-changes.patch
. Can you please check on your machine if it fixes your problem?

I am 99.9999% it will *not* fix your problem because there was two bugs, not
one as previously believed. On two test machines here, this kmem_cache_init
problem still happens even with the revert which fixed a third machine. I
was delayed in testing because these boxen unavailable from Friday until
yesterday evening (a stellar display of timing). It was missed on TKO because
it was SLAB-specific and those machines were testing SLUB. I found that the
patch below was necessary to fix the problem.

Olaf, please confirm whether you need the patch below as well as the
revert to make your machine boot.

Christoph/Pekka, this patch is papering over the problem and something
more fundamental may be going wrong. The crash occurs because l3 is NULL
and the cache is kmem_cache so this is early in the boot process. It is
selecting l3 based on node 2 which is correct in terms of available memory
but it initialises the lists on node 0 because that is the node the CPUs are
located. Hence later it uses an uninitialised nodelists and BLAM. Relevant
parts of the log for seeing the memoryless nodes in relation to CPUs is;

early_node_map[1] active PFN ranges
    2:        0 ->  1048576
Processor 1 found.
clockevent: decrementer mult[3cf1] shift[16] cpu[2]
Processor 2 found.
clockevent: decrementer mult[3cf1] shift[16] cpu[3]
Processor 3 found.
Brought up 4 CPUs
Node 0 CPUs: 0-3
Node 2 CPUs:

Can you see a better solution than this?

====
Recent changes to how slab operates mean a situation can occur on systems
with memoryless nodes whereby the nodeid used when growing the slab does
not map to the correct kmem_list3. The following patch adds the necessary
check to the indicated preferred nodeid and if it is bogus, use numa_node_id() instead.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>

--- 
 mm/slab.c |    9 +++++++++
 1 file changed, 9 insertions(+)

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.24-rc8-005-revert-memoryless-slab/mm/slab.c linux-2.6.24-rc8-010_handle_missing_l3/mm/slab.c
--- linux-2.6.24-rc8-005-revert-memoryless-slab/mm/slab.c	2008-01-22 17:46:32.000000000 +0000
+++ linux-2.6.24-rc8-010_handle_missing_l3/mm/slab.c	2008-01-22 18:42:53.000000000 +0000
@@ -2775,6 +2775,11 @@ static int cache_grow(struct kmem_cache 
 	/* Take the l3 list lock to change the colour_next on this node */
 	check_irq_off();
 	l3 = cachep->nodelists[nodeid];
+	if (!l3) {
+		nodeid = numa_node_id();
+		l3 = cachep->nodelists[nodeid];
+	}
+	BUG_ON(!l3);
 	spin_lock(&l3->list_lock);
 
 	/* Get colour for the slab, and cal the next value. */
@@ -3317,6 +3322,10 @@ static void *____cache_alloc_node(struct
 	int x;
 
 	l3 = cachep->nodelists[nodeid];
+	if (!l3) {
+		nodeid = numa_node_id();
+		l3 = cachep->nodelists[nodeid];
+	}
 	BUG_ON(!l3);
 
 retry:


-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

  reply	other threads:[~2008-01-22 19:54 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-15 15:09 crash in kmem_cache_init Olaf Hering
2008-01-15 15:58 ` Olaf Hering
2008-01-17 12:14 ` Pekka Enberg
2008-01-17 14:30   ` Christoph Lameter
2008-01-17 18:12     ` Olaf Hering
2008-01-17 18:58       ` Christoph Lameter
2008-01-17 19:54         ` Olaf Hering
2008-01-17 20:20           ` Olaf Hering
2008-01-19  4:56             ` Christoph Lameter
2008-01-17 21:15         ` Olaf Hering
2008-01-18  6:56           ` Olaf Hering
2008-01-18 18:42             ` Christoph Lameter
2008-01-19  4:55             ` Christoph Lameter
2008-01-18 18:47           ` Christoph Lameter
2008-01-18 21:30             ` Mel Gorman
2008-01-18 21:43               ` Christoph Lameter
2008-01-18 22:16               ` Christoph Lameter
2008-01-18 22:19                 ` Nish Aravamudan
2008-01-18 22:38                 ` Christoph Lameter
2008-01-18 22:57                 ` Olaf Hering
2008-01-22 19:54                   ` Mel Gorman [this message]
2008-01-22 20:11                     ` Christoph Lameter
2008-01-22 21:26                       ` Mel Gorman
2008-01-22 21:34                         ` Christoph Lameter
2008-01-22 22:50                           ` Mel Gorman
2008-01-22 22:57                             ` Christoph Lameter
2008-01-22 23:10                               ` Mel Gorman
2008-01-22 23:14                                 ` Christoph Lameter
2008-01-22 22:59                             ` Pekka Enberg
2008-01-22 23:12                               ` Christoph Lameter
2008-01-22 23:18                                 ` Christoph Lameter
2008-01-23  8:19                                   ` Pekka Enberg
2008-01-23  8:40                                     ` Olaf Hering
2008-01-22 21:45                     ` Olaf Hering
2008-01-22 22:12                       ` Nish Aravamudan
2008-01-22 22:23                       ` Christoph Lameter
2008-01-23  7:58                         ` Olaf Hering
2008-01-23 10:50                           ` Mel Gorman
2008-01-23 12:14                             ` Olaf Hering
2008-01-23 12:52                               ` Olaf Hering
2008-01-23 13:55                                 ` [PATCH] Fix boot problem in situations where the boot CPU is running on a memoryless node Mel Gorman
2008-01-23 14:18                                   ` Pekka J Enberg
2008-01-23 14:32                                     ` Pekka J Enberg
2008-01-23 14:49                                       ` Pekka J Enberg
2008-01-23 15:56                                         ` Mel Gorman
2008-01-23 17:29                                           ` Pekka J Enberg
2008-01-23 17:42                                             ` Pekka J Enberg
2008-01-23 18:51                                             ` Christoph Lameter
2008-01-23 19:52                                             ` Nishanth Aravamudan
2008-01-23 21:02                                               ` Pekka Enberg
2008-01-23 21:14                                                 ` Christoph Lameter
2008-01-23 21:36                                                   ` Nishanth Aravamudan
2008-01-24  3:13                                                     ` Christoph Lameter
2008-01-23 18:36                                         ` Christoph Lameter
2008-01-23 18:35                                     ` Christoph Lameter
2008-01-23 14:27                                   ` Olaf Hering
2008-01-23 14:42                                     ` Mel Gorman
2008-01-23 18:41                                   ` Christoph Lameter
2008-01-23 13:41                               ` crash in kmem_cache_init Mel Gorman
2008-01-18 18:51           ` Christoph Lameter
2008-01-17 19:03       ` Christoph Lameter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080122195448.GA15567@csn.ul.ie \
    --to=mel@csn.ul.ie \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=clameter@sgi.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=lee.schermerhorn@hp.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@ozlabs.org \
    --cc=nacc@us.ibm.com \
    --cc=olaf@aepfle.de \
    --cc=penberg@cs.helsinki.fi \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).